2. Design

The first four points are discussed below. Performance is addressed in detail in Section 5.

2.1 What to Change in a File System

Without changing a given Vnode interface, we have identified three items that file system developers want to inspect or manipulate: file data, names, and attributes. Changing file data is the most obvious (e.g., encryption).

Changing file names as part of, say, encryption is also desired. For example, a file system may refuse to create files that contain rarely used characters such as whitespace and other non-printable characters, or visually confusing names such as ``...'' (three dots) as these are sometimes used by intruders to obscure their tracks. More advanced use can be made by inspecting file names and selectively manipulating them. For example, a file system that adds immutable file support will need to have a list of file names to consider untouchable. Several file systems can place auxiliary information (such as access keys) in files that are hidden from normal view (beyond the obvious ``dot'' files) and are used only internally by the file system.

Working with file attributes promises some of the most interesting file systems, as attributes fundamentally reflect existing Unix file access control. For example, a file system can perform UID or GID mapping based on the file's ownership. It can exploit seldom used mode bits: the setuid bit on directories can be used to indicate immutable directories. Attempting to modify setuid binaries without prior authentication can be prevented. For every given file F, if a file named .acl.F exists, the file system can read the contents of that file and interpret them as additional access grants or revocations to the file F.

Advanced developers may wish to combine changing various aspects of files with additional coding inside specific vnode operations. For example, a file system may wish to track removal of vital files such as logs that can be used to analyze attackers' actions. Attempts to remove the files can be transparently translated into file renaming. The name chosen can be a special name that is hidden from normal view: it will not get listed with the rest of the files in the directory, but be available on the underlying file system. A generalized scheme can include file versioning.

This list of examples is not exhaustive. It should be considered only a hint of what can be accomplished with level of flexibility that Wrapfs offers.

2.1.1 The Wrapfs API

Table 1: Wrapfs Developer API

Command	Input Argument	Output Argument
encode_data	buffer from user space	encoded (same size) buffer to be written
decode_data	buffer read from lower level file system	decoded (same size) buffer to pass to user space
encode_filename	file name passed from user system call	encoded and allocated file name of any length to use in lower level file. Encoded string must be a valid Unix file name. system
decode_filename	file name read from the lower level file system	decoded and allocated file name of any length to pass back to a user process
encode_attr	attributes passed from upper level VFS (ownership, group, modes, etc.).	(optionally) modified attributes to pass to the lower level file system
decode_attr	attributes read from the lower level file system	(optionally) modified attributes to pass to the upper VFS, and possibly out to the user process.

The API for the Wrapfs developer is summarized in Table 1 and is described here. It consists of six calls to encode and decode file data, names, or attributes. Since it may be necessary to perform more sophisticated operations in these calls, they are passed additional information such as the current vnode, VFS, user credentials, etc.

In order to simplify the manipulation of file data, and to enable MMAP operations (necessary for executing binaries), we perform data manipulations in a size that is native to the operating system, usually 4KB or 8KB. Another compelling reason for manipulating only whole pages is that some file data changes may require it. Some encryption algorithms work on blocks of data of a known fixed size such that bytes within the block depend on preceding bytes[22,27]. It was therefore important to confine users of Wrapfs to manipulating a fixed size data buffer.

To keep Wrapfs simple, we decided that the data encoding and decoding calls will return a buffer of the same size as the one passed to it. This design decision excludes the possibility of using algorithms such as compression and decompression, because such algorithms change the size of their input data, making file offset calculations costly. Supporting such algorithms would have complicated Wrapfs considerably. Therefore we left this support out of the first implementation of Wrapfs.

We decided that all Vnode calls that write file data will call the function encode_data before writing the data to the lower level file system. Then, all Vnode calls that read file data will call the function decode_data after reading the data from the lower level file system. In a similar fashion, all Vnode functions that manipulate file names or attributes have the appropriate encode or decode function called in the right places.

The user of Wrapfs who wishes to manipulate files and their names or attributes need not worry about which Vnode functions use them, how directory reading (readdir) is being accomplished, about holding and releasing locks, updating reference counts, or caching. The file system developer only needs to fill in the relevant encoding and decoding functions. Wrapfs takes cares of all these operating system internals.

We are making the full sources to Wrapfs publicly available. This way it is possible for file system developers to modify every aspect of a prototype, not just through the six API calls. This also allows the security community to validate and improve the templates.

2.2 User Level Issues

There are three important issues relating to the extension of the Wrapfs API to user-level: mount points, caching, and ioctls.

2.2.1 Mount Points

Wrapfs supports two ways of mounting a file system: a regular mount and an overlay mount. In a regular mount two pathnames are given: one for the mount point (say /mnt), and one for the directory to stack on (the mounted directory /usr). For example mount -t wrapfs /mnt /usr. After the mount is complete, there are two ways to access the mounted-on file system. Access via the mounted-on directory (/usr) yields the lower level files without going through Wrapfs. However, access via the mount point (/mnt) will go through Wrapfs first. This mount style exposes the mounted directory to user processes, and is useful for debugging purposes and for backups to proceed faster. (That users can bypass the mount point is a general property of stacking, not one brought on by Wrapfs). For example, in an encryption file system, a backup utility can backup files faster and safer if it uses the lower file system's files (ciphertext), rather than the ones through the mount point (cleartext).

The second mount style, an overlay mount, is accomplished using mount -t wrapfs -O /usr. Here, Wrapfs is mounted directly on top of /usr. Accessing files such as in /usr/ucb must go though Wrapfs. There is no easy way to get to the original file system's files under /usr without passing through Wrapfs first. This mount style makes backups and debugging more difficult, but has the advantage of hiding the lower level file system from user processes.

We consider an overlay mount more secure and thus made it the default mount style in Wrapfs. A sophisticated attacker might be able to overlay another file system whose purpose would be to bypass several layers and get directly into the lowest level file system. Such an attack requires root privileges, source access to all of file systems currently mounted, and understanding of kernel internals. The attacker would have to carefully follow kernel data structures to reach the ones representing the lowest level file system. This attack is therefore no easier than kernel memory manipulation via /dev/kmem.

2.2.2 Cache Coherency

An important point that relates to the mount style is that of caching. Most file systems cache pages to improve performance. When a stackable file system is used on top of, say, UFS, both layers may cache pages independently. Cache incoherency could result if pages at different layers are modified independently, but that could only occur in regular mounts; overlay mounts do not let user processes modify data pages at the lower layers. A mechanism for cache synchronization through a centralized cache manager was proposed by Heidemann[7], but that solution involved modifying the rest of the operating system and other file systems.

We decided that Wrapfs will perform its own caching, and may cache pages at the lower layer depending on the mount style. If the mount style was regular, Wrapfs caches pages also at the lower layer, because this improves performance when accessing files directly through the lower layer; in fact there is no way to avoid caching pages at the lower layer in a regular mount style, because processes can access files directly through the lower level file system. We also decided that the higher the layer is, the more authoritative it would be. For example, when writing to disk, cached pages for the same file in Wrapfs would overwrite their UFS counterparts. This policy matches the most common case of cache access, through the uppermost layer.

If an overlay mount style was used, Wrapfs does not cache pages at the lower layer. This cuts memory usage for pages by half, and performance is still very good, as pages are served off of the upper (Wrapfs) layer, where pages are always cached.

2.2.3 Ioctls

The third important user-level issue relates to the ioctl(2) system call. Ioctls have been used for years as simple means to extend the API of a file system beyond that which system and Vnode calls offer. Wrapfs allows its user to define new ioctl codes and implement their associated actions. Two ioctls are already defined: one to set a debugging level, and one to query it. Wrapfs comes with many debugging traces that can be turned on or off at run time by a root user. Other possible ioctls that can be implemented by specific file systems include passing and retrieving additional information to and from the file system. An encryption file system (such as the one described in Section 4.9) might use an ioctl mechanism to set encryption keys.