next up previous
Next: 3. Implementation Up: Cryptfs: A Stackable Vnode Previous: 1. Introduction


2. Design

Cryptfs is designed to be simple in principle. The file system interposes (mounts) itself on top of any directory, encrypts file data before it is passed to the interposed-upon file system, and decrypts it in the reverse direction. Our explicit design goals were:

The next five issues are treated in the following sections:

How to make it more difficult for others to decrypt data without authorization while at the same time providing simple encryption services transparently to authorized users?

Should encrypted bytes depend on the previously encrypted ones?

Does encryption affect file offsets or sizes and if so, how?

Should file names be encrypted?

How to keep the structure of all affected Unix file systems valid while encryption takes place?

2.1 Key Management

We decided that only the root user would be allowed to mount an instance of Cryptfs, but could not automatically encrypt or decrypt files. To thwart an attacker who gains access to a user's account or to root privileges, Cryptfs maintains keys in an in-memory data structure that associates keys not with UIDs alone but with the combination of UID and session ID. To succeed in acquiring or changing a user's key, an attacker would not only have to break into an account, but also arrange for his processes to have the same session ID as the process that originally received the user's passphrase. This is a more difficult attack, requiring session and terminal ``hijacking'' or kernel-memory manipulations.

Using session IDs to further restrict key access does not burden users during authentication. Login shells and daemons use setsid(2) to set their session ID and detach from the controlling terminal. Forked processes inherit the session ID from their parent. So a user would normally have to authorize themselves only once in a shell. From this shell they could run most other programs that would work transparently and safely with the same encryption key.

We made two small additional design decisions here. First, we decided to check for real UIDs and not effective ones. That way a user could run setuid programs and they would work with the runner's UID, not the file's owner. Secondly, if users find it too inconvenient, Cryptfs can be mounted with processing of keys based on UIDs alone (though we do not recommend it.)

We designed a user tool which prompts users for passphrases that are at least 16 characters long. The tool hashes passphrases using MD5[14] and passes them to Cryptfs using a special ioctl(2). The tool can also instruct Cryptfs to delete or reset keys.

Our design decouples key possession from file ownership. For example, a group of users who wish to edit a single file would normally do so by having the file group-owned by one Unix group and add each user to that group. However, Unix systems often limit the number of groups a user can be a member of to 8 or 16. Worse, there are often many subsets of users who are all members of one group and wish to share certain files, but are unable to guarantee the security of their shared files because there are other users who are members of the same group; e.g., many sites put all of their staff members in a group called ``staff'', students in the ``student'' group, guests in another, and so on. With our design, you can further restrict access to shared files only to those users who were given the key.

One disadvantage of this design is reduced scalability with respect to the number of files being encrypted and shared. Users who have many files encrypted with different keys will have to switch their effective key before attempting to access files that were encrypted with a different one. We did not perceive this to be a serious problem for two reasons. First, the amount of Unix file sharing of restricted files has always been limited. Most shared files are generally world readable and thus do not require encryption. Secondly, with the proliferation of windowing systems, users can associate different keys with different windows.

An alternative design option that would allow simultaneous access to multiple keys was to require that each user separately mount an instance of Cryptfs with a different key. This design option was rejected for two reasons. First, users would either require root privileges to mount or the file system would have to allow any user to mount Cryptfs. Secondly, this would not scale well with respect to the number of mounts required on a busy multi-user system.

2.2 Encryption Algorithm and Mode

To provide strong enough encryption it is necessary to encrypt as much data together in a chaining fashion that includes bit substitutions and transpositions, such that each byte encrypted depends on some of the prior ones. At the extreme, we could have designed Cryptfs to encrypt the whole file in this mode; but doing so would mean that each time we need to decrypt a single byte anywhere in the file, all prior bytes would have to be decrypted as well -- a major performance problem. We decided to encrypt blocks of data in a size that is natural to the operating system used -- 4096 or 8192 bytes. These values were chosen because they are the most common virtual memory subsystem page sizes, making it easier to handle memory-mapped operations (described in Section 3.4.)

Next we picked the algorithm. We rejected patented or licensed ones, and also rejected DES[20] because it is too big and slow. We picked Blowfish[18] -- a 64 bit block cipher that was designed to be fast, compact, and simple. Blowfish is suitable in applications where the keys do not change often such as in automatic file decryptors. It can use variable length keys as long as 448 bits. We kept the default 128 bit long keys.

We selected the Cipher Block Chaining (CBC)[17] encryption mode because it allows us to encrypt byte sequences of any length -- suitable for encrypting file names. However, we decided to use CBC only within each block encrypted by Cryptfs. This way ciphertext blocks (of 4-8KB) would not depend on previous ones, allowing us to decrypt each block independently. This choice also minimizes potential data loss: if one byte is corrupted in a file, at most one page worth of data could not be properly decrypted.

2.2.1 File Offsets

Many programs read or write arbitrary data within files. They seek to a specific offset within the file and perform the read or write operation starting there. Some encryption algorithms may change the size of the input being encrypted -- generally increasing it. If the encryption algorithm changes data size, it becomes difficult and costly to perform file operations at arbitrary offsets. The Blowfish algorithm was chosen also to avoid this cost. This algorithm does not change the size of the data being encrypted, making offsets in encrypted and decrypted files the same. Furthermore, since the Blowfish algorithm does not change the total size of the file, operations like stat(2) (getting file attributes such as size) can be handled simply by passing them on to the interposed-upon file system layer.

2.3 File Names

Users often choose comfortable file and directory names describing the nature of the data stored within. An attacker who discovers the names of files -- even if they cannot access the file data -- can still infer much about the nature of the data itself. Therefore, we decided to encrypt all file and directory names as well.

Encryption algorithms use a large subset of possible characters for the ciphertext. This strengthens the encryption by ``randomizing'' any possible patterns in the cleartext. But when encrypting strings that represent Unix file names, several characters may result that are illegal in file names, such as a forward slash (/) or a null. Such encrypted file names cannot be stored verbatim in the normal directory structures as they will corrupt the underlying file system. In addition, there are many non-printable characters that, while legal characters after encryption, are difficult to display on the screen (e.g., the output of ls) or may affect the terminal settings.

We decided that after encrypting file names, we will uuencode them to eliminate the unwanted characters and guarantee that all file names consist of printable characters. The uuencoding algorithm chosen is simple and fast. It converts every 3 byte encrypted sequence into a 4 byte sequence of ASCII characters from a set of 64 characters ranging from 48-111. Since each character in the chosen range requires only 6 bits, we were able to convert exactly 3 bytes of encrypted data chosen from a 256 character set to 4 bytes chosen from 64 printable characters.

The above choice also meant that file names become one third longer. This was necessary but it did not have the same ramifications as changing offsets of data within files. Since file names are always read whole and from the beginning of the file name, we can read them from the underlying storage, apply our uudecoding algorithm, and finally the decrypting algorithm. The resulting string would be the original file name and be returned to the caller.

Special consideration was given to the two directories that always exist: the ``.'' and ``..'' directories. The encryption algorithm leaves them unchanged for two reasons:

If these two directories do not exist in the interposed-upon file system, normal directory operations such as changing directories to the parent one and other recursive operations would fail.

Since everyone knows that these two must always exist in Unix file systems, encrypting them may reduce the level of security by supplying a potential attacker with known decrypted strings as well as a small set of encrypted ones. An attacker would know that two of the encrypted strings must decrypt to result in ``.'' and ``..'' -- and may try a known-plaintext attack.

Finally, we decided that along with file names, we will also encrypt directory names, symbolic links and the values they point to, and all other special files. The targets of symbolic links will always be encrypted -- regardless if they point to ``.'' or ``..''. These measures provide added security.

2.4 Mount Points

A stackable file system is similar to a loopback file system (lofs) in that the mount point and the directory mounted upon are separate. Cryptfs can provide transparent encryption for, say /home/ezk/private mounted on /mnt/ezk. Anyone accessing files directly through the mounted directory, /home/ezk/private, will see encrypted files and directories with nothing but normal Unix permissions to stop a potential attacker. Only access through the mount point, /mnt/ezk, by a valid authenticated user, will provide transparent decryption and encryption of data -- which would still be subject to Unix permission checks.

Providing access to the ``raw'' encrypted files is important for backups: the backup operator should not have to decrypt files because it is CPU intensive and it is insecure to keep plaintext data on backup media. Having this access, however, provides an attacker who gains root privileges or the owner's privileges with the ability to corrupt data files or remove them. For this reason it was also desirable for Cryptfs to overlay the mounted directory with the mount point, making both of them the same. Since overlaying the mount point will prevent backups, we came up with a combined compromise solution: Cryptfs will overlay the mount point by default, will allow valid authenticated users to decrypt files using their keys, deny unauthenticated non-root users any access, and would otherwise behave like a read-only loopback file system to root users who did not provide a key to Cryptfs. In other words, unauthenticated root users who access files via Cryptfs would get to see their encrypted names and data, but will not be allowed to make any changes. This allows backups to proceed quickly and safely, and prevents attackers from corrupting data or removing files.

An additional design decision borne out of these was that the underlying storage must remain a valid file system of whatever type it was before. This is a must for backup programs and other tools to be able to browse the encrypted directories unabated.

next up previous
Next: 3. Implementation Up: Cryptfs: A Stackable Vnode Previous: 1. Introduction
Erez Zadok