Next: 6 Related Work Up: The Design and Implementation Previous: 4 Rubberd

Subsections

5 Evaluation

To evaluate elastic quotas in a real world operating system environment, we implemented a prototype of our elastic quota system in Solaris 9, the latest operating system release¹ from Sun Microsystems. We chose Solaris because it is widely used in large production environments such as the file servers on which elastic quotas would operate. We present some experimental results using our prototype EQFS and rubberd implementations. We compared EQFS against Solaris 9 UFS [2], the most popular file system used on Solaris servers. We also measured the impact of rubberd on a running system.

We conducted all experiments on a Sun-Fire 480R multiprocessor system with four 750 MHz UltraSPARC-III CPUs and 4 GB of RAM, running Solaris 9. We believe this is a moderate size machine for the type of large file servers that elastic quotas will be useful on. Although such installations will probably include RAID arrays or SAN products, we focused on the native disks that were in the machine; this helped us to analyze the results without worrying about interactions with other storage systems. For all our experiments, we used a local UFS file system installed on a Seagate Cheetah 36LP disk with 36 GB capacity and 10000 rpm. UFS includes optional logging features used in some installations that enable a form of journaling that logs meta-data updates to provide higher reliability guarantees. We considered both UFS and UFS logging (LUFS) in our experiments. For each experiment, we only read, wrote, or compiled the test files in the file system being tested. All other user utilities, compilers, headers, and libraries resided outside the tested file system. Unless otherwise noted, all tests were run with a cold cache by unmounting all file systems that participated in the given test after the test completed and mounted the file systems again before running the next iteration of the test.

We report experimental results using both file system benchmarks and real applications. Sections 5.1 and 5.2 describe the file system workloads we used for measuring EQFS and rubberd performance, respectively. Sections 5.3 shows results for three file system workloads comparing EQFS to UFS to quantify the performance overhead of using EQFS. Section 5.4 shows results quantifying the impact of rubberd's actions on a running system: reclaiming storage, building its database, etc.

5.1 EQFS Benchmarks

To measure EQFS performance, we stacked EQFS on top of UFS and compared its performance with native UFS. We measured the performance of four file system configurations on a variety of file system workloads: UFS without logging (UFS), UFS with logging (LUFS), EQFS on top of UFS (EQFS/UFS), and EQFS on top of LUFS (EQFS/LUFS). We used three file system workloads for our experiments: PostMark, a recursive find, and a compilation of a large software package, the Solaris 9 kernel.

5.1.0.1 PostMark

The first workload we used was PostMark [13], a well-known file system benchmark that creates a large pool of continually changing files to simulate a large electronic mail server workload. PostMark creates an initial pool of text files of various sizes, then performs transactions by reading from, appending to, or creating and deleting files. The workload provides a useful measure of file system performance for users performing daily tasks such as reading mail, editing files, and browsing their directories. This workload exercises some of the more complex EQFS file operations and provides a conservative measure of EQFS overhead. We only report PostMark measurements for EQFS using /home since EQFS performs identically when using either /home or /ehome in this experiment.

Because the default PostMark workload is too small, we configured PostMark to perform 5000 transactions starting with an initial pool of 2500 files with sizes between 8 KB and 64 KB, matching file size distributions reported in file system studies [29]. Previous results obtained using PostMark show that a single PostMark run may not be indicative of system performance under load because the load is single-threaded whereas practical systems perform multiple concurrent actions [28]. Therefore, we measured the four file systems running 1, 2, 4, and 8 PostMark runs in parallel. This not only allows us to conservatively measure EQFS's performance overhead, but also evaluate EQFS's scalability as the amount of concurrent work done increases. The latter is even more important than the former, since raw speed can be improved by moving to a larger machine, whereas poorly-scaling systems cannot be easily helped by using larger machines.

5.1.0.2 Recursive Find

The second workload we used was a recursive scan of the full Solaris source base -- which is a collection of 32416 Java, C, and assembly files in 7715 subdirectories -- using find . -print. Since EQFS is implemented as a stackable union file system, some EQFS file operations must be performed on both /elastic and /persistent. For example READDIR must merge two directory contents; and LOOKUP must find a file in either of these two directories. Since LOOKUP operations are common [20], and merging two directory contents can be costly, this find test, when run with a cold cache, is intended to show the worst-case performance overhead of EQFS when using these file system operations. To measure EQFS performance with this workload, all files were stored persistently and we performed the recursive scan using both /home and /ehome.

5.1.0.3 Solaris Compile

The third workload we used was a build of the Solaris 9 kernel, which provides a more realistic measure of overall file system performance. The kernel build is inherently parallel, and as such the elapsed time masks overheads due to disk latency. As in all such measurements, the increase in system time is of interest, as it indicates the extra processing done by EQFS. This build processes 5275 C and assembly source files in 1946 directories to produce 4020 object files and more than 10,000 other temporary files. We used Sun's Workshop 5.0 compilers and set the maximum concurrency to 16 jobs to keep the CPU busy and to ensure that the overhead is not underrepresented due to time spent performing I/O. Overall this benchmark contains a large number of reads, writes, and file lookups, as well as a fair mix of most other file system operations such as unlink, mkdir, and rename. To measure EQFS performance with this workload, all source files were stored persistently and we performed the build in both /home and /ehome. When using /ehome, all object files are created elastic.

5.2 Rubberd Benchmarks

To evaluate rubberd, we measured how long it took to build its nightly elastic files log and use it for cleaning elastic files. The rubberd log we used contains the names of elastic files and lstat(2) output. To provide realistic results on common file server data sets, we used a working set of files collected over a period of 18 months from our own production file server. The working set includes the actual files of 121 users, many of whom are software developers. The file set includes 1,194,133 inodes and totals over 26 GB in size; more than 99% of the file set are regular files. 24% of the users use less than 1 MB of storage; 27% of users use between 1-100 MB; 38% of users use between 100 MB-1 GB of storage; and 11% of users consume more than 1 GB of storage each. Average file size in this set is 21.8 KB, matching results reported elsewhere [20]. We treated this entire working set as being elastic. Previous studies [23] show that roughly half of all data on disk and 16% of files are regeneratable. Hence by treating all files as elastic, we are effectively modeling the cost of using rubberd on a disk consuming a total of 52 GB in 7.5 million files. Using EQFS mounted on LUFS, we ran three experiments with the working set for measuring rubberd performance: building the elastic files log, cleaning elastic files using the log, and cleaning elastic files while running a file system workload.

5.2.0.1 Elastic File Log Creation

The first rubberd benchmark we used measured the time it took to build an elastic file log by scanning the entire /elastic directory through EQFS. The scan is recursive and builds per-user log files in parallel with a separate child process for each user, storing lstat(2) information on each file in the 26 GB data set described above. Thus, the completion time to create the log is determined by the users with the most elastic files. Building such a disk scan may take a while and can disrupt user activity, particularly when run on larger file systems. As a result, the log is intended to be built at night or when few users are active. Nevertheless, once the log is created, we expect that scanning it to find elastic files suitable for removal can be executed much faster than scanning the file system directly, especially if the set of files to be removed is significantly smaller than the set of elastic files on the system.

5.2.0.2 Elastic File Cleaning

The second rubberd benchmark we used measured the time it took to use the elastic file log to clean a portion of the disk on an otherwise idle system using our default cleaning policy. Rubberd operates by retrieving the list of files for each user, ordering them based on the default cleaning algorithm as described in Section 4.3, and then removing files in order from this list. To provide a conservative measure of cleaning overhead, we set the rubberd parameters such that 5 GB of disk space, roughly 1/4 of the space used by elastic files, would need to be removed to achieve the desired state. While we do not propose using such a high hysteresis value for normal file systems, we chose a large value to avoid under-representing the cost of rubberd operation.

5.2.0.3 Rubberd Cleaning with Solaris Compile

The third rubberd benchmark we used measured the time it took to run the second rubberd benchmark in conjunction with the Solaris Compile described in Section 5.1. This experiment measures the more practical impact of rubberd cleaning on a system operating under load. Here, we ran the previous elastic file cleaning benchmark on the same file set, but at the same time we ran the parallel Solaris compilation, simulating high CPU and I/O load. In this experiment, the kernel build was performed under /ehome, although we did not need to worry about rubberd causing the build to fail as the database contained enough files from which to satisfy the cleaning request. Note that both the kernel build and rubberd cleaning were executed on the same physical disk.

5.3 EQFS Results

5.3.0.1 PostMark

The following two figures show the results for running PostMark on each of the four file systems. Figure 2 shows the total throughput of the system and Figure 3 shows the total time it takes to complete all of the runs. The results for LUFS show that EQFS incurs less than 10% overhead over LUFS, with the EQFS/LUFS throughput rate and completion time being within 10% of LUFS. The results for UFS are even better, showing that EQFS incurs hardly any overhead, with the EQFS/UFS throughput rate and completion time being within 1% of UFS. These results show that EQFS's overhead is relatively modest even for a file system workload that stresses some of the more costly EQFS file operations.

**Figure:** PostMark transactions per second results
$\begin{figure}\begin{centering} \epsfig{file=data/trans-time-both.eps, width=3.125in}\vspace{-1.00em}\end{centering}\end{figure}$

**Figure:** PostMark completion time results
$\begin{figure}\begin{centering} \epsfig{file=data/comp-time-both.eps, width=3.125in}\vspace{-1.00em}\end{centering}\end{figure}$

EQFS exhibits higher overhead when stacked on LUFS versus UFS in part because LUFS performs better and is less I/O bound than UFS, so that any EQFS processing overhead becomes more significant. LUFS logs transactions in memory, clustering meta-data updates and flushing them out in larger chunks than regular UFS, resulting in higher throughput and lower completion time than regular UFS for PostMark. However, UFS scales better than LUFS, as evident by the fact that the total throughput rate for UFS increases slightly with more parallel PostMark runs whereas the throughput rate for LUFS decreases significantly. More importantly, the results show that EQFS scales with the performance of the underlying file system and in no way impacts performance adversely as the amount of concurrent work done increases.

5.3.0.2 Recursive Find

Figure 4 shows the results for running the recursive find benchmark on each of the file systems. We show results for running the benchmark with both cold cache and warm cache. The cold cache results show that EQFS incurs roughly 80% overhead in terms of completion time when stacked on top of UFS or LUFS, taking about 80% longer to do the recursive scan than the native file systems. The high EQFS overhead is largely due to the frequent READDIR operations that are done by the recursive scan. Using a cold cache with the recursive scan, each READDIR operation requires going to disk to read the respective directory block. Because EQFS must merge both persistent and elastic directories, READDIR requires two directory operations on the underlying file system. This causes twice as much disk I/O as using the native file system to read directories, resulting in a significantly higher completion time. This is compounded by the fact that most FFS-like file systems such as UFS make an attempt to cluster meta-data and data together on disk; UFS does not necessarily place the two sister directories close to each other on disk, hence reading the two directories not only causes multiple I/O requests, but also causes the disk to seek more, which slow overall performance. Overall the recursive find benchmark is not representative of realistic file workloads, but provides a measure of the worst-case overhead of EQFS as READDIR is the most expensive EQFS operation.

**Figure:** Elapsed times (seconds, log-scale) of a recursive `find`, using cold and warm caches
$\begin{figure}\begin{centering} \epsfig{file=data/find-time.eps, width=3.25in}\vspace{-1.00em}\end{centering}\end{figure}$

In this test all files found were located under /persistent. This meant that looking up files via /home found the files in the primary directory, whereas when looking them up via /ehome, the files were logically located in the sister directory and EQFS had to perform two LOOKUP operations to find those files. Nevertheless, Figure 4 shows that the overhead of looking up those files with an extra LOOKUP was small: 4.2% when mounted on LUFS and only 0.1% when mounted on top of UFS.

When using a warm cache, Figure 4 shows that EQFS incurs essentially no overhead versus the native file system when stacked on top of either UFS or LUFS. For all file systems, the recursive find took less than two seconds to complete, roughly two orders of magnitude faster than when using a cold cache. Like other Solaris file systems, our EQFS implementation utilizes the Solaris Directory Name Lookup Cache (DNLC). The warm cache results illustrate the full benefits of caching. Since the directory contents are already merged and cached, EQFS does not spend additional time merging directories, resulting in negligible performance overhead. There is also no difference in EQFS performance when using /home versus /ehome since LOOKUP requests are satisfied from the cache and EQFS does not call the underlying file system.

5.3.0.3 Solaris Compile

Figure 5 shows the results for running the Solaris compile on each of the file systems. Results are reported in terms of elapsed time and system time. Although we do not report user time, we note that the sum of user and system time is higher than elapsed time, due to the parallel nature of the build and the multiprocessor machine used. The results show that EQFS incurs almost no overhead in completion time when stacked on top of UFS or LUFS, taking less than 1% longer to complete the compilation. EQFS incurs less than 5% overhead versus UFS or LUFS in terms of system time. These results show that EQFS imposes very little performance overhead, and does not limit file system scalability for realistic application workloads such as a large parallel compilation.

**Figure:** Elapsed and system times (seconds) of a large compile benchmark
$\begin{figure}\begin{centering} \epsfig{file=data/build-time.eps, width=3.25in}\vspace{-1.00em}\end{centering}\end{figure}$

The performance of EQFS when doing the compile from /ehome is slightly worse than when doing the compile from /home because the source files are located in the underlying persistent directory. As a result, LOOKUP operations for uncached entries from /ehome will cause a lookup in both underlying directories. We analyzed the cost and frequency of various file operations for the compilation and found that while LOOKUP operations are the most frequent, accounting for almost half of all file operations, the total time spent doing LOOKUP operations was small. Since the same file is typically referenced multiple times during the build, requests are satisfied from the cache, resulting in little performance difference between compiling in /home versus /ehome.

For comparison purposes, we also measured the overhead of a null stacking layer and found that it incurred about 0.5% overhead when stacked on top of UFS or LUFS. This means that EQFS only imposes roughly 0.5% more overhead beyond the basic stacking costs, even though EQFS provides significant additional functionality. EQFS's low overhead is due in part to its effective use of the DNLC for vnode caching. Previously published results for similar compilation benchmarks on trivial stacking systems [31] that simply copy data between layers show a 14.4% increase in system time, significantly higher than what we measure for EQFS.

5.4 Rubberd Results

5.4.0.1 Elastic File Log Creation

Table 2 shows the results for building the elastic file log. The results show that the entire log was created in only about 10 minutes using a cold cache. This indicates that the cost of building the elastic file log is small and should have little if any effect on system operation if run during off-peak hours. Table 2 also shows that the entire log was created in less than three minutes when using a warm cache. In practice, we expect actual numbers to be closer to those of a cold cache.

Table: Times (seconds) to build the elastic file log

Time	Cold	Warm	Speedup
Elapsed	638.5	175.4	364%
User	7.3	7.2	1.3%
System	76.1	72.0	5.7%

5.4.0.2 Elastic File Cleaning

Table 3 shows the results of running the elastic file cleaning benchmark to clean 5 GB of disk space. The entire cleaning process took less than two minutes. Compared to the time it took to scan the disk and build the elastic file log, the overhead of cleaning is more than five times less, which shows the benefit of using the log for cleaning. In the absence of the elastic file log, removing the same set of data would have involved scanning the entire disk in order to find candidate files, which would have taken significantly longer. As expected, the figures indicate that the job is primarily I/O bound, with user and system times amounting to a mere fraction of the completion time.

The cleaning cost is low enough that rubberd may be run multiple times during the course of a day without much overhead. For instance, if rubberd were run once an hour, the rubberd would only need three percent of the time to clean 120 GB of disk space a day. It is unlikely that this much storage space would need to be reclaimed daily for most installations, so that rubberd cleaning overhead in practice would typically be even lower.

Table: Times (seconds) to clean 5GB

Elapsed	User	System
111.8	16.5	10.1

5.4.0.3 Rubberd Cleaning with Solaris Compile

Table 4 shows the completion time for executing our large Solaris compile benchmark while rubberd is running. These results measure the impact of running rubberd cleaning on the Solaris compilation by comparing the compilation completion times when rubberd is not running, when rubberd is running at low priority, and when rubberd is running at normal priority.

Rubberd Status	Elapsed Time
Not running	2872.1
Low Priority	2974.5
Normal Priority	2991.5

Comparing with the Solaris compilation results without rubberd running, we observe a 3.5% degradation in completion time when rubberd is running at low priority, and a 4% degradation when running at regular priority. Running rubberd as a lower priority job does not make a large difference, primarily since both jobs are I/O bound, hence CPU scheduling priority has a very small impact on completion time. Furthermore, we observe that there are numerous lull times during a regular system's operation in which it would be possible to schedule rubberd to run with an even lower impact on system operation [6].

Overall, however, we observe that the impact of rubberd running even once an hour with a conservatively large amount of data to remove does not significantly hamper normal system operation. It is also important to note that as these files are temporary they would be removed anyhow; rubberd provides the added convenience of automatically doing so when disk space becomes low and before the disk fills up and hampers user productivity.

Next: 6 Related Work Up: The Design and Implementation Previous: 4 Rubberd

Erez Zadok 2002-06-21