File System Benchmarking Tools and Techniques

Benchmarking is critical when evaluating performance, but is especially difficult for file and storage systems. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable. The large variety of workloads that these systems experience in the real world also add to this difficulty.

We have found that some of the most commonly used benchmarks are flawed, and many research papers do not provide a clear enough picture of file system performance. We believe that a good performance evaluation should use micro-benchmarks to highlight both the good and bad qualities of a file system, as well as general-purpose benchmarks or traces to give an idea about how it would perform under expected and realistic workloads. Nevertheless, care should be taken to ensure that general-purpose benchmarks indeed accurately reflect the real-world workloads. In addition, benchmarks should scale well, and results should be reproducible and comparable across papers.

In this project, we survey file system benchmarks used in many recent research papers. We found that no single benchmark adequately measures file system performance. We show how some commonly acceptable and widely used benchmarks and benchmarking techniques can easily conceal overheads, unfairly over-emphasize overheads, or can in general emphasize or de-emphasize many of the file system's properties. We offer suggestions on how to create and conduct benchmarks so that they provide a more fair and accurate picture of file system performance.

Primarily in this project, we describe our views on the future of file system benchmarking. To that end, we have been developing several technologies: fine-grained file system tracing, efficient file system replaying, automated file system benchmarking tools, and low-overhead detailed file system behavior visualization tools.

Journal Articles:

# Title (click for html version) Formats Published In Date Comments
1 A Nine Year Study of File System and Storage Benchmarking PS PDF BibTeX ACM Transactions on Storage (TOS) May 2008 Online data appendix

Conference and Workshop Papers:

# Title (click for html version) Formats Published In Date Comments
1 Accurate and Efficient Replaying of File System Traces PS PDF BibTeX Fourth USENIX Conference on File and Storage Technologies (FAST 2005) Dec 2005  
2 Auto-pilot: A Platform for System Software Benchmarking PS PDF BibTeX Usenix Technical Conference, FREENIX Track Apr 2005  
3 Tracefs: A File System to Trace Them All PS PDF BibTeX Third USENIX Conference on File and Storage Technologies (FAST 2004) Apr 2004  

Technical Reports:

# Title (click for html version) Formats Published In Date Comments
1 A Nine Year Study of File System and Storage Benchmarking PS PDF BibTeX Stony Brook U. CS TechReport FSL-07-01 May 2007 Online data appendix
2 Versatile File System Tracing with Tracefs PS PDF BibTeX Stony Brook U. CS TechReport FSL-04-05 Aug 2004 M.S. Thesis

Current Students:

# Name (click for home page) Program Member Since
1 Avishay Traeger PhD Sep 2003

Past Students:

# Name (click for home page) Program Period Current Location
1 Nikolai Joukov PhD Jan 2004 - Dec 2006 Research Staff Member, Storage and Data Services Research group, IBM T. J. Watson Research Center (Hawthorne, NY)
2 Charles P. Wright PhD May 2003 - May 2006 Research Staff Member, Network Server Systems Software group, IBM T. J. Watson Research Center (Hawthorne, NY)
3 Akshat Aranya MS May 2003 - Aug 2004 Associate Research Staff Member, NEC Labs America (Princeton, New Jersey)
4 Tim Wong BS Dec 2004 - Jun 2005 Analyst, Capital Markets Prime Services department, Quantitative Research division, Repo Trading Analytics group, Lehman Brothers (New York, NY)

Sponsors:

# Sponsor Amount Period Type Title (click for award abstract)
1 NSF HECURA $760,253 2006-2009 PI File System Tracing, Replaying, Profiling, and Analysis on HEC Systems
2 NSF Trusted Computing (TC) $400,000 2003-2006 Sole PI A Layered Approach to Securing Network File Systems


(Last updated: Fri Jun 13 06:55:53 EDT 2008)