On the Role of Static Analysis in Operating System Checking and Runtime Verification

A RESEARCH PROFICIENCY EXAM PRESENTED
BY
ABHISHEK RAI
STONY BROOK UNIVERSITY

Technical Report FSL-05-01
May 2005

Abstract of the RPE
On the Role of Static Analysis in Operating System Checking and Runtime Verification
by
Abhishek Rai
Stony Brook University
2005

Software inevitably contains bugs. For certain classes of software like operating systems, reliability is a critical requirement. Recent research has shown that several commodity operating systems, even after careful design and extensive testing, still contain a number of bugs. Methods for automated detection of bugs in software can be classified into (1) static methods, (2) formal verification, and (3) runtime checking. In this paper, our focus is on verifying critical properties of large and complex software like OSs, one such property being correctness of memory accesses. For this purpose, we evaluate both static checking techniques and runtime checking techniques, and present our observations.

Static checking is useful because the cost of repairing a bug increases along the software development lifetime. Bugs discovered and fixed early result in significant cost savings. We evaluate a representative set of existing static checking techniques on the basis of soundness, precision, and usefulness in checking large software like the Linux kernel. We also cover the related problem of deriving property rules for automated checking.

Static checking though useful, cannot in general catch all possible bugs in C programs, without producing false alarms. Consequently, it is mostly a best-effort exercise to find some, but not all bugs. However, memory access checking is too critical to be trusted to static techniques. We introduce a survey of existing bounds checking techniques. In particular, we show that it is possible to classify the wide range of existing bounds checking techniques into a single hierarchy. We evaluated existing techniques based on soundness, performance, and usability. A software bounds checking framework for the Linux kernel is introduced and evaluated.

We observe that existing static checking methods have limited coverage in detecting bugs. For verifying the correctness of critical program properties like memory accesses, runtime checking is necessary. However, runtime methods by themselves are not very useful due to performance reasons. Hence, we conclude advocating static analysis for use in runtime checking systems.

To my family

1 Introduction
2 Overview
    2.1 Why Is C Not Enough
        2.1.1 Types of Bugs
        2.1.2 The Nature of Software Bugs
    2.2 Bug Detection Techniques
        2.2.1 Formal Verification
        2.2.2 Runtime Checking
        2.2.3 Static Checking
        2.2.4 Issues and Trade-Offs in Static Checking
3 Static Analysis Survey
    3.1 Annotation-Based Techniques
        3.1.1 Lint
        3.1.2 LCLint
        3.1.3 Splint
        3.1.4 Extended Static Checking
        3.1.5 Cqual
    3.2 Tool-Based Techniques
        3.2.1 PREfix
        3.2.2 ESP
        3.2.3 Abstract Interpretation: ASTREE
        3.2.4 SLAM
        3.2.5 Metacompilation
    3.3 Language-Based Approaches
        3.3.1 Cyclone
        3.3.2 Vault
        3.3.3 Comparison of Cyclone and Vault
        3.3.4 MOPS
4 Automated Deduction of Checking Rules
    4.1 The Need for Automated Deduction
    4.2 Invariant Inference using Predicate Abstraction
        4.2.1 Inferring Loop Invariants
        4.2.2 Predicate Abstraction
    4.3 Invariant Inference for Annotation Based Checkers: Houdini
    4.4 Runtime Methods for Inferring Invariants
    4.5 Bugs as Deviant Behavior: Combining Static Analysis and Invariant Inference
    4.6 Extracting Finite State Models from Code
        4.6.1 Metal Based System for Flash
5 Case Study: Memory Bounds Checking
    5.1 The Nature of Memory Bugs
    5.2 Static Methods for Detecting Memory Bugs
        5.2.1 Archer
    5.3 Runtime Checking for Detecting Memory Bugs
        5.3.1 Purify
        5.3.2 Safe-C
        5.3.3 Decoupling Computation and Bounds Checking
        5.3.4 Backward-Compatible Extensions to Safe-C
        5.3.5 Cash: Hardware Assisted Bounds Checking
    5.4 Bounds Checking for the Linux kernel
        5.4.1 KBCC
        5.4.2 Mudflap
        5.4.3 Kernel Code Electric Fence
    5.5 Performance Evaluation
        5.5.1 Kefence
        5.5.2 KBCC
6 Static Analysis as a Means for Facilitating Runtime Checking
    6.1 Static Analysis for Bounds Checking
        6.1.1 CCured
        6.1.2 Static Analysis in BCC
    6.2 Static Analysis for Instrumentation of Code
        6.2.1 Aspect-Oriented Programming
        6.2.2 Source Code Instrumentation in Aristotle
7 Conclusions and Future Work
    7.1 Static Checking
    7.2 Runtime Bounds Checking
    7.3 Future Work

List of Figures

    2.1 Example of an Application Level Protocol that should be satisfied in program code
    3.1 Static checkers plotted along the two dimensions coverage and effort
    5.1 Bounds checking overhead for ReiserFS with Am-utils and Postmark
    6.1 Architectural overview of the Aristotle system

List of Tables

    3.1 Table showing how Cyclone handles various types of errors
    3.2 Feature comparison of static checkers
    5.1 A two-level classification for runtime bounds-checking technique
    5.2 Table showing the differences between capability-based and availability-based runtime bounds checking
\LARGE{{Acknowledgments}}

Acknowledgments

I would like to thank my advisor, Professor Erez Zadok, for his constant support for this project. The vision of this project shaped during several insightful discussions with him. Besides, he taught me several interesting and useful research practices along the way. Sean Callanan, Radu Grosu, Scotta Smolka, Scott Stoller, and Annie Liu provided an excellent forum for discussion and development of ideas related to this project.

My committee members, Tzi-cker Chiueh and Scott Stoller, were generous of their time and provided helpful input and comments on the work. Amit Sasturkar and Rahul Agarwal provided valuable feedback on some ideas in this paper.

I thoroughly enjoyed the company of members of File Systems and Storage Laboratory, Sean Callanan, Gopalan Sivathanu, Charles Wright, Avishay Traeger, and Nikolai Joukov during the course of this project. Kapil Kumar, Mahadev Konar, Susanta Nanda, Sumit Jain, and Mohit Gupta have made my stay in Stony Brook entertaining and pleasant. And, friends from A1 are just as cherished today, on a daily basis, as they were years before.

Finally, I would like to thank my family, papa, mummy, montu, and didi for everything.

This work was partially made possible thanks to NSF CAREER award EIA-0133589, NSF award CCR-0310493, and HP/Intel gifts numbers 87128 and 88415.1.

Chapter 1
Introduction

Operating system reliability has become more important than ever. Users trust their computers with important data, and OS failures can have catastrophic effects, like data loss, resource unavailability, and at the very least, frustrated users. Moreover, commodity operating systems are increasingly being used in enterprise scenarios to create large scale computing and storage clusters. But commodity operating systems are still unreliable. Recent techniques for automated debugging of some commodity operating systems have discovered numerous bugs [47].

Manual checking of code is not enough. For one, it is both error-prone and time consuming, and secondly, humans easily get overwhelmed by complexity. This motivates the need for automation of the checking process. Moreover, the wide variety of bugs found in these systems call for innovative techniques for automated bug detection. Existing techniques can be classified into static methods, model checking, and runtime checking. In this report we study applications of static techniques for automated debugging and analysis of software. In particular, we consider two different ways to apply static analysis to software. First, we present a survey of static checking techniques for automated debugging of OS kernels. Second, we study the use of static analysis and instrumentation for runtime checking of OS kernels.

Static methods have significant benefits over other approaches to automated analysis of software. In comparison to runtime checking, static analysis has three main advantages. First, static analysis does not need to actually execute the software. Thus bugs can be discovered and fixed early during the development process. Second, software that is statically checked does not suffer from the performance overhead of runtime checking. Third, static analysis covers all code uniformly; unlike runtime checking techniques, bugs on rarely executed code paths also get discovered. Although model checking techniques share some of these merits over runtime checking, they are rendered effectively intractable for large software systems due to scalability problems and the lack of a formal specification or model of the system.

We believe that static analysis of code has great potential in helping improve software quality. Static checking of code is just one of its possible applications. Additionally, static analysis of code can be combined with other techniques like model checking and runtime checking to produce even more powerful checking systems. For example, one can first check software statically, and then leave the rest for runtime verification. Similarly, one can combine static analysis with model checking to automatically discover predicates using model checking heuristics and then check them using static analysis [16].

We demonstrate the amalgamation of static analysis and runtime checking through a memory bounds checking system we have developed for the Linux kernel. Memory bound overrun bugs often have serious consequences. They can result in security breaches, program misbehavior and worse still, they can silently damage the system. It is all the more acute for OS kernels where memory bugs can lead to security breaches and permanent data loss.

We have ported two open source memory bounds checking systems into the Linux kernel, and with the help of static techniques, have significantly cut down their performance overheads. Additionally, we have made significant modifications to reduce the number of false positives, and support concurrency.

The rest of this paper is organized as follows. In Section 2 we present a classification for bug detection techniques and synthesize the basic ingredients for static checking. In Section 3 we present a survey of a static checking techniques. Section 4 motivates the need for automated deduction of checking rules and presents a quick survey. Section 6 generalizes static checking and advocates using it as a means for enhancing the effectiveness of runtime checking. We present some of our own work in this section. Section 5 focuses on memory related bugs in software. We present a survey of existing techniques and present our own work in this field. We finally conclude in Section 7 and identify future avenues of research in this field.

Chapter 2
Overview

In this section we derive the basic thesis of automated bug finding in software. We first examine the nature of bugs in systems software. Based on these bugs, we then present the design space for automatic static checking techniques with particular emphasis on techniques for checking large complex software written in C, like OS kernels. Finally, we compare static checking with other methods of detecting bugs in software, like formal verification and runtime checking.

2.1 Why Is C Not Enough

Since its very inception, C has been the de-facto language for systems programming. In fact, systems programming and C have evolved side by side ever since C's introduction in the 1970s. Like other trends in computing industry, this was not necessarily designed to be that way, but this is how things turned out to be. So far, C has suited well for the requirements of a systems programming language. In general, any systems programming language is required to provide the following features.

Control over low-level data representation
Explicit memory management
Performance, and "transparency" of performance: performance characteristics of a C program should directly manifest in the source code (e.g., language features like automatic garbage collection make runtime overhead of a program unpredictable).

C satisfies all these requirements to the extent mandated by traditional systems. However, as the scale and complexity of systems increases, systems design as it is known today is getting increasingly overwhelmed by complexity and bugs. This can be attributed to two basic deficiencies in the design of the C language.

First, C is too permissive. As a result, compilers cannot catch all possible bugs. For example, C tries to give programmers much control over memory layout of data structures; and, it allows them to control the lifetime of program resources (like memory). This is partly by design: C does not enforce strong semantics at the language level (type system). Also, at the time C was conceived, program analysis was not advanced enough to develop a strongly typed but flexible language. In any case, the design of C served well in accommodating a wide range of programming styles and implementation requirements. But, it also opens up the possibility of programming bugs. The default type checking done by C compilers is weak and misses many bugs. It is ultimately left to the developer to exercise their prudence to produce "good" error-free code.

The second serious drawback of C is its lack of extensibility. There is a large gap between the semantics of a typical systems program and the simple rules enforced that the C compiler tries to enforce. As a result, the compiler is bound to miss bugs. They may not be "language" bugs, but they are bugs nevertheless. Consider the following example. Synchronization primitives are a common ingredient of systems programs. In the context of the Linux kernel, it is required that the spin_lock function is called, only when the lock is not already held. Similarly, the spin_unlock function should be called while the lock is held. This property can be conveniently depicted using a state diagram as shown in Figure 2.1.

Figure 2.1: Example of an Application Level Protocol that should be satisfied in program code. During normal operation, the system can be in one of two states: Locked or Unlocked. Calls to spin_lock and spin_unlock cause transitions between these states. Any other transition results in an error.

However, the C language as such, provides no support for incorporating application level semantics into compile-time checking. The developer has to validate higher level properties of their code using alternative methods.

Code inspection is still the most widely exercised code checking technique (after compilation). But humans can easily get overwhelmed by complexity beyond a few hundred lines of code. As a result, manual checking is error-prone and time consuming. Moreover, manual checking has poor incremental returns: as different components of the system evolve and rules and invariants change, it is quite an art trying to maintain such code. Local changes applied to only a small part of the program often end up breaking other, seemingly unrelated parts of the code. Despite all these challenges, C has no built-in support for language-extensions for checking program conformance to higher level properties.

2.1.1 Types of Bugs

It is not possible to give a complete listing, or even a useful classification of all possible bugs. In this section we discuss some common bugs and their causes. This discussion will help us classify bug detection techniques in the next section.

Let us consider a simple bug. The following piece of C code tries to copy the string "hello world" into the array arr.

fn()
{
  char arr[11];
  strcpy(arr, "hello world");
}

One can make two observations regarding this code. First, the allocated space for arr is too small to accommodate the entire string in it. Secondly, strcpy does not detect and handle this situation. As is obvious, this operation will cause data to be written beyond the limits of the array arr. The result could be a program crash, or misbehavior due to memory corruption, or, no visible effect at all. One can take two points from this example:

simple static checking can detect some potentially dangerous bugs.
knowledge of procedure interfaces would be useful, e.g., knowing that strcpy will not check the operation for bounds overflow.

We now present some common types of bugs.

Memory Errors

These are probably the most common bugs in C programs. Memory access bugs, especially memory buffer overruns have caused much grief to IT industry. We classify existing techniques for detecting memory bugs in Section 5. We also consider in detail the case of runtime bounds checking.

Inconsistent Interfaces

In large complex systems, which have several smaller components making up the whole system, it is important that individual components have strictly enforced interfaces. When this is not the case, the software can behave unpredictably. This problem can be related to the issue of automatically enforcing higher-level system specifications. Traditionally, software written in C has not had access to checkable specifications. For example, Jass or JML for Java.

Violation of Temporal Properties

Imperative programs are essentially temporal artifacts. There is often a prescribed temporal ordering of operations. For example, lock before unlocking, etc. Violations of temporal properties can result in synchronization errors (deadlocks, race conditions), memory leaks, use after free (dereferencing of pointers which have already been freed), etc.

Bugs due to Software Development Processes

Software development processes introduce several bugs. One of the nastier ones is regression bugs. As code evolves over time, it is more difficult to maintain. First, developers leave and with them goes ad-hoc undocumented knowledge about the code. Second, as new code is added, components may get out of sync with each other, or existing code may be broken. This is very much the case when software specifications are not tightly coupled with the code, and the developer is responsible for ensuring compatibility. For example, checkable interface specifications, or annotation-based systems can help prevent such regressive digressions (Section 3).

This paper does not address problems arising from incorrect synchronization in concurrent programs, including problems like data races, atomicity violations, and deadlocks. These problems have been extensively studied in literature, and static, dynamic, as well as hybrid techniques addressing these problems have been developed [23,30,56,51,52,112].

2.1.2 The Nature of Software Bugs

In recent times, a lot of research effort has been focused at detecting and studying bugs in operating system kernels. This is no surprise considering the possible implications of buggy operating systems. The two leading operating systems of today, Linux [47,137] and Microsoft Windows [16] have received much scrutiny from researchers developing automated static checking systems. Also, these methods have been successful in finding numerous bugs.

Chou et al. studied of bugs in the Linux kernel [32]. The bugs they report were found using their static checking system which we discuss in Section 3.2.5. In particular, they made the following notable observations about bugs in the Linux kernel which justify some of the claims we have made above:

Bug distribution is much higher in device drivers than the core kernel. This is because the core kernel undergoes much greater scrutiny and runtime testing than device drivers. However, a buggy device driver is just as likely to crash a kernel.
Long and complex functions have a higher number of bugs per line of code. Thus, programming in C does not handle complexity very well.
Bug distribution between files is logarithmic: few files have most bugs while most files have none or few bugs. This is probably due to the fact that thousands of developers are involved in kernel development, and they have very different backgrounds-they all not be trusted with producing completely bug-free code.
Some bugs in the Linux kernel had lived for a couple of years before they were detected. This serves to demonstrate that traditional methods do not achieve good coverage.

Apart from the issue of programming errors, another grossly mishandled case is the absence of a unified mechanism for graceful recovery from errors (exceptions). Since error detection is often postponed to runtime (instead of being pointed out at compile time), the system is usually not well equipped to handle every possible error case. In the worst case, this could cause the whole system to crash, leading to loss of data or service. The cases in which the system misbehaves without explicitly crashing are equally disagreeable.

The approaches we discuss in this section try to address some or all of these issues in different ways. Existing methods for detecting or preventing memory bugs can be classified into language based approaches, static analysis, and runtime checking techniques

2.2 Bug Detection Techniques

Bug detection is a very broad field. The central thesis of this paper is that static detection of bugs can be useful; however, they cannot completely verify large complex systems. For such systems, static checking is best combined with runtime checking for catching all bugs. Static checking is useful because the cost of repairing a bug increases along the software development time-line. An early discovery can result in a substantial saving. This paper demonstrates that static analysis can be very useful in these regards. Of course, it would be a better idea to avoid bugs altogether (better programming practices); however, in the absence of completely sound bug avoidance, having redundant checking is better than producing buggy programs.

Apart from "discovering" bugs, it would be very useful if the bug is detected as close to its origin in the code as possible. This makes for easy fixing of the bug. For instance, run-time memory checking facilities can also detect NULL pointer dereferences. However, they usually do not provide detailed history of events leading to the NULL pointer, a good static checker can in many cases backtrack in its analysis state to provide more useful information.

We now compare static checking with alternative bug detection strategies for the case of checking OS kernels.

2.2.1 Formal Verification

Twenty years ago, Rushby outlined the difficulties in verifying secure kernels [108] and suggested a physical separation of OS components. Neumann et al. built comprehensive mechanisms for proving the security of the PSOS operating system [96], a hierarchically-structured OS written in a strongly-typed language. Later works focused on compartmentalization, boundary protection, and some verification of OSs such as Mach [9,62], Pebble [59], EROS [117], DEOS [101], and Fiasco [68,124]. In the Kit project, Bevier tried to prove the correctness of an OS using theoretical techniques [21]. As security concerns grew with the Internet, the U.S. Department of Defense released The Orange Book [41], which serves as a "bible" for evaluating secure systems. Arbaugh showed the life-cycle of system vulnerabilities, emphasizing the importance of early discovery, disclosure and fixing of bugs, and then releasing patches [12]. They also showed how important it was for the security and integrity of computer systems, to be able to verify each component and layer independently [11].

Model checking of OSs has been used to detect specific problems within OS components. Pike et al. used the SPIN [20] verification system to detect a certain subtle race condition in Plan 9 [103]. More recent efforts seek to extract models from OS source code [14,17,87,48], with an emphasis on checking data-independent properties of device drivers and network protocols, and to apply model checking directly to software [36,61,121,80,93].

However, model checking techniques do not trivially extend to operating systems that were not designed with verifiability as a goal (e.g., Linux). Model checking such a huge system is impractical for two reasons: (1) lack of a complete model or specification for the system, (2) state space explosion. Nevertheless, several static checking techniques use ideas from the model checking world [16,36,42].

2.2.2 Runtime Checking

Most runtime external checking of systems is incidental. For example, the Bonnie tool is a program that exercises the memory (VM) and cache subsystems of the OS by creating very large files and then randomly reading and writing parts of them [24,35]. Similarly, file-system developers test a new or modified file system by compiling a large package (such as the kernel source itself) inside a mounted instance of the new file system [100,134,136,135]. The POSIX standard specifies a common set of system calls for OSs [70] and also offers a standard description of how to test code for POSIX compliance [69].

In the fields of security and intrusion detection, wrappers are used to guard software interfaces by monitoring inputs, outputs, and sequences of operations [126,18,19,114,85,115,133]. Chiueh showed how to protect loadable kernel modules from each other on x86-based OSs such as Linux [29]. We also propose to wrap system calls and monitor them for compliance, but our wrappers are generated from formal specifications of the system-call interfaces. Wrapper mechanisms with an overhead of no more than 1-2% are possible [18,19,134,136,135]. Our monitoring is intended not only for security, but for any deviant behavior.

Runtime checking in tools such as the Bounds-Checking Gnu C (BCC) [25], Purify [106] and Insure++ [79], perform runtime monitoring for memory corruptions; e.g., running off of the end of a string in C. We discuss this in detail in Section 5. Run-time monitoring can also detect potential race conditions [113] and atomicity violations [83,52]. Also, some methods have combined runtime checking with static checking to obtain robust property checking. Specifically, the static checking properties validates all that can be verified statically, the remaining checks are left to be checked at runtime.

2.2.3 Static Checking

The central tenet of static checking is that many abstract program restrictions map clearly to source code constructs. Static checking is based on the science of program analysis. Program analysis offers static techniques for predicting safe and computable approximations to the set of values or behaviors arising dynamically at run-time when executing a program on a computer [97]. However, program analysis-including finding possible run-time errors-is undecidable: there is no mechanical method that can always answer truthfully whether programs may not exhibit runtime errors. When faced with this mathematical impossibility, the alternative has been to apply certain techniques that simplify the overall program analysis problem into more tractable problems. There are four basic approaches to program analysis:

Data Flow Analysis treats programs as graphs: the nodes are the elementary blocks and the edges describe how control might pass from one elementary block to another. For example, for handling the problem of reaching definitions, the standard approach is to coin mathematical abstractions for the role of edges in the graph and solve them using a set of equations. Symbolic execution as used in PREfix and SLAM (Section 3.2) are types of data flow analysis.
Constraint-Based Analysis verifies global properties by solving a set of local constraints generated from the program text. Constraint generation from program text has the effect of separating specification from implementation. Constraints hide implementation details and pave the way for tractable, structural proofs. There are two basic constraint theories: data flow analysis and type-inference. In Section 3.1.5 we present Cqual which is a constraint-based type inference system.
Abstract Interpretation (AI) maps programs into more abstract domains [74,8]. This makes analysis more tractable and potentially useful for checking. The theory of AI differs from other approaches, it is a general methodology for calculating analyses rather than specifying them and then relying on a posteriori validation. For AI to be effective, it imposes a constraint on the nature of the abstraction and the concretization functions. This is called the galois connection and serves to produce a safe over-approximation analysis. ASTREE (discussed in Section 3.2.3) is an example of an AI-based static checker. In Sections 3.2.4 and 4.2, we use predicate abstraction [110], which is a special form of abstract interpretation in which the abstract domain is constructed using a given set of predicates.
Type and Effect Analysis is an amalgamation of two ingredients: an effect system and an annotated type system. The effect system tells the effect a statement's execution has. The annotated type system provides the semantics for the effect system. Type analysis is the most commonly used technique enforced by compilers for static checking (e.g., the gcc -Wall option). Type inference (Section 3.1.5) is a useful approach. Most of the checkers discussed in Section 3 use some form of strong type enforcement.

Despite the apparent differences, these approaches to program analysis are equally powerful. There is some commonality among these approaches. The art of program analysis entails cultivating the ability to choose the right approach for the right task and in exploiting insights developed in one approach to enhance the power of the other. Moreover, one can argue that it is not possible to give a single clear-cut classification of static checking techniques. This is because (1) there is a significant overlap in different techniques, (2) seemingly radical approaches are often equivalent in terms of program analysis power, and (3) designing good static checkers is increasingly an engineering problem-innovative implementations often beat more ambitious but less useful ones. Nonetheless, for the purpose of discussion, it helps to divide our subjects into a few classes. In Section 3 we classify static checking techniques into annotation-based methods, and tool-based methods. Although this may not be the right choice, it certainly has the advantage of better capturing the programmer's imagination.

2.2.4 Issues and Trade-Offs in Static Checking

Before we discuss existing static analysis techniques, it would be useful to get a taste of common engineering issues and trade-offs in designing static checkers. We qualify these as "engineering" because they require a deep knowledge of available information and a creative intuition for effective handling. The choices made in handling these issues often have little effect on the "power" of the static analysis as such, but they often are the deciding factor in the acceptance of the system.

We present below a listing of the issues and trade-offs we identified. This is not an exhaustive list. Also, they are mostly orthogonal; in most cases one can realign one trade-off without disrupting others too much.

Chapter 3
Static Analysis Survey

In this section we describe a representative set of existing static checking systems. We place particular emphasis on the scalability and the usefulness of the checker. Scalability is important because our ultimate goal is checking large complex software like OS kernels. Usefulness has many dimensions: the more bugs a checker finds the more useful it is, fewer false bug reports, and ease of use by the end-programmer to easily add custom checks. We classify static checking techniques into annotation based methods, and tool based methods. Although this may not be the right choice, it certainly has the advantage of better capturing the programmer's imagination. We conclude this section with a quick analysis of how individual systems compare against each other.

3.1 Annotation-Based Techniques

Program annotations is the term given to specifying behavioral aspects of code. For example, the const qualifier in C specifies a behavioral constraint. Program annotations serve as a useful tool for automated checking of behavioral aspects of programs. Annotations can also be viewed as a means for augmenting the semantic and behavioral content of a program, by specifying additional constraints that the code must satisfy. Thus, annotations help enforce additional semantic checking of the program over the default checking performed by the compiler.

In this section we consider a few annotation-based tools for static checking. Although this is not in any way a comprehensive listing, still to the best of our knowledge, it is a representative list of the state of annotation assisted checking in software.

3.1.1 Lint

Lint is probably the oldest static checker (other than the language compiler) that continues to be widely used today. Lint checks C programs for a small fixed set of syntax and semantic errors. It performs simple flow sensitive and type aware checking, but can only detect violations of basic program properties. In fact, Lint is now essentially superfluous since many of the checks performed by Lint have been in GCC for a long time now. However, Lint is of historical value given the set of checkers that were inspired by it, two of which are the subject of this section: LCLint and Splint.

Lint is a command-line driven tool with command line arguments for specifying which of the several checks provided by Lint should be applied to the program. The annotation support in Lint was introduced for suppressing false alarms. One additional feature of Lint is its "code portability" check: when invoked with the -p option, Lint checks the code for possible portability issues from its dictionary of known portability problems [122].

3.1.2 LCLint

LCLint [49,2] grew out of the Larch Project at the MIT Lab for Computer Science and DEC Systems Research Center. It was originally named after LCL [123], the Larch C Interface Language and Lint.

LCL and Larch

The Larch family of languages [65,92] represent a two-tiered approach to formal specification. A specification is built using two language: the Larch Shared Language (LSL), which is independent of the implementation language, and a Larch Interface Language designed for the specific implementation language. An LSL specification defines sorts, analogous to abstract types in a programming language, and operators, analogous to procedures. It expresses the underlying semantics of an abstraction.

The interface language specifies an interface to an abstraction in a particular programming language. It captures the details of the interface needed by a client using the abstraction and places constraints on both correct implementations and uses of the module. The semantics of the interface are described using primitives and the sorts and the operators defined in LSL specifications. Interface languages have been designed for several programming languages. LCL [123] is a Larch interface language for Standard C. LCL uses a C-like syntax.

LCLint

Much of the difference between Lint and LCLint is that Lint's annotations are related to minimizing the number of false positives, whereas LCLint's annotations have more features, only some of which are used for this purpose. LCLint makes extensive use of annotations to define behavioral aspects of code and to suppress false positives. LCLint's annotations were originally derived from a subset of LCL specifications. Although separate LCL specifications can provide more precise documentation on program interfaces than is possible with LCLint annotations, LCLint's designers chose the annotations approach over separately provided specifications for obvious reasons.

LCLint can detect problems like violations of information hiding; inconsistent modifications of caller-visible state or uses of global variables; memory management errors including uses of dead storage and memory leaks; and undefined program behavior. LCLint checking is done using simple dataflow analyses and is potentially as fast as a compiler.

LCLint was designed to run frequently on large programs and hence checking efficiency and scalability were important requirements. As a result, LCLint employs a number of simplifications for faster analysis and ease of use. For example, LCLint does not incorporate inter-procedural data flow analysis, instead relies on programmer annotations for adjudging the role of function calls, and loops are considered identical to if statements for the purpose of the analysis. Also, LCLint focuses specifically on errors at interface points. Hence, it only uses weak intra-procedural alias analysis that infers inter-procedural actions from the annotations.

3.1.3 Splint

Splint 3.0.1 [3] is the successor of LCLint version 2.5q. The main changes are support for secure programs (hence the name Splint) and extensible checks and annotations.

User-defined Annotations

Currently, LCLint users are limited to a pre-defined set of annotations and associated checking. This works well as long as their programming style is consistent with the methodology supported by LCLint (e.g., abstract data types implemented by separate modules, pointers used in a limited systematic way), but is problematic if they need to check a program that does not adhere to this methodology. For example, LCLint provides annotations for checking storage that is managed using reference counting. An annotation is used to denote an integer field of a structure as the reference count, and LCLint will report inconsistencies if new pointers to the structure are created without increasing the reference count, or if the storage associated with the referenced object is not deallocated when the reference count reaches zero. If a program implements reference counting in some other way (for example, by keeping the reference counts in a separate lookup table), however, LCLint provides no relevant annotations or checking. More generally, applications have application-specific constraints that should be checkable statically. These could include value consistency requirements, event ordering requirements, and global constraints. Splint provides extensions to LCLint that address this problem by supporting user-defined annotations. Programmers have some freedom to invent new annotations, express syntactic constraints on their usage, and define checking associated with the user-defined annotation in different contexts.

3.1.4 Extended Static Checking

ESC/Java [54] descends from the intellectual tradition of program verification. It is an annotation-based static checker for Java programs that uses theorem solving of program predicates to check for correctness. ESC/Java is intermediate in both power and ease of use between type-checkers and theorem-provers, but it aims to be more like type-checkers and is lightweight in comparison with to theorem-provers. This way, the designers of ESC/Java hoped to make it easy to use by the programmers (like type-checkers) rather than extraordinarily powerful but so difficult to use that programmers shy away from it (like theorem-provers). Also, hence the name "extended static checker." Figure 3.1 shows the intuitive placement of "extended static checking" between conventional type-checking and program verification [54].

Figure 3.1: Static checkers plotted along the two dimensions coverage and effort (not to scale). Extended Static Checking as implemented in ESC/Java is intermediate both in coverage and effort needed between type-checkers and program verification systems.

At the time ESC was first proposed [42], powerful static checkers needed programmer assistance for checking. This is not in agreement with ESC's goal for ease of use and hence ESC does not require any programmer assistance while checking.

ESC/Java is designed to scale well with code size. It employs modular checking. Specifically, when checking the body of a function f, ESC/Java does not examine bodies of functions called by f. Rather, it relies on the specifications of those functions, as expressed by annotations. Hence, ESC/Java can check individual components of a system separately, leading to good scalability.

ESC/Java supports a powerful specification language. The specification language is the language used to write ESC/Java annotations. It is composed of one or more specification expressions, or pragmas. ESC/Java provides a rich specification language which revolves around these expressions. ESC/Java expressions can be arbitrary first order logic formulas. Also, they can be nested or combined like standard Java expressions, and are type-checked in ways similar to Java code. However, there are inevitably some differences between them and standard Java syntax. Also, ESC/Java specifications can be provided in separate files (.spec files). This is useful when the Java sources they annotate are unavailable for use by the checker.

ESC/Java checks the first-order logic expressions using the Simplify automatic theorem prover [38]. First-order is logic is undecidable, hence no theorem prover can prove it in polynomial time. Simplify is a best effort theorem prover and tries to prove or disprove its input till as long as it can before giving up.

The operation of ESC/Java consists of the following steps:

First, ESC/Java loads, parses, and type-checks the files named on the command line, as well as any other files needed because their contents are directly or indirectly used by files named on the command line.
Next, for each class whose function bodies are to be checked, ESC/Java generates a type-specific background predicate encoding such information as subtype relations, types of fields, etc. in the class to be checked and the classes and interfaces it uses.
Next, ESC/Java translates each function to be checked into a logical formula called a verification condition (VC). As an intermediate step in this translation, ESC/Java produces a command in an intermediate language [86] based on Dijkstra's guarded commands. The intermediate language includes commands of the form assert E, where E is a boolean expression of the language. An execution of a command is said to "go wrong" if control reached subcommand of the form assert E when E is false. Ideally, when a function f is translated into a command C and then to a verification condition V, the following three conditions should be equivalent:
1. There is no way that f can be invoked from a state satisfying its specified preconditions and then behave erroneously by, for example, dereferencing null, violating an assert pragma, terminating in a state that violates its specified postconditions, etc.
2. There is no execution of C that starts in a state satisfying the background predicate of f's class and then goes wrong.
3. V is a logical consequence of the background predicate.

In practice, the translation is incomplete and unsound, so there may be semantic discrepancies between f, C, and V. Finally, ESC/Java invokes the Simplify [38] theorem prover, asking it to prove each body's verification given the appropriate background predicate. If an attempted proof succeeds (or if Simplify exceeds specified resource limits in attempting the proof, or if ESC/Java exceeds specified resource limits generating the verification condition), then ESC/Java reports no warnings for the body. If the proof fails (other than by exceeding resource limits), Simplify produces a potential counterexample context, from which ESC/Java derives a warning message.

Unsoundness and Incompleteness in ESC/Java

ESC/Java is unsound and incomplete and can hence miss errors actually present in the program it is analyzing. This is inevitable from ESC/Java's design, since it is meant to be an extended static checker rather than a program verifier. However, ESC/Java provides pragmas for the programmer to control the soundness and incompleteness of checking. Thus users can balance annotation effort, completeness, and verification time. Here are some known causes of unsoundness in ESC/Java (taken from the ESC/Java Manual [76]).

Loops: ESC/Java does not consider all possible execution paths through a loop. It considers only those that execute at most one complete iteration (plus the test for being finished before the second iteration). This is simple, and avoids the need for loop invariants, but it is unsound. The users can modify ESC/Java's treatment of loops by using the -loop command-line option to specify how many loop iterations should ESC/Java consider. We study methods for automatically inferring loop invariants in Section 4.2.1. Archer (Section 5.2.1) uses symbolic execution and heuristics to aggressively unroll loops whenever possible.
Trusting Pragmas: The assume, axiom, and nowarn pragmas allow the user to introduce assumptions into the checking process. ESC/Java trusts them. If the assumptions are invalid, the checking can miss errors. For example, ESC/Java depends on programmers to supply monitored and monitored_by pragmas telling which locks protect which shared variables. In the absence of such annotations, ESC/Java will not produce a warning when a routine might access a variable without holding the appropriate lock.
Search Limits in Simplify: If Simplify cannot find a proof or a (potential) counterexample for the verification condition for a routine within a set time limit, then ESC/Java issues no warnings for the method, even though it might have issued a warning if given a longer time limit. If Simplify reaches its time limit after reporting one or more (potential) counterexamples, then ESC/Java will issue one or more warnings, but perhaps not so many warnings as it would have issued if the time limit were larger. There is also a bound on the number of counterexamples that Simplify will report for any conjecture, and thus on the number of warnings that ESC/Java will issue for any routine. Again, these limits can be set by the programmer.
Concurrency: Although we do not consider concurrency in this paper, it is nonetheless an important topic in program analysis research. Interestingly, ESC/Java assumes that the value of a shared variable stays unchanged if a routine releases and then reacquires the lock that protects it, ignoring the possibility that some other thread might have acquired the lock and modified the variable in the interim.

Here are some causes and consequences of incompleteness in ESC/Java:

Incompleteness of the Theorem-Prover: The verification conditions that ESC/Java give to the Simplify theorem prover are in a language that includes first-order predicate calculus along with some (interpreted) function symbols of arithmetic. Since the true theory of arithmetic is undecidable, Simplify is necessarily incomplete. In fact, the incompleteness of Simplify's treatment of arithmetic goes well beyond that necessitated by Godel's Incompleteness Theorem. In particular, (1) Simplify has no built-in semantics for multiplication, except by constants, and (2) Simplify does not support mathematical induction. Also, first-order predicate calculus (FOPC) is only semidecidable-that is, all valid formulas of FOPC are provable, but any procedure that can prove all valid formulas must loop forever on some invalid ones. But it is not useful for Simplify to loop forever, since ESC/Java issues warnings only when Simplify reports (potential) counterexamples. Therefore Simplify will sometimes report a (potential) counterexample C, even when it is possible that more work could serve to refute C.
Modular Checking: ESC/Java's use of modular checking causes it to miss some inferences that might be possible through whole program analysis. However, the designers chose to retain modular checking in lieu of its scalability benefits.

3.1.5 Cqual

Cqual (previously Carillon) [57] is an annotation-based extension to C that uses the power of flow-sensitive type-state analysis to catch bugs statically. Cqual is an amalgamation of ideas from type-checking and constraint-based analysis. The user extends the language type system with flow-sensitive type qualifiers, which are atomic properties that refine standard types similar to standard C type qualifiers such as const. Cqual checks constraint satisfaction on these type qualifiers inter-procedurally. Cqual's analysis is sound, meaning that it does not miss a bug, although it may report false errors (it is incomplete).

Programmers uses Cqual by specifying two things: they augment standard C code with type qualifiers, and, identify sub-typing relationship between the type qualifiers. For example, consider the following program constraint introduced to prevent format string attacks [116].

Data entered by a user should not directly be used as argument to printf since the user may be an adversary.

This constraint can be specified in Cqual by introducing two type qualifiers for character strings: tainted for strings that could be controlled by an adversary, and untainted for strings that must not be controlled by an adversary. The sub-typing relationship between the two qualifiers is denoted as:

    untainted < tainted

meaning that untainted is a subtype of tainted, any piece of code that accepts a tainted argument can be passed an untainted argument, but not vice versa.

The sub-typing relation is typically specified as a lattice of type qualifiers with partial-order relations between the qualifiers. CQUAL has a few built-in inference rules that extend the subtype relation to qualified types. Using the extended inference rules, CQUAL performs qualifier inference to detect violations against the type relations defined by the lattice. Intuitively, Cqual works by converting a program augmented by type qualifiers into constraints on type qualifier compatibility, which can be represented graphically. The problem of checking correctness is then reduced to graph reachability.

Cqual's analysis employs a unique combination of flow-insensitive and flow-sensitive analysis. It models program state as an abstract store, which is a mapping from variables to types (types contain qualifiers). The effect of a program statement on the state is emulated as a "Hoare logic" type effect with possible changes in variable types (ALLOC and ASSIGN operations). This flow-insensitive phase also includes alias analysis and effect inference. Following this, the flow-sensitive analysis phase handles flow-sensitive qualifiers and linearity issues for alias analysis. Finally, even though Cqual's analysis is not path-sensitive, it does not suffer from false-positives due to impossible paths. This is because Cqual's constraint satisfaction logic tries to find at least one way to satisfy the constraints, rather than finding all possible violations. However, since it does not do path-sensitive analysis, the error message may not be as informative (as in PREfix, for example).

One of the key innovations in Cqual is its practical approach to alias analysis. The general problem in alias analysis is making updates to one alias visible to others. Cqual handles this by adding a layer of indirection between pointers and their types by introducing an abstract location with each variable. With this change, a variable is associated with an abstract location, and abstract stores map abstract locations to qualified types. Thus a type update to an abstract location is automatically available to other references pointing to that location. An abstract location is defined as linear if the type system can prove that it corresponds to a single concrete memory location in every execution; otherwise, it is non-linear. Updates to linear locations can arbitrarily change their qualifiers (strong updates). Updates to a non-linear location result in a unification of qualifiers (weak update). Locations may be inferred as non-linear even if strong updates are desired on them (Cqual is sound). Evidently, strong updates give better opportunities to test qualifier constraints. Linearity can be reclaimed by using Cqual's restrict keyword (or equivalently the confine keyword). The expression restrict x = e1 in e2 introduces a new name bound to the value of e1. The name x is given to a new abstract location, and among all aliases of e1 only x and values derived from x may be used within e2. Thus, the location of x may be linear, and hence may be strongly updated, even if the location of e1 is non-linear.

Cqual's scalable, annotation-based analysis has been applied to large software including the Linux kernel. It discovered several locking bugs in Linux [57], and has been used to detect format-string vulnerabilities too [137]. Cqual is easy to use and provides strong soundness guarantees. However, there are two problems with Cqual. Firstly, checking pre-written code with Cqual needs substantial rewriting effort. Secondly, Cqual's conservative aliasing assumptions may lead to a high number of false-positives requiring additional annotation effort to suppress them (restrict operator). Recently, Cqual has introduced automatic confine inference by the static analyzer to reduce the burden of aliasing annotations. In fact, it has been observed to apply to roughly 95% of pre-written C code [10].

Usability

Cqual's design places strong emphasis on usability. Firstly, due to its unique combination of intra-procedural and inter-procedural analysis, Cqual is quite scalable. This is further demonstrated by the number of bugs it has caught in the Linux kernel (4M+ LOC and counting). Secondly, Cqual is easier to use than some other annotation-based systems: it needs less annotation-effort, thanks to automatic inference of type qualifiers. Also, confine-inference has been shown to work 95% of the time on pre-written code. Thus, one does not need to add heavy annotations to enforce linearity. Finally, Cqual is getting ready for mainstream usage: it currently presents the analysis results using Program Analysis Mode, an Emacs-based GUI. However, there are two concerns with Cqual: (1) how much will the rewriting effort be for porting legacy code to Cqual (a Houdini [53] like annotation assistant would be useful), and (2) it is not clear what is the false alarm rate of Cqual's conservative alias analysis.

3.2 Tool-Based Techniques

Today, commodity-class static-checking technologies are still coming up and are in a constant state of flux. While it is widely recognized that static-checking should be integrated into the regular development process, emerging static-checking techniques have two major drawbacks. First, they suffer from highly noisy outputs, and second, existing technologies are still not very mature and continue to evolve. Consequently, developers tend to shy away from intrusive techniques like annotations. Tool-based checkers find it easier to be adopted, even if it is just out of curiosity. Indeed, many tool-based checkers do not aim for soundness and just try to find as many bugs as possible which help them achieve mass acceptance.

3.2.1 PREfix

PREfix [26] is a static-checking tool that was designed to achieve two main goals. First, PREfix should catch maximum number of bugs while keeping the number of false positives. This kind of requirement almost always translates to an unsound system. Second, and most importantly, PREfix should be developer-friendly, that is, developers should easily be able to integrate PREfix into their existing development practices. Although many static-checking tools cite this as their primary goal, none of them do it as good as PREfix does. In fact, Microsoft acquired the company that owned PREfix, and PREfix is now used in-house in Microsoft check code as it is written on a daily basis. It is also going to be available as a feature in upcoming releases of Visual Studio. PREfix makes three key choices to ensure developer friendliness:

Design a tool-based system rather than an annotation based system.
Minimum false positives: False positives are a serious practical drawback to the mainstream acceptance of any static-checking system. PREfix designers made minimizing false positives a first class design goal. Also, it is known that less than 10% of the code of PREfix is said to concern with analysis per se, most applies to the filtering and presentation of output, to reduce the number of false positives [125].
Provide useful error flow information.

PREfix mainly targets memory errors like uninitialized memory, buffer overflows, NULL-pointer dereferences, and memory leaks. One main weakness of PREfix is that in the general case, there is not a method for programmers to define their own custom checks.

PREfix is a path-sensitive static analyzer that employs symbolic evaluation of execution paths. Path-sensitivity ensures that program paths analyzed are only the paths that can be taken during execution, thus keeping the number of false positives to a minimum. Also, path-sensitivity helps PREfix provide useful information about the reported errors. However, path-sensitive analysis can incur a heavy cost; exponential path blowup due to control constructs and possibly infinite paths due to loops make it impractical. The solution adopted by PREfix is to explore only a representative set of paths, whose number is configurable by the user. This is archetypal of PREfix's design goals: sacrificing soundness to get maximum errors with minimal false positives. It is possible to make the system sound, by for example, applying less heavy-weight analysis on the remaining paths (e.g., fixed-point based data flow analysis). However, this would be in direct contradiction with PREfix's goal of keeping the false-positive count extremely low.

PREfix employs a summary-based bottom-up inter-procedural analysis:

Functions are analyzed in bottom-up order in the call graph.
As functions are analyzed, a summary of constraints is generated for them.
Since the analysis is bottom-up, every function call can be replaced by the callee's summary. Thus a function gets analyzed for each possible call site that calls it. In other words, the analysis minimizes the impact of missing summaries for functions yet to be analyzed.
Recursion is broken arbitrarily.

PREfix running time scales linearly with program size due to the fixed cutoff on number of paths. Some other interesting lessons learned from PREfix are:

A bulk of errors come for inter-procedural interactions. Hence, static analyzers should be inter-procedural.
Good user-interfaces with minimal false positives and visual traces of error paths are very handy.

3.2.2 ESP

ESP is a highly scalable, property verification system for large C/C++ programs. It takes as input a set of source files and a specification of a high level protocol that the code in the source files is expected to satisfy. Its output is either a guarantee that the code satisfies the protocol along all execution paths, or a browsable list of execution traces that lead to violations of the protocol. ESP is targeted towards large code bases where it is nearly impossible to verify system-wide properties manually . Also, in existing large scale systems, it is infeasible to annotate the code for potentially checking all possible properties of interest. In addition, protocol specifications can typically be isolated into separate aspects rather than be spread apart through out the entire code base. ESP uses these protocol specifications to check conformance in the code.

The analysis engine of ESP was designed to achieve four goals: it must not validate any program that contains errors, it must scale to very large programs, it must report few false errors, and it must produce useful feedback that allows the programmer to investigate and fix errors.

ESP achieves these goals by combining two separate phases of program analysis: Phase one is a global context-sensitive control-flow-insensitive analysis that produces a call graph and information about the flow of values in the program. Phase two is an inter-procedural context-sensitive data-flow analysis that incorporates a restricted form of inter-procedural path simulation. The first phase produces the program abstractions used by the second phase, and also serves as a safety net for the second phase: it is used to conservatively answer questions about the corner cases that the more precise (and expensive) analysis in the second phase cannot handle.

ESP's design reflects the fundamental trade-off between precision and scalability. Whereas type-inference or data-flow analysis are highly scalable (due to their unification semantics: program state is unified at each join point), path-sensitive analysis is not. However, path-sensitive analysis is more precise since it considers only feasible paths. Just like PREfix, it is ESP's primary goal to implement a precise analysis thus keeping the number of false positives very low while finding a lot of errors. Hence, ESP chose to retain path-sensitive analysis. Scalability concerns are addressed by restricting the property space to finite state temporal safety properties. This was assumed to be an intuitive model. The property is specified using a finite state automaton (FSA). As the program executes, a monitor maintains the current state of the FSA and signals an error when the FSA transitions into an error state. So the goal of verification is: is there some execution path that would cause the monitor to signal an error ?

Let us compare path-sensitive analysis with data-flow analysis. Assume that each of the analyses attempts to verify a certain property. The fundamental difference between the two is that data-flow property analysis tracks only the state of the FSA and ignores non-state-changing code. On the other hand, path-sensitive analysis symbolically executes the program and can thus track execution state in addition. As a result, at branch points, path-sensitive analysis can avoid infeasible paths: it queries the current execution state, if the current execution state can decide a branch direction, it is taken. Otherwise, the current state is split and both branches are processed individually. In comparison, data-flow analysis would simulate the state along both branches combining the end result at the join point.

ESP has been successfully used for checking some large software [91]. ESP has been used to verify certain file I/O properties of a version of the gnu C compiler (roughly 150,000 lines of C code). ESP has also been used for validating the Windows OS kernel against a category of security properties.

3.2.3 Abstract Interpretation: ASTREE

Abstract interpretation [74,8](AI) is a form of program analysis that maps programs into more abstract domains. This makes analysis more tractable and potentially useful for checking. ASTREE is a static program analyzer that uses abstract interpretation to discover bugs. The bugs targeted by ASTREE include out-of-bound array accesses, integer division by zero, and similar errors. Additionally, ASTREE checking can also include user-defined asserts. One restriction imposed by ASTREE on the class of C programs it checks is that they should not contain any dynamic memory allocation, string manipulation, and restricts pointer use. This allows for a fast and precise memory analysis which would not be possible otherwise. Also, this restriction has not proved to be a problem for verifying the class of programs ASTREE was originally designed for: embedded programs as found in automobiles and aerospace.

ASTREE makes certain design choices which can be explained in the context of safety-critical embedded programs as follows:

ASTREE's static-checking is sound considering the need to give correctness guarantees on safety-critical applications. Hence, in the case of ASTREE, the soundness-completeness trade-off is more in favor of soundness than completeness.
ASTREE is fully automatic and does not need any user help. In particular, ASTREE does not use annotations, since a huge body of code already exists for the target systems and annotating it would be too much effort.
ASTREE is designed to be efficient and scalable to handle industry class program sizes.

A consequence of soundness can be low precision or high rates of false alarms. ASTREE is carefully designed to not suffer from this problem. First, ASTREE is parametric, that is, the trade-off between cost of the analysis and precision of analysis can be fully adapted depending on need. Second, ASTREE is modular. It is made of pieces ( called abstract domains) that can be assembled and parameterized to build application-specific analyzers, adapted to a domain of application or to end-user needs. Third, ASTREE has been extended with domain-specific knowledge about target applications to better test sophisticated behavior. ASTREE counts on its modular, parametric, and domain-aware features to provide precise analysis with minimal false alarms.

One of the key benefits of using abstract interpretation is a parametrizable precision-efficiency trade-off. There is usually an easily-tunable balance between precision and efficiency in abstract interpretation that is duely exploited by ASTREE. This feature arises from the theory of abstract interpretation, especially, the notion of abstract domains [90].

Overall, ASTREE is sound, automatic, efficient, domain-aware, parametric, modular and precise. However, it still does not offer the same easy extensibility present in ESP or Metal (the official Website of ASTREE recognizes this as a focus of current work).

ASTREE has been successfully used in static analysis of industry-class synchronous, time-triggered, real-time, safety-critical, embedded software written or automatically generated in the C programming language. Specifically, it has been used for verifying parts of the Airbus A340 and A380 subsystems. Notably, one of the applications involved the verification of 132,000 lines of C code and was completed in 1 hour 20 minutes on a modestly equipped machine.

One serious criticism of ASTREE is that in the presence of sound analysis, it gets increasingly more difficult to keep false alarms low as the program size increases. The current methods employed in ASTREE seem to require developer participation. Introducing SLAM like predicate discovery and contradiction would potentially allow the system to reduce the number of false positives.

3.2.4 SLAM

SLAM combines and extends techniques from predicate abstraction, model checking, and predicate refinement to soundly and precisely check C programs. SLAM output is a set of error traces. SLAM is sound in that it does not miss errors, but it is nearly precise in that it raises very few false alarms. The key thesis behind SLAM is that it is possible to soundly and precisely check a program against a specification of API rules by:

creating a program abstraction,
exploring the abstraction's state space, and
refining the abstraction.

SLAM accepts as input the client C code as is, that is, without any annotations, and a specification of API usage rules in the SLIC language. The program abstraction component of SLAM is called c2bp (for C to Boolean Program). It accepts as input a C program P and a set of predicates E specified in SLIC. SLIC allows predicates to be specified as pure C boolean expressions. It outputs a boolean program bp(P,E) that is a sound and (boolean-) precise abstraction of P. In particular, the skeletal boolean program so produced has all its if conditionals removed thus exposing every possible path in the underlying program. This boolean program is then analyzed by Bebop. Bebop is the SLAM component that checks a boolean program for reachability of predicate violations. It uses symbolic inter-procedural data flow analysis and tests CFL reachability. However, not all error traces produced by Bebop may be real errors. This is because the boolean program does not have any conditionals and all paths are analyzed invariably. SLAM's answer to this imprecise analysis is a third stage of analysis called Newton. Newton performs counterexample-driven refinement. It symbolically executes the error paths in the original C program, checking for path infeasibility at each conditional. This is done with the help of the Simplify theorem prover. If the path is found infeasible, Newton generates new predicates to rule out the infeasible path.

An important problem in SLAM's design is proving termination. SLAM draws ideas from abstract interpretation: widening and abstract interpretation with infinite lattices (WAIL) [37]. The authors have shown that their strategy of finite abstractions combined with iterative refinement (FAIR) is more powerful than widening in abstract interpretation over infinite lattices [15]. Like widening, their algorithm loses precision as it tries to achieve a fixed-point. However, FAIR's termination condition is stronger than WAIL's termination condition and hence always terminates if WAIL does.

SLAM is considered a promising new technology for seamlessly combining code checking with development. The key features of SLAM that make this possible are no need for user intervention (API specifications can double up as SLIC input), low noise rate, soundness, and scalability.

3.2.5 Metacompilation

Metacompilation (MC) [66,47] is the process of augmenting a traditional compiler with system rules for checking at compile-time. It represents a clear division of labor: the compiler provides a generic infrastructure for static analysis, and the programmer provides the rules to be checked. The focus of MC is on pragmatism. Checking rules are specified using a language called Metal. Metal is designed for (1) ease of use and (2) flexibility to support a wide range of rules within a unified framework. Ease of use is important since MC by itself cannot detect bugs: it needs the developers to specify checking rules governing their system in Metal. MC can be just as successful in catching bugs as the developer is in specifying them effectively. Flexibility is important so that developers can specify a wide range of properties for checking.

Checking-rules in Metal are specified using a state machine (SM) abstraction. SMs were chosen since they are a familiar concept to programmers. However, the authors claim that SMs only provide sugar for common operations and they do not limit extensions to checking finite-state properties. When needed, extensions can be augmented with general-purpose code. The SMs transition from one state to the other based on the source code pattern observed. For example, the following transition tells Metal to transition to the state v.stop whenever the pointer v is dereferenced in the v.freed state:

v.freed: { *v } ==> v.stop

Thus the alphabet of each SM is defined by the Metal patterns used within the extension. Patterns serve the purpose of identifying source code actions that are relevant to a particular rule. Intuitively, SM transitions map directly to path-specific actions in the source code.

Each individual SM's current state consists of one global state value and one or more variable-specific state values. Global state values capture a program-wide property (e.g., "interrupts are disabled"). Variable-specific state values capture program properties associated with specific source objects (e.g., "pointer p is freed"). Besides, state transitions in Metal can be augmented with general-purpose code which has the effect of a "dynamic" state space for the SM.

Implementation

The inter-procedural analysis engine of the MC system is called xgcc. It performs bottom-up summary-based traversal of the call graph like PREfix (Section 3.2.1). xgcc's role can be described as mapping of the SM abstraction provided by Metal to lower level program analysis. xgcc follows the approach of splitting the analysis into an intra-procedural pass followed by an inter-procedural pass. The intra-procedural analysis is implemented as a DFS traversal over the abstract syntax tree (AST) of the function. Block-level state caching is used to speed things up. Since the analysis is path sensitive, a program point may have different states associated with different paths. Storage requirement is further reduced by separating variable states from program point states. In this way, variable states may be shared, and if the states are the same, applying the same transition will have the same effect.

The main contribution of the inter-procedural analysis phase is refinement and restore of the SM state. State refinement occurs when a function call is encountered and followed. Any object that passes from the caller's scope to the callee's scope should retain its state which has the effect of refining the incoming state at the callee. The state is restored when the analysis returns from the callee and resumes analyzing the caller. The restore operation may need to move the state back from an object in the callee's scope to the corresponding object in the caller's scope. The extension's global state passes across function call boundaries unchanged.

The inter-procedural path-sensitive analysis employed by the MC system is memory and CPU intensive. In fact, for analyzing systems of the size of the Linux kernel (more than 4 million lines of code and counting), analysis scalability can play a determining role in the overall feasibility of the system. The MC system uses composable summaries of function behaviors for reducing running time. Once functions are analyzed, they are condensed into summaries which state how the function transforms the possible incoming transition states. Function summaries are used to reproduced the effect of analyzing the function once there is a hit in the summary-cache at a function call boundary.

False Positive Suppression

The MC system adopts several methods to suppress false positives. False path pruning is used to identify infeasible paths. For example, in the following code, there are only two possible paths, not four. The two impossible paths can drive an SM into an error state: freeing unallocated pointer, and allocating a pointer but not using it.

if(x)
{
    v = kmalloc(10);

    // use v
}

if(x)
{
    kfree(v);
}

MC performs limited symbolic execution of the program to statically deduce which conditionals can be taken and which ones cannot. Based on this information, it prunes impossible paths. The actual implementation uses a congruence closure algorithm [45]. Variables that must have the same value are placed into a single equivalence class. The congruence closure algorithm is then used to derive as many equalities and non-equalities as possible. This information, combined with the symbolic tracking of inequalities, helps deduce the truth value of most conditionals. The developer may also be able to rewrite Metal extensions to weed out observed false paths in further runs. Finally, Metal saves false path pruning effort by storing descriptions of identified false paths for use in later runs of the extension, thus avoiding redundant effort.

Some false positives may be left behind despite the above measures. MC handles them by ranking error reports based on many heuristics so that errors most likely to be genuine bugs are reported at the top.

Conclusion

The MC system represents a significant point in the design space of static checkers. First, it follows a pragmatic approach to tackle the soundness-usefulness trade-off. MC is not intended to be sound: it uses incomplete symbolic execution, does not unroll loops, etc. However, it aims for ease-of-use by developers thus maximizing its impact on software development. Second, MC adds another dimension to "usefulness": finding as many bugs and as many types of bugs as possible. The authors justify unsoundness by claiming that soundness directly conflicts with this goal of maximum coverage. Indeed, the MC system has discovered several thousand bugs in the Linux and BSD kernels.

However, the scalability issues of the MC system are not very clear. Inter-procedural path-sensitive analysis can easily explode in CPU and memory requirement. It seems that the MC system avoids this by (1) having small or moderately sized SMs that do not generate a lot of state throughout the entire Linux kernel, and (2) running only a few extensions at a time.

3.3 Language-Based Approaches

Language-based methods address the problem of programming bugs at a fundamental level: by defining a new programming language that ensures safety. Particular emphasis is placed on memory related errors since they pose the most immediate threat to systems programs

It is tempting to explore the application of existing type safe languages like Java, C#, and OCaml for systems programming. However, these languages are not suitable for this purpose due to the following reasons:

Performance overhead from dynamic array-bound checks, garbage collection, dynamic typing, etc.
Memory overhead due to dynamic typing information.
Lack of control on low level layout of data structures and resource management.
Effort needed to port legacy C code to the new language.
Lack of language extensibility to add system-specific compile-time checking.

Due to these problems with existing type-safe languages, researchers are exploring the option of defining new safe languages that combine C's flexibility and performance with type-safety and extensibility. Cyclone [73] is a C-like language that enforces stronger semantics without sacrificing performance. Vault [104] is another C-like language that allows the programmer to extend the type system with system-specific rules.

The key language facility leveraged by these languages is the language type system. A language type system has two important roles. First, it serves to provide the semantics for various operations so that the compiler can generate appropriate machine code. For example, generating different instructions for adding integers and floating points. Second, an object's type defines the kind of operations and their semantics that can apply on that object. Equivalently, types define an equivalence class for all objects of that type with regard to the kind of allowed operations. For example, given a value x of struct type with a field f, it is valid to form the index x.f but the expression f.x is invalid.

3.3.1 Cyclone

Cyclone [73] is a programming language-based on C that has the following goals:

safety,
flexibility to give nearly the same control over data representations and memory management as C,
no worse performance than C, and
as close to C as possible in power, syntax, and semantics.

Cyclone achieves these goals without being drastically different from C. In fact, the language syntax and semantics of Cyclone are the same as C, with some differences. In particular, the language is enhanced with annotation support and additional keywords and operators. This would probably give the impression that Cyclone is a superset of C. But, there are C programs that a Cyclone compiler will not accept either because they are unsafe or because the compiler cannot prove that they are safe. Hence Cyclone is best described as a dialect of C.

Cyclone uses a combination of program annotations, static analysis, and runtime checking to ensure safety. The Cyclone compiler performs static analysis on the source code, and inserts run-time checks into the compiled output at places where the analysis cannot determine whether an operation is safe. The compiler can also refuse to compile a program, which can happen if the program is truly unsafe, or because Cyclone cannot guarantee that the program is safe even after inserting runtime checks. A downside is that the compiler can reject some truly safe programs as well since it is impossible to implement an analysis that perfectly separates the safe programs from the unsafe programs.

Cyclone can be best understood by taking C, removing unsafe constructs, and replacing them with new, safe operations.

From C to Cyclone

The goals of Cyclone seem to be conflicting and this makes its design interesting. The main goal is to give absolute safety guarantees in C-like code. For this purpose, it tries to address all possible causes of programming bugs in C code individually. The engineering approach taken is to strip off all those constructs in C which are absolutely not possible to be accommodated in a safe language without sacrificing any of the goals identified above.

The first thing to go under the axe is C's pointer arithmetic. Pointer arithmetic is unsafe and difficult to verify for safety. Note that it is not impossible to ensure pointer safety; this can be done with runtime checking, but such an approach would have serious effects on performance. Hence, Cyclone simply rejects all potentially unsafe and unverifiable code. Instead, Cyclone provides language mechanisms to safely implement the corresponding programming idioms. Let us consider buffer overflow bugs. Buffer overflows are often the result of incorrect pointer arithmetic. Cyclone simply disallows pointer arithmetic on regular C pointers. Instead, if programmers want to use pointer arithmetic, they are required to use Cyclone provided pointers for this purpose. Cyclone provides fat pointers, which are standard C pointers augmented with the length of the target buffer. Thus, it can check arithmetic on fat pointers at runtime by checking for overflow against the buffer length. Cyclone is careful to handle all possible corner cases possible with using fat pointers (pointer assignments, pointer parameter passing, etc.). For example, when a fat pointer is cast to a regular C pointer its bounds are checked, and when a regular C pointer is cast to a fat pointer, its length is initialized to one.

Dereferencing NULL pointers is a common programming error, Cyclone avoids it by inserting runtime checks for NULL values. However, the programmer has control over the placement of checks, in particular turning off unnecessary checks. Cyclone provides the never-NULL pointer type for this purpose. A never-NULL pointer is not checked for NULL value since it is known to be not null. When a pointer is first cast to a never-NULL pointer type it is checked for NULL. Thereafter, the never-NULL pointer is not checked. Note that the fat pointer and never-NULL attributes of pointers are orthogonal and hence can be combined freely. These are examples of Cyclone's extended pointer type system for ensuring safety.

Let us now look at Cyclone's mechanisms for implementing safe memory management [63,67]. This can be helpful in preventing common problems like dangling pointers. Cyclone provides a range of flexible options for implementing safe memory management:

Region-based memory management using lexical or dynamic regions.
Linear objects.
Reference-counted objects.
Garbage collection.

A region is a segment of memory that is deallocated all at once. For example, Cyclone considers all of the local variables of a block to be in the same region, which is deallocated on exit from the block. For code that is spread in different blocks but wants to use the same region, Cyclone provides growable regions. Cyclone's static region analysis keeps track of what region each pointer points into, and what regions are live at any point in the program. Any dereference of a pointer into a non-live region is reported as a compile-time error. However, Cyclone's region analysis is intra-procedural, it relies on programmer annotations to track regions across function calls.

In all, Cyclone ensures safety by outright disallowing a small subset of operations, and by checking the code using a combination of static analysis and runtime checking. Table 3.1 summarizes how Cyclone handles some of the common memory bugs found in C programs.

Problem	Cyclone's Solution
Null pointer	insert checks, cut down checks using static analysis and `@never-null` attribute
Buffer overflows	use fat pointers, cut down checks using: static analysis and `@nozeroterm` attribute
Uninitialized pointers	reject using static analysis at compile time
Dangling pointers	detect dangling pointer dereferences using static analysis (region analysis), convert all calls to `free` to no-ops
Type-varying arguments (e.g., arguments to `printf`)	tagged union, injections, varargs
goto,switch,setjmp/longjmp	reject at compile-time

Table 3.1: Table showing how Cyclone handles various types of errors

Discussion

Cyclone is an attempt at enforcing a stronger type system using annotations. For example, programmers can help Cyclone in its analysis, and in the process help themselves (by avoiding unnecessary checks to be inserted by Cyclone). One relevant question is what is the extent and power of cyclone's static analysis ?

Scalability

Cyclone only implements intra-procedural analysis. Consequently, it requires that function interfaces be clearly qualified with the appropriate types expected by the function. The designers argue that the ability to compile different components of the system separately is critical for scalability in systems design. Although interfaces may be used for unavailable code, a complete system-wide inter-procedural analysis is not acceptable as a standard programming language feature.

Character strings in Cyclone

One notable aspect of Cyclone's checking is its approach to inferring the end of character strings. The end of a character string is either the first occurrence of a zero byte in it or, the allocated length for that string pointer. Cyclone's type analysis automatically checks for the end of string using both these clauses in this order. Although this may be useful, it seems that this could result in false positives. A possible alternative to have the best of both worlds is to add another annotation for char \* that identifies the string as null-terminable.

3.3.2 Vault

Vault [104] is intended to be a safe version of C, although many of its features are closer to Java. In this section we present some interesting features of Vault.

Extending type system with system-specific rules

The key novelty in Vault as a programming language is that the programmer can easily extend the language type system with system-specific rules. Type system extensions allow the programmer to express behavioral aspects of their code in Vault, and have the compiler check for conformance automatically at compile time. In this way, Vault adds an important new dimension to the type system: resource management and access control.

Memory Management

A novel feature of Vault is that its analysis can support aliasing, at least to some extent. Aliasing is by far the bane of static analysis. Specifically, to be sound, static analyzers have to make conservative assumptions about aliasing. This hurts the precision of checking. Vault provides linear data types to handle aliasing. Linear data types cannot be aliased, and hence the static analysis can make strong predictions about their state. This helps provide powerful safety guarantees. However, past attempts at using linear types imposed strict non-aliasing restrictions limiting the use of linear types in mainstream programming. Vault introduces adoption and focus operations to have the best of both worlds [50]. The programmer uses these keywords to declare to the compiler one of two things: (1) no alias exists for this variable in this scope, and (2) this variable may be aliased in this scope. The compiler can use the information about the absence of any aliases in a particular scope to implement more aggressive analysis.

Vault uses region analysis to implement safe memory management. By combining region analysis with its advanced linear type system, Vault is able to effectively track memory usage at compile time. Also, Vault allows programmers to attach states to linear types. Programmers can use this feature to enforce higher-level protocols on linear objects.

Casting and converting between types

Experience suggests that C's permissiveness in casting incompatible types into each other is the cause of some of the most notorious bugs in C programs. Vault designers address this problem by disallowing such arbitrary casts. The underlying thesis is that there are two main reasons for programmers to use casting between different C data types: to create generic data structures and algorithms, and to create a string of raw bits. Allowing arbitrary casts is not the right way to implement these operations since it is bug prone. Instead, Vault handles the first case using parametric polymorphism, and provides specific mechanisms to support the second.

3.3.3 Comparison of Cyclone and Vault

Cyclone and Vault are different systems with different compatibility goals and hence different trade-offs. Cyclone places the emphasis on preserving the programming style of C even if that means runtime checking of some invariants. For example, Cyclone gives the programmer complete control over memory layout and memory management, and as a result, cannot avoid deferring some tests to runtime since they cannot be validated at compile time. Vault, on the other hand, aims for detecting all violations at compile-time. As a result, its memory usage model is designed for completeness rather than resemblance to C.

Annotation Effort Cyclone needs less annotation effort than Vault. However, Vault allows the programmer to check system-specific properties using annotations, which can significantly increase the total amount of annotations required. Still, Cyclone can infer most region annotations automatically, which is not true for Vault. Finally, even though Vault's annotations for system-specific properties can get quite verbose, these annotations can double up as formal specification for the properties.

Code Migration Efforts Cyclone presumably will need less time for code migration from C. This will mostly be for making syntactic changes to C programs. Certain advanced memory management idioms could need more work though. On the other hand, porting C code to Vault will need significantly more effort.

Performance Cyclone programs have been shown to have a performance overhead of 0% to 150% over their original C implementations. This can largely be attributed to fat pointers and runtime bounds checking. We wrote some simple programs using Cyclone and analyzed the runtime checks Cyclone compiler inserted in the output. Our experience indicates that Cyclone in its pursuit of soundness, inserts checks for catching all possible violations in code. Although programmers can insert annotations to reduce the inserted checks to minimum, they often cannot cover all possible cases. This is similar to the problem static checking set out to solve in the first place, only that it is simpler. It seems that Cyclone can possibly benefit from automated annotation inference based on usage, or, an automated "Cyclone editor" that provides annotation hints to programmers as they type code.

Even though these proposed replacements to C are designed to allow easy migration for programmers from C, language-based solutions are not likely to be adopted in mainstream systems because they are not completely backward compatible with all C code. Rewriting the huge existing C code base would be a monumental exercise. However, software projects that create a new code base from scratch may be more willing to embrace a new language which gives soundness guarantees, though compatibility issues with existing libraries and interfaces is not very well understood. Alas, most software development is incremental.

3.3.4 MOPS

MOPS is not a new language like Cyclone and Vault, but is similar in scope; it extends the core language by defining modular extensions called meta-object protocols (MOPS). MOPS was pioneered by Xerox PARC [130] and is part of their broader vision in which traditional software abstraction mechanisms are considered insufficient to divide complex properties into modules. They propose meta-level specifications which control or augment existing software. This idea further led to aspect-oriented programming.

Metaobject protocols were originally defined as supplemental interfaces to programming languages that give users the ability to incrementally modify the language's behavior and implementation. Languages that incorporate metaobject protocols blur the distinction between a language designer and a language user. Traditionally, designers were expected to produce languages with well-defined, fixed behaviors (or semantics). Users were expected to treat these languages as immutable black-box abstractions, and to derive any needed flexibility or power from constructs built on top of them. This sharp division was thought to constitute an appropriate division of labor. Programming language design was viewed as a difficult, highly-specialized art, inappropriate for average users to engage in. It was also often assumed that a language design must be rigid in order to support portable implementations, efficient compilers, and the like.

The metaobject protocol approach, in contrast, is based on the idea that one can and should "open" languages up, allowing users to adjust the design and implementation to suit their particular needs. In other words, users are encouraged to participate in the language design process. If handled properly, opening up the language design need not compromise program portability or implementation efficiency.

In a language-based metaobject protocols, the language implementation itself is structured as an object-oriented program. This allows the power of object-oriented programming techniques to be exploited to make the language implementation adjustable and flexible. In effect, the resulting implementation does not represent a single point in the overall space of language designs, but rather an entire region within that space.

More recently, it has become clear that systems other than programming languages can have a metaobject protocol. In fact, many existing systems can be seen as having ad-hoc metaobject protocols. This was one of the observations leading to the concept of Open Implementations. Open Implementation is a software design technique that helps write modules that are both reusable and very efficient for a wide range of clients. In the Open Implementation approach, modules allow their clients individual control over the module's own implementation strategy. This allows the client to tailor the module's implementation strategy to better suit their needs, effectively making the module more reusable, and the client code more simple. This control is provided to clients through a well-designed auxiliary interface.

Kiczales et al. [58] and Lamping et al. [84] are two of the earlier works on the use of MOPS that take the approach of pushing domain-specific information into compilation. However, such protocols are typically dynamic and have fairly limited analysis abilities. Open C++ [28] introduces static MOP that allow users to extend the compilation process. All these extensions are limited to syntax-based tree traversal or transformation. In particular, they do not perform elaborate data flow analysis for applying the MOP specification to the code.

	Feature	Splint	ESC/Java	Cqual	PREfix	ESP	ASTREE	SLAM	MC	Cyclone	Vault
1	Uses Annotations	X	X	X						X	X
2	Flow-Sensitive Analysis	X	X	X	X	X	X	X	X	X	X
3	Path-Sensitive Analysis				X	X		X	X
4	Extensibility		X	X		X		X	X		X
5	Sound			X		X	X	X		X	X
6	Precision		X	X	X	X	X	X	X	X	X
7	Inter-Procedural	X	X	X	X	X	X	X	X	X	X
8	Scalability	X	X	X	X	X	X	X	X	X	X
9	Predicate Discovery							X	X

Table 3.2: Feature comparison of static checkers. A check mark indicates that the feature is supported, otherwise it is not.

Chapter 4
Automated Deduction of Checking Rules

The checking methodologies we have seen so far serve the purpose of automating the checking process, freeing humans of the arduous and boring task of checking software manually. These automated checking techniques have been successful at catching bugs that have missed the human eye. This represents a significant advancement in software checking: checking software automatically without human intervention, and doing it better. However, human beings are not entirely out of the picture yet, due to the huge semantic gap between the artifacts produced by developers and those accepted by automated checking tools. Most development is done with general-purpose programming languages (e.g., C, C++, Java), but most checking and verification tools accept specification languages designed for the simplicity of their semantics (e.g., state machines, process algebras). In order to use a verification tool on a real program, the developer must extract an abstract model of the program's salient properties and specify this model as a "checking rule" in the input language of the checking tool. This process is both error-prone and time-consuming. In this section we motivate the need for automated deduction of checking rules, and present a representative set of existing techniques for doing the same.

4.1 The Need for Automated Deduction

Most of the software produced today lacks comprehensive documentation. Documentation is needed for specifying checks or writing annotations for use by the static checkers. The current state of poor documentation in software design can be attributed to many factors. First, documentation is perceived by programmers as a completely unrewarding exercise because it is often boring, too verbose, and annoyingly difficult to get completely right. Secondly, there is sometimes a huge semantic gap between the code of a program and the higher-level system specification it represents. As a result, bridging the gap needs significant documentation effort in the form of detailed specifications of how the code relates to the higher level specification. Thirdly, as software evolves, developers join and leave and with them goes the ad-hoc implementation details about the system. Consequently, as code grows older, less is known about its internals to write checkers for it.

Good documentation habits, in our opinion, can be cultivated. However, the real issue is the cost-to-benefit ratio in documentation. If the documentation is really worth its effort, programmers can be convinced to do it. Unfortunately, current state of systems design does not make for such documentation. The end result of having an incomplete documentation is that the code has much more semantic content than the documentation, and hence documentation is not enough for writing checks for the code. Manually understanding the code for writing checkers is virtually intractable.

There are other reasons that call for automation in writing checks. Systems development is incremental by nature. Programmers often have to interface with or use legacy code that has little or no documentation in order to write checkers for them. Finally, modern systems may have just too many "checkable" properties than human beings can handle. Automation can achieve significant cost savings while giving much better results.

One might argue that a better way to address this problem is at a more fundamental level by generating code from the documentation, and getting rid of the whole problem once and for all. For instance, systems should be designed from scratch using a high-level, modular specification language that is easily amenable to software verification. In fact, this is indeed how things are being done with some application-level software. However, low-level system design under such a framework is not very well understood yet. Incremental refinement of existing code has been the holy grail of systems design, and asking systems designers to abandon all the existing code in favor of a new approach is asking too much. Hence, instead of adopting a revolutionary new approach to designing systems, one has to find clever ways to make current systems more amenable to automated checking and verification.

The central connecting thread of the techniques we present in this section is reverse engineering. Reverse engineering is the process of analyzing a subject system to identify the system's components and their inter-relationships, and to create representations of the system in another form or at a higher level of abstraction. For example, producing control-flow graphs from source code, or generating interface specifications, would qualify as reverse engineering. The main goal of reverse engineering is to help with program comprehension. Techniques discussed in this section analyze program code or its runtime behavior to reverse-engineer invariants about the code. Code invariants are properties which should always remain true on the code and hence are a useful first step in automated generation of checking rules.

4.2 Invariant Inference using Predicate Abstraction

We introduced abstract interpretation in Section 2.2.3. Predicate abstraction [110] is a special form of abstract interpretation in which the abstract domain is constructed using a given set of predicates.

4.2.1 Inferring Loop Invariants

A long standing independent problem encountered in static checking of un-annotated or legacy code is automatic deduction of loop invariants. A loop invariant for a loop is a proposition composed of variables from the program that is true before the loop, during each iteration of the loop, and after the loop completes (if it completes). Knowledge of loop invariants considerably simplifies static checking of un-annotated or legacy programs. Loop invariants by definition do not depend on the number of loop iterations actually taken, and thus provide useful information for static checking of the program.

Loop invariant discovery is a non-trivial problem considering the enormity of possible invariants to consider. Blass and Gurevich proved [7] that as a consequence of Cook's completeness theorem, there exists a program whose loop invariants are undecidable. However, it seems that the case presented by them would be uncommon in practice.

Earlier attempts at automatic inference of loop invariants for sequential programs were based on heuristics [129,78]. These were followed by algorithms based on iterative forward and backward traversal, with heuristics for terminating the iteration [111]. The induction-iteration method for example [94], works by finding the weakest liberal precondition for the loop. Although the algorithm may not always terminate, it works well in most cases. ESC/Modula-3 [43] experimented with generating loop invariants using the widening operator from abstract interpretation and a second technique called loop modification inference.

4.2.2 Predicate Abstraction

Flanagan et al. [55] have used predicate abstraction (Section 2.2.3) for automatically inferring loop invariants in C code. Their technique uses predicate abstraction over predicates generated in a heuristic manner from the program text. The programmer is free to provide additional predicates. Given a suitable set of predicates for a given loop, their algorithm infers loop invariants that are boolean combinations of these predicates. In effect, the problem of guessing loop invariants is reduced to the easier problem of guessing a relevant set of simple predicates.

One key novelty in their algorithm is in connection to dealing with the universal quantifier over loop iterators. For example, in the following code, i is universally quantified over all possible values from 0 to n.

for (int i = 0; i < n; i++) {
    array[i] = i;
}

Systems like SLAM have faced similar problems but do not handle this issue. Due to the universal quantification over i, the analyzer will have to guess all the possible quantifications of the expression as predicates, which is no easier than guessing the invariant itself. They avoid this limitation by resorting to skolemization: predicates can refer to skolem constants instead of using the universal quantification. Skolem constants are arbitrary unknown constants whose values may not be known during analysis, but they are known to be constant.

SLAM The SLAM project (Section 3.2.4) has also explored predicate abstraction for sequential programs. The boolean program generated from the C input by c2bp has boolean predicates over the values of system variables at different points in the program. SLAM's use of a counter-example-driven technique to refine predicates has the effect of inferring stronger invariants for the program.

4.3 Invariant Inference for Annotation Based Checkers: Houdini

Houdini [53] was motivated by the high burden of annotating static checking systems. It targets ESC/Java (Section 3.1.4), which has been reported as needing approximately one programmer hour of annotation effort for every 300 lines of code. In fact, this drawback has severally impeded mainstream adoption of ESC/Java despite its powerful bug finding capabilities.

Houdini adopts an iterative refutation refinement algorithm. To begin with, it guesses invariants for the program code using heuristics, and inserts appropriate annotations into the code. For example, for int i,j it adds the following annotations:

//@ invariant i cmp j;
//@ invariant i cmp 0;
	where cmp is any of <, <=, =, >=, >, !=

Similarly, for the field Object[] a, it guesses:

//@ invariant a != null;
//@ a.length cmp i;
//@ invariant (forall int k; 0 <= k && k < a.length ==> a[k] != null);

The program thus annotated is run through the ESC/Java checker which identifies annotations that are not satisfied in the program code. Houdini automatically removes those annotations and re-runs the program through ESC/Java until there are no more warnings about annotation violations. If ESC/Java does not give any program correctness warnings, it means that Houdini has correctly figured out a minimal set of true annotations and it terminates. However, if the program still produces warnings, Houdini provides valuable information to the user to further debug.

Overall, Houdini's analysis is two-level. The inter-procedural analysis is derived from ESC/Java and is used for refuting and testing annotations. The intra-procedural analysis is similar to abstract interpretation based on powerset lattice.

4.4 Runtime Methods for Inferring Invariants

Invariants are valuable in many aspects of program development, including design, coding, verification, testing, optimization, and maintenance. They also enhance programmers' understanding of data structures, algorithms, and program operation. Unfortunately, explicit invariants are usually absent from programs, depriving programmers and automated tools of their benefits.

The Daikon tool by Ernst et al. [89] tries to infer invariants on variable values by observing program execution. All invariants are with this respect to program points, thus Daikon infers possible invariants on every possible program point. It works by instrumenting the program with tracing code and exercising it using a suite of test inputs. The trace is analyzed offline through various analyses to yield possible invariants on the values of variables at different program points. However, these "invariants" are not invariant in the true sense since they are guaranteed to hold only on that suite of test inputs and may not hold in general, hence this technique is not sound. However, knowledge of such invariants can be useful in the following scenarios:

Understanding program execution for the purpose of designing static checks or test cases.
Documenting legacy code by reverse-engineering them.
Supporting program evolution: invariants discovered this way can be used for regression testing as the program evolves.
Runtime checking of programs.

There are two issues in dynamic invariant inference. First, one has to choose which invariants to infer, and second, how to go about inferring them. Daikon's choice of which invariants to infer is guided by a library of inference patterns, it can only infer invariants which match this pattern. Invariants may only use scalar variables or sequences of scalar variables. Variables that do not fit this specification are converted into one of these forms (split an array of structs into individual arrays of struct fields). Daikon supports a wide but limited range of invariant patterns: for example, simple invariants over variable values, linear relationships over two variables including their sum, difference, etc., three variables and simple properties on sequence of variables (minimum, maximum, and membership).

The invariant inference algorithm employs an early discard scheme: an invariant is discarded once it is clear that it is not satisfiable. This allows Daikon to efficiently handle the large space of possible invariants. For example, in the invariant ax + by + cz = 1, constants a, b, and c can be determined from the first three trace observations. All later observations are checked against these constant values for satisfiability. If a violation is found, the invariant is discarded. In most cases it takes just a few program point observations to prune an unsatisfiable invariant. However, in the worst case, the cost of invariant inference is:

linear in the number of samples,
linear in the number of program points,
cubic in the number of variables at a program point (because some invariants involve three variables), and
linear in the number of invariants.

Invariant inference over 70 variables in a program with 10,000 calls whose entry and exit points were instrumented with monitoring code took only a few minutes per procedure.

Some program invariants may involve meta-values for variables or summary values, for example, j < size(A) where A is an array, or min(A) < i where min(A) is the minimum element of array A. Daikon includes these variables by monitoring values for "derived variables" alongside regular program variables. Thus derived variables are simply computable expressions on existing program variables.

Daikon performs several optimizations to speed up the inference process, early discard being just one of them. Some of the other optimizations are:

Multi-stage inference. For example, A[j] is useful only if j < size(A), so first infer j < size(A) and only if it is true, infer A[j],
Do not consider every program point, the default behavior is to consider only function entry and exit points, and loop headers.
Exploit relationships between invariants to automatically infer one invariant from another.

Finally, it is reasonable to assume that the kind of analysis Daikon performs can result in a huge deluge of possible invariants, all of which may not be true. While Daikon does not give any guarantees regarding the truthness of any invariants, it is conceivable that the majority of discovered invariants may simply reflect properties of the test set and may not hold true in general. Spurious invariants in the output can significantly reduce the readability of the output, dwarfing the benefits from the tool. Daikon performs statistical analysis on variable values available from the traces in order to make intelligent predictions about the likelihood of the invariants being true for the general case. Specifically, it calculates the chances that a discovered invariant is true only by coincidence. This is derived by assuming a uniform distribution on the range of values observed in the traces and calculating the probability for a chance validity of the invariant. If the probability is less than a user-specified threshold, the invariant is passed as true. Note that the precision of this method can be improved by increasing the size and rigor of the test suite.

Daikon's running times and scalability are not very well understood, and, there are conflicting results of Daikon performance and scalability issues (few minutes to 220 minutes per procedure). Another fundamental question is: "how useful are the invariants generated by Daikon ?" It is not clear how useful the restricted forms of invariants inferred by Daikon can be. However, in Section 4.3, we have seen that even Houdini starts out with seemingly simple invariants and ends up generating useful bug-finding annotations for the program. This suggests that even basic invariants can serve for detecting symptoms of common program bugs. Moreover, one way to automatically utilize the large number of invariants generated by Daikon can be as input annotations for an annotation-based checker. Nimmer et al. [72] explore this approach in the context of Daikon and ESC/Java. They also suggest using Daikon output as Houdini input so that Houdini can identify and remove invariants that do not hold true in the general case.

However, the weakest aspect of Daikon remains the test suite; if the test suite does not exercise a sufficiently large number of code paths, ESC/Java cannot do much about it. Finally, the general impression is that invariant inference is probably useful only for static checking (and regression testing). So, to be useful, Daikon has to do a good job at that.

4.5 Bugs as Deviant Behavior: Combining Static Analysis and Invariant Inference

This is largely the work of Engler et al. [48]. They use belief analysis to automatically deduce erroneous behavior in source code. The key idea is that the programmer's intention is to produce correct code. However, they might make errors in a few places, and these errors show up as deviant behavior from the rest of the code. Thus the problem of finding bugs is reduced to the problem of identifying anomalous behavior in code. An interesting consequence of this approach is that even bugs that have not been formally specified by the programmer to the checker can be found. The end result is a fully automatic system that can potentially discover bugs in programs without any human intervention at all.

The system collects sets of programmer beliefs, which are then checked for contradictions. Beliefs are facts about the system, implied by the code. Two kinds of beliefs are considered: MUST beliefs and MAY beliefs. MUST beliefs are directly implied by the code, and there is no doubt that the programmer has that belief. A pointer dereference implies that a programmer must believe that the pointer is non-null. MAY beliefs are cases where the code has certain patterns that suggest a belief but may instead be just a coincidence. For example, a call to lock followed by a call to unlock implies that the programmer may believe they must be paired, but it could be a coincidence.

Once the set of beliefs are identified, the algorithm handles MUST and MAY beliefs differently. For MUST beliefs, it looks for contradictions. Any contradiction implies the existence of an error in the code. For MAY beliefs, the algorithm tries to separate true beliefs from coincidences. This is done using statistical analysis over the number of program locations where the belief is satisfied and the locations where it is violated. MAY beliefs with high satisfaction ratio are regarded as true beliefs and any violations are reported as errors. The errors are ranked on the basis of the confidence in the corresponding MAY belief, and lower errors may be discarded. Since this method is imprecise and may contain false alarms, statistical ranking of errors provides a trade-off between the effort expended in analyzing the error traces and the number of bugs discovered.

The static analysis engine used in this system is based on Metal and xgcc discussed in Section 3.2.5. The search for MAY beliefs is guided by specification of generic rule templates: for example, a rule present by default is: <a> must be paired with <b>, where <a> and <b> will be dynamically matched by the analyzer. Some useful may belief patterns are: checking that the return value of a function is always checked to ensure success or handle a failure condition, lock <l> is always held when accessing variable <v>, etc.

One of the key novelties of this work is the successful use of statistical analysis for ranking errors. Errors that are more likely to be real errors are ranked higher. However, statistical analysis is still fuzzy, and as a result coincidences can cause false positives, or imperfect analysis can cause low-ranked real bugs.

Intuitively, this method exploits the "redundancy of correctness" and "rarity of bugs" feature of software to first define buggy behavior and then to report it. Correctness rules of the system do not have to be known before hand; logical inconsistencies and programmer behavior inconsistencies usually indicate errors. Once again, this method draws from the paradigm that program errors directly map to detectable code patterns.

4.6 Extracting Finite State Models from Code

This section discusses a related field, that of automatic extraction of models from code for the purpose of formal verification. We discuss the work by Engler et al. [88]. A related system that we do not discuss in this paper is Bandera [36].

4.6.1 Metal Based System for Flash

Engler et al. [88] used their Metal-based system to automatically extract model descriptions from low level software implementations. Their system has found previously unknown bugs in the implementation of cache coherence protocols for the FLASH multiprocessor [81].

Their method is implemented using an extensible compiler, xg++ [31,47], and its accompanying extension language metal. The goal is to statically analyze program code and translate it into a specification that can serve as input for the Murphy model checker. Metal, with its extensible static analysis, offers an excellent system to use for this purpose. The user performs the following steps for extracting models from code:

Write a metal slicer extension for xg++. The slicer extension identifies variables and functions of interest in connection with the properties being checked. The Metal slicer strips of code that does not involve these elements. The xg++based slicer computes a backward slice at the level of statements with a PDG like algorithm [71].
Write a metal printer extension for xg++. This extension controls the translation phase. It instructs to xg++'s translator how the remaining constructs in the sliced program should be translated into Murphy input.
Manually create a model for the environment, correctness properties, and initial state.
Automatically combine the extracted model with the manually specified components.

The model is now ready for checking with Murphy.

By using the Metal slicer which is based on well-known powerful algorithms, their system can utilize a rich set of properties for finding and extracting the desired models. However, it seems that detailed implementation knowledge is needed in writing the metal printer extension. Also, the effort will scale super-linearly with code size (the reported FLASH code was only 10k lines). Also, their system does not entirely rule out manual effort. However, most of the manual effort is only during the first time the model is extracted or during major rewrites.

This system is a progressive step because it has modeled and found bugs in real world software. It is more of a model extraction tool than an invariant extraction.

Chapter 5
Case Study: Memory Bounds Checking

In this section we study existing techniques for checking programs for memory errors. We first present a discussion on the nature of memory bugs followed by a survey of existing static and run-time techniques for detecting memory bugs. We then present our system for run-time checking of memory errors in the Linux kernel.

5.1 The Nature of Memory Bugs

Memory bugs are probably the most common programming bugs that are encountered by programmers on a daily basis. Still, they are also the most insidious and difficult to detect bugs. Incentives for detecting and fixing memory bugs are many. Memory bugs are found in every scale of computing, be it a desktop computer or high end server. The end result is equally severe: data loss, program crash, security breach, etc. Well-tested systems have often been shown to contain serious vulnerabilities involving a single buffer-overrun that allows an attacker to execute arbitrary code on a machine, thus gaining control over it. Such bugs have already caused much grief in the IT industry.

Memory bugs in C programs can be divided into safety violations and liveness violations. Intuitively, a safety violation is defined as a condition in which a pointer expression accesses a different object from the one intended by the programmer. The intended object is called the referent of the pointer. Memory errors are classified into spatial errors and temporal errors. A spatial error occurs when dereferencing a pointer that is outside the bounds of its referent. Examples include array bounds errors, dereferencing un-initialized or NULL pointers, and even pointers derived from illegal pointer arithmetic. A temporal error occurs when dereferencing a pointer whose referent no longer exists (i.e., it has been freed previously), or its referent has not been created yet (e.g., uninitialized pointers). Liveness violations as the name implies relate to memory leak or garbage collection bugs.

Combating memory bugs is a multi-faceted research problem. First, the class of "memory bugs" as we discuss in this section is very broad: it includes NULL pointer dereferences, bound overflows, dangling pointer dereferences, memory leaks, etc. Secondly, memory bugs can be the result of widely different source code actions, hence a single unified framework is not easy to conceive. For these reasons, researchers have actively embraced a wide range of static and run-time checking techniques for detecting memory bugs. In fact, memory errors are one class of errors for which we advocate using run-time techniques in this paper.

The notoriety of memory bugs in software is further demonstrated using concepts from software testing. Sometimes, a memory error will not cause the program to crash, but will silently corrupt its internal state. Since the program has not crashed and seems to be working fine, this may be taken as normal behavior. This kind of behavior has been studied before using sensitivity analysis introduced by Voas et al. [128]. Sensitivity analysis estimates the probability that a program location may be hiding a failure-causing fault. Memory bugs can potentially display a wide range of sensitivity. In fact, due to non-determinism, the same bugs can display different sensitivity on different systems, or even between different runs on the same system. Particularly insidious are low sensitivity cases in which a failure-causing fault may go undetected (an event known as infection). Here, failure is defined as any deviant behavior from the intended behavior. Thereafter, the fault may propagate to other parts of the program, corrupting the entire system state.

5.2 Static Methods for Detecting Memory Bugs

In this section we present a brief overview of static techniques for detecting memory bugs. We have already discussed static checking for software in Section 3. In this section we delve deeper into implementation aspects of static checkers for detecting memory errors.

Most static bounds checkers assume that the input data can assume any possible value and symbolic execution is used to check if array accesses are within the proper array bounds. Sizes of memory buffers are aggressively inferred statically: for example, using array size or arguments to malloc. We have seen PREfix in Section 3.2.1. The Archer system discussed below also employs the same methodology. Coen-Porisini et al. [34] give a good overview of the approach for a subset of the C-programming language. Evans and Larochelle [39], and Rugina and Regard [107] have used similar techniques for finding buffer overflow bugs statically. BOON [40] represents arrays or strings as ranges and the problem of detecting buffer overflows is transformed into a system of integer constraints, which can then be solved for detecting overflows. Ganapathy et al. [60] add flow and context sensitive analysis to a BOON like framework producing a high coverage static bounds checker.

Sound techniques for bounds checking either modify the language, or take help from runtime checking for operations that cannot be verified statically. Cyclone (Section 3.3.1) is an example of a new language. Cyclone infers buffer lengths statically using fat pointers that store the pointer size. Cyclone adds runtime checking to detect bound overflows. CCured is an example of the second approach. We discuss CCured in detail in Section 6.1.1.

CSSV [44], like Splint is an annotation driven bounds checker. CSSV aims to catch all buffer overflow vulnerabilities with minimum false alarms. The basic idea is to convert a C program into an integer program with correctness assertions included, and use a conservative static analysis algorithm to detect faulty integer manipulations, which directly translate to bugs in the C source code. The analysis is intra-procedural, however, by using procedure annotations ("contracts"), it becomes inter-procedural. The number of false alarms generated by the tool depends on the accuracy of the contracts. The analysis used by CSSV to check the correctness of integer manipulations was heavyweight, and may scale poorly to large programs. For instance, CSSV took > 200 seconds to analyze a string manipulation program with a total of about 400 lines of code. Splint on the other hand, sacrifices soundness and completeness, and uses a light-weight static analysis to detect bugs in source code.

5.2.1 Archer

Archer [131] is a memory error developed by Dawson Engler's group at Stanford University. Archer does not target the broad range of memory errors caught by PREfix (Section 3.2.1), like NULL pointer dereference, access to uninitialized or freed memory, etc. Instead, it focuses only on bound overflow errors. This is probably because the authors had already developed metacompilation-based tools to detect other kinds of memory errors [66,47]. By concentrating on bound overflow errors, Archer is able to provide a more powerful and more precise bounds overflow detector than PREfix.

Archer's analysis has two key components: a traversal module and a solver module. The traversal module performs a randomized, bottom-up traversal of the system wide call graph to exhaustively explore all possible legal program execution paths (in practice, some paths will not be taken, and some impossible paths will be taken). The solver module keeps track of the set of constraints generated by the traversal module and solves them to check for satisfiability. The goal is to build up a knowledge base (stored in the solver state) that tracks values of and relations between as many scalars, arrays, and pointers as possible, which is then used to detect errors and prune infeasible paths. Initially this state is empty. As each statement is handled in order, Archer (1) symbolically evaluates the statement using the current state and (2) records any side-effects of the statement, producing a new state. The solver's role can be seen as keeping track of assumptions and check safety properties that the traversal module specifies in the form of assertions to the solver.

Loop Handling

Archer tries to evaluate the termination condition of loops using the solver module. If the termination condition is known and the solver can evaluate it (symbolically), the loop body is iteratively evaluated until the termination condition is satisfied. If the termination condition is unknown or cannot be evaluated, Archer applies two heuristics to handle these cases. For "iterator loops" used to iterate through buffer accesses, even if the termination condition cannot be evaluated, the (symbolic) bounds of the iteration variable can often be used for analysis. For example in the following piece of code,

for (i = start; i < end; i++) {
  ...  /* loop body does not modify i */
}

the iteration variable i is replaced by the range [start, end). For all other termination conditions that cannot be evaluated by the solver, Archer evaluates the loop exactly once checking for any errors, and after exiting from the loop, it undoes all assignments performed in the loop.

Error Triggers

Bound overflow errors can either involve local memory objects whose sizes can be deduced intra-procedurally, or pointers passed as arguments to the function (in which case, the size cannot be ascertained unless the caller is analyzed). In the former case, the analysis may be able to detect an error and flag it. However, in the latter case, the presence of an error will depend on the actual arguments passed to this function. For these cases, Archer summarizes any possible error-causing arguments to a function in the form of triggers. Later on when analyzing calls to this function, if the prevailing condition at a call-site reaches any of the triggers for this function, an error is flagged. Consider the following contrived example:

char arr[10];

void fn() {
  gn(20);
}

void gn(int n) {
  arr[n] = 'x';
}

Archer will analyze the function gn first (bottom-up order). The solver infers that the array dereference arr[n] can result in a bounds overflow depending on the value of n. However, it will not know for sure until all possible callers of gn are analyzed. Hence it creates the error trigger: n > = 10 where 10 is the size of the array arr. During the analysis of function fn, the predicates prevailing at the call to gn satisfy this trigger (20 > = 10) and hence Archer will flag an error. Note that by using summaries in the form of error-triggers, Archer is able to implement highly scalable inter-procedural path-sensitive analysis.

Symbolic Execution of Statements

Assignment statements ((x = y)) are handled by first evaluating y against the current solver state and binding the result to x. For control flow edges, the traversal module uses the solver to evaluate the condition; if the condition can be evaluated, the appropriate path is taken. If not, then all possible paths are taken, and the condition's truth value is added to the solver state along each path. For function calls, first the triggers are evaluated, if no triggers are violated, any global state passed by reference to the callee is evaluated and the analysis resumes after the function call. Finally, for memory accesses, the solver is queried to see if the memory access can potentially go out of bounds. If so, an error is emitted, otherwise, Archer produces no errors.

Size Inference using Statistical Belief Analysis

There are two cases which traditional inter-procedural analysis cannot handle: (1) when the callee's code is not available, and (2) when the callee's code is too complicated to make inferences about its effect on the caller's state. Archer uses idea from statistical belief analysis [48] (Section 4.5) to solve this problem. Rather than analyzing a routine's implementation to determine how it acts, we instead analyze the routine's calling contexts to do so.

Archer's analysis is quite powerful due to its path-sensitive and inter-procedural nature. It seems to be more powerful than PREfix for the case of detecting bound overflows, since it tracks properties such as buffer lengths and pointer offsets with a symbolic solver, and can potentially infer interfaces of unavailable functions. Archer has discovered previously unknown bugs in several open source software packages [131].

5.3 Runtime Checking for Detecting Memory Bugs

Operating systems that run in protected mode already provide a crude form of checking for memory errors. Such operating systems, for security and reliability reasons, separate processes from each other. So if one process accidentally or deliberately tries to read or write the memory of another process, the first process is prevented from doing this. When this happens, Unix-based systems give the message.

	Segmentation fault

and the process is killed. Other operating systems like Windows NT for instance, give more information. In a sense, then, there is a primitive form of checking going on. However, this default support is insufficient in two ways:

error detection: it provides very coarse grained checking. For example, memory is not protected at the granularity of memory objects.
error recovery: it does not provide a graceful method to recover from errors at runtime, nor does it provide sufficient information to identify the origin of a memory bug that causes the program to crash.

These factors motivate the need for more fine-grained runtime checking for memory errors. The general problem of runtime bounds checking requires comparing the target address for each memory reference against the bounds of its referent object. Accordingly, bounds checking comprises of two subproblems: (1) identifying a given memory reference's referent object and thus its bounds (bound lookup problem), and (2) comparing the reference's address with the referent's bounds and raising an exception if the bound is violated (bound comparison problem). For the purpose of this discussion, we classify existing bounds checking techniques into two categories based on their approach to the bound lookup problem: (1) capability-based techniques, and (2) availability-based techniques. In capability checking, each pointer object is augmented with a capability. The capability specifies among other things, the exact memory bounds of the object referenced by the pointer. Before any pointer dereference, a check is performed on the pointer's capability to verify that the pointer is within the bounds, or in other words, is "capable" of dereferencing the address. Thus, illegal pointer dereferences are detected. Also, language semantics are modified to conservatively preserve capability information across pointer operations like assignments, arithmetic, etc. For example, in each of the following examples of pointer usage, pointer p always inherits the capabilities of q unchanged:

char *p, *q;
int i;

p = q;

p = q + i;

Similarly, in the case of p++, the capabilities of p remain unchanged even though the address value of p changes. This immutability of capability is the key idea behind capability-based checking: as pointers and addresses are assigned and changed during the execution of a program, the set of valid capabilities remain unchanged. Capabilities change only as a result of allocation and deallocation, and all addresses must have capabilities associated with them (arbitrary integer to pointer casts are not allowed).

Since capability checking entails augmenting pointer variables with extra capability information, it is not backward compatible with standard C code. Availability checking is a backward compatible form of run-time bounds checking. The idea is not to store any extra information along with the pointer, but to maintain a running map of currently allocated memory for the program (availability map). All pointer dereferences and pointer arithmetic can then be checked against this map: if it falls within the availability map, it is allowed; otherwise it is a bug. Table 5.1 shows a classification of runtime bounds checking techniques discussed in this section.

Capability based		Availability based
Fat Pointers	Shadow Variables	Track Arithmetic	Do Not Track Arithmetic
Safe-C (Section 5.3.2), Cyclone (Section 3.3.1)	Patil and Fischer (Section 5.3.3), Xu et al. (Section 5.3.4, Lam et al. (Section 5.3.5)	Jones and Kelly's BCC (Section 5.3), KBCC (Section 5.4.1)	Mudflap (Section 5.4.2), Purify (Section 5.3.1)

Table 5.1: A two-level classification for runtime bounds-checking technique

There is an interesting soundness and completeness trade-off between capability checking and availability checking that we present below.

We are interested in two operations on pointers: dereferences and arithmetic. Any run-time bounds checking scheme must check pointer dereferences, since dereferencing an incorrect address can have severe consequences. However, it is not always necessary to track pointer arithmetic. A pointer arithmetic operation typically takes the address stored in a pointer, recomputes a new address, and assigns it back to the pointer. The vulnerability in this operation is that the newly assigned address could be out of bounds with respect to the object referred by the pointer before the operation. This is a bounds overflow in action. However, arithmetic that causes a bounds overflow does not crash the program yet. For the program to crash, the new address should be an illegal address, and it should get dereferenced. A pointer update can have two results: (1) the new pointer points to an unallocated address, and (2) the new pointer now points within another (possibly adjacent) object. With capability checking, whatever the new value, any dereferences will have to be verified against the pointer's capability (which does not change from arithmetic). With availability checking, if the new pointer points within another object, the checking system cannot detect an overflow since the pointer falls within the availability map. However, if pointer arithmetic operations were to be tracked and verified to catch the case when pointers crossed object bounds, this scenario could be avoided. Hence, this is a soundness-performance issue: checking pointer arithmetic is important for soundness, but it may also have a lot of overhead. Jones and Kelly's bounds checking patch to GCC [75] is an example of an availability-based checker that tracks pointer arithmetic, and Purify [106] is an example of an availability-based checker that does not track pointer arithmetic. However, this soundness guarantee holds only under the assumption that the program does not write to any arbitrary numerical address.

In Section 5.4.1 we will see that tracking pointer arithmetic in an availability-based checker can lead to loss of precision. In particular, the C definition does not restrict pointers from crossing object boundaries as long as they are not dereferenced outside. For a variety of reasons (mostly having to do with convenience), programmers often have C pointers crossing object boundaries. An availability checker that checks pointer arithmetic flags an error in such cases even though it may be legitimate usage. Hence, among availability-based checkers, whereas pointer arithmetic checking schemes are more sound than other schemes, they are also less precise.

Table 5.2 summarizes the difference between capability-based checking and availability-based checking.

Capability based	Availability based
May not be backward compatible with C code (e.g., Safe-C), unless the capabilities are stored separately (Section 5.3.3). Even then, they are incompatible with pre-compiled C programs (e.g., libraries).	Backward compatible since data representation or layout is not changed.
Faster since no bound lookups: bound information is available within in the variable or shadow variable.	Slower: bound lookup entails searching the availability map for the address.
Can detect memory-recycling errors.	Cannot detect memory-recycling errors.
Sound	Cannot be completely sound due to memory recycling errors. However, if pointer arithmetic is not tracked, it will be even less sound.
Memory overhead due to shadow variables or fat pointers.	Memory overhead due to object padding and availability map. Note that padding is only required if the checker does not track pointer arithmetic.
Can catch overflows between `struct` members.	Cannot catch

Table 5.2: Table showing the differences between capability-based and availability-based runtime bounds checking

BCC

Jones and Kelly's patch to GCC [25] (BCC) is an availability based backward-compatible runtime bounds checker for C programs. BCC has two components: (1) a compiler component based on GCC that compiles C programs and instruments them with special function calls which call (2) an external runtime library called libcheck. The executable thus compiled runs in the context of libcheck. At runtime libcheck maintains an availability map containing all memory allocated by the program, with help from these inserted calls. Some of the calls inform libcheck of memory allocation and deallocation events so that the availability map can always be kept up to date. Others are placed immediately before pointer operations, and call libcheck to verify the correctness of the operation. The operation can execute only if libcheck validates the operation. A more detailed description is presented below:

If runtime bounds-checking is desired for a C program, it should be compiled with the -fbounds-checking flag to BCC.
During compilation, BCC identifies three categories of operations in the C program and instruments them with calls to libcheck as follows:
- All memory allocation operations, whether explicit (malloc, free, etc.) or implicit (function local variables), are followed by calls to libcheck detailing the location and the size of the memory thus allocated or deallocated.
- All pointer dereferences are preceded by calls to libcheck, passing details of the pointer: location, size of the target, etc.
- All pointer arithmetic expressions are preceded by calls libcheck, passing the pointer location and the details of the arithmetic.
The program thus instrumented is completely backward compatible with C code and can be linked with C code not compiled with runtime bounds checking support.
When the program is executed, it is checked for bounds errors by the inserted function calls with help from libcheck:
- For memory allocation and de-allocation operations, libcheck updates its availability map.
- For pointer dereferences, libcheck verifies if the pointer points to an address within the availability map. If it does, it returns and the program continues, if not, an error message is generated and the program terminates.
- For pointer arithmetic operations, libcheck verifies if (1) the pointer points to a valid location in the availability map, and (2) if before and after the operation, the pointer references the same referent object.
In the event of an error being discovered by BCC, a verbose trace of the error is displayed, providing the exact location and nature of the error (NULL pointer dereference or invalid arithmetic). After displaying the error the program can either be terminated, or allowed to execute. The behavior is specified at program compile time. Also, if the program continues after an error, its behavior may be undefined.

BCC maintains per-object allocation information in an availability map which is searched during the above operations to locate the referent object corresponding to the address passed to libcheck. The availability map is organized as a splay-tree. Splay-trees are self-adjusting binary search trees first introduced by Sleator and Tarjan [118]. The search key in this case is the address of the referent object. Splay-trees have the useful property that recently accessed elements are quick to access again. All normal operations on a splay tree are combined with one basic operation, called splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at the root of the tree. It performs basic operations such as insertion, look-up and removal in O(log(n)) amortized time. For many non-uniform sequences of operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown. The choice of a splay-tree for storing an availability map is justified by the temporal and spatial locality inherent in variable accesses in C programs.

BCC potentially has complete control on the compilation process and can insert runtime checks to check other operations too. In fact, BCC replaces calls to standard string library functions, like strlen, strcpy, etc. with calls to their bounds-safe counterparts in libcheck. This is because the standard library functions do not (and in most cases, cannot) perform safety checks on their arguments. libcheck ensures their safety by redirecting them to their safe counterparts which check the operation against the availability map.

BCC pads all memory allocations so that some unallocated memory is left before and after each allocated chunk. It seems that this padding is superfluous since its use is to detect overflows, which BCC can detect anyway by tracking pointer arithmetic. In fact, this padding can be harmful at times. For example, BCC pads all variables on the stack whose address is taken in the program. This frequently results in highly bloated, sparsely filled stack frames. This is clearly an undesirable feature for in-kernel code, since the kernel stack is typically very small (4-8K). Also, the padding can sometimes have undesirable side effects. For example, in our adaptation of BCC for the Linux kernel (Section 5.4.1) (KBCC), KBCC compiles a kernel module and loads it into the kernel using the insmod module loader tool in Linux. A key operation during the insmod process is passing the global data values from user space to kernel space. KBCC generates global variables padded on both sides, but the kernel is not aware of this and crashes when trying to parse the module.

We discuss other BCC in more detail in Section 5.4.1, when we discuss our adaptation of BCC for the Linux kernel.

5.3.1 Purify

Purify [106] is a commercial availability-based run-time bounds checker. It can check bound overflows, bad pointer dereferences, use of uninitialized data, and memory leaks. Purify instruments programs at object-code level. Hence, Purify is language independent, and it can instrument binary-only third-party software. Purify associates one of three possible states with every byte in the program's address space: unallocated (unwritable and unreadable), allocated and initialized (writable but unreadable), and allocated and initialized (writable and readable). Memory accesses are checked at the granularity of bytes and operations (read or write).

Unlike BCC, Purify stores availability information on a per-byte basis (possibly compressed into integer-ranges), instead of on a per-object basis. Hence, Purify cannot catch errors due to overlapping memory allocations whereas BCC can. Also unlike BCC, Purify does not check pointer arithmetic. However, it is still able to detect bounds overflows in some cases. Purify pads arrays and heap allocated memory with a small "red-zone." The memory in the red-zone is recorded as unallocated (unwritable and unreadable). If a program accesses these bytes, Purify signals an array bounds error. Note that this is not a fool-proof method, a bounds overflow that jumps over the red-zone to access outside memory can still go undetected. This is an example of a performance-soundness trade-off in availability-based checking: tracking pointer arithmetic makes the checking sound, but at the cost of increased performance overhead. Checkers like Purify and Mudflap (Section 5.4.2) trade-off soundness and memory (for the red-zone) for performance. This is no surprise since both these checkers are targeted for mainstream use: Purify is a commercial tool for runtime bounds checking, and Mudflap is available by default in GCC 4.0 (no longer a patch like BCC).

5.3.2 Safe-C

Safe-C [13] is an example of a capability-based runtime bounds checker. It modifies the internal representation of pointers at compile-time to support runtime checking. Safe-C uses fat pointers to store additional information about the referent object. Such a fat pointer is represented by a four-tuple comprising of the following attributes:

the value of the safe pointer,
base address and size of the referent object,
storage class of the referent object (stack/heap/global),
a capability to the referent (explained below).

Thus the fat pointer logically resembles the following template specification:

typedef {
        <type> *value;
        <type> *base;
        unsigned size;
        enum {Heap=0, Local, Global} storageClass;
        int capability;
} SafePointer<type>

The value attribute is the only attribute that can be manipulated by the program; the rest are used by the Safe-C runtime. A capability is a unique identifier attached to a referent. As objects are allocated and deallocated at runtime (due to function calls, heap allocation, etc.), a new capability value is assigned to each new object from a monotonically increasing program-wide counter. This capability value is maintained in a global capability store for later verification. Safe-C ensures that pointer attributes are preserved across pointer arithmetic and assignments, thus all pointer dereferences in the program can be checked by checking the corresponding attributes. Note that Safe-C does not change array references. Array references by nature are static and hence their bounds are known statically.

Assuming the safe pointer attributes are correct, complete safety for all pointer and array accesses is provided by inserting an access check before each pointer or array dereference. The dereference check first verifies that the referent is alive by performing an associative search for the referent's capability. If the referent has been freed, the capability would no longer exist in the capability store and the check would fail. Because capabilities are never re-used, the temporal check fails even if the storage has been reallocated. Once the storage is known to be alive, a bounds check is applied to verify that the entire extent of the access fits into the referent.

Pointer operations must interact properly with the composite safe pointer structure. When applied, they must reach into the safe pointer to access the pointer value and pass the rest of the attributes along as appropriate. For example, the assignment statement q = p + 6 causes the pointer attributes of p to be passed along to q. In fact, operations which manipulate pointer values never modify the copied object attributes because changing the value of the pointer does not change the attributes of the storage it references. However, certain cases need special handling. For example, if the right hand side of an assignment is NULL, the pointer's capability is set to a special value NEVER indicating an invalid pointer that should not be dereferenced. Similarly, for assignments from string constants the pointer attributes can be generated at compile time itself.

The technique proposed in Safe-C can be applied both at compile time, or using a source-to-source translator. Safe-C can detect all memory access errors provided (1) storage management is always apparent to the translator, and (2) the program must not manipulate any object attributes. However, a major drawback of Safe-C is that due to its use of fat pointers, it is incompatible with a majority of legacy C code. As a result, it is either incompatible with existing libraries, or breaks existing programs due to its use of fat pointers.

5.3.3 Decoupling Computation and Bounds Checking

Patil and Fischer [98] propose to get rid of fat pointers as used in Safe-C by separating pointer attributes from the actual pointers. Corresponding to each pointer p, a shadow variable G_p (they call it the "guarded pointer") stores its attributes. Their approach suggests correlating accesses to p with checks on G_p. In one respect, the scheme proposed by Patil and Fischer represents a significant departure from conventional runtime bounds checking. They make the following observations about the state of runtime bounds checking:

Performance overhead of traditional runtime bounds checking systems is due to in-line checks for memory access errors. If checking could somehow be decoupled from the main computation, performance overhead can be reduced.
Performance overhead is a necessary evil of bounds-checking. Given these overheads, the possibility of using runtime bounds checking in production systems seems unlikely. Hence, when bounds checking a system, its other computations do not matter.

Based on these premises, they argue for decoupling a program's computation from bounds checking using one of two methods:

Alternative 1: Strip down a program into a simplified version (akin to slicing) that has only the memory operations and any other operations that can affect these memory operations. This simplified program is called the guarded program. If no memory access errors are detected in the guarded program, there should be none in the original program, and vice versa.
Alternative 2: Execute the original program and the guarded program in parallel. The original program executes normally, however, the guarded program executes on the same input as the original program, but executes only memory access code and checking. This scheme is called shadow-checking. Any memory access errors in the original program will be detected and reported by the shadow program which simulates an identical memory state.

It is not very clear how generally applicable the shadow-checking scheme is. For example, for complete equivalence between the main program and the shadow program, it is required that they execute in exactly the same environments. However, sometimes it may not possible to reproduce exactly the same environments for the main and the shadow process (e.g., system call return status). Although their method does employ heuristics for handling such situations, it does not seem to be general enough in its current form.

Note that the assumptions made by Patil and Fischer regarding the unavoidable overhead in runtime bounds checking are not entirely true today. However, this was a plausible assumption at the time this tool was developed (1997). Recent techniques have achieved success in reducing bounds checking overheads to acceptable limits: for example, CCured (Section 6.1.1) addresses the bound comparison problem by removing redundant bound comparisons using static analysis, and Cash (Section 5.3.5) addresses the bound comparison problem by using hardware support to perform fast comparisons.

5.3.4 Backward-Compatible Extensions to Safe-C

Xu et al. [132] use ideas from the above technique to derive a fast, backward-compatible source-to-source translation based on Safe-C. For each pointer p, a new shadow variable p_info (they call it the "metadata variable") is defined in the output C source. p_info stores the attributes associated with the pointer p. The shadow variable in this case is a C struct that stores the base and size of the referent object (like Safe-C), and a capability. However, Xu's notion of capability is different from Safe-C, and this is one of the key novelties in this approach. Whereas Safe-C associates a capability with each pointer variable, this method associates capabilities with memory objects. A field in the shadow variable points to the capability of the referent object. In this way, their method introduces a level of indirection between referent objects and pointers. Multiple pointers that refer to the same referent object now share the same capability. This is a very effective way to handling aliasing: any changes to the state of the memory object through one alias (e.g., a deallocation operation) is immediately available to other aliases through the shared capability. However, their method is still not completely compatible with existing C code, especially when they pass pointers to external library interfaces.

5.3.5 Cash: Hardware Assisted Bounds Checking

Segmentation hardware has been used to implement protection domains starting from the Multics operating system [6].

Electric Fence [102] uses a system's virtual memory hardware to place an inaccessible memory page immediately after (or, at the user's option, before) each memory area allocated using malloc. It also aligns the allocated memory area to the end of page boundaries. When software reads to or writes from this page, the hardware issues a segmentation fault, stopping the program at the offending instruction. It is then easy to find the statement in the C source using a debugger. Memory that has been released by free is made similarly inaccessible. Although this method has zero overhead to perform array bound checking, depending on the allocation pattern, it can waste a lot of virtual address space available to the process. Also, its checking ability is limited to only checking dynamically allocated buffers.

In recent times, the X86 segmentation hardware has been used in novel ways to support software evolution. For example, the Palladium system uses segmentation [29] for implementing safe and efficient software extensions, and the Cash bounds checker [82] uses segmentation to solve the bound comparison problem thus reducing bounds checking overhead. Pearson has also used segmentation hardware feature to get bounds comparison for free [99], although only for the case of dynamically allocated memory buffers.

The Cash bounds checker is a capability-based checker that uses shadow pointers for storing bounds information associated with pointer variables. Each pointer variable P is augmented with another pointer variable P_A which stores:

Lower bound of P
Upper bound of P
the LDT index (described below) associated with the segment allocated to the object.

The pointer variable P_A is stored immediately after P, and is thus trivially available from the knowledge of P's location.

In the Intel X86 architecture, a virtual address consists of a 16-bit segment selector value, which resides in one of the six on-chip segment registers, and a 32-bit offset which is given by general purpose registers (GPR), EIP, or ESP depending on usage. The segment selector contains a 13-bit index into the Global Descriptor Table (GDT) or the current process's Local Descriptor Table (LDT). The choice between GDT and LDT is determined by one of the remaining three bits in the segment selector. The number of entries in each LDT and the GDT is 8,192 (only 8,191 are available for use). The GDT or LDT entry indexed by the segment selector contains a segment descriptor, which includes the base and limit addresses of the segment, and read and write protection bits (and some other fields). Combining the 32-bit offset with the segment's base address produces the requisite linear address.

The basic idea behind Cash is that every time an array is used, Cash allocates a segment register for the array. The segment register is loaded with the appropriate LDT entry indexed by the array's segment selector (stored in the shadow variable). Thus all accesses to the array are automatically checked. Since the number of segment registers is limited (three for general use, but authors have successfully used four in some cases), the available segments are mapped to buffers on-demand. Note that whenever an array or buffer is allocated in Cash, either due to a function call, or malloc, a segment is allocated for it and entered into the LDT. Similarly, the segment is freed on de-allocation. In case there are more live buffers at a program point than segments in an LDT (8,191), Cash checks only 8,191 buffers and performs software bounds checking on the excess buffers. One alternative to this approach is to use multiple LDTs and dynamically switch between them. The authors argue that such an implementation could have resulted in two many LDT switches (thrashing).

The Cash framework tackles some serious hurdles that otherwise prevent it from achieving good performance. A useful optimization is with respect to inserting new LDT entries into the LDT table when a new array is allocated. Linux provides a system call for doing this (modify_ldt). However, a system call imposes unacceptable overhead for such a frequent operation. Cash's solution is to provide an alternative call gate to the user process, allowing the user to execute a Cash-specific kernel function which modifies the LDT. The advantage of this approach is that it has less overhead than a system call (it backs up and restores only the EDX and DS registers).

Cash has two major limitations. First, it checks only code within loops. This was done given the high overhead associated with setting up segments for function local arrays. In the case of loop variables, this cost pays itself off over many iterations. Furthermore, the authors argue that most buffer overflow attacks are loop-based. A second weakness of Cash is its incompatibility with pre-compiled C code, like binary libraries. This is due to the shadow pointer variable that needs to be passed along in function calls.

Cash has impressive performance compared to other bounds checkers. Whereas other schemes like Purify and Jones and Kelly's approach have been shown to be 5 to 20 times slower, the overhead of Cash is within 50% for most benchmarks, and only marginally larger for others.

5.4 Bounds Checking for the Linux kernel

Programmers make mistakes, and C semantics leave ample opportunity for introducing memory errors. This problem becomes all the more acute when programming inside the kernel as a small memory-access bug could crash the entire system. Runtime bounds checking through hardware is an efficient method of detecting program bugs. We have developed such a system called Kefence (Section 5.4.3. KBCC, which is a software based approach provides more comprehensive checking.

5.4.1 KBCC

KBCC is an availability-based runtime bounds checker for the Linux kernel. It can detect many kinds of memory access errors like bound overflows, NULL pointer dereferences, bad arithmetic, use after free, double free, etc. The goal of KBCC is to perform runtime bounds checking on the entire Linux kernel. KBCC is an extension to BCC (Section 5.3 that is specifically targeted towards the Linux kernel. It is based on the GNU C compiler (GCC). KBCC was derived from Jones and Kelly's bounds checking patch to GCC [75] (BCC). However, KBCC extends BCC in several ways. Because KBCC is based on GCC, it gets GCC's excellent support for a wide range software for free. Also, KBCC leverages GCC's optimization and analysis features for help with bounds checking. KBCC's development has coincided with GCC development. We have kept KBCC uptodate with the latest GCC version until 4.0. The implementation we describe here was derived from GCC-3.4.2.

The development of KBCC can be described as a porting effort to port BCC for using with kernel code. There are two components in BCC: the compiler that instruments compiled code, and the runtime bounds-checking library libcheck; both needed different kinds of attention. For example, we taught the KBCC compiler to recognize kmalloc as the memory allocator, instead of malloc. Similarly, libcheck was packaged into a kernel module. Sending libcheck to kernel space was necessary since the checking library should run in the same address space as the checked program to minimize communication overhead. During this whole exercise, several interesting issues came up which offer useful insight into runtime bounds-checking systems, and are the subject of this section.

Soundness

KBCC is an availability based checker and is not completely sound. In Section 5.3 we saw that BCC inserts calls to the BCC runtime library (libcheck) before each memory access. The call serves the purpose of verifying the address value for validity with help from libcheck. libcheck maintains an availability map of all currently allocated memory regions (the "availability"), against which the address passed by the checking call is verified. If the address is found, the memory access must be valid and libcheck returns control to the program, else it signals an error. Now consider the following piece of code:

1.  char *p, *q;
2.  p = malloc(...);
3.  free(p);
4.  q = malloc(...);
5.  *p = 'x';

A chunk of memory is allocated using malloc and assigned to pointer p. In line 3, the memory is deallocated. Line 4 calls malloc again to allocate another chunk of memory. The dereference of pointer p in line 5 is illegal since the address corresponding to p has been de-allocated. However, it is possible that the same address was allocated to q in line 4. In this case, statement 5 is still illegal since p is a stale pointer from a previous incarnation of the memory location. But libcheck will not flag an error. libcheck is an availability-based checker that simply searches for the address value of p in its address map. In this case it will be found and libcheck incorrectly assumes that there is no error. Hence, KBCC is unsound and cannot catch memory recycling errors. Note that this is a problem with all availability based checkers.

Incompleteness

KBCC is incomplete, because it can report bugs that are actually not real bugs in the checked program. KBCC's incompleteness comes from its strict checking of pointer arithmetic operations. Languages with mutable pointer operations like C allow the program to legally create invalid pointers; for example, iterating a pointer across all the elements of an array exits the loop with the pointer pointing to the memory location following the last object. If the invalid pointer is never dereferenced, the program would not be in error. However, KBCC recognizes this operation as an illegal pointer arithmetic, since the pointer has gone beyond its legal bounds. In such scenarios it can generate false alarms. This might suggest that KBCC not flag an error during possibly incorrect pointer arithmetic, since it does not have enough information about whether the produced pointer will be dereferenced later on in the code. However, such behavior can result in more unsoundness for the following reason. Imagine that two pointers p and q point to memory areas next to each other. A bad arithmetic on p can cause it to overflow out of its bounds and point to within q's bounds. Since KBCC's bounds checking is availability based, it will not be able to detect such a scenario if it does not track pointer arithmetic operations. Thus, there is a trade-off in KBCC's design involving precision of checking, performance, and soundness. Systems code usually has a plethora of pointer arithmetic operations; tracking them all means poor performance. Thus tracking pointer arithmetic gives soundness at the cost of performance and precision of checking. In KBCC, we decided to track pointer arithmetic operations, since there is a large loss in soundness otherwise. Kernel objects are usually tightly packed next to each other, hence overflows are more likely to fall within adjacent object's territory than in unallocated space. An availability checker cannot detect this and will thus silently miss errors. One might argue that KBCC cannot be sound anyway from the memory recycling example above, and hence it makes little sense to sacrifice performance trying to be sound. However, the loss of soundness in the case of undiscovered overflows is much more serious than the rare case of memory recycling errors.

BCC was designed to be completely backward compatible with standard C code. In fact, one should be able to freely mix code compiled with BCC and code compiled with any other GCC-like compilers. However, in reality BCC fails to compile a majority of open-source C programs [109]. We have extensively experimented with one particular case, the Linux kernel and found it to fail frequently. There were two main reasons for this: (1) several previously unknown bugs in BCC which we fixed, and (2) BCC could not handle several "advanced" C programming constructs. We give examples of each of these below.

BCC Bugs: An Example

BCC 3.2 and 3.3 could not inline function calls declared as inline. Although most C programs would still work, the Linux kernel refused to compile without inlining. The reason was traced to the extern inline directive of GCC. This directive provides easy support for users to write plugins. If a function f is declared as extern inline in a header file header.h included from the C source file source.c, then GCC will try to locate another function definition for f in source.c. If found, the new definition replaces the definition in header.h. Otherwise, the compiler inlines all calls to f using the function body provided in header.h. Since BCC could not inline functions, this step could not be performed, resulting in a compile time error. We tracked this problem to a bug in the BCC source code, which resulted in patches to BCC.

Advanced C Constructs

Consider the following piece of code from the Linux kernel.


/* from linux-2.6.7/include/linux/kernel.h */
#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr  = (ptr);   \
        (type *)( (char *)__mptr - offsetof(type,member) );})

/* from linux-2.6.7/include/linux/list.h */
#define list_entry(ptr, type, member) \
        container_of(ptr, type, member)

#define list_for_each_entry(pos, head, member)                          \
        for (pos = list_entry((head)->next, typeof(*pos), member);      \
             &pos->member != (head);                                    \
             pos = list_entry(pos->member.next, typeof(*pos), member))

/* from linux-2.6.7/include/linux/list.h */
struct list_head {
        struct list_head *next, *prev;
};

/* from linux-2.6.7/include/linux/fs.h */
struct super_block {
        struct list_head       s_files;   /* List of open files */
    	...

/* from linux-2.6.7/fs/super.c */
static void mark_files_ro( struct super_block *sb)
{
        struct file *f;

	list_for_each_entry(f, &sb->s_files, f_list) {
	  ...
	}

list_head is a frequently used data structure in the Linux kernel to implement doubly linked lists of inode or dentry objects. Each member of a list has a list_head object embedded within it. The list_head has pointers to the list_heads of the predecessor and successor objects in the list. Note that the list_head pointers make a list of list_heads and not of the containing object in which they are embedded. However, the containing object can be derived by simply subtracting from the address of the embedded list_head its offset from the start of the containing object.

Consider the function mark_files_ro shown above. It calls the macro list_for_each_entry with sb->s_files as the head. This is a common usage pattern of list_head structs: one special list_head serves as the "head" of the list. The head is special because unlike other list_heads that make up the list of files, the head does not lie within a file object. The purpose of having this special head is to be a placeholder through which the list can be traversed. In this case, the kernel can traverse all files associated with a super_block.

Thus, the function mark_files_ro iterates over the entire list using the macro list_for_each_entry which is implemented using a for loop. Now consider the termination condition for the for loop: &pos- > member ! = head. It will be satisfied when pos has made one full circle of the list and has returned to the head. Note that when the termination condition is satisfied, pos->member is pointing to head. pos is of type file and is generated using the list_entry macro from a list_head in the list. list_entry as we have seen above obtains the file containing a list_head by simply subtracting from the list_head its offset from the start of file. Thus, when the termination condition is satisfied, the list_head in question is the special "head" list_head that does not have a containing file object. However, the comparison in the condition still works fine for C because &pos- > member is computed as pos + OFFSET_OF(TYPE_OF(pos), member), where OFFSET_OF and TYPE_OF have obvious meanings. So even though pos is a non-existent object, its address is never dereferenced; instead, it is simply added back to the appropriate offset to obtain the address of the head. However, BCC is not aware of this future usage of pos at the time it is generated. Consequently, this is flagged as an error.

Simple C Constructs that BCC cannot handle

The above example is a case of pointer arithmetic overflowing an object (in this case, the head) but not being dereferenced. It only serves to demonstrate the possible complexity of underlying operations in the kernel. The same effect can also be demonstrated using a simple example. Consider the following contrived example:

1.  void fn (unsigned i, unsigned j)
2.  {
3.     char arr[10], char *p;

4.     p = arr + i - j;
5.     *p = 'x';

On line 4, expression p = arr + i - j will be evaluated as p = ( (arr + i) - j). We can rewrite the code using a temporary variable as:

1.  void fn (unsigned i, unsigned j)
2.  {
3.     char arr[10], char *p;
3.1    char *t;
3.2    t = arr + i;
4.     p = t - j;
5.     *p = 'x';

This is the exact manner in which this function will be evaluated by KBCC. In line 3.2, the expression arr + i can be out of the bounds of array arr depending on the value of i. However, in line 4, upon evaluation of t - j, the result p may be back within the bounds of arr (if 0 � (i - j) \lt 10). Thus on line 5, the dereference will be legitimate. However, libcheck flags an error on line 3.2 if it detects that t = arr + i has gone out of arr's bounds. This is undesirable since this could be perfectly legal usage. We have fixed all these problems with BCC in our KBCC compiler. We describe the nature of our fix below.

Out-of-bounds pointers

We use the term out-of-bound pointers (OOB) to describe the above phenomenon. These are (often temporary) pointer values generated during computation that may fall out of bounds, but are perfectly safe since they are not dereferenced. BCC flags an error for such temporary out-of-bounds addresses, which is undesirable. Ruwase and Lam proposed a scheme [109] to fix this problem. In their method, whenever an OOB pointer is generated, libcheck enters a record for that OOB into a global table. The record has just one element: the out-of-bounds address of the OOB object. Thereafter, libcheck replaces the OOB pointer in the execution stream by the address of its record in the table. During all later operations on the OOB, the program passes the address of the record to libcheck and libcheck transparently uses the actual out-of-bounds address from the corresponding record to check an access or perform an arithmetic. Thus by replacing an OOB address with a special address in the table, libcheck gracefully handles the case when the OOB pointer undergoes some further operation.

This method does not work when there is a mix of KBCC-compiled code and other code. The problem arises due to the use of alternative addresses (into the table) to replace real OOB addresses. If one such address is passed to a part of code that is not compiled with KBCC, it is bound to fail. In our case, we want to have the flexibility of compiling only parts of our system with bounds checking. Hence, checked and unchecked code should mix without any problems.

We suggest two alternatives in this paper. In the first approach, libcheck makes note of every OOB pointer value in a table. However, we do not replace the OOB address in the execution stream. Instead, when the OOB pointer is later passed again to libcheck, it is first searched in the table, and if not found, it is searched in the splay tree.

This approach though sound is likely to be slow, since even in the common case of no OOBs, the OOB table will be searched before the splay tree is searched. In our other approach, we define a special relation among objects called peering. Whenever an out-of-bounds address is created by an arithmetic operation on an object O, we insert a special peer object into the splay tree at the new address, making it a peer of O. Object O becomes the peer-leader of the set of peers that originate from it. The only operations permitted on a peer pointer are those that either generate another peer or return to the bounds of the peer-leader. The peering relation between objects helps us track OOB pointers and unlike the method of Ruwase and Lam, we do not need to replace its value in the execution stream.

Removing redundant pointer checks.

Completely safe bounds checking of C code is difficult in concurrent environments. For sound bounds checking, pointer accesses must be validated and executed atomically, failing which, bounds checking is no longer sound. A concurrently executing thread could modify or free the pointer between validation and execution, causing the bounds checker to miss errors or access freed memory. The probability of such concurrent accesses is increased when redundant pointer checks are removed, since a check now potentially covers more than one access. However, redundant pointer checks do not necessarily imply freedom from such race conditions.

Our benchmarking results given in Section 5.5 indicate that redundancy elimination can significantly boost bounds-checking performance. Atomicity can be ensured by detecting and removing race conditions from bounds checking code by using static race-detection techniques [46,120]. However, KBCC provides redundancy elimination as an optional feature.

Several algorithms exist to remove redundant pointer checks from bounds checking code [40,64,95] and most of these can be incorporated into KBCC's bounds checking framework. We discuss some of these techniques in detail in Section 6.1. For the purpose of demonstrating the benefits of removing redundant checks from bounds checking code, we used a simple strategy based on common subexpression elimination. Redundant pointer checks are removed by marking checking calls to the monitor with the GCC attribute __attribute__ (("const")), which makes them candidates for removal by GCC's common subexpression elimination and loop-unrolling optimizations. GCC provides this attribute to mark functions whose only effect is their return value, and the return value depends only on the parameters. If parameter values have not changed since the last checking call, the new call can simply be replaced by the last return value. This optimization weeds out a bulk of redundant memory checks. Although the bounds checking monitor makes use of external state in the form of the splay tree, this does not violate the "const" attribute. The global state is maintained on a per-memory region basis and is only accessed by the monitor.

Summary of Our Contributions

We produced a robust and efficient runtime bounds checking framework for the Linux kernel.
Our system has high coverage: we have been able to compile most of the Linux kernel. We believe that using our techniques, BCC can be able to support a wide range of existing programs. We can support several version of the GCC compiler and several versions of the Linux kernel (both 2.4 and 2.6) including the latest releases.
We modified KBCC to be reentrant. The library libcheck is completely reentrant inside the kernel.
We optimized KBCC for size and efficiency in several ways. For example, by reducing stack usage using informed compile-time analysis, removing redundant tests by applying common-subexpression elimination on the abstract syntax tree, by removing redundant code with the help of compile-time analysis, function call argument packing, etc.
We added support to KBCC to effectively manage out-of-bounds pointers.
While working towards a kernel-ready BCC, we also created a more robust version of BCC, fixing several serious bugs (e.g., several problems with inline function calls and out-of-bounds pointers).

5.4.2 Mudflap

GCC Version 4.0 comes with a built-in bounds checker called Mudflap. It validates pointer operations against a fast cache to avoid querying the splay tree. The code for searching the fast cache is inlined in the checked code instead of a function call, resulting in inflated binary. In our port of Mudflap, we provide the option for optimizing for size rather than performance, by allowing sharing of the code for searching the cache.

However, the main difference between Mudflap and KBCC is that Mudflap does not check pointer arithmetic. Hence, it is less sound than KBCC and can miss some bound overflow bugs. Nevertheless, Mudflap does not suffer from the problem of false positives due to out-of-bounds pointers like BCC.

5.4.3 Kernel Code Electric Fence

Kernel Code Electric Fence (Kefence) is designed to detect memory buffer overflows at the hardware level. Kefence aligns memory buffers allocated in the kernel virtual address space (using vmalloc) to page boundaries. The kernel's vmalloc function allocates one or several pages for each request, facilitating this alignment. A guardian page table entry (PTE) is added adjacent to each buffer so that whenever a buffer overflow occurs, the guardian PTE is accessed. The guardian PTE has read and write permissions disabled; hence, accessing it causes a page fault. The page fault handler of the Linux kernel is modified so that whenever there is an access to a guardian PTE, it reports a buffer overflow.

Exact details about the context and location of buffer overflows are logged through syslog. The modified page fault handler can be configured to perform various additional tasks. When security is critical, Kefence can be configured to crash the module upon a memory overflow, thereby preventing further malicious operations. The system administrator can look at the logs to determine the location of the overflow. If debugging is more important, Kefence can be configured to just log the buffer overflow without terminating the module. We implemented this by auto-mapping a read-only or read-write page to the guardian PTE whenever there is an overflow. This way the code which caused the overflow can either be allowed to write or to just read the out-of-bounds memory locations. Since the logs contain full information about the location and the code which caused the overflow, buffer overflows in kernel code can be diagnosed easily. Kefence can do this in real time, making it suitable for security critical applications.

Since Kefence can only protect virtually-mapped buffers, those allocated using kmalloc are not protected. Therefore, to add bounds checking to a kernel module, one must use vmalloc instead of kmalloc for memory allocations. We have modified the Linux header files in such a way that this replacement is done automatically if a special compiler flag is set.

Using vmalloc consumes more virtual memory, since it allocates at least a page for each memory allocation. This is partly mitigated by the fact that modern 64-bit architectures make the address space a virtually inexhaustible resource. However, replacement of kmallocs with vmallocs results in extra consumption of physical memory because the memory is allocated in units of pages. To speed up the default vfree function we have added a hash table to store the information about virtual memory buffers. Since the alignment of buffers to page boundaries can be done either at the beginning or at the end, Kefence cannot detect buffer overflows and underflows simultaneously, unless the allocation is in multiples of the page size. Although this is a common case inside the kernel, we have found overflow protection to be sufficient in most other cases.

5.5 Performance Evaluation

In this section we present our benchmarking results for KBCC, Kefence, and Mudflap. We conducted our benchmarks on a 1.7GHz Pentium 4 machine with 1GB of RAM running Red Hat 9 with a vanilla 2.6.10 kernel. The machine was equipped with 10GB Seagate U5 5,400 RPM IDE hard drives.

5.5.1 Kefence

To evaluate the performance of Kefence, we instrumented a stackable file system [136] called Wrapfs. Wrapfs is a wrapper file system that just redirects file system calls to a lower-level file system. The vanilla Wrapfs uses kmalloc for allocation. Each Wrapfs object (inode, file, etc.) contains a private data field which gets dynamically allocated. In addition to this, temporary page buffers and strings containing file names are also allocated dynamically. We mounted Wrapfs on top of a regular Ext2 file system with a 7,200 RPM IDE disk to conduct our tests. In the instrumented version of Wrapfs, we used vmalloc for all memory allocations so that we could exercise Kefence for all dynamically allocated buffers.

We compiled the Am-utils [100] package inside Wrapfs and compared the time overhead of the instrumented version of Wrapfs with vanilla Wrapfs. The instrumented version of Wrapfs had an overhead of 1.4% elapsed time over normal Wrapfs. This overhead is due to two main causes. First, vmalloc and vfree functions are slower than kmalloc and kfree functions. Second, allocating an entire page for each memory buffer increases TLB contention. We found that the maximum number of outstanding allocated pages during the compilation of Am-utils over the instrumented version of Wrapfs was 2,085 and the average size of each memory allocation was 80 bytes. We conclude that Kefence performs well for normal user workloads. However, memory-intensive code may exhaust virtual or physical memory.

5.5.2 KBCC

For software bounds checking, we compared the performance of kernel code compiled with KBCC with the performance of kernel code compiled with plain GCC. We used the ReiserFS file system in Linux 2.6.10 as our code base, and exercised it using standard file-system benchmarks. The choice of ReiserFS was motivated by our experience with Linux file systems, which indicates that the source code of ReiserFS in general, uses more complicated C constructs and pointer operations than the code of most other file systems. We remounted ReiserFS before every benchmark run to purge the page cache. We ran each test at least ten times and used the Student-t distribution to compute 95% confidence intervals for the mean elapsed, system, user, and wait times. Wait time is the elapsed time less CPU time used and consists mostly of I/O, but process scheduling can also affect it. In each case, the half-widths of the confidence intervals were less than 5% of the mean.

We benchmarked both KBCC and Mudflap.

The first benchmark we ran was an Am-utils build. Am-utils is a CPU-intensive benchmark. We used Am-utils 6.1b3 [100]: it contains over 60,000 lines of C code in 430 files. Though the Am-utils compile is CPU intensive, it contains a fair mix of file system operations, which are compiled with bounds checking enabled. Figure 5.1 compares the results for building Am-utils inside ReiserFS compiled with our port of BCC and ReiserFS compiled with plain GCC. Previously, code compiled with software bounds checking has been reported to run 2-6x times slower than the original version [40,64,75]. ReiserFS compiled with our port of BCC exhibits only 19% overhead in elapsed time, and around 35% overhead in system time. This demonstrates that our optimizations have saved significant overhead over what was previously reported.

Figure 5.1: Bounds checking overhead for ReiserFS with Am-utils (left) and Postmark (right).

The second benchmark we ran was Postmark [77], which simulates the operation of electronic mail servers. It performs a series of file system operations such as appends, file reads, creations, and deletions. This benchmark uses little CPU, but is I/O intensive. We configured Postmark to create 20,000 files, between 512-10K bytes, and perform 200,000 transactions. Create and delete, and, read and write operations were selected with equal probability. We chose these parameters as they are typically used and recommended for file system benchmarks [127]. Figure 5.1 compares Postmark results for ReiserFS compiled with our port of BCC and ReiserFS compiled with plain GCC. In this case, the system time overhead is around 14 times. However, since there is a significant wait time spent waiting for I/O, the overall elapsed time has only 25% overhead.

To determine the residual overhead from using instrumented code, we measured system performance once all bounds-checking tests have been turned off. We observed that the performance in this case was within 3-4% of the original uninstrumented performance.

Our results indicate that we have significantly cut down on the overhead from bounds checking. Moreover, once bounds checking is turned off, performance is close to the unchecked case.

Chapter 6
Static Analysis as a Means for Facilitating Runtime Checking

In this section we discuss possible uses of static analysis to facilitate runtime checking. In particular, we focus on two categories of uses: static analysis for helping with runtime bounds checking (Section 6.1), and the general case of using static analysis for helping with code instrumentation (Section 6.2).

6.1 Static Analysis for Bounds Checking

Static analysis has been used for improving bounds checking system in several ways. The general problem of bounds checking can be divided into two subproblems: (1) identifying a given memory reference's referent object and thus its bounds (bound lookup problem), and (2) comparing the reference's address with the referent's bounds and raising an exception if the bound is violated (bound comparison problem). One of the common uses of static analysis has been used in solving the bound comparison subproblem. The general approach is to eliminate unnecessary checks, so that the number of checks is reduced. However, as seen in Section 5.4.1, these techniques may not be valid in the presence of concurrency.

Gupta [64] used flow analysis techniques to avoid redundant bound checks. Although this technique guarantees to identify any array bound violation, it does not necessarily detect them at the time of the error. This was done to reduce the overhead associated with bounds checking within a loop. Such bound-checks are moved outside the loop so that checking them guarantees prevention of bound overflow errors within the loop body.

ABCD [22] is a light-weight algorithm for eliminating Array Bounds Checks on Demand. Despite its simplicity, ABCD has proved to be quite effective. BOON [40] checks memory errors in using standard string library functions. It formulates memory buffer overruns as integer range analysis problem. Its approach consists of two logical steps: (1) infer possible ranges for array indices, and (2) for each possible input, check using range analysis if any memory access falls beyond an array's limits. Larson and Austin present an interesting extension to this idea. Their method is identical to BOON's approach in step (1). However, for detecting possible inputs in step (2), they argue against using static analysis. In general, any static analysis cannot always determine exactly which paths will be taken at runtime. Hence, BOON's method (which relies on symbolic execution) may produce false positives. Larson and Austin's approach is to dynamically monitor variable values at runtime possibly inferring all possible ranges in which they can fall. Using this information in the range analysis, one can potentially perform well-informed bounds checks.

6.1.1 CCured

CCured [95] is the project most similar to our work in spirit. CCured uses a combination of static analysis and runtime checking to detect all memory errors in C programs. CCured is a source-to-source converter that accepts as input a C program, and outputs another C program instrumented with bounds-checks. It uses type-based static analysis to eliminate redundant memory checks statically. The idea is to assign type qualifiers (through user annotations) to individual pointers. Thereafter, the static analysis infers type qualifiers throughout the program using this initial set of qualifiers. If the inferred type of a pointer is safe, it is not bounds-checked at runtime; for all other memory accesses, runtime checks are inserted.

In CCured, pointers and the memory objects they point to can be typed or untyped. A typed pointer is either SAFE, or SEQ (sequential), and points only to typed memory. Memory typing information is dynamically maintained in a bitmap at runtime. For operations on SAFE pointers, CCured does not insert any bounds checking code; safety is ensured by disallowing any arithmetic on a SAFE pointer, and checking the pointer for only NULL value at runtime. Pointer arithmetic is allowed on SEQ pointers: in fact, SEQ pointers were introduced for memory traversal operations (e.g., array references). In contrast, an untyped pointer always points to an untyped memory region. All operations on an untyped pointer are checked for bound overflows every time it is dereferenced.

With this type system in place, CCured's working can be understood as a sequence of three steps: (1) disallow incompatible operations, like pointer arithmetic on SAFE types, (2) statically validate using type inference which operations need not be checked, and (3) for all other operations, insert runtime checks. CCured enforces the invariant that untyped pointers cannot access typed memory areas. Consider the following example:

01    struct elem {
02        int * UNTYPED unsafe_pointer;
03        int * SAFE	safe_pointer;
04    };
05    int func (struct elem * SEQ p) {
06        struct elem * SAFE e = NULL;
07        int s = 0;
08        for (int i = 0; i < 100; i++) {
09            e = (struct elem * SAFE) (p+i);	//out-of-bounds check
10            s += *(e->unsafe_pointer;	//No out-of-bounds check on e,
					//out-of-bounds check on e->unsafe_pointer
11            s += *(e->safe_pointer;		//No out-of-bounds check on e,
				//Only NULL check on e->safe_pointer
12        }
13        return s;
14    }

This example demonstrates the use of typed pointers and untyped pointers. In this case the programmer has annotated the code with SAFE and UNTYPED annotations. However, in this case CCured could have inferred these attributes. Line 09 casts a SEQ pointer p to SAFE pointer e. A bounds-check is performed at this step. However, e is not checked for bounds overflow after this step.

A most conservative translation is to consider all pointers untyped. Thereafter, CCured infers SAFE and SEQ pointers using a flow-sensitive analysis. This involves constraint generation and solving for type inference. The authors report that in general even without programmer annotations CCured can infer many cases of pointer-typing information. However, the programmers may use annotations for optimizations.

CCured is sound: it detects all spatial memory errors using bounds checks and NULL-pointer checks. For temporal errors it provides its own garbage collection mechanism. Note that a language-based garbage collector makes it a poor candidate for use in systems programming, since the performance characteristic of the code is not obvious from the source code itself-there is an unpredictable and varying garbage collection overhead.

6.1.2 Static Analysis in BCC

BCC employs various static analyses during the course of compilation. Consider the issue of padding variables for the purpose of detecting bound overflows. Jones and Kelly's implementation [75] does not pad stack variables if both of the following conditions hold for them: (1) the variable is not a pointer, and (2) the variable's address is not assigned to any pointer in the scope of the function. Note that BCC derives this information from GCC's analysis framework. By not padding these variables, BCC reduces stack usage of the checked program.

As seen in Section 5.4.1, KBCC leverages GCC's optimization framework to remove unnecessary checks from the program. This is an example of conservative flow-sensitive analysis for mitigating the bound conversion problem.

6.2 Static Analysis for Instrumentation of Code

Static analysis of code can also be useful in the more general case of instrumenting code for tracing support, checking, etc. In this section we first discuss aspect oriented programming, a generic approach to instrumenting arbitrary code. We then present the compile-time instrumentation framework of Aristotle, a runtime verification system we developed.

6.2.1 Aspect-Oriented Programming

Aspect-oriented programming (AOP) is an upcoming programming paradigm which allows crosscutting concerns to be modularized as aspects that are separated from objects. Aspects can specify synchronization policies, resource sharing and performance optimizations over objects. A compiler called weaver, weaves aspects and objects together into a program.

The AOP paradigm was inspired by the difficulty programmers often face in understanding how the code works because it is often poorly localized, and its relation to the execution flow of the main code is hard to follow. Often, implementation for aspects of the program, like the scheduling policy is spread throughout the entire source code. As a result, understanding the code and maintaining it becomes difficult. Often, changing only a particular aspect of the code requires changing the source code in many different places. Aspect-oriented approaches attempt to address these problems by modularizing the scattered pieces of source code that implement a single concept.

Aspects are described as specifications in a spec language. The language comprises of a series of pattern-action pairs. The weaver searches for each occurrence of the pattern in the source code, and performs the corresponding action on it. For example, the action could be to insert a function call, or signal an error. AspectJ is an aspect-oriented framework for Java. It was an offshoot of the MOPS project (Section 3.3.4) for separating higher level protocol specifications from source code. The system comprises of an aspect-oriented language (AspectJ) for Java and a weaver. Some other frameworks currently under development are AspectC++ [1] and AspectC [33].

All these approaches use simple pattern matching when searching for the pattern in the target code (C or Java). The only form of flow-sensitive pattern searching supported is using the cflow keyword. However, the weaver as such does not perform any flow-sensitive analysis. Any flow-sensitive conditions in the pattern code are evaluated at runtime. This can significantly slow down runtime performance for frequently executed code.

The Bossa system [4] is an example of an AOP system that allows the programmer to use flow-sensitive static analysis for specifying patterns. The Bossa spec language supports temporal logic based flow-sensitive pattern specifications. Bossa has been used with the Linux kernel for separating the scheduling framework from the mainstream kernel.

Several other tools exist for code instrumentation. CTool [5] is a simple source transformation tool. AST-log [105] is a tool for simple pattern matching in parse trees. Tools such as ATOM [119] provide support for binary level translation. Purify [106] performs binary level translation of generic programs for inserting bound checking.

6.2.2 Source Code Instrumentation in Aristotle

Aristotle is a runtime verification system for the Linux kernel currently under development in our research group. Aristotle continually monitors kernel execution at runtime, checking it for conformance to pre-defined properties. Kernel monitoring is facilitated by tracing events generated by an instrumented kernel. The tracing events are typically function calls made to Aristotle's runtime checking engine as shown in Figure 6.1. For the purpose of our discussion, we will focus on KGCC.

Figure 6.1: Architectural overview of the Aristotle system. KGCC instruments kernel code with monitoring calls. During the execution of the instrumented kernel, the monitoring calls invoke individual monitors to dynamically verify system properties.

KGCC is a special compiler used by Aristotle for instrumenting the Linux kernel for runtime monitoring. KGCC is an extension to GCC. In fact, the KBCC compiler introduced in Section 5.4.1 is a subset of KGCC. KGCC is a general source code instrumentation framework based on GCC.

Aristotle inserts custom monitoring code into the Linux kernel using KGCC. KGCC scans the parse tree generated during compilation and instruments it with the monitoring code. Using a compiler for statically instrumenting kernel code offers several advantages. Compile-time instrumentation can benefit from higher-level semantic information about the code, e.g. user-defined data types that are typically not available to binary rewriters. Moreover, static analysis can be used to guide the placement of the monitoring code. In this context, Aristotle uses the compiler both as a provider of parse trees, type information, etc., and as a useful tool for instrumenting programs at the parse-tree level, code optimization, etc. KGCC also allows us to separate the monitoring code from the source code, making monitoring code easily portable to different kernel versions in the spirit of aspect-oriented programming.

GCC is a heavy-weight C compiler with little documentation of its internals. Static checking and meta-compilation [66] projects in the past have preferred light-weight compilers with well-documented internals over GCC [27], since they are much easier to understand and modify. The Linux kernel, however, is designed to compile exclusively with GCC, forcing Aristotle to execute code compiled with KGCC rather than just use the compiler for its parsing capabilities. These considerations obviate our choice of a GCC-based approach. KGCC was originally based on GCC-3.2 and later ported to GCC-4.0 (beta). GCC-4.0 offers several advantages: a cleaner interface to manipulate parse trees, better documentation which significantly reduced development times, and advanced optimizations.

Like AOP, KGCC's instrumentations are also guided by pattern-action pairs. Patterns identify source code constructs of interest, and actions identify the corresponding actions to be performed on the source code. For each such pair, GCC scans the parse tree for every occurrence of pattern, and inserts the corresponding action in the appropriate parse-tree node. Patterns can be specified using any information typically available in the parse tree: types, identifiers, containing functions, file names, etc. Note that in our current implementation, we do not yet support a generic language for specifying patterns and actions. KGCC currently hard-codes these into the compiler itself. However, our design is easily amenable to an AOP "weaver" like setup.

KGCC actions can be logically divided into two categories: pre-checks and post-ops. Pre-checks are to be performed before executing the operation identified in the pattern. For example, in the case of memory-bounds checking (see Section 5.4.1), a pre-check would verify pointer sanity before each pointer dereference. A post-op is inserted after the specified operation and can be an arbitrary piece of C code (as is the case for bounds checking) or simply a call to the event dispatcher. In the latter case, the operation (event) will be dispatched to the confidence engine and may be logged for post-processing purposes. For example, when reference-counting monitoring is enabled, an operation that manipulates an object's reference count would be dispatched to the reference-counting monitor to check that the operation does not violate a system-correctness property.

Chapter 7
Conclusions and Future Work

The contributions of this work are that we have presented a survey of available techniques for static checking of software. We have performed a comprehensive survey of existing runtime bounds checking techniques. The main conclusion of this paper is that, where correctness guarantees are desired, static analysis should be combined with runtime checking.

We have seen that static checking of C programs is limited in its scope due to computational limits. Although static checkers have been designed for detecting bugs in software, these checkers are often unsound and can miss bugs. Attempts to achieve complete soundness result in loss of precision in the form of a large number of false alarms. These false alarms are a significant practical drawback.

Existing static checking systems have favored one of soundness and precision in their design trade-offs. Soundness, as incorporated in systems like Cqual, requires user participation in the form of program annotations to be useful. Hence, although they may be adopted in fresh projects, it is unlikely that they will be adopted in existing systems.

Other systems favor bug finding power and precision over soundness, and can be best described as best-effort checkers. Even though they are unsound, they are nevertheless useful. Completely sound and precise systems are not compatible with existing C code. Since software checking for legacy C code like the Linux kernel cannot give correctness guarantees, best-effort bug detection is a useful practical alternative for increasing software quality.

It is evident that static checking alone is not in the state to give correctness guarantees about large complex software like the Linux kernel. Faced with this impossibility, the alternative has been to use runtime techniques for providing correctness guarantees. However, runtime checking techniques themselves are not acceptable in production systems due to performance reasons. The central thesis of this paper is that, where correctness guarantees are desired, static analysis should be combined with runtime checking.

The rest of this chapter is organized as follows. Section 7.1 discusses our conclusions about static checking techniques. Section 7.2 discusses our conclusions about runtime bounds checking. We present possible future directions of research in Section 7.3.

7.1 Static Checking

The cost of repairing a bug increases along the software development time-line. An early discovery can result in substantial savings, making static checking a useful tool for software developers.

From our survey of annotation-based checkers and tool-based checkers, we believe that both have some unique benefits. Annotation-based systems can be more sound and precise, but require additional effort on the part of the programmer. Tool-based checkers are easier to use, especially if they are extensible, but they are often unsound and need innovative design to be powerful in catching bugs.

Annotations can be viewed as a mechanism by which programmers help a static checker to check their code. Static checking systems use annotations for some or all of the following purposes:

Assumptions and Invariants: Programmers use annotations to specify assumptions and invariants. Thus the static analyzer is freed from the responsibility of inferring invariants. Completely sound and precise static invariant inference is often intractable. Thus with programmer help, the static analyzer can still usefully check program properties without being any more unsound and imprecise.
Interface Specifications: Programmers use annotations to specify procedure interfaces. This often saves the static analyzer from (a possibly unsound and imprecise) inter-procedural analysis. More importantly, this leads to highly scalable static analyzers since procedures can be easily checked in isolation. Interface level annotations are arguably a useful means to automatically generate code documentation as well.
Extensibility: Some static analyzers like ESC/Java (see Section 3.1.4) give programmers much freedom in specifying annotations (ESC/Java allows programmers to specify arbitrary first order logic annotations). Hence programmers can extend the default checking done by the checker with their own custom checks.

Tool-based checkers are trying to solve a much harder problem: checking possibly arbitrary code where the code speaks for itself (no annotations). In fact, tool-based checkers represent a much advanced point in the automated checking design space, especially with regard to engineering novelty. Since this problem is hard, tool based checkers trade-off soundness for tractability. From our survey we infer the following key features that increase the power and usefulness of tool-based checkers:

Extensibility: Most of the bugs in software are due to violations of some system-specific properties. For a checker to be useful in general, it should be able to detect such bugs. Tool-based checkers like ESP and Metacompilation allow the programmer to specify system-specific properties separately from code. The checker then uniformly checks the entire code base with this property specification. Thus, both tool-based and annotation-based methods use the programmer-only that tool-based methods can use them even with unmodified code.
Statistical Inference: Best-effort checkers can be more bold in their bug detection strategies than their sound and complete counterparts. One example of this ideology is the use of statistical inference for deriving system invariants, and then checking the code for violations of these invariants. Note that such an analysis can be very imprecise, and heuristics are needed to properly sort and rank such errors.
Rank Bugs: To win programmers' confidence, the checker must be useful. Just finding bugs is not enough. There should be minimum false alarms. Again, this is a difficult problem especially if the checker is aggressive in finding as many bugs as possible. One solution is to rank bugs: bugs are ranked in the order of decreasing likelihood of being a genuine bug and not just a false alarm.

7.2 Runtime Bounds Checking

Memory access errors cannot always be verified statically. However, they also represent a serious problem in systems today. Runtime bounds checking, possibly with help from static analysis, is a viable approach to ensuring memory safety at runtime.

Runtime memory access checking schemes can be evaluated based on four criteria [132]:

The ability to detect all memory errors,
low performance overhead,
the ability to handle all C programs without modification, and
preserving the existing C memory-allocation model.

We have seen classification of bounds checking methods based on their approach for handling the bounds lookup subproblem. We proposed a formal classification for existing techniques based on capability checking and availability checking. Capability checking are much better than availability-based techniques in most respects except one. Capability-based techniques are sound in that they can potentially detect all memory errors, they are much faster, they can protect existing C memory allocation model, and they can be backward-compatible with most existing C source code. However, capability-based methods are often incompatible with existing pre-compiled C code (e.g., libraries).

The independent subproblem of bounds comparison has also received much attention. Static analysis has been successfully used for eliminating unnecessary checks on bound accesses. A radically different approach is the use of hardware-based techniques.

The Linux kernel bounds checking compiler is an instructive experiment in runtime bounds checking of large safety-critical software. In particular, we have successfully applied a known bounds checking technique to check the Linux kernel. Arguably, this is the only existing software approach that could be applied to bounds checking of the Linux kernel in a completely backward-compatible way. During the process of this port, we identified and solved several issues like invalid pointers, redundant checks, etc.

7.3 Future Work

We plan to pursue future work in two directions: (1) bounds checking, and (2) static analysis for runtime checking.

Bounds Checking

Performance overhead is the bane of runtime bounds checking. Whereas removal of redundant pointer checks did improve KBCC's performance, it is still too slow for deployment in production environments. Hardware based techniques, though promising, fall short in comprehensiveness and speed in some cases. We are exploring an alternative novel approach to address this problem. The idea is that after bounds checks have been exercised a certain number of times, they can be turned off, giving back the original performance.

Static Analysis for Runtime Checking

We are actively developing the Aristotle system introduced in Section 6.2.2. Aristotle will be used for verifying properties of the Linux kernel at runtime. We plan to leverage static analysis in two ways to help achieve this goal:

Minimize performance overhead due to monitoring: statically deduce and verify all that is possible, leaving only the rest for runtime checking and verification.
Provide fine-grained control to Aristotle developers for arbitrarily instrumenting kernel code at compile time.

In the context of bounds checking, we have achieved limited success in fulfilling the first requirement. For providing fine-grained instrumentation control, we plan to develop upon ideas from aspect-oriented programming research to implement a useful extension to GCC for arbitrarily instrumenting source code at compile time. In particular, our GCC support is nearly ready, and we are working on an advanced specification language for describing the "aspects." Recent deliberations have suggested the possibility that overloading the already heavy-weight GCC compiler with this facility may not be a good idea, and the analysis engine and instrumentation engine should be implemented as separate tools.

Bibliography

[1]: AspectC++ Homepage. www.aspectc.org.
[2]: Splint Guide. www.splint.org/guide/guide.html.
[3]: Splint Manual. www.splint.org/manual/manual.html.
[4]: The Bossa Project. www.emn.fr/x-info/bossa/.
[5]: The GNU CTool Project. http://ctool.sourceforge.net.
[6]: A. Bensoussan and C. T. Clingen and R. C. Daley. The Multics Virtual Memory: Concepts and Design. Communcations of the ACM, 15(5):308-318, 1972.
[7]: A. Blass and Y. Gurevich. Inadequacy of Computable Loop Invariants. ACM Transactions on Computational Logic, 2(1):1-11, 2001.
[8]: S. Abramsky and C. Hankin, editors. Abstract interpretation of declarative languages. Halsted Press, New York, NY, USA, 1987.
[9]: M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young. Mach: A new kernel foundation for UNIX development. In Proceedings of the Summer USENIX Technical Conference, pages 93-112, Summer 1986.
[10]: A. Aiken, J. S. Foster, J. Kodumal, and T. Terauchi. Checking and inferring local non-aliasing. SIGPLAN Not., 38(5):129-140, 2003.
[11]: W. Arbaugh, D. Farber, and J. Smith. A Secure and Reliable Bootstrap Architecture. In Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 1997.
[12]: W. A. Arbaugh, W. L. Fithen, and J. McHugh. Windows of Vulnerability: A Case Study Analysis. IEEE Computer, 33(12):52-59, December 2000.
[13]: T. M. Austin, S. E. Breach, and G. S. Sohi. Efficient detection of all pointer and array access errors. SIGPLAN Not., 29(6):290-301, 1994.
[14]: T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani. Automatic predicate abstraction of c programs. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM Press, 2001. Appeared in ACM SIGPLAN Notices 36(5), pp. 203-213.
[15]: T. Ball, A. Podelski, and S. K. Rajamani. Relative completeness of abstraction refinement for software model checking. In Joost-Pieter Kaoen and Perdita Stevens, editors, Proceedings of TACAS02: Tools and Algorithms for the Construction and Analysis of Systems, volume 2280 of LNCS, pages 158-172, Grenoble, France, April 2002. Springer-Verlag.
[16]: T. Ball and S. K. Rajamani. The SLAM toolkit. In CAV '01: Proceedings of the 13th International Conference on Computer Aided Verification, pages 260-264. Springer-Verlag, 2001.
[17]: T. Ball and S. K. Rajamani. The slam project: Debugging system software via static analysis. In Conference Record of the 29th Annual ACM Symposium on Principles of Programming Languages, pages 1-3. ACM Press, 2002.
[18]: R. Balzer. Mediating connectors. Technical report, USC/ISI, 1998. www.isi.edu/software-sciences/instrumented-connectors.html.
[19]: R. Balzer and N. Goldman. Mediating connectors. In Proceedings of the 19th IEEE International Conference on Distributed Computing Systems Workshop, pages 72-7, Austin, TX, June 1999.
[20]: B. Bershad, S. Savage, P. Pardyak, E. G. Sirer, D. Becker, M. Fiuczynski, C. Chambers, and S. Eggers. Extensibility, safety, and performance in the SPIN operating system. In Proceedings of the 15th ACM Symposium on Operating System Principles (SOSP '95), pages 267-284, Copper Mountain Resort, CO, December 1995. ACM SIGOPS.
[21]: W. R. Bevier. Kit: A Study in Operating System Verification. IEEE Transactions on Software Engineering, 15(11):1382-1396, 1989.
[22]: R. Bodik, R. Gupta, and V. Sarkar. Abcd: Eliminating array bounds checks on demand. In PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 321-333, New York, NY, USA, 2000. ACM Press.
[23]: C. Boyapati and M. C. Rinard. A parameterized type system for race-free Java programs. volume 36(11) of SIGPLAN Notices, pages 56-69, November 2001.
[24]: T. Bray. The Bonnie home page. www.textuality.com/bonnie, 1996.
[25]: H. T. Brugge. The BCC home page. http://web.inter.NL.net/hcc/Haj.Ten.Brugge, 2001.
[26]: W. R. Bush, J. D. Pincus, and D. J. Sielaff. A static analyzer for finding dynamic programming errors. Softw. Pract. Exper., 30(7):775-802, 2000.
[27]: C. W. Fraser and D. R. Hanson. A Retargetable C Compiler: Design and Implementation. Addison-Wesley Longman Publishing Co., Inc., 1995.
[28]: S. Chiba. A metaobject protocol for c++. In OOPSLA '95: Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications, pages 285-299, New York, NY, USA, 1995. ACM Press.
[29]: T. Chiueh, G. Venkitachalam, and P. Pradhan. Integrating segmentation and paging protection for safe, efficient and transparent software extensions. In Proceedings of 17th ACM Symposium on Operating Systems Principles (SOSP '99), pages 140-153, Kiawah Island Resort, near Charleston, SC, December 1999. ACM SIGOPS.
[30]: J. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. pages 258-269, 2002.
[31]: A. Chou, B. Chelf, D. R. Engler, and M. Heinrich. Using meta-level compilation to check FLASH protocol code. In Architectural Support for Programming Languages and Operating Systems, pages 59-70, 2000.
[32]: A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An Empirical Study of Operating Systems Errors. In Proceedings of the 18th ACM Symposium on Operating System Principles (SOSP 2001), Chateau Lake Louise, Banff, Canada, October 2001. ACM SIGOPS.
[33]: Y. Coady, G. Kiczales, M. Feeley, and G. Smolyn. Using AspectC to Improve the Modularity of Path-Specific Customization in Operating System Code. In Joint 8th European Software Engineering Conference (ESEC) and 9th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), pages 88-98, 2001.
[34]: A. Coen-Porisini, G. Denaro, C. Ghezzi, and M. Pezze. Using symbolic execution for verifying safety-critical systems. In ESEC/FSE-9: Proceedings of the 8th European Software Engineering Conference held jointly with 9th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 142-151, New York, NY, USA, 2001. ACM Press.
[35]: R. Coker. The Bonnie++ home page. www.coker.com.au/bonnie++, 2001.
[36]: J.C. Corbett, M.B. Dwyer, J. Hatcliff, S. Laubach, C.S. Pasareanu, Robby, and H. Zheng. Bandera: Extracting finite-state models from java source code. In Proceedings of 22nd International Conference on Software Engineering, pages 439-448, 2000.
[37]: P. Cousot and R. Cousot. Comparing the galois connection and widening/narrowing approaches to abstract interpretation. In PLILP '92: Proceedings of the 4th International Symposium on Programming Language Implementation and Logic Programming, pages 269-295, London, UK, 1992. Springer-Verlag.
[38]: D. Detlefs and G. Nelson and J. Saxe. Simplify: A theorem prover for program checking. Technical Report HPL-2003-148, HP Labs, Palo Alto, CA, 2003.
[39]: D. Evans and D. Larochelle. Improving Security Using Extensible Lightweight Static Analysis. IEEE Software, 19(1):42-51, 2002.
[40]: D. Wagner and J. S. Foster and E. A. Brewer and A. Aiken. A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities. In Networking and Distributed System Security Symposium 2000, San Diego, CA, February 2000.
[41]: Department of Defense. Trusted computer system evaluation criteria (The Orange Book). Technical Report DoD 5200.28-STD, National Computer Security Center, Alexandria, VA, December 1985. www.dynamoo.com/orange.
[42]: D. L. Detlefs. An overview of the extended static checking (esc) system. In Proceedings of the First Workshop on Formal Methods in Software Practice, pages 1-9, January 1996.
[43]: D. L. Detlefs, R. M. Leino, G. Nelson, and J. B. Saxe. Extended Static Checking. Technical Report SRC-159, Compaq SRC, 130 Lytton Ave., Palo Alto, December 1998.
[44]: N. Dor, M. Rodeh, and M. Sagiv. Cssv: Towards a realistic tool for statically detecting all buffer overflows in c. SIGPLAN Not., 38(5):155-167, 2003.
[45]: P. J. Downey, R. Sethi, and R. E. Tarjan. Variations on the common subexpression problem. J. ACM, 27(4):758-771, 1980.
[46]: D. Engler and K. Ashcraft. RacerX: effective, static detection of race conditions and deadlocks. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 237-252. ACM Press, 2003.
[47]: D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking System Rules Using System-Specific Programmer-Written Compiler Extensions. In Proceedings of the Fourth Symposium on Operating Systems Design and Implementation (OSDI 2000), pages 1-16, San Diego, CA, October 2000.
[48]: D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Proceedings of the 18th ACM Symposium on Operating System Principles (SOSP 2001), Chateau Lake Louise, Banff, Canada, October 2001. ACM SIGOPS.
[49]: D. Evans, J. Guttag, J. Horning, and Y. M. Tan. LCLint: A Tool for Using Specifications to Check Code. In Proceedings of the ACM SIGSOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994.
[50]: M. Fahndrich and R. DeLine. Adoption and focus: practical linear types for imperative programming. In PLDI '02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 13-24, New York, NY, USA, 2002. ACM Press.
[51]: C. Flanagan and S. N. Freund. Type-based race detection for Java. pages 219-232, 2000.
[52]: C. Flanagan and S. N. Freund. Atomizer: A dynamic atomicity checker for multithreaded programs. In Proc. 31st ACM SIGPLAN Symposium on Principles of Programming LANGUAGES (POPL), January 2004.
[53]: C. Flanagan and K. Rustan M. Leino. Houdini, an annotation assistant for esc/java. In FME '01: Proceedings of the International Symposium of Formal Methods Europe on Formal Methods for Increasing Software Productivity, pages 500-517, London, UK, 2001. Springer-Verlag.
[54]: C. Flanagan, K. Rustan M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for java. In PLDI, pages 234-245, 2002.
[55]: C. Flanagan and S. Qadeer. Predicate abstraction for software verification. In POPL '02: Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 191-202, New York, NY, USA, 2002. ACM Press.
[56]: C. Flanagan and S. Qadeer. A type and effect system for atomicity. In Proc. ACM SIGPLAN Conference on Programming Language Design and IMPLEMENTATION (PLDI), pages 338-349. ACM Press, 2003.
[57]: J. S. Foster, T. Terauchi, and A. Aiken. Flow-sensitive type qualifiers. In PLDI '02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, pages 1-12, New York, NY, USA, 2002. ACM Press.
[58]: G. Kiczales and J. D. Rivieres. The Art of the Metaobject Protocol. MIT Press, Cambridge, MA, USA, 1991.
[59]: E. Gabber, C. Small, J. Bruno, J. Brustoloni, and A. Silberschatz. The Pebble component-based operating system. In Proceedings of the Annual USENIX Technical Conference, pages 267-282, Monterey, CA, June 1999.
[60]: V. Ganapathy, S. Jha, D. Chandler, D. Melski, and D. Vitek. Buffer overrun detection using linear programming and static analysis. In CCS '03: Proceedings of the 10th ACM Conference on Computer and Communications Security, pages 345-354, New York, NY, USA, 2003. ACM Press.
[61]: P. Godefroid. Model checking for programming languages using verisoft. In Proceedings of the 24th ACM Symposium on Principles of Programming Languages, pages 174-186, 1997.
[62]: D. B. Golub and R. P. Draves. Moving the Default Memory Manager out of the Mach Kernel. In Proceeding of the Second USENIX Mach Symposium Conference, pages 177-88, Monterey, CA, November 1991.
[63]: D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-based memory management in cyclone. In PLDI '02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 282-293, New York, NY, USA, 2002. ACM Press.
[64]: R. Gupta. Optimizing array bound checks using flow analysis. ACM Letters on Programming Languages and Systems, 2(1-4):135-150, 1993.
[65]: J. V. Guttag and J. J. Horning. Larch: languages and tools for formal specification. Springer-Verlag New York, Inc., New York, NY, USA, 1993.
[66]: S. Hallem, B. Chelf, Y. Xie, and D. Engler. A System and Language for Building System-Specific, Static Analyses. In ACM Conference on Programming Language Design and Implementation, pages 69-82, Berlin, Germany, June 2002.
[67]: M. Hicks, G. Morrisett, D. Grossman, and T. Jim. Safe and flexible memory management in Cyclone. Technical Report CS-TR-4514, University of Maryland Department of Computer Science, July 2003. Also published as University of Maryland Institute for Advanced Computer Studies (UMIACS) Technical report UMIACS-TR-2003-82.
[68]: M. Hohmuth. The Fiasco kernel: Requirements definition. Technical Report TUD-FI98-12, TU Dresden, December 1998. http://os.inf.tu-dresden.de/fiasco/doc.html.
[69]: IEEE/ANSI. Information Technology-Test Methods for Measuring Conformance to POSIX-Part 1: System Interfaces. Technical Report STD-2003.1, ISO/IEC, 1992.
[70]: IEEE/ANSI. Information Technology-Portable Operating System Interface (POSIX)-Part 1: System Application: Program Interface (API) [C Language]. Technical Report STD-1003.1, ISO/IEC, 1996.
[71]: J. Ferrante and K. J. Ottenstein and J. D. Warren. The Program Dependence Graph and its use in Optimization. ACM Transactions on Programming Language Systems, 9(3):319-349, 1987.
[72]: J. W. Nimmer and M. D. Ernst. Static Verification of Dynamically Detected Program Invariants: Integrating Daikon and ESC/Java. In Proceedings of RV'01, First Workshop on Runtime Verification, Paris, France, July 23, 2001.
[73]: T. Jim, G. Morrisett, D. Grossman, M. Hicks, J. Cheney, and Y. Wang. Cyclone: A Safe Dialect of C. In Proceedings of the Annual USENIX Technical Conference, pages 275-288, Monterey, CA, June 2002.
[74]: N. D. Jones and F. Nielson. Abstract interpretation: a semantics-based tool for program analysis. pages 527-636, 1995.
[75]: R. Jones and P. Kelly. Bounds Checking for C. Technical report. www-ala.doc.ic.ac.uk/phjk/BoundsChecking.html.
[76]: K. Rustan M. Leino and G. Nelson and J. B. Saxe. ESC/Java User's Manual. Technical Report 2000-002, Compaq Systems Research Center, October 2000.
[77]: J. Katcher. PostMark: A New Filesystem Benchmark. Technical Report TR3022, Network Appliance, 1997. www.netapp.com/tech_library/3022.html.
[78]: S. Katz and Z. Manna. Logical analysis of programs. Commun. ACM, 19(4):188-206, 1976.
[79]: A. Kolawa and A. Hicken. Insure++: A Tool to Support Total Quality Software. www.parasoft.com/insure/papers/tech.htm, March 2001.
[80]: S. Kumar and K. Li. Using Model Checking to Debug Device Firmware. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 2002), pages 61-74, Boston, MA, December 2002.
[81]: J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy. The stanford flash multiprocessor. In ISCA '98: 25 years of the international symposia on Computer architecture (selected papers), pages 485-496, New York, NY, USA, 1998. ACM Press.
[82]: L. Lam and T. Chiueh. Checking Array Bound Violation Using Segmentation Hardware. In Proceedings of the 2005 International Conference on Dependable Systems and Networks (DSN 2005), June 2005.
[83]: L. Wang and S. D. Stoller. Run-Time Analysis for Atomicity. In Proceedings of the Third Workshop on Runtime Verification (RV), volume 89(2) of Electronic Notes in Theoretical Computer Science. Elsevier, July 2003.
[84]: J. Lamping, G. Kiczales, L. H. Rodriguez Jr., and E. Ruf. An architecture for an open compiler. In Proceedings of the IMSA'92 Workshop on Reflection and Meta-level Architectures (A. Yonezawa and B. C. Smith, eds.), pages 95-106, 1992.
[85]: W. Lee, W. Fan, M. Miller, S. Stolfo, and E. Zadok. Toward cost-sensitive modeling for intrusion detection and response. In Workshop on Intrusion Detection and Prevention, Seventh ACM Conference on Computer Security, Athens, Greece, November 2000.
[86]: K. Rustan M. Leino, J. B. Saxe, and R. Stata. Checking java programs via guarded commands. In Proceedings of the Workshop on Object-Oriented Technology, pages 110-111, London, UK, 1999. Springer-Verlag.
[87]: D. Lie, A. Chou, D. Engler, and D. Dill. A Simple Method for Extracting Models from Protocol Code. In The 28th Annual International Symposium on Computer Architecture (ISCA 2001), Göteborg, Sweden, June/July 2001.
[88]: D. Lie, A. Chou, D. Engler, and D. L. Dill. A simple method for extracting models for protocol code. In ISCA '01: Proceedings of the 28th annual international symposium on Computer architecture, pages 192-203, New York, NY, USA, 2001. ACM Press.
[89]: M. D. Ernst and J. Cockrell and W. G. Griswold and D. Notkin. Dynamically Discovering Likely Program Invariants to Support Program Evolution. IEEE Transactions on Software Engineering, 27(2):99-123, 2001.
[90]: L. Mauborgne. ASTRÉE: Verification of Absence of Run-Time Error. In René Jacquart, editor, Building the information Society (18th IFIP World Computer Congress), pages 384-392. The International Federation for Information Processing, Kluwer Academic Publishers, August 2004.
[91]: Microsoft Corporation. ESP. www.microsoft.com/windows/cse/esp.mspx.
[92]: MIT LCS. Larch Project Home. www.sds.lcs.mit.edu/spd/larch/.
[93]: M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler, and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 2002), pages 75-88, Boston, MA, December 2002.
[94]: N. Suzuki and K. Ishihata. Implementation of an Array Bound Checker. In POPL '77: Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 132-143, New York, NY, USA, 1977. ACM Press.
[95]: G. C. Necula, S. McPeak, and W. Weimer. Type-Safe Retrofitting of Legacy Code. In Proceedings of the 29th Annual ACM Symposium on Principles of Programming Languages, pages 128-139, Portland, OR, January 2002.
[96]: P. Neumann, R. Boyer, R. Feiertag, K. Levitt, and L. Robinson. A Provably Secure Operating System: The System, its Applications, and Proofs. Technical Report CSL-116, SRI International, Menlo Park, CA, May 1980.
[97]: F. Nielson, H. R. Nielson, and C. Hankin. Principles of Program Analysis. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[98]: H. Patil and C. Fischer. Low-cost, concurrent checking of pointer and array accesses in c programs. Softw. Pract. Exper., 27(1):87-110, 1997.
[99]: G. Pearson. Array Bounds Checking with Turbo C. Dr. Dobb's Journal of Software Tools, 16(5):72, 74, 78-79, 81-82, 104-107, May 1991.
[100]: J. S. Pendry, N. Williams, and E. Zadok. Am-utils User Manual, 6.1b3 edition, July 2003. www.am-utils.org.
[101]: John Penix, Willem Visser, Eric Engstrom, Aaron Larson, and Nicholas Weininger. Verification of Time Partitioning in the DEOS Scheduler Kernel. In International Conference on Software Engineering, pages 488-497, 2000.
[102]: B. Perens. efence(3), April 1993.
[103]: R. Pike, D. Presotto, K. Thompson, and H. Trickey. Plan 9 from Bell Labs. In Proceedings of the Summer UKUUG Conference, pages 1-9, July 1990.
[104]: R. DeLine and M. Fähndrich. Enforcing High-Level Protocols in Low-Level Software. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 59-69, 2001.
[105]: R. F. Crew. ASTLOG: A Language for Examining Abstract Syntax Trees. In Proceedings of the USENIX Conference on Domain-Specific Languages, October 1997.
[106]: Rational Software. Rational PurifyPlus. www.rational.com/products/pqc/index.jsp, May 2001.
[107]: R. Rugina and M. Rinard. Symbolic bounds analysis of pointers, array indices, and accessed memory regions. In PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 182-195, New York, NY, USA, 2000. ACM Press.
[108]: J. Rushby. Design and Verification of Secure Systems. In Proceedings of the Eigth ACM Symposium on Operating Systems Principles (SOSP), volume 15(5), pages 12-21, 1981. Appearing also as ACM Operating Systems Review, SIGOPS.
[109]: O. Ruwase and M. Lam. A practical dynamic buffer overflow detector. In Proceedings of the Network and Distributed System Security (NDSS) Symposium, pages 159-169, February 2004.
[110]: S. Graf and H. Saidi. Construction of Abstract State Graphs with PVS. In O. Grumberg, editor, Proc. 9th INternational Conference on Computer Aided Verification (CAV'97), volume 1254, pages 72-83. Springer Verlag, 1997.
[111]: S. M. German and B. Wegbreit. A Synthesizer of Inductive Assertions. IEEE Transactions on Software Engineering, 1(1):68-75, 1975.
[112]: A. Sasturkar, R. Agarwal, L. Wang, and S. D. Stoller. Automated type-based analaysis of data races and atomicity. In Proceedings of the Tenth ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005.
[113]: S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. ERASER: A Dynamic Data Race Detector for Multithreaded Programs. ACM Transactions on Computer Systems, 15(4):391-411, 1997.
[114]: M. G. Schultz, E. Eskin, E. Zadok, M. Bhattacharyya, and S. J. Stolfo. MEF: Malicious Email Filter - A UNIX Mail Filter that Detects Malicious Windows Executables. In Proceedings of the Annual USENIX Technical Conference, FREENIX Track, pages 245-252, Boston, MA, June 2001. (Won best student paper award).
[115]: M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detection of new malicious executables. In Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 2001.
[116]: U. Shankar, K. Talwar, J. S. Foster, and D. Wagner. Detecting Format String Vulnerabilities with Type Qualifiers. In Proceedings of the 10th Usenix Security Symposium, Washington, D.C., August 2001.
[117]: J. S. Shapiro and S. Weber. Verifying the EROS Confinement Mechanism. In RSP: 21th IEEE Computer Society Symposium on Research in Security and Privacy, 2000.
[118]: D. D. Sleator and R. E. Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652-686, 1985.
[119]: A. Srivastava and A. Eustace. Atom: A system for building customized program analysis tools. SIGPLAN Not., 39(4):528-539, 2004.
[120]: N. Sterling. Warlock: A static data race analysis tool. In Proceedings of the USENIX Winter Technical Conference, pages 97-106, San Diego, CA, January 1993.
[121]: S. D. Stoller. Model-checking multi-threaded distributed Java programs. International Journal on Software Tools for Technology Transfer, 4(1):71-91, October 2002.
[122]: Sun Microsystems. LCLint Online Resource. http://docs.sun.com/source/806-3567/lint.html.
[123]: T. Y. Meng. Formal Specification Techniques for Engineering Modular C, volume 1. Kluwer International Series in Software Engineering, Boston, 1995.
[124]: H. Tews, H. Härtig, and M. Hohmuth. VFiasco - Towards a Provably Correct m-Kernel. Technical Report TUD-FI01-1, TU Dresden, January 2001. http://os.inf.tu-dresden.de/fiasco/doc.html.
[125]: The Spin Project Homepage. Static Analysis Tools for C code. www.spinroot.com/static/.
[126]: P. Uppuluri. Pattern Matching Based Intrusion Detection Systems. Technical Report RPE report, State University of New York at Stony Brook, Computer Science Department, August 2001. http://seclab.cs.sunysb.edu/~prem/rpe.ps.
[127]: VERITAS Software. VERITAS File Server Edition Performance Brief: A PostMark 1.11 Benchmark Comparison. Technical report, Veritas Software Corporation, June 1999. http://eval.veritas.com/webfiles/docs/fsedition-postmark.pdf.
[128]: J. Voas, L. Morrel, and K. Miller. Predicting where faults can hide from testing. IEEE Software, 8(2):41-48, 1991.
[129]: B. Wegbreit. The synthesis of loop predicates. Commun. ACM, 17(2):102-113, 1974.
[130]: Xerox Parc. Metaobject Protocols. http://www2.parc.com/csl/groups/sda/projects/mops/.
[131]: Y. Xie, A. Chou, and D. Engler. Archer: Using symbolic, path-sensitive analysis to detect memory access errors. SIGSOFT Softw. Eng. Notes, 28(5):327-336, 2003.
[132]: W. Xu, D. C. DuVarney, and R. Sekar. An efficient and backwards-compatible transformation to ensure memory safety of c programs. In SIGSOFT '04/FSE-12: Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering, pages 117-126, New York, NY, USA, 2004. ACM Press.
[133]: E. Zadok. Stackable file systems as a security tool. Technical Report CUCS-036-99, Computer Science Department, Columbia University, December 1999. www.cs.columbia.edu/~library.
[134]: E. Zadok, J. M. Anderson, I. Badulescu, and J. Nieh. Fast Indexing: Support for size-changing algorithms in stackable file systems. In Proceedings of the Annual USENIX Technical Conference, pages 289-304, Boston, MA, June 2001.
[135]: E. Zadok, I. Badulescu, and A. Shender. Extending file systems using stackable templates. In Proceedings of the Annual USENIX Technical Conference, pages 57-70, Monterey, CA, June 1999.
[136]: E. Zadok and J. Nieh. FiST: A Language for Stackable File Systems. In Proceedings of the Annual USENIX Technical Conference, pages 55-70, San Diego, CA, June 2000.
[137]: X. Zhang, A. Edwards, and T. Jaeger. Using cqual for static analysis of authorization hook placement. In Proceedings of the 11th USENIX Security Symposium, pages 33-48, Berkeley, CA, USA, 2002. USENIX Association.

File translated from T_EX by T_TH, version 3.67.
On 21 May 2005, 00:38.