fn()
{
char arr[11];
strcpy(arr, "hello world");
}
One can make two observations regarding this code. First, the
allocated space for arr is too small to accommodate the entire
string in it. Secondly, strcpy does not detect and handle this
situation. As is obvious, this operation will cause data to be
written beyond the limits of the array arr. The result could be
a program crash, or misbehavior due to memory corruption, or, no
visible effect at all. One can take two points from this example:
Data entered by a user should not directly be used as argument to printf since the user may be an adversary.This constraint can be specified in Cqual by introducing two type qualifiers for character strings: tainted for strings that could be controlled by an adversary, and untainted for strings that must not be controlled by an adversary. The sub-typing relationship between the two qualifiers is denoted as:
untainted < taintedmeaning that untainted is a subtype of tainted, any piece of code that accepts a tainted argument can be passed an untainted argument, but not vice versa. The sub-typing relation is typically specified as a lattice of type qualifiers with partial-order relations between the qualifiers. CQUAL has a few built-in inference rules that extend the subtype relation to qualified types. Using the extended inference rules, CQUAL performs qualifier inference to detect violations against the type relations defined by the lattice. Intuitively, Cqual works by converting a program augmented by type qualifiers into constraints on type qualifier compatibility, which can be represented graphically. The problem of checking correctness is then reduced to graph reachability. Cqual's analysis employs a unique combination of flow-insensitive and flow-sensitive analysis. It models program state as an abstract store, which is a mapping from variables to types (types contain qualifiers). The effect of a program statement on the state is emulated as a "Hoare logic" type effect with possible changes in variable types (ALLOC and ASSIGN operations). This flow-insensitive phase also includes alias analysis and effect inference. Following this, the flow-sensitive analysis phase handles flow-sensitive qualifiers and linearity issues for alias analysis. Finally, even though Cqual's analysis is not path-sensitive, it does not suffer from false-positives due to impossible paths. This is because Cqual's constraint satisfaction logic tries to find at least one way to satisfy the constraints, rather than finding all possible violations. However, since it does not do path-sensitive analysis, the error message may not be as informative (as in PREfix, for example). One of the key innovations in Cqual is its practical approach to alias analysis. The general problem in alias analysis is making updates to one alias visible to others. Cqual handles this by adding a layer of indirection between pointers and their types by introducing an abstract location with each variable. With this change, a variable is associated with an abstract location, and abstract stores map abstract locations to qualified types. Thus a type update to an abstract location is automatically available to other references pointing to that location. An abstract location is defined as linear if the type system can prove that it corresponds to a single concrete memory location in every execution; otherwise, it is non-linear. Updates to linear locations can arbitrarily change their qualifiers (strong updates). Updates to a non-linear location result in a unification of qualifiers (weak update). Locations may be inferred as non-linear even if strong updates are desired on them (Cqual is sound). Evidently, strong updates give better opportunities to test qualifier constraints. Linearity can be reclaimed by using Cqual's restrict keyword (or equivalently the confine keyword). The expression restrict x = e1 in e2 introduces a new name bound to the value of e1. The name x is given to a new abstract location, and among all aliases of e1 only x and values derived from x may be used within e2. Thus, the location of x may be linear, and hence may be strongly updated, even if the location of e1 is non-linear. Cqual's scalable, annotation-based analysis has been applied to large software including the Linux kernel. It discovered several locking bugs in Linux [57], and has been used to detect format-string vulnerabilities too [137]. Cqual is easy to use and provides strong soundness guarantees. However, there are two problems with Cqual. Firstly, checking pre-written code with Cqual needs substantial rewriting effort. Secondly, Cqual's conservative aliasing assumptions may lead to a high number of false-positives requiring additional annotation effort to suppress them (restrict operator). Recently, Cqual has introduced automatic confine inference by the static analyzer to reduce the burden of aliasing annotations. In fact, it has been observed to apply to roughly 95% of pre-written C code [10]. Usability Cqual's design places strong emphasis on usability. Firstly, due to its unique combination of intra-procedural and inter-procedural analysis, Cqual is quite scalable. This is further demonstrated by the number of bugs it has caught in the Linux kernel (4M+ LOC and counting). Secondly, Cqual is easier to use than some other annotation-based systems: it needs less annotation-effort, thanks to automatic inference of type qualifiers. Also, confine-inference has been shown to work 95% of the time on pre-written code. Thus, one does not need to add heavy annotations to enforce linearity. Finally, Cqual is getting ready for mainstream usage: it currently presents the analysis results using Program Analysis Mode, an Emacs-based GUI. However, there are two concerns with Cqual: (1) how much will the rewriting effort be for porting legacy code to Cqual (a Houdini [53] like annotation assistant would be useful), and (2) it is not clear what is the false alarm rate of Cqual's conservative alias analysis.
v.freed: { *v } ==> v.stop
Thus the alphabet of each SM is defined by the Metal patterns used
within the extension. Patterns serve the purpose of identifying
source code actions that are relevant to a particular rule.
Intuitively, SM transitions map directly to path-specific
actions in the source code.
Each individual SM's current state consists of one global
state value and one or more variable-specific state values.
Global state values capture a program-wide property (e.g.,
"interrupts are disabled"). Variable-specific state values capture
program properties associated with specific source objects (e.g.,
"pointer p is freed"). Besides, state transitions in Metal can be
augmented with general-purpose code which has the effect of a
"dynamic" state space for the SM.
Implementation
The inter-procedural analysis engine of the MC system is called
xgcc. It performs bottom-up summary-based traversal of the
call graph like PREfix (Section 3.2.1). xgcc's role can be
described as mapping of the SM abstraction provided by Metal to lower
level program analysis. xgcc follows the approach of splitting the
analysis into an intra-procedural pass followed by an inter-procedural
pass. The intra-procedural analysis is implemented as a DFS traversal
over the abstract syntax tree (AST) of the function. Block-level
state caching is used to speed things up. Since the analysis is path
sensitive, a program point may have different states associated with
different paths. Storage requirement is further reduced by separating
variable states from program point states. In this way, variable
states may be shared, and if the states are the same, applying the
same transition will have the same effect.
The main contribution of the inter-procedural analysis phase is
refinement and restore of the SM state. State
refinement occurs when a function call is encountered and followed.
Any object that passes from the caller's scope to the callee's scope
should retain its state which has the effect of refining the incoming
state at the callee. The state is restored when the analysis returns
from the callee and resumes analyzing the caller. The restore
operation may need to move the state back from an object in the
callee's scope to the corresponding object in the caller's scope. The
extension's global state passes across function call boundaries
unchanged.
The inter-procedural path-sensitive analysis employed by the MC system
is memory and CPU intensive. In fact, for analyzing systems of the
size of the Linux kernel (more than 4 million lines of code and
counting), analysis scalability can play a determining role in the
overall feasibility of the system. The MC system uses composable
summaries of function behaviors for reducing running time. Once
functions are analyzed, they are condensed into summaries which state
how the function transforms the possible incoming transition states.
Function summaries are used to reproduced the effect of analyzing the
function once there is a hit in the summary-cache at a function call
boundary.
False Positive Suppression
The MC system adopts several methods to suppress false positives.
False path pruning is used to identify infeasible paths. For
example, in the following code, there are only two possible paths, not
four. The two impossible paths can drive an SM into an error state:
freeing unallocated pointer, and allocating a pointer but not using
it.
if(x)
{
v = kmalloc(10);
// use v
}
if(x)
{
kfree(v);
}
MC performs limited symbolic execution of the program to statically
deduce which conditionals can be taken and which ones cannot. Based
on this information, it prunes impossible paths. The actual
implementation uses a congruence closure algorithm [45].
Variables that must have the same value are placed into a single
equivalence class. The congruence closure algorithm is then used to
derive as many equalities and non-equalities as possible. This
information, combined with the symbolic tracking of inequalities, helps
deduce the truth value of most conditionals. The developer may also
be able to rewrite Metal extensions to weed out observed false paths
in further runs. Finally, Metal saves false path pruning effort by
storing descriptions of identified false paths for use in later runs
of the extension, thus avoiding redundant effort.
Some false positives may be left behind despite the above measures.
MC handles them by ranking error reports based on many
heuristics so that errors most likely to be genuine bugs are reported
at the top.
Conclusion
The MC system represents a significant point in the design space of
static checkers. First, it follows a pragmatic approach to tackle the
soundness-usefulness trade-off. MC is not intended to be sound: it
uses incomplete symbolic execution, does not unroll loops, etc.
However, it aims for ease-of-use by developers thus maximizing its
impact on software development. Second, MC adds another dimension to
"usefulness": finding as many bugs and as many types of
bugs as possible. The authors justify unsoundness by claiming that
soundness directly conflicts with this goal of maximum coverage.
Indeed, the MC system has discovered several thousand bugs in the
Linux and BSD kernels.
However, the scalability issues of the MC system are not very clear.
Inter-procedural path-sensitive analysis can easily explode in CPU and
memory requirement. It seems that the MC system avoids this by (1)
having small or moderately sized SMs that do not generate a lot of
state throughout the entire Linux kernel, and (2) running only a few
extensions at a time.
| Problem | Cyclone's Solution |
| Null pointer | insert checks, cut down checks using static analysis and @never-null attribute |
| Buffer overflows | use fat pointers, cut down checks using: static analysis and @nozeroterm attribute |
| Uninitialized pointers | reject using static analysis at compile time |
| Dangling pointers | detect dangling pointer dereferences using static analysis (region analysis), convert all calls to free to no-ops |
| Type-varying arguments (e.g., arguments to printf) | tagged union, injections, varargs |
| goto,switch,setjmp/longjmp | reject at compile-time |