Article

Inferring Local (Non)Aliasing and Strings for Memory Safety 1

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We propose an original approach for checking memory safety of C pointer programs, by combining deductive verification and abstract interpretation techniques. The approach is modular and contextual, thanks to the use of Hoare-style annotations (pre- and postconditions), allowing us to verify each C function independently. Deductive verification is used to check these annotations in a sound way. Abstract interpretation techniques are used to automatically generate such annotations, in an idiomatic way: standard practice of C programming is identified and incor porated as heuristics. Our first contribution is a set of techniques for identifying aliasing and strings, which we do in a local setting rather th an through a global analysis as it is done usually. Our separati on analysis in particular is a totally new treatment of non-al iasing. We present for the first time two abstract lattices to deal with local pointer aliasing and local pointer non-aliasing in an abstract interpretation framework. Our second contribution is the design of an abstract domain for implications, which makes it possible to build efficient contextual analyses. Our last contribution is an efficient back-and-forth propagati on method to generate contextual annotations in a modular way, in the framework of abstract interpretation. We implemented our method in Caduceus, a tool for the verification of C programs , and successfully generated appropriate annotations for the C standard string library functions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... E.g., a model some authors used in past work on verification of C programs [21] defines, for each allocated pointer, a base address, that points to the beginning of the allocated memory block, a block size, that contains the size of the allocated block, and an integer offset, that corresponds to the difference between the pointer and its base address. In more recent work [22], we describe a different model for pointers better suited to inference of annotations and automatic verification. It defines a left and a right integer bounds for pointers, such that adding an integer between these bounds to the pointer and dereferencing it is safe. ...
... We successfully generated correct annotations and automatically proved memory safety in WHY for 20 functions of the C standard string library implemented in the Minix 3 operating system [25]. This is to contrast with a previous work on sufficient preconditions inference [22] in which we could only prove 18 of these functions because the preconditions we could infer were not expressive enough. With the method we present in this paper, any kind of formula can be inferred using the appropriate combination of abstract domains and quantifier elimination. ...
Conference Paper
Assertion checking is the restriction of program verification to validity of program assertions. It encompasses safety checking, which is program verification of safety properties, like memory safety or absence of overflows. In this paper, we consider assertion checking of program parts instead of whole programs, which we call modular assertion checking. Classically, modular assertion checking is possible only if the context in which a program part is executed is known. By default, the worst-case context must be assumed, which may impair the verification task. It usually takes user effort to detail enough the execution context for the verification task to succeed, by providing strong enough preconditions. We propose a method to automatically infer sufficient preconditions in the context of modular assertion checking of imperative pointer programs. It combines abstract interpretation, weakest precondition calculus and quantifier elimination. We instantiate this method to prove memory safety for C and Java programs, under some memory separation conditions.
... With functional dependencies, nullable pointers cannot be expressed. Dependencies of functions manipulating zero-terminated strings can be specified following the work of [27]. On the other hand, SAL annotations do not contain dependency information. ...
Article
We present functional dependencies, a convenient, formal, but high-level, specification format for a piece of procedural software (function). Functional dependencies specify the set of memory locations, which may be modified by the function, and for each modified location, the set of memory locations that influence its final value. Verifying that a function respects pre-defined functional dependencies can be tricky: the embedded world uses C and Ada, which have arrays and pointers. Existing systems we know of that manipulate functional dependencies, Caveat and SPARK, are restricted to pointer-free subsets of these languages. This article deals with the functional dependencies in a programming language with full aliasing. We show how to use a weakest pre-condition calculus to generate a verification condition for pre-existing functional dependencies requirements. This verification condition can then be checked using automated theorem provers or proof assistants. With our approach, it is possible to verify the specification as it was written beforehand. We assume little about the implementation of the verification condition generator itself. Our study takes place inside the C analysis framework Frama-C, where an experimental implementation of the technique described here has been implemented on top of the WP plug-in in the development version of the tool.
Conference Paper
Formal verification enables developers to provide safety and security guarantees about their code. A modular verification approach supports the verification of different pieces of an application in separation. We propose symbolic linking as such a modular approach, since it allows to decide whether or not earlier verified source files can be safely linked together (i.e. earlier proven properties remain valid). If an annotation-based verifier for C source code supports both symbolic linking and preprocessing, care must be taken that symbolic linking does not become unsound. The problem is that the result of a header expansion depends upon the defined macros right before expansion. In this paper, we describe how symbolic linking affects the type checking process and why the interaction with preprocessing results in an unsoundness. Moreover, we define a preprocessing technique which ensures soundness by construction and show that the resulting semantics after type checking are equivalent to the standard C semantics. We implemented this preprocessing technique in VeriFast, an annotation-based verifier for C source code that supports symbolic linking, and initial experiments indicate that the modified preprocessor allows most common use cases. To the extent of our knowledge, we are the first to support both modular and sound verification of annotated C source code.
Article
This thesis comes within the domain of proofs of programs by deductive verification. The deductive verification generates from a program source and its specification a mathematical formula whose validity proves that the program follows its specification. The program source describes what the program does and its specification represents what the program should do. The validity of the formula is mainly verified by automatic provers. During the last ten years separation logic has shown to be an elegant way to deal with programs which use data-structures with pointers. However it requires a specific logical language, provers, and specific reasoning techniques.This thesis introduces a technique to express ideas from separation logic in the traditional framework of deductive verification. Unfortunately the mathematical formulas produced are not in the same first-order logic than the ones of provers. Thus this work defines new conversions between the polymorphic first-order logic and the many-sorted logic used by most proves.The first part of this thesis leads to an implementation in the Jessietool. The second part results in an important participation to the writing of the Why3 tool, in particular in the architecture and writing of the transformations which implement these conversions.
Article
Full-text available
JML is a notation for specifying the detailed design of Java classes and interfaces. JML's assertions are stated using a slight extension of Java's expression syntax. This should make it easy to use. Tools for JML aid in static analysis, verification, and run-time debugging of Java code.
Conference Paper
Full-text available
A program verifier is a complex system that uses compiler technology, program semantics, property inference, verification-condition generation, automatic decision procedures, and a user interface. This paper describes the architecture of a state-of-the-art program verifier for object-oriented programs.
Conference Paper
Full-text available
We describe an ongoing project, the deployment of a modular checker to statically find and prevent every buffer overflow in future versions of a Microsoft product. Lightweight annotations specify requirements for safely using each buffer, and functions are checked individually to ensure they obey these requirements and do not overflow. To date over 400,000 annotations have been added to specify buffer usage in the source code for this product, of which over 150,000 were automatically inferred, and over 3,000 potential buffer overflows have been found and fixed.
Conference Paper
Full-text available
Existing shape analysis algorithms infer descriptions of data structures at program points, starting from a given precondition. We describe an analysis that does not require any preconditions. It works by attempting to infer a description of only the cells that might be accessed, following the footprint idea in separation logic. The analysis allows us to establish a true Hoare triple for a piece of code, independently of the context in which it occurs and without a whole-program analysis. We present experimental results for a range of typical list-processing algorithms, as well as for code fragments from a Windows device driver.
Conference Paper
Full-text available
Predicate abstraction is a powerful technique for extracting finite-state models from often complex source code. This paper reports on the usage of statically computed invariants inside the predicate abstraction and refinement loop. The main idea is to selectively strengthen (conjoin) the concrete transition relation at a given program location by efficiently computed invariants that hold at that program location. We experimentally demonstrate the usefulness of transition relation strengthening in the predicate abstraction ...
Conference Paper
Full-text available
Semantic analysis of programs is essential in optimizing compilers and program verification systems. It encompasses data flow analysis, data type determination, generation of approximate invariant assertions, etc. This paper is devoted to the systematic and correct design of program analysis frameworks with respect to a formal semantics.
Conference Paper
Full-text available
Reasoning about heap-allocated data structures such as linked lists and arrays is challenging. The reachability predicate has proved to be useful for reasoning about the heap in type-safe languages where memory is manipulated by dereferencing object fields. Sound and precise analysis for such data structures becomes significantly more challenging in the presence of low-level pointer manipulation that is prevalent in systems software. In this paper, we give a novel formalization of the reachability predicate in the presence of internal pointers and pointer arithmetic. We have designed an annotation language for C programs that makes use of the new predicate. This language enables us to specify properties of many interesting data structures present in the Windows kernel. We present preliminary experience with a prototype verifier on a set of illustrative C benchmarks.
Conference Paper
Full-text available
Erroneous string manipulations are a major source of software defects in C programs yielding vulnerabilities which are exploited by software viruses. We present C String Static Verifyer (CSSV), a tool that statically uncovers all string manipulation errors. Being a conservative tool, it reports all such errors at the expense of sometimes generating false alarms. Fortunately, only a small number of false alarms are reported, thereby proving that statically reducing software vulnerability is achievable. CSSV handles large programs by analyzing each procedure separately. To this end procedure contracts are allowed which are verified by the tool.We implemented a CSSV prototype and used it to verify the absence of errors in real code from EADS Airbus. When applied to another commonly used string intensive application, CSSV uncovered real bugs with very few false alarms.
Article
Full-text available
The authors introduce a new programming language concept, called typestate, which is a refinement of the concept of type. Whereas the type of a data object determines the set of operations over permitted on the object, typestate determines the subset of these operations which is permitted in a particular context. Typestate tracking is a program analysis technique which enhances program reliability by detecting at compile-time syntactically legal but semantically undefined execution sequences. These include reading a variable before it has been initialized and dereferencing a pointer after the dynamic object has been deallocated. The authors define typestate, give examples of its application, and show how typestate checking may be embedded into a compiler. They discuss the consequences of typestate checking for software reliability and software structure, and summarize their experience in using a high-level language incorporating typestate checking.
Article
Full-text available
This paper provides a detailed description of the automatic theorem prover Simplify, which is the proof engine of the Extended Static Checkers ESC/Java and ESC/Modula-3. Simplify uses the Nelson-Oppen method to combine decision procedures for several important theories, and also employs a matcher to reason about quantifiers. Instead of conventional matching in a term DAG, Simplify matches up to equivalence in an E-graph, which detects many relevant pattern instances that would be missed by the conventional approach. The paper describes two techniques, labels and counterexample contexts, for helping the user to determine the reason that a false conjecture is false. The paper includes detailed performance figures on conjectures derived from realistic program-checking problems
Conference Paper
Full-text available
. It is possible, but difficult, to reason in Hoare logic about programs which address and modify data structures defined by pointers. The challenge is to approach the simplicity of Hoare logic's treatment of variable assignment, where substitution affects only relevant assertion formul. The axiom of assignment to object components treats each component name as a pointerindexed array. This permits a formal treatment of inductively defined data structures in the heap but tends to produce instances of modified component mappings in arguments to inductively defined assertions. The major weapons against these troublesome mappings are assertions which describe spatial separation of data structures. Three example proofs are sketched. 1 Introduction The power of the Floyd/Hoare treatment of imperative programs [8][11] lies in its use of variable substitution to capture the semantics of assignment: simply, R E x , the result of replacing every free occurrence of variable x in R by...
Article
Full-text available
The use of type casts is pervasive in C. Although casts provide great flexibility in writing programs, their use obscures the meaning of programs, and can present obstacles during maintenance. Casts involving pointers to structures (C structs) are particularly problematic, because by using them, a programmer can interpret any memory region to be of any desired type, thereby compromising C’s already weak type system. This paper presents an approach for making sense of such casts, in terms of understanding their purpose and identifying fragile code. We base our approach on the observation that casts are often used to simulate object-oriented language features not supported directly in C. We first describe a variety of ways — idioms — in which this is done in C programs. We then develop a notion of physical subtyping, which provides a model that explains these idioms. We have created tools that automatically analyze casts appearing in C programs. Experimental evidence collected by using these tools on a large amount of C code (over a million lines) shows that, of the casts involving struct types, most (over 90%) can be associated meaningfully — and automatically — with physical subtyping. Our results indicate that the idea of physical subtyping is useful in coping with casts and can lead to valuable software productivity tools.
Conference Paper
Full-text available
We describe a new technique for finding potential buffer overrun vulnerabilities in security-critical C code. The key to success is to use static analysis: we formulate detection of buffer overruns as an integer range analysis problem. One major advantage of static analysis is that security bugs can be eliminated before code is deployed. We have implemented our design and used our prototype to find new remotely-exploitable vulnerabilities in a large, widely deployed software package. An earlier hand audit missed these bugs. 1. Introduction Buffer overrun vulnerabilities have plagued security architects for at least a decade. In November 1988, the infamous Internet worm infected thousands or tens of thousands of network-connected hosts and fragmented much of the known net [17]. One of the primary replication mechanisms was exploitation of a buffer overrun vulnerability in the fingerd daemon. Since then, buffer overruns have been a serious, continuing menace to system security. If any...
Article
Memory corruption errors lead to non-deterministic, elusive crashes. This paper describes ARCHER ( ARray CHeckER ) a static, effective memory access checker. ARCHER uses path-sensitive, interprocedural symbolic analysis to bound the values of both variables and memory sizes. It evaluates known values using a constraint solver at every array access, pointer dereference, or call to a function that expects a size parameter. Accesses that violate constraints are flagged as errors. Those that are exploitable by malicious attackers are marked as security holes.Memory corruption errors lead to non-deterministic, elusive crashes. This paper describes ARCHER ( ARray CHeckER ) a static, effective memory access checker. ARCHER uses path-sensitive, interprocedural symbolic analysis to bound the values of both variables and memory sizes. It evaluates known values using a constraint solver at every array access, pointer dereference, or call to a function that expects a size parameter. Accesses that violate constraints are flagged as errors. Those that are exploitable by malicious attackers are marked as security holes.We carefully designed ARCHER to work well on large bodies of source code. It requires no annotations to use (though it can use them). Its solver has been built to be powerful in the ways that real code requires, while backing off on the places that were irrelevant. Selective power allows it to gain efficiency while avoiding classes of false positives that arise when a complex analysis interacts badly with statically undecidable program properties. ARCHER uses statistical code analysis to automatically infer the set of functions that it should track --- this inference serves as a robust guard against omissions, especially in large systems which can have hundreds of such functions.In practice ARCHER is effective: it finds many errors; its analysis scales to systems of millions of lines of code and the average false positive rate of our results is below 35%. We have run ARCHER over several large open source software projects --- such as Linux, OpenBSD, Sendmail, and PostgreSQL --- and have found errors in all of them (118 in the case of Linux, including 21 security holes).
Article
There are important classes of programming errors that are hard to diagnose, both manually and automatically, because they involve a program's dynamic behavior. This article describes a compile-time analyzer that detects these dynamic errors in large, real-world programs. The analyzer traces execution paths through the source code, modeling memory and reporting inconsistencies. In addition to avoiding false paths through the program, this approach provides valuable contextual information to the programmer who needs to understand and repair the defects. Automatically-created models, abstracting the behavior of individual functions, allow inter-procedural defects to be detected efficiently. A product built on these techniques has been used effectively on several large commercial programs.
Article
SMT stands for Satisfiability Modulo Theories. An SMT solver decides the satisfiability of propositionally complex formulas in theories such as arithmetic and uninterpreted functions with equality. SMT solving has numerous applications in auto-mated theorem proving, in hardware and software verification, and in scheduling and planning problems. This paper describes Yices, an efficient SMT solver developed at SRI International. Yices supports a rich combination of first-order theories that occur frequently in soft-ware and hardware modeling: arithmetic, uninterpreted functions, bit vectors, arrays, recursive datatypes, and more. Beyond pure SMT solving, Yices can solve weighted MAX-SMT problems, compute unsatisfiable cores, and construct models. Yices is the main decision procedure used by the SAL model checking environment, and it is be-ing integrated to the PVS theorem prover. As a MAX-SMT solver, Yices is the main component of the probabilistic consistency engine used in SRI's CALO system.
Article
In the context of standard abstract interpretation theory, a reduced relative power operation for functionally composing abstract domains is introduced and studied. The reduced relative power of two abstract domains D1 (the exponent) and D2 (the base) consists in a suitably defined lattice of monotone functions from D1 to D2, called dependencies, and it is a generalization of the Cousot and Cousot reduced cardinal power operation. The relationship between reduced relative power and Nielson's tensor product of abstract domains is also investigated. The case of autodependencies, i.e. base and exponent are the same domain, turns out to be particularly interesting: Under certain hypotheses, the domain of autodependencies corresponds to a powerset-like completion of the base abstract domain, providing a compact set-theoretic representation for autodependencies. Two relevant applications of the reduced relative power operation in the fields of logic program analysis and semantics design are presented. Notably, it is proved that the wellknown abstract domain Def for logic program ground-dependency analysis can be characterized as the domain of autodependencies of the standard abstract domain representing plain groundness information only; on the semantics side, it is shown how reduced relative power can be exploited in order to systematically derive compositional semantics for logic programs.
Conference Paper
Memory corruption errors lead to non-deterministic, elusive crashes. This paper describes ARCHER () a static, effective memory access checker. ARCHER uses path-sensitive, interprocedural symbolic analysis to bound the values of both variables and memory sizes. It evaluates known values using a constraint solver at every array access, pointer dereference, or call to a function that expects a size parameter. Accesses that violate constraints are flagged as errors. Those that are exploitable by malicious attackers are marked as security holes.We carefully designed ARCHER to work well on large bodies of source code. It requires no annotations to use (though it can use them). Its solver has been built to be powerful in the ways that real code requires, while backing off on the places that were irrelevant. Selective power allows it to gain efficiency while avoiding classes of false positives that arise when a complex analysis interacts badly with statically undecidable program properties. ARCHER uses statistical code analysis to automatically infer the set of functions that it should track --- this inference serves as a robust guard against omissions, especially in large systems which can have hundreds of such functions.In practice ARCHER is effective: it finds many errors; its analysis scales to systems of millions of lines of code and the average false positive rate of our results is below 35%. We have run ARCHER over several large open source software projects --- such as Linux, OpenBSD, Sendmail, and PostgreSQL --- and have found errors in all of them (118 in the case of Linux, including 21 security holes).
Conference Paper
The technique of abstract interpretation analyzes a computer program to infer various properties about the program. The particular properties inferred depend on the particular abstract domains used in the analysis. Roughly speaking, the properties representable by an abstract domain follow a domain-specific schema of relations among variables. This paper introduces the congruence-closure abstract domain, which in eect extends the properties representable by a given abstract domain to schemas over arbitrary terms, not just variables. Also, this paper intro- duces the heap succession abstract domain, which when used as a base domain for the congruence-closure domain, allows given abstract domains to infer properties in a program's heap. This combination of abstract do- mains has applications, for example, to the analysis of object-oriented programs.
Conference Paper
This paper describes a new static analysis to show the ab- sence of memory errors, especially string buffer overflows in C programs. The analysis is specifically designed for the subset of C that is found in critical embedded software. It is based on the theory of abstract inter- pretation and relies on an abstraction of stores that retains the length of string buffers. A transport structure allows to change the granularity of the abstraction and to concisely define several inherently complex ab- stract primitives such as destructive update and string copy. The analysis integrates several features of the C language such as multi-dimensional arrays, structures, pointers and function calls. A prototype implementa- tion produces encouraging results in early experiments.
Conference Paper
Abstract interpretation is a formal method that enables the static determination (i.e. at compile-time) of the dynamic properties (i.e. at ru n-time) of programs. So far, this method has mainly been used to build sophisticated, optimizing compilers. In this paper, we show how abstract interpretation techniques can be used to perform, prior to their execution, a static and automatic debugging of imperative programs. This novel approach, which we call abstract debugging, lets programmers use assertions to express invariance propertiesas well as inevitable propertiesof programs, such as termination. We show how such assertions can be used to find th e origin of bugs, rather than their occurrences, and determine necessary conditions of program correctness , that is, necessary conditions for programs to be bug-free an d correct with respect to the programmer's assertions. We also show that assertions c an be used to restrict the control-flow of a program and examine its behavior along spec ific execution paths and find necessary conditions for the program to reach a parti cular point in a given state. Finally, we present the Syntox system that enables the abstract debugging of Pascal programs by the determination of the range of scalar variabl es, and discuss implementation, algorithmic and complexity issues.
Conference Paper
This paper describes a sound technique that combines the precision of theorem proving with the loop-invariant inference of abstract interpretation. The loop-invariant computations are invoked on demand when the need for a stronger loop invariant arises, which allows a gradual increase in the level of precision used by the abstract interpreter. The technique generates loop invariants that are specific to a subset of a program's executions, achieving a dynamic and automatic form of value-based trace partitioning. Finally, the technique can be incorporated into a lemmas-on-demand theorem prover, where the loop-invariant inference happens after the generation of verification conditions.
Conference Paper
This paper describes a system which checks correctness of array accesses automatically without any inductive assertions or human interaction. For each array access in the program a condition that the subscript is greater than or equal to the lower bound and a condition that the subscript is smaller than or equal to the upper bound are checked and the results indicating within the bound, out of bound, or undetermined are produced. It can check ordinary programs at about fifty lines per ten seconds, and it shows linear time complexity behavior.It has been long discussed whether program verification will ever become practical. The main argument against program verification is that it is very hard for a programmer to write assertions about programs. Even if he can supply enough assertions, he must have some knowledge about logic in order to prove the lemmas (or verification conditions) obtained from the verifier.However, there are some assertions about programs which must always be true no matter what the programs do; and yet which cannot be checked for all cases. These assertions include: integer values do not overflow, array subscripts are within range, pointers do not fall off NIL, cells are not reclaimed if they are still pointed to, uninitialized variables are not used.Since these conditions cannot be completely checked, many compilers produce dynamic checking code so that if the condition fails, then the program terminates with proper diagnostics. These dynamic checking code sometimes take up much computation time. It is better to have some checking so that unexpected overwriting of data will not occur, but it is still very awkward that the computation stops because of error. Moreover, these errors can be traced back to some other errors in the program. If we can find out whether these conditions will be met or not before actually running the program, we can benefit both by being able to generate efficient code and by being able to produce more reliable programs by careful examination of errors in the programs. Similar techniques can be used to detect semantically equivalent subexpressions or redundant statements to do more elaborate code movement optimization.The system we have constructed runs fast enough to be used as a preprocessor of a compiler. The system first creates logical assertions immediately before array elements such that these assertions must be true whenever the control passes the assertion in order for the access to be valid. These assertions are proved using similar techniques as inductive assertion methods. If an array element lies inside a loop or after a loop a loop invariant is synthesized. A theorem prover was created which has the decision capabilities for a subset of arithmetic formulas. We can use this prover to prove some valid formulas, but we can also use it to generalize nonvalid formulas so that we can hypothesize more general loop invariants.Theoretical considerations on automatic synthesis of loop invariants have been taken into account and a complete formula for loop invariants was obtained. We reduced the problem of loop invariant synthesis to the computation of this formula. This new approach of the synthesis of loop invariant will probably give more firmer basis for the automatic generation of loop invariants in general purpose verifiers.
Conference Paper
The ad-hoc use of unions to encode disjoint sum types in C programs and the inability of C's type system to check the safe use of these unions is a long standing source of subtle bugs. We present a dependent type system that rigorously captures the ad-hoc protocols that programmers use to encode disjoint sums, and introduce a novel technique for automatically inferring, via Craig Interpolation, those de- pendent types and thus those protocols. In addition to checking the safe use of unions, the dependent type information inferred by interpolation gives programmers looking to modify or extend legacy code a precise un- derstanding of the conditions under which some fields may safely be ac- cessed. We present an empirical evaluation of our technique on 350KLOC of open source C code. In 80 out of 90 predicated edges (corresponding to 1472 out of 1684 union accesses), our type system is able to infer the correct dependent types. This demonstrates that our type system captures and explicates programmers' informal reasoning about unions, without requiring manual annotation or rewriting.
Conference Paper
Good alias analysis is essential in order to achieve high performance on modern processors, yet precise inter- procedural analysis does not scale well. We present a source code annotation, #pragma independent, which is a more flexible, intuitive and useful way for the programmer to provide pointer aliasing informa- tion than the current C99 restrictkeyword. We describe a tool which highlights the most important and most likely correct locations at which a programmer can insert the pragmas. We analyze the effect of the improved alias information using a range of compilers and architectures.
Article
The authors introduce a new programming language concept, called typestate, which is a refinement of the concept of type. Whereas the type of a data object determines the set of operations ever permitted on the object, typestate determines the subset of these operations which is permitted in a particular context. Typestate tracking is a program analysis technique which enhances program reliability by detecting at compile-time syntactically legal but semantically undefined execution sequences. These include reading a variable before it has been initialized and dereferencing a pointer after the dynamic object has been deallocated. The authors define typestate, give examples of its application, and show how typestate checking may be embedded into a compiler. They discuss the consequences of typestate checking for software reliability and software structure, and summarize their experience in using a high-level language incorporating typestate checking.
Article
We propose an original approach for checking memory safety of C pointer programs possibly including pointer arithmetic and sharing (but no casts, structures, double indirection or memory deallocation). This involves first identifying aliasing and strings, which we do in a local setting rather than through a global analysis as it is done usually. Our separation analysis in particular is a totally new treatment of non-aliasing. We present for the first time two abstract lattices to deal with local pointer aliasing and local pointer non-aliasing in an abstract interpretation framework. The key feature of our work is to combine abstract interpretation techniques and deductive verification. The approach is modular and contextual, thanks to the use of Hoare-style annotations (pre- and postconditions), allowing to verify each C function independently. Abstract interpretation techniques are used to automatically generate such annotations, in an idiomatic way: standard practice of C programming is identified and incorporated as heuristics. Abstract interpretation and deductive verification are both used to check these annotations in a sound way. Our first contribution is the design of an abstract domain for implications, which makes it possible to build efficient contextual analyses. Our second contribution is an efficient back-and-forth propagation method to generate contextual annotations in a modular way, in the framework of abstract interpretation. Thanks to previously unknown loop refinement operators, this propagation method does not require iterating around loops. We implemented our method in Caduceus, a tool for the verification of C programs, and successfully verified automatically the C standard string library functions.
Conference Paper
The article presents a novel numerical abstract domain for static analysis by abstract interpretation. It extends a former numerical abstract domain based on Difference-Bound Matrices and allows us to represent invariants of the form (±x±y⩽c), where x and y are program variables and c is a real constant. We focus on giving an efficient representation based on Difference-Bound Matrices with O(n2) memory cost, where n is the number of variables, and graph-based algorithms for all common abstract operators, with O(n3 ) time cost. This includes a normal form algorithm to test the equivalence of representation and a widening operator to compute least fixpoint approximations
Article
In prior work [15] we studied a language construct restrict that allows programmers to specify that certain pointers are not aliased to other pointers used within a lexical scope. Among other applications, programming with these constructs helps program analysis tools locally recover strong updates, which can improve the tracking of state in flow-sensitive analyses. In this paper we continue the study of restrict and introduce the construct confine. We present a type and effect system for checking the correctness of these annotations, and we develop efficient constraint-based algorithms implementing these type checking systems. To make it easier to use restrict and confine in practice, we show how to automatically infer such annotations without programmer assistance. In experiments on locking in 589 Linux device drivers, confine inference can automatically recover strong updates to eliminate 95% of the type errors resulting from weak updates.
Article
Drawing upon early work by Burstall, we extend Hoare's approach to proving the correctness of imperative programs, to deal with programs that perform destructive updates to data structures containing more than one pointer to the same location. The key concept is an "independent conjunction" P & Q that holds only when P and Q are both true and depend upon distinct areas of storage. To make this concept precise we use an intuitionistic logic of assertions, with a Kripke semantics whose possible worlds are heaps (mapping locations into tuples of values).
Article
In this paper we propose a scheme that combines type inference and run-time checking to make existing C programs type safe. We describe the CCured type system, which extends that of C by separating pointer types according to their usage. This type system allows both pointers whose usage can be verified statically to be type safe, and pointers whose safety must be checked at run time. We prove a type soundness result and then we present a surprisingly simple type inference algorithm that is able to infer the appropriate pointer kinds for existing C programs. Our experience with the CCured system shows that the inference is very effective for many C programs, as it is able to infer that most or all of the pointers are statically veri able to be type safe. The remaining pointers are instrumented with efficient run-time checks to ensure that they are used safely. The resulting performance loss due to run-time checks is 0-150%, which is several times better than comparable approaches that use only dynamic checking. Using CCured we have discovered programming bugs in established C programs such as several SPECINT95 benchmarks.
Article
This article presents a new numerical abstract domain for static analysis by abstract interpretation. It extends a former numerical abstract domain based on Difference-Bound Matrices and allows us to represent invariants of the form (+/-x+/-y<=c), where x and y are program variables and c is a real constant. We focus on giving an efficient representation based on Difference-Bound Matrices - O(n2) memory cost, where n is the number of variables - and graph-based algorithms for all common abstract operators - O(n3) time cost. This includes a normal form algorithm to test equivalence of representation and a widening operator to compute least fixpoint approximations.
  • B Dutertre
  • L De Moura
Dutertre, B. and L. de Moura, "The YICES SMT Solver," Computer Science Laboratory, SRI International (2006), http://yices.csl.sri.com.
Multi-prover verification of C programs
  • J.-C Filliâtre
  • C Marché
Filliâtre, J.-C. and C. Marché, Multi-prover verification of C programs, in: Proc. ICFEM'04, LNCS 3308, 2004, pp. 15-29.
Using statically computed invariants inside the predicate abstraction and refinement loop
  • H Jain
  • F Ivancic
  • A Gupta
  • I Shlyakhter
  • C Wang
Jain, H., F. Ivancic, A. Gupta, I. Shlyakhter and C. Wang, Using statically computed invariants inside the predicate abstraction and refinement loop., in: Proc. CAV'06, LNCS 4144, 2006, pp. 137-151.
A reachability predicate for analyzing low-level software
  • S Q Shaunak Chatterjee
  • K Shuvendu
  • Z Lahiri
  • Rakamaric
Shaunak Chatterjee, S. Q., Shuvendu K. Lahiri and Z. Rakamaric, A reachability predicate for analyzing low-level software, in: Proc. TACAS'07, 2007.