Article

Verification Modulo Versions: Towards Usable Verification

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We introduce Verification Modulo Versions (VMV), a new static analysis technique for reducing the number of alarms reported by static verifiers while providing sound semantic guarantees. First, VMV extracts semantic environment conditions from a base program P. Environmental conditions can either be sufficient conditions (implying the safety of P) or necessary conditions (implied by the safety of P). Then, VMV instruments a new version of the program, P', with the inferred conditions. We prove that we can use (i) sufficient conditions to identify abstract regressions of P' w.r.t. P; and (ii) necessary conditions to prove the relative correctness of P' w.r.t. P. We show that the extraction of environmental conditions can be performed at a hierarchy of abstraction levels (history, state, or call conditions) with each subsequent level requiring a less sophisticated matching of the syntactic changes between P' and P. Call conditions are particularly useful because they only require the syntactic matching of entry points and callee names across program versions. We have implemented VMV in a widely used static analysis and verification tool. We report our experience on two large code bases and demonstrate a substantial reduction in alarms while additionally providing relative correctness guarantees.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In [27] Logozzo et al. introduce a technique for extracting and maintaining semantic information across program versions: Specifically, they consider an original program P and a variation (version) P of P, and they explore the question of extracting semantic information from P, using it to instrument P (by means of executable assertions) and then pondering what semantic guarantees they can infer about the instrumented version of P . The focus of their analysis is the condition under which programs P and P can execute without causing an abort (due to attempting an illegal operation), which they approximate by sufficient conditions and necessary conditions. ...
... The focus of their analysis is the condition under which programs P and P can execute without causing an abort (due to attempting an illegal operation), which they approximate by sufficient conditions and necessary conditions. They implement their approach in a system called VMV (Verification Modulo Versions) whose goal is to exploit semantic information about P in the analysis of P and to ensure that the transition from P to P happens without regression; in that case, they say that P is correct relative to P. The definition of relative correctness of Logozzo et al. [27] is different from ours, for several reasons: Whereas [27] talk about relative correctness between an original program and a subsequent version in the context of adaptive maintenance (where P and P may be subject to distinct requirements), we talk about relative correctness between an original (faulty) software product and a revised version of the program (possibly still faulty yet more-correct) in the context of corrective maintenance with respect to a fixed requirements specification; whereas [27] use a set of assertions inserted throughout the program as a specification, we use a relation that maps initial states to final states to specify the standards against which absolute correctness and relative correctness are defined; whereas [27] represent program executions by execution traces (snapshots of the program state at assertion sites), we represent program executions by functions mapping initial states into final states; finally, whereas Logozzo et al. define a successful execution as a trace that satisfies all the relevant assertions, we define it as an initial state/final state pair that falls within the relational specification. ...
... The focus of their analysis is the condition under which programs P and P can execute without causing an abort (due to attempting an illegal operation), which they approximate by sufficient conditions and necessary conditions. They implement their approach in a system called VMV (Verification Modulo Versions) whose goal is to exploit semantic information about P in the analysis of P and to ensure that the transition from P to P happens without regression; in that case, they say that P is correct relative to P. The definition of relative correctness of Logozzo et al. [27] is different from ours, for several reasons: Whereas [27] talk about relative correctness between an original program and a subsequent version in the context of adaptive maintenance (where P and P may be subject to distinct requirements), we talk about relative correctness between an original (faulty) software product and a revised version of the program (possibly still faulty yet more-correct) in the context of corrective maintenance with respect to a fixed requirements specification; whereas [27] use a set of assertions inserted throughout the program as a specification, we use a relation that maps initial states to final states to specify the standards against which absolute correctness and relative correctness are defined; whereas [27] represent program executions by execution traces (snapshots of the program state at assertion sites), we represent program executions by functions mapping initial states into final states; finally, whereas Logozzo et al. define a successful execution as a trace that satisfies all the relevant assertions, we define it as an initial state/final state pair that falls within the relational specification. ...
Article
Full-text available
Faults are an important concept in the study of system dependability, and most approaches to dependability can be characterized by the way in which they deal with faults (e.g., fault avoidance, fault removal, fault tolerance, fault forecasting). In their seminal work on modeling dependable computing, Laprie et al. define a fault as the adjudged or hypothesized cause of an error. In this paper, we propose a more formal definition of a fault in the context of software products and discuss some of its implications.
... X. Si, A. Naik-Both authors contributed equally to the paper. -The framework should be able to transfer knowledge across programs, that is, past runs should boost performance on similar programs in the future, which is especially relevant for CI/CD settings [15,20,25]. -The framework should be able to adapt to generate different kinds of invariants (e.g. ...
... Input: a verification instance T and a proof checker check Output: a invariant inv satisfying check(T, inv) = ⊥ Parameter: graph constructor G and invariant grammar A 20 expand inv according to p Update weights of π.η T , π.η A , π.αctx, π. inv , π.ν T , π.ν ...
... there are no nonterminals). Specifically, the candidate is updated by either expanding its leftmost non-terminal according to one of its production rules (lines [19][20] or by replacing its leftmost placeholder symbol with some concrete value from the verification instance (lines [22][23]. The selection of a production rule or concrete value is done through an attention mechanism, which picks the most likely one according to the current context and corresponding region of external memory. ...
Chapter
Full-text available
We propose a general end-to-end deep learning framework Code2Inv, which takes a verification task and a proof checker as input, and automatically learns a valid proof for the verification task by interacting with the given checker. Code2Inv is parameterized with an embedding module and a grammar: the former encodes the verification task into numeric vectors while the latter describes the format of solutions Code2Inv should produce. We demonstrate the flexibility of Code2Inv by means of two small-scale yet expressive instances: a loop invariant synthesizer for C programs, and a Constrained Horn Clause (CHC) solver.
... For 2-safety properties [19,45], the executions are of the same program, but the initial states need not be equal. For translation validation and (some notions of) program equivalence [6,13,32,34,48], the programs are different, but the initial states are the same. For more general notions of program equivalence, including representation independence, the two programs and the initial states are different. ...
... Moreover, researchers have developed specialized approaches for many relational properties, e.g., information flow [3,35,40], continuity [16], determinism [12], differential privacy [24,41], or quantitative reliability [13,14]. Other important applications of relational verification include regression verification [23,25], semantical differences [32,37] and cross or relative verification [27,34,39]. One advantage of our work over many of these approaches is that we can freely switch from the relational world to the non-relational world when program executions diverge. ...
Conference Paper
Establishing quantitative bounds on the execution cost of programs is essential in many areas of computer science such as complexity analysis, compiler optimizations, security and privacy. Techniques based on program analysis, type systems and abstract interpretation are well-studied, but methods for analyzing how the execution costs of two programs compare to each other have not received attention. Naively combining the worst and best case execution costs of the two programs does not work well in many cases because such analysis forgets the similarities between the programs or the inputs. In this work, we propose a relational cost analysis technique that is capable of establishing precise bounds on the difference in the execution cost of two programs by making use of relational properties of programs and inputs. We develop , a refinement type and effect system for a higher-order functional language with recursion and subtyping. The key novelty of our technique is the combination of relational refinements with two modes of typing—relational typing for reasoning about similar computations/inputs and unary typing for reasoning about unrelated computations/inputs. This combination allows us to analyze the execution cost difference of two programs more precisely than a naive non-relational approach. We prove our type system sound using a semantic model based on step-indexed unary and binary logical relations accounting for non-relational and relational reasoning principles with their respective costs. We demonstrate the precision and generality of our technique through examples.
... To make matters worse, these contexts themselves may be changing, and code written under some assumptions today may be used under different ones tomorrow. With so many moving parts and adoption of formal methods an uphill battle, now more than ever, researchers and practitioners have found compositionality and reasoning across versions [22,27]. to be indispensable. ...
... Researchers have explored relational invariants (over multiple executions of a single program) via program transformations that "glue" copies of the program to itself, including self-composition [6,33], product programs [5], Cartesian Hoare logic [32] and decomposition for k-safety [3]. Logozzo et al. describe verification modulo versions [22] and explore how necessary/sufficient environment conditions for a program C's safety can be used to determine whether program C ′ introduced a regression or is "correct relative to C". The work does not involve refinement relations. ...
Preprint
Modern software is constantly changing. Researchers and practitioners are increasingly aware that verification tools can be impactful if they embrace change through analyses that are compositional and span program versions. Reasoning about similarities and differences between programs goes back to Benton, who introduced state-based refinement relations, which were extended by Yang and others. However, to our knowledge, refinement relations have not been explored for traces. We present a novel theory that allows one to perform compositional reasoning about the similarities/differences between how fragments of two different programs behave over time through the use of what we call trace-refinement relations. We take a reactive view of programs and found Kleene Algebra with Tests (KAT) [Kozen] to be a natural choice to describe traces since it permits algebraic reasoning and has built-in composition. Our theory involves a two-step semantic abstraction from programs to KAT, and then our trace refinement relations correlate behaviors by (i) categorizing program behaviors into trace classes through KAT intersection and (ii) correlating atomic events/conditions across programs with KAT hypotheses. We next describe a synthesis algorithm that iteratively constructs trace-refinement relations between two programs by exploring sub-partitions of their traces, iteratively abstracting them as KAT expressions, discovering relationships through a custom edit-distance algorithm, and applying strategies (i) and (ii) above. We have implemented this algorithm as {\sc knotical}, the first tool capable of synthesizing trace-refinement relations. It built from the ground up in Ocaml, using InterProc and SymKAT. We have demonstrated that useful relations can be efficiently generated across a suite of 37 benchmarks that include changing fragments of array programs, systems code, and web servers.
... For 2-safety properties [19,45], the executions are of the same program, but the initial states need not be equal. For translation validation and (some notions of) program equivalence [6,13,32,34,48], the programs are different, but the initial states are the same. For more general notions of program equivalence, including representation independence, the two programs and the initial states are different. ...
... Moreover, researchers have developed specialized approaches for many relational properties, e.g., information flow [3,35,40], continuity [16], determinism [12], differential privacy [24,41], or quantitative reliability [13,14]. Other important applications of relational verification include regression verification [23,25], semantical differences [32,37] and cross or relative verification [27,34,39]. One advantage of our work over many of these approaches is that we can freely switch from the relational world to the non-relational world when program executions diverge. ...
Article
Establishing quantitative bounds on the execution cost of programs is essential in many areas of computer science such as complexity analysis, compiler optimizations, security and privacy. Techniques based on program analysis, type systems and abstract interpretation are well-studied, but methods for analyzing how the execution costs of two programs compare to each other have not received attention. Naively combining the worst and best case execution costs of the two programs does not work well in many cases because such analysis forgets the similarities between the programs or the inputs. In this work, we propose a relational cost analysis technique that is capable of establishing precise bounds on the difference in the execution cost of two programs by making use of relational properties of programs and inputs. We develop , a refinement type and effect system for a higher-order functional language with recursion and subtyping. The key novelty of our technique is the combination of relational refinements with two modes of typing—relational typing for reasoning about similar computations/inputs and unary typing for reasoning about unrelated computations/inputs. This combination allows us to analyze the execution cost difference of two programs more precisely than a naive non-relational approach. We prove our type system sound using a semantic model based on step-indexed unary and binary logical relations accounting for non-relational and relational reasoning principles with their respective costs. We demonstrate the precision and generality of our technique through examples.
... Alternatively, there are approaches [25,34] to reason not only about differences between behaviors, but also to analyze differences between properties in different programs. The technique called Verification Modulo Versions (VMV) [25] transforms assertions from one program into assumptions for another program. ...
... Alternatively, there are approaches [25,34] to reason not only about differences between behaviors, but also to analyze differences between properties in different programs. The technique called Verification Modulo Versions (VMV) [25] transforms assertions from one program into assumptions for another program. VMV then tries to find (or prove absence of) bugs that are present only in the latter program. ...
Conference Paper
Full-text available
We present a novel approach for automated incremental verification that employs both reusable and relational specifications of software to incrementally verify pairs of programs with possibly nested loops. It analyzes two programs, P-the one already verified, and Q-the one needed to be verified, and proceeds by detecting an abstraction αP of P and a simulation ρ, such that αP simulates Q via ρ. The key idea behind our simulation synthesis is to drive construction of both αP and ρ by the safe inductive invariants of P , thus guaranteeing the property preservations by the results. Finally, our approach allows effective lifting of the safe inductive invariants of P to Q using only αP and ρ. Based on our evaluation, in many cases when the absolute equivalence between programs cannot be proven, our approach is able to establish the property directed equivalence, confirming that the program Q is safe.
... While most program analysis techniques focus on the inference of sufficient preconditions to guarantee safety, some techniques also infer necessary preconditions [32,33,72,73,78]. For example, Verification Modulo Versions (VMV) infers both necessary and sufficient conditions and utilizes previous versions of the program to reduce the number of warnings reported by verifiers [73]. ...
... While most program analysis techniques focus on the inference of sufficient preconditions to guarantee safety, some techniques also infer necessary preconditions [32,33,72,73,78]. For example, Verification Modulo Versions (VMV) infers both necessary and sufficient conditions and utilizes previous versions of the program to reduce the number of warnings reported by verifiers [73]. Similarly, necessary conditions are inferred to repair the program in such a way that the repair does not remove any "good" traces [72]. ...
Article
This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program ${\mathcal P}$, program trimming generates a new program ${\mathcal P}'$ such that ${\mathcal P}$ and ${\mathcal P}'$ are equi-safe (i.e., ${\mathcal P}'$ has a bug if and only if ${\mathcal P}$ has a bug), but ${\mathcal P}'$ has fewer execution paths than ${\mathcal P}$. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques.
... While, to the best of our knowledge, our work is the first to apply relative correctness to program derivation, it is not the first to introduce a concept of relative correctness. In [13] Logozzo discusses a framework for ensuring that some semantic properties are preserved by program transformation in the context of software maintenance. In [11] Lahiri et al. present a technique for verifying the relative correctness of a program with respect to a previous version, where they represent specifications by means of executable assertions placed throughout the program, and they define relative correctness by means of inclusion relations between sets of successful traces and unsuccessful traces. ...
Article
Relative correctness is the property of a program to be more-correct than another program with respect to a given specification. Among the many properties of relative correctness, that which we found most intriguing is the property that program P' refines program P if and only if P' is more-correct than P with respect to any specification. This inspires us to reconsider program derivation by successive refinements: each step of this process mandates that we transform a program P into a program P' that refines P, i.e. P' is more-correct than P with respect to any specification. This raises the question: why should we want to make P' more-correct than P with respect to any specification, when we only have to satisfy specification R? In this paper, we discuss a process of program derivation that replaces traditional sequence of refinement-based correctness-preserving transformations starting from specification R by a sequence of relative correctness-based correctness-enhancing transformations starting from abort.
... Several approaches have emerged to address this issue. Code review [36], regression testing [20], [45], test-suite augmentation [35], [40], [41], code contracts [5], [26], regression verification [21], [39] and verification modulo versions [34] are just a few of the approaches to ensure the quality of a change; they all directly benefit from changeimpact analysis (CIA). Change-Impact Analysis determines the set of program elements that may be impacted by a syntactic change. ...
Article
Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). Existing CIA is imprecise because it is coarse-grained, deals with only few refactoring patterns, or is unaware of the change semantics. We formalize the notion of change impact in terms of the trace semantics of two program versions. We show how to leverage equivalence relations to make dataflow-based CIA aware of the change semantics, thereby improving precision in the presence of semantics-preserving changes. We propose an anytime algorithm that allows applying costly equivalence relation inference incrementally to refine the set of impacted statements. We have implemented a prototype in SymDiff, and evaluated it on 322 real-world changes from open-source projects and benchmark programs used by prior research. The evaluation results show an average 35% improvement in the size of the set of impacted statements compared to standard dataflow-based techniques.
... 46% of developers indicated that they use some mechanism to suppress warnings. The primary methods are through a global configuration file, source code annotations (i.e., not in comments), annotations in source code comments, an external suppression file, and by comparing the code to a previous (baseline) version of it [45]. When asked which of these methods they like and dislike, 76% of those that use source code annotations like them, followed by using a global configuration file (63%) and providing annotations in code comments (56%). ...
Conference Paper
Program Analysis has been a rich and fruitful field of research for many decades, and countless high quality program analysis tools have been produced by academia. Though there are some well-known examples of tools that have found their way into routine use by practitioners, a common challenge faced by researchers is knowing how to achieve broad and lasting adoption of their tools. In an effort to understand what makes a program analyzer most attractive to developers, we mounted a multi-method investigation at Microsoft. Through interviews and surveys of developers as well as analysis of defect data, we provide insight and answers to four high level research questions that can help researchers design program analyzers meeting the needs of software developers. First, we explore what barriers hinder the adoption of program analyzers, like poorly expressed warning messages. Second, we shed light on what functionality developers want from analyzers, including the types of code issues that developers care about. Next, we answer what non-functional characteristics an analyzer should have to be widely used, how the analyzer should fit into the development process, and how its results should be reported. Finally, we investigate defects in one of Microsoft's flagship software services, to understand what types of code issues are most important to minimize, potentially through program analysis.
... The differential-assertion-checking problem[25]is to determine if one version of a given program satisfies all assertions satisfied by a previous version of the program. Verification modulo versions[29]filters warnings generated by applying a static analysis to a new version of a program to only the warnings that are novel to the new version. Optimizations to static analysis have been proposed that compute function summaries using an interpolating theorem prover[41]; when analyzing a new version of the program, the optimized analysis first checks if the summaries computed for functions in the original version of the program are valid summaries for functions in the new version of the program. ...
Article
Verifying partial (i.e., termination-insensitive) equivalence of programs has significant practical applications in software development and education. Conventional equivalence verifiers typically rely on a combination of given relational summaries and suggested synchronization points; such information can be extremely difficult for programmers without a background in formal methods to provide for pairs of programs with dissimilar logic. In this work, we propose a completely automated verifier for determining partial equivalence, named Pequod. Pequod automatically synthesizes expressive proofs of equivalence conventionally only achievable via careful, manual constructions of product programs To do so, Pequod syntheses relational proofs for selected pairs of program paths and combines the per-path relational proofs to synthesize relational program invariants. To evaluate Pequod, we implemented it as a tool that targets Java Virtual Machine bytecode and applied it to verify the equivalence of hundreds of pairs of solutions submitted by students for problems hosted on popular online coding platforms, most of which could not be verified by existing techniques.
... $15.00 DOI: 10.1145/3092703.3092719 contracts [5,25], regression veri cation [20,38] and veri cation modulo versions [33]; all bene t from change-impact analysis (CIA). ...
Conference Paper
Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). However most statement-level CIA techniques suffer from imprecision as they do not incorporate the semantics of the change. We formalize change impact in terms of the trace semantics of two program versions. We show how to leverage equivalence relations to make dataflow-based CIA aware of the change semantics, thereby improving precision in the presence of semantics-preserving changes. We propose an anytime algorithm that applies costly equivalence-relation inference incrementally to refine the set of impacted statements. We implemented a prototype and evaluated it on 322 real-world changes from open-source projects and benchmark programs used by prior research. The evaluation results show an average 35% improvement in the number of impacted statements compared to prior dataflow-based techniques.
... Adding more and more specific features will have a minor effect if safety hinges on the convention of use of the module. A similar observation was made by Logozzo et al. [27], who suggest inferring assumptions as preconditions and analyze subsequent versions of the software under those assumptions. We would need assumptions about whether certain threads may run in parallel. ...
Conference Paper
Device drivers rely on fine-grained locking to ensure safe access to shared data structures. For human testers, concurrency makes such code notoriously hard to debug; for automated reasoning, dynamically allocated memory and low-level pointer manipulation poses significant challenges. We present a flexible approach to data race analysis, implemented in the open source Goblint static analysis framework, that combines different pointer and value analyses in order to handle a wide range of locking idioms, including locks allocated dynamically as well as locks stored in arrays. To the best of our knowledge, this is the most ambitious effort, having lasted well over ten years, to create a fully automated static race detection tool that can deal with most of the intricate locking schemes found in Linux device drivers. Our evaluation shows that these analyses are sufficiently precise, but practical use of these techniques requires inferring environmental and domain-specific assumptions.
... Other researchers [14][12] [13] have introduced a concept of relative correctness and have explored this concept in the context of program repair. Our work differs significantly from theirs in many ways: we represent specifications by relations whereas they specify them with assertions; we capture program semantics with input/output functions whereas they capture them by means of execution traces; we define relative correctness by means of competence domains and specification violations whereas they define it by means of correct traces and incorrect traces; we introduce relative correctness as a way to define faults whereas they introduce it as a way to compare program versions; we have explored the implications of relative correctness on several aspects of software engineering, whereas they focus primarily on sooftware testing. ...
Article
Relative correctness is the property of a program to be more-correct than another with respect to a given specification. Whereas the traditional definition of (absolute) correctness divides candidate program into two classes (correct, and incorrect), relative correctness arranges candidate programs on the richer structure of a partial ordering. In other venues we discuss the impact of relative correctness on program derivation, and on program verification. In this paper, we discuss the impact of relative correctness on program testing; specifically, we argue that when we remove a fault from a program, we ought to test the new program for relative correctness over the old program, rather than for absolute correctness. We present analytical arguments to support our position, as well as an empirical argument in the form of a small program whose faults are removed in a stepwise manner as its relative correctness rises with each fault removal until we obtain a correct program.
... In this paper, we propose a novel approach, called Graphs for Partial programs (GRAPA), that boosts complete-code tools to analyze partial programs. As it can take huge effort to analyse whole projects, researchers [7], [22], [10], [21] have explored analyzing only a subset of a whole program. Their basic idea is to extract an abstraction of other parts of a program. ...
... Cross-version program analysis. Prior work on comparing closely related programs versions include regression verification that checks semantic equivalence using uninterpreted function abstraction of equivalent callees [Felsing et al. 2014;Godlin and Strichman 2008;Lahiri et al. 2012], mutual summaries Wood et al. 2017], relational invariant inference to prove differential properties ] and verification modulo versions [Logozzo et al. 2014]. Other approaches include static analysis for abstract differencing [Jackson and Ladd 1994;Partush and Yahav 2014], symbolic execution for verifying assertion-equivalence [Ramos and Engler 2011] and differential symbolic execution to summarize differences [Person et al. 2008]. ...
Article
Full-text available
Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of conflict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also show how to verify this property using a novel, compositional algorithm that combines lightweight summarization for shared program fragments with precise relational reasoning for the modifications. Towards this goal, our method uses a 4-way differencing algorithm on abstract syntax trees to represent different program versions as edits applied to a shared program with holes. This representation allows our verification algorithm to reason about different edits in isolation and compose them to obtain an overall proof of conflict freedom. We have implemented the proposed technique in a new tool called SafeMerge for Java and evaluate it on 52 real-world merge scenarios obtained from Github. The experimental results demonstrate the benefits of our approach over syntactic conflict-freedom and indicate that SafeMerge is both precise and practical.
... The drastic consequences of small program changes on verification tools is sometimes recognized as verification's "butterfly effect" [Leino and Pit-Claudel 2016]. Many program analysis techniques exhibit brittle behaviors [Karpenkov et al. 2016;Logozzo et al. 2014]. This may be in line with the inherent hardness; for instance, the class of programs for which an abstract interpreter is complete is undecidable [Giacobazzi et al. 2015]. ...
Preprint
We study the complexity of invariant inference and its connections to exact concept learning. We define a condition on invariants and their geometry, called the fence condition, which permits applying theoretical results from exact concept learning to answer open problems in invariant inference theory. The condition requires the invariant's boundary---the states whose Hamming distance from the invariant is one---to be backwards reachable from the bad states in a small number of steps. Using this condition, we obtain the first polynomial complexity result for an interpolation-based invariant inference algorithm, efficiently inferring monotone DNF invariants. We further harness Bshouty's seminal result in concept learning to efficiently infer invariants of a larger syntactic class of invariants beyond monotone DNF. Lastly, we consider the robustness of inference under program transformations. We show that some simple transformations preserve the fence condition, and that it is sensitive to more complex transformations.
... Since these techniques do not exploit program semantics, such techniques can only be used as a post-processing step (thus offering little control to users of a tool). Finally, work on differential static analysis [2] can be leveraged to suppress a class of warnings with respect to another program that can serve as a specification [29,32]. Our work does not require any additional program as a specification and therefore can be more readily applied to standard verification tasks. ...
Conference Paper
Full-text available
Verification of open programs can be challenging in the presence of an unconstrained environment. Verifying properties that depend on the environment yields a large class of uninteresting false alarms. Using a verifier on a program thus requires extensive initial investment in modeling the environment of the program. We propose a technique called angelic verification for verification of open programs, where we constrain a verifier to report warnings only when no acceptable environment specification exists to prove the assertion. Our framework is parametric in a vocabulary and a set of angelic assertions that allows a user to configure the tool. We describe a few instantiations of the framework and an evaluation on a set of real-world benchmarks to show that our technique is competitive with industrial-strength tools even without models of the environment.
Article
A common theme in program verification is to relate two programs, for instance to show that they are equivalent, or that one refines the other. Such relationships can be formally established using relational program logics, which are tailored to reason about relations between two programs, or product constructions which allow to build from two programs a product program that emulates the behavior of both input programs. Similarly, product programs and relational program logics can be used to reason about 2-safety properties, an important class of properties that reason about two executions of the same program, and includes as instances non-interference, continuity, and determinism. In this paper, we consider several notions of product programs and explore their relationship with different relational program logics. Moreover, we present applications of product programs to program robustness, non-interference, translation validation, and differential privacy.
Conference Paper
Developing provably correct programs is an incremental process that often involves a series of interactions with a program verifier. To increase the responsiveness of the program verifier during such interactions, we designed a system for fine-grained caching of verification results. The caching system uses the program’s call graph and control-flow graph to focus the verification effort on just the parts of the program that were affected by the user’s most recent modifications. The novelty lies in how the original program is instrumented with cached information to avoid unnecessary work for the verifier. The system has been implemented in the Boogie verification engine, which allows it to be used by different verification front ends that target the intermediate verification language Boogie; we present one such application in the integrated development environment for the Dafny programming language. The paper describes the architecture and algorithms of the caching system and reports on how much it improves the performance of the verifier in practice.
Conference Paper
In practice, software engineers are only able to spend a limited amount of resources on statically analyzing their code. Such resources may refer to their available time or their tolerance for imprecision, and usually depend on when in their workflow a static analysis is run. To serve these different needs, we propose a technique that enables engineers to interactively bound a static analysis based on the available resources. When all resources are exhausted, our technique soundly records the achieved verification results with a program instrumentation. Consequently, as more resources become available, any static analysis may continue from where the previous analysis left off. Our technique is applicable to any abstract interpreter, and we have implemented it for the .NET static analyzer Clousot. Our experiments show that bounded abstract interpretation can significantly increase the performance of the analysis (by up to 8x) while also increasing the quality of the reported warnings (more definite warnings that detect genuine bugs).
Article
Tracking code fragments of interest is important in monitoring a software project over multiple versions. Various approaches, including our previous work on Herodotos, exploit the notion of Longest Common Subsequence, as computed by readily available tools such as GNU Diff, to map corresponding code fragments. Nevertheless, the efficient code differencing algorithms are typically line-based or word-based, and thus do not report changes at the level of language constructs. Furthermore, they identify only additions and removals, but not the moving of a block of code from one part of a file to another. Code fragments of interest that fall within the added and removed regions of code have to be manually correlated across versions, which is tedious and error-prone. When studying a very large code base over a long time, the number of manual correlations can become an obstacle to the success of a study. In this paper, we investigate the effect of replacing the current line-based algorithm used by Herodotos by tree-matching, as provided by the algorithm of the differencing tool GumTree. In contrast to the line-based approach, the tree-based approach does not generate any manual correlations, but it incurs a high execution time. To address the problem, we propose a hybrid strategy that gives the best of both approaches.
Article
We study the complexity of invariant inference and its connections to exact concept learning. We define a condition on invariants and their geometry, called the fence condition, which permits applying theoretical results from exact concept learning to answer open problems in invariant inference theory. The condition requires the invariant's boundary---the states whose Hamming distance from the invariant is one---to be backwards reachable from the bad states in a small number of steps. Using this condition, we obtain the first polynomial complexity result for an interpolation-based invariant inference algorithm, efficiently inferring monotone DNF invariants with access to a SAT solver as an oracle. We further harness Bshouty's seminal result in concept learning to efficiently infer invariants of a larger syntactic class of invariants beyond monotone DNF. Lastly, we consider the robustness of inference under program transformations. We show that some simple transformations preserve the fence condition, and that it is sensitive to more complex transformations.
Conference Paper
In earlier work, we had presented a definition of software fault as being any feature of a program that admits a substitution that would make the program more-correct. This definition requires, in turn, that we define the concept of relative correctness, i.e., what it means for a program to be more-correct than another with respect to a given specification. In this paper we broaden our earlier definition to encompass non-deterministic programs, or non-deterministic representations of programs; also, we study the mathematical properties of the new definition, most notably its relation to the refinement ordering, as well as its algebraic properties with respect to the refinement lattice.
Conference Paper
Many practical static analyzers are not completely sound by design. Their designers trade soundness to increase automation, improve performance, and reduce the number of false positives or the annotation overhead. However, the impact of such design decisions on the effectiveness of an analyzer is not well understood. This paper reports on the first systematic effort to document and evaluate the sources of unsoundness in a static analyzer. We developed a code instrumentation that reflects the sources of deliberate unsoundness in the .NET static analyzer Clousot and applied it to code from six open-source projects. We found that 33% of the instrumented methods were analyzed soundly. In the remaining methods, Clousot made unsound assumptions, which were violated in 2–26% of the methods during concrete executions. Manual inspection of these methods showed that no errors were missed due to an unsound assumption, which suggests that Clousot’s unsoundness does not compromise its effectiveness. Our findings can guide users of static analyzers in using them fruitfully, and designers in finding good trade-offs.
Conference Paper
Full-text available
We introduce a unified view of induction performed by automatic verification tools to prove a given program specification This unification is done in the abstract interpretation framework using extrapolation (widening/dual-widening) and interpolation (narrowing, dual-narrowing, which are equivalent up to the exchange of the parameters). Dual-narrowing generalizes Craig interpolation in First Order Logic pre-ordered by implication to arbitrary abstract domains. An increasing iterative static analysis using extrapolation of successive iterates by widening followed by a decreasing iterative static analysis using interpolation of successive iterates by narrowing (both bounded by the specification) can be further improved by a increasing iterative static analysis using interpolation of iterates with the specification by dual-narrowing until reaching a fixpoint and checking whether it is inductive for the specification.
Article
Given a specification R, it is common for a candidate program P to be doing more than R requires; this is not necessarily bad, and is often unavoidable, due to programming language constraints or to otherwise sensible design decisions. In this paper, we introduce a relational operator that captures, for a given specification R and candidate program P, the functionality delivered by P that is relevant to R. This operator, which we call the projection of P over R (for reasons we explain), has a number of interesting properties, which we explore in this paper.
Conference Paper
This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program P, program trimming generates a new program P′ such that P and P′ are equi-safe (i.e., P′ has a bug if and only if P has a bug), but P′ has fewer execution paths than P. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques.
Conference Paper
We present an approach for comparing two closely related concurrent programs, whose goal is to give feedback about interesting differences without relying on user-provided assertions. This approach compares two programs in terms of cross-thread interferences and data-flow, under a parametrized abstraction which can detect any difference in the limit. We introduce a partial order relation between these abstractions such that a program change that leads to a “smaller” abstraction is more likely to be regression-free from the perspective of concurrency. On the other hand, incomparable or bigger abstractions, which are an indication of introducing new, possibly undesired, behaviors, lead to succinct explanations of the semantic differences.
Chapter
Relational safety specifications describe multiple runs of the same program or relate the behaviors of multiple programs. Approaches to automatic relational verification often compose the programs and analyze the result for safety, but a naively composed program can lead to difficult verification problems. We propose to exploit relational specifications for simplifying the generated verification subtasks. First, we maximize opportunities for synchronizing code fragments. Second, we compute symmetries in the specifications to reveal and avoid redundant subtasks. We have implemented these enhancements in a prototype for verifying k-safety properties on Java programs. Our evaluation confirms that our approach leads to a consistent performance speedup on a range of benchmarks.
Article
Full-text available
We present an approach for comparing two closely related concurrent programs, whose goal is to give feedback about interesting differences without relying on user-provided assertions. This approach compares two programs in terms of cross-thread interferences and data-flow, under a parametrized abstraction which can detect any difference in the limit. We introduce a partial order relation between these abstractions such that a program change that leads to a “smaller” abstraction is more likely to be regression-free from the perspective of concurrency. On the other hand, incomparable or bigger abstractions, which are an indication of introducing new, possibly undesired, behaviors, lead to succinct explanations of the semantic differences.
Article
Full-text available
The modern software engineering process is evolutionary, with commits/patches begetting new versions of code, progressing steadily toward improved systems. In recent years, program analysis and verification tools have exploited version-based reasoning, where new code can be seen in terms of how it has changed from the previous version. When considering program versions, refinement seems a natural fit and, in recent decades, researchers have weakened classical notions of concrete refinement and program equivalence to capture similarities as well as differences between programs. For example, Benton, Yang and others have worked on state-based refinement relations. In this paper, we explore a form of weak refinement based on trace relations rather than state relations. The idea begins by partitioning traces of a program C1 into trace classes, each identified via a restriction r1. For each class, we specify similar behavior in the other program C2 via a separate restriction r2 on C2. Still, these two trace classes may not yet be equivalent so we further permit a weakening via a binary relation A on traces, that allows one to, for instance disregard unimportant events, relate analogous atomic events, etc. We address several challenges that arise. First, we explore one way to specify trace refinement relations by instantiating the framework to Kleene Algebra with Tests (KAT) due to Kozen. We use KAT intersection for restriction, KAT hypotheses for A, KAT inclusion for refinement, and have proved compositionality. Next, we present an algorithm for automatically synthesizing refinement relations, based on a mixture of semantic program abstraction, KAT inclusion, a custom edit-distance algorithm on counterexamples, and case-analysis on nondeterministic branching. We have proved our algorithm to be sound. Finally, we implemented our algorithm as a tool called Knotical, on top of Interproc and Symkat. We demonstrate promising first steps in synthesizing trace refinement relations across a hand-crafted collection of 37 benchmarks that include changing fragments of array programs, models of systems code, and examples inspired by the thttpd and Merecat web servers.
Conference Paper
Frequently updated programs cause the cost of static analysis to be multiplied by the number of program versions. When the baseline cost is high (for example, analyzing JavaScript), this multiplicative factor can be prohibitive. As an example, JavaScript-based browser addons are continually updated and there are known instances where malicious code has been injected into such updates; thus the addons must be repeatedly vetted each time an update happens.
Article
Software model checking and static analysis have matured over the last decade, enabling their use in automated software verification. However, lack of scalability makes these tools hard to apply in industry practice. Furthermore, approximations in the models of program and environment lead to a profusion of false alarms. This paper proposes DC2, a verification framework using scope-bounding to address the issue of scalability, while retaining enough precision to avoid false alarms in practice. DC2 splits the analysis problem into manageable parts, relying on a combination of three automated techniques: (a) techniques to infer useful specifications for functions in the form of pre- and post-conditions; (b) stub inference techniques that infer abstractions to replace function calls beyond the verification scope; and (c) automatic refinement of pre- and post-conditions using counterexamples that are deemed to be false alarms by a user. The techniques enable DC2 to perform iterative reasoning over the calling environment of functions, to find non-trivial bugs and fewer false alarms. Based on the DC2 framework, we have developed a software model checking tool for C/C++ programs called Varvel, which has been in industrial use at NEC for a number of years. In addition to DC2, we describe other scalability and usability improvements in Varvel that have enabled its successful application on numerous large software projects. These include model simplifications, support for witness understanding to improve debugging assistance, and handling of C++ programs. We present experimental evaluations that demonstrate the effectiveness of DC2 and report on the usage of Varvel in NEC.
Conference Paper
Full-text available
We consider the problem of automatic precondition inference. We argue that the common notion of sufficient precondition inference (i.e., under which precondition is the program correct?) imposes too large a burden on callers, and hence it is unfit for automatic program analysis. Therefore, we define the problem of necessary precondition inference (i.e., under which precondition, if violated, will the program always be incorrect?). We designed and implemented several new abstract interpretation-based analyses to infer atomic, disjunctive, universally and existentially quantified necessary preconditions. We experimentally validated the analyses on large scale industrial code. For unannotated code, the inference algorithms find necessary preconditions for almost 64% of methods which contained warnings. In 27% of these cases the inferred preconditions were also sufficient, meaning all warnings within the method body disappeared. For annotated code, the inference algorithms find necessary preconditions for over 68% of methods with warnings. In almost 50% of these cases the preconditions were also sufficient. Overall, the precision improvement obtained by precondition inference (counted as the additional number of methods with no warnings) ranged between 9% and 21%.
Article
Full-text available
Method extraction is a common refactoring feature provided by most modern IDEs. It replaces a user-selected piece of code with a call to an automatically generated method. We address the problem of automatically inferring contracts (precondition, postcondition) for the extracted method. We require the inferred contract: (a) to be valid for the extracted method (validity); (b) to guard the language and programmer assertions in the body of the extracted method by an opportune precondition (safety); (c) to preserve the proof of correctness of the original code when analyzing the new method separately (completeness); and (d) to be the most general possible (generality). These requirements rule out trivial solutions (e.g., inlining, projection, etc). We propose two theoretical solutions to the problem. The first one is simple and optimal. It is valid, safe, complete and general but unfortunately not effectively computable (except for unrealistic finiteness/decidability hypotheses). The second one is based on an iterative forward/backward method. We show it to be valid, safe, and, under reasonable assumptions, complete and general. We prove that the second solution subsumes the first. All justifications are provided with respect to a new, set-theoretic version of Hoare logic (hence without logic), and abstractions of Hoare logic, revisited to avoid surprisingly unsound inference rules. We have implemented the new algorithms on the top of two industrial-strength tools (CCCheck and the Microsoft Roslyn CTP). Our experience shows that the analysis is both fast enough to be used in an interactive environment and precise enough to generate good annotations.
Article
Full-text available
We motivate, define and design a simple static analysis to check that comparisons of floating point values use compatible bit widths and thus compatible precision ranges. Precision mismatches arise due to the difference in bit widths of processor internal floating point registers (typically 80 or 64 bits) and their corresponding widths when stored in memory (64 or 32 bits). The analysis gurantees that floating point values from memory (i.e. array elements, instance and static fields) are not compared against floating point numbers in registers (i.e. arguments or locals). Without such an analysis, static symbolic verification is unsound and hence may report false negatives. The static analysis is fully implemented in Clousot, our static contract checker based on abstract interpretation.
Conference Paper
Full-text available
We present an overview of Clousot, our current tool to statically check CodeContracts. CodeContracts enable a compiler and language-independent specification of Contracts (precondition, postconditions and object invariants). Clousot checks every method in isolation using an assume/guarantee reasoning: For each method under analysis Clousot assumes its precondition and asserts the postcondition. For each invoked method, Clousot asserts its precondition and assumes the postcondition. Clousot also checks the absence of common runtime errors, such as null-pointer errors, buffer or array overruns, divisions by zero, as well as less common ones such as checked integer overflows or floating point precision mismatches in comparisons. At the core of Clousot there is an abstract interpretation engine which infers program facts. Facts are used to discharge the assertions. The use of abstract interpretation (vs usual weakest precondition-based checkers) has two main advantages: (i) the checker automatically infers loop invariants letting the user focus only on boundary specifications; (ii) the checker is deterministic in its behavior (which abstractly mimics the flow of the program) and it can be tuned for precision and cost. Clousot embodies other techniques, such as iterative domain refinement, goal-directed backward propagation, precondition and postcondition inference, and message prioritization.
Conference Paper
Full-text available
Considerable progress has been made towards automatic support for one of the principal techniques available to enhance program reliability: equipping programs with extensive contracts. The results of current contract inference tools are still often unsatisfactory in practice, especially for programmers who already apply some kind of basic Design by Contract discipline, since the inferred contracts tend to be simple assertions—the very ones that programmers find easy to write. We present new, completely automatic inference techniques and a supporting tool, which take advantage of the presence of simple programmer-written contracts in the code to infer sophisticated assertions, involving for example implication and universal quantification. Applied to a production library of classes covering standard data structures such as linked lists, arrays, stacks, queues and hash tables, the tool is able, entirely automatically, to infer 75% of the complete contracts—contracts yielding the full formal specification of the classes—with very few redundant or irrelevant clauses.
Conference Paper
Full-text available
Programmers often insert assertions in their code to be optionally checked at runtime, at least during the debugging phase. In the context of design by contracts, these assertions would better be given as a precondition of the method/procedure which can detect that a caller has violated the procedure’s contract in a way which definitely leads to an assertion violation (e.g., for separate static analysis). We define precisely and formally the contract inference problem from intermittent assertions inserted in the code by the programmer. Our definition excludes no good run even when a non-deterministic choice (e.g., an interactive input) could lead to a bad one (so this is not the weakest precondition, nor its strengthening by abduction, since a terminating successful execution is not guaranteed). We then introduce new abstract interpretation-based methods to automatically infer both the static contract precondition of a method/procedure and the code to check it at runtime on scalar and collection variables.
Conference Paper
Full-text available
Relational program logics are formalisms for specifying and verifying properties about two programs or two runs of the same program. These properties range from correctness of compiler optimizations or equivalence between two implementations of an abstract data type, to properties like non-interference or determinism. Yet the current technology for relational verification remains underdeveloped. We provide a general notion of product program that supports a direct reduction of relational verification to standard verification. We illustrate the benefits of our method with selected examples, including non-interference, standard loop optimizations, and a state-of-the-art optimization for incremental computation. All examples have been verified using the Why tool.
Conference Paper
Full-text available
Specifying application interfaces (APIs) with information that goes beyond method argument and return types is a long-standing quest of programming language researchers and practitioners. The number of type system extensions or specification languages is a testament to that. Unfortunately, the number of such systems is also roughly equal to the number of tools that consume them. In other words, every tool comes with its own specification language. In this paper we argue that for modern object-oriented languages, using an embedding of contracts as code is a better approach. We exemplify our embedding of Code Contracts on the Microsoft managed execution platform (.NET) using the C# programming language. The embedding works as well in Visual Basic. We discuss the numerous advantages of our approach and the technical challenges, as well as the status of tools that consume the embedded contracts.
Conference Paper
Full-text available
We introduce Pentagons (Pentagons ), a weakly relational numerical abstract domain useful for the validation of array accesses in byte-code and intermediate languages (IL). This abstract domain captures properties of the form of x in [a, b] && x < y. It is more precise than the well known Interval domain, but it is less precise than the Octagon domain. The goal of Pentagons is to be a lightweight numerical domain useful for adaptive static analysis, where Pentagons is used to quickly prove the safety of most array accesses, restricting the use of more precise (but also more expensive) domains to only a small fraction of the code. We implemented the Pentagons abstract domain in Clousot, a generic abstract interpreter for .NET assemblies. Using it, we were able to validate 83% of array accesses in the core runtime library mscorlib.dll in less than 8 minutes.
Conference Paper
Full-text available
We introduce FunArray, a parametric segmentation abstract domain functor for the fully automatic and scalable analysis of array content properties. The functor enables a natural, painless and efficient lifting of existing abstract domains for scalar variables to the analysis of uniform compound data-structures such as arrays and collections. The analysis automatically and semantically divides arrays into consecutive non-overlapping possibly empty segments. Segments are delimited by sets of bound expressions and abstracted uniformly. All symbolic expressions appearing in a bound set are equal in the concrete. The FunArray can be naturally combined via reduced product with any existing analysis for scalar variables. The analysis is presented as a general framework parameterized by the choices of bound expressions, segment abstractions and the reduction operator. Once the functor has been instantiated with fixed parameters, the analysis is fully automatic. We first prototyped FunArray in Arrayal to adjust and experiment with the abstractions and the algorithms to obtain the appropriate precision/ratio cost. Then we implemented it into Clousot, an abstract interpretation-based static contract checker for .NET. We empirically validated the precision and the performance of the analysis by running it on the main libraries of .NET and on its own code. We were able to infer thousands of non-trivial invariants and verify the implementation with a modest overhead (circa 1%). To the best of our knowledge this is the first analysis of this kind applied to such a large code base, and proven to scale.
Conference Paper
Full-text available
A program denotes computations in some universe of objects. Abstract interpretation of programs consists in using that denotation to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -1515 * 17 may be understood to denote computations on the abstract universe {(+), (-), (±)} where the semantics of arithmetic operators is defined by the rule of signs. The abstract execution -1515 * 17 → -(+) * (+) → (-) * (+) → (-), proves that -1515 * 17 is a negative number. Abstract interpretation is concerned by a particular underlying structure of the usual universe of computations (the sign, in our example). It gives a summary of some facets of the actual executions of a program. In general this summary is simple to obtain but inaccurate (e.g. -1515 + 17 → -(+) + (+) → (-) + (+) → (±)). Despite its fundamentally incomplete results abstract interpretation allows the programmer or the compiler to answer questions which do not need full knowledge of program executions or which tolerate an imprecise answer, (e.g. partial correctness proofs of programs ignoring the termination problems, type checking, program optimizations which are not carried in the absence of certainty about their feasibility, …).
Conference Paper
Full-text available
Static assertion checking of open programs requires setting up a precise harness to capture the environment assumptions. For instance, a library may require a file handle to be properly initialized before it is passed into it. A harness is used to set up or specify the appropriate preconditions before invoking methods from the program. In the absence of a precise harness, even the most precise automated static checkers are bound to report numerous false alarms. This often limits the adoption of static assertion checking in the hands of a user. In this work, we explore the possibility of automatically filtering away (or prioritizing) warnings that result from imprecision in the harness. We limit our attention to the scenario when one is interested in finding bugs due to concurrency. We define a warning to be an interleaved bug when it manifests on an input for which no sequential interleaving produces a warning. As we argue in the paper, limiting a static analysis to only consider interleaved bugs greatly reduces false positives during static concurrency analysis in the presence of an imprecise harness. We formalize interleaved bugs as a differential analysis between the original program and its sequential version and provide various techniques for finding them. Our implementation CBugs demonstrates that the scheme of finding interleaved bugs can alleviate the need to construct precise harnesses while checking real-life concurrent programs.
Article
Full-text available
Completeness is an ideal, although uncommon, feature of abstract interpretations, formalizing the intuition that, relatively to the properties encoded by the underlying abstract domains, there is no loss of information accumulated in abstract computations. Thus, complete abstract interpretations can be rightly understood as optimal. We deal with both pointwise completeness, involving generic semantic operations, and (least) fixpoint completeness. Completeness and fixpoint completeness are shown to be properties that depend on the underlying abstract domains only. Our primary goal is then to solve the problem of making abstract interpretations complete by minimally extending or restricting the underlying abstract domains. Under the weak and reasonable hypothesis of dealing with continuous semantic operations, we provide constructive characterizations for the least complete extensions and the greatest complete restrictions of abstract domains. As far as fixpoint completeness is concerned, for merely monotone semantic operators, the greatest restrictions of abstract domains are constructively characterized, while it is shown that the existence of least extensions of abstract domains cannot be, in general, guaranteed, even under strong hypotheses. These methodologies, which in finite settings give rise to effective algorithms, provide advanced formal tools for manipulating and comparing abstract interpretations, useful both in static program analysis and in semantics design. A number of examples illustrating these techniques are given.
Article
Full-text available
We construct a hierarchy of semantics by successive abstract interpretations. Starting from the maximal trace semantics of a transition system, we derive the big-step semantics, termination and nontermination semantics, Plotkin's natural, Smyth's demoniac and Hoare's angelic relational semantics and equivalent nondeterministic denotational semantics (with alternative powerdomains to the Egli-Milner and Smyth constructions), D. Scott's deterministic denotational semantics, the generalized and Dijkstra's conservative/liberal predicate transformer semantics, the generalized/total and Hoare's partial correctness axiomatic semantics and the corresponding proof methods. All the semantics are presented in a uniform fixpoint form and the correspondences between these semantics are established through composable Galois connections, each semantics being formally calculated by abstract interpretation of a more concrete one using Kleene and/or Tarski fixpoint approximation transfer theorems. Copyright 2002 Elsevier Science B.V.
Article
Full-text available
Many techniques have been developed over the years to automatically find bugs in software. Often, these techniques rely on formal methods and sophisticated program analysis. While these techniques are valuable, they can be difficult to apply, and they aren't always effective in finding real bugs. Bug patterns are code idioms that are often errors. We have implemented automatic detectors for a variety of bug patterns found in Java programs. In this extended abstract1, we describe how we have used bug pattern detectors to find serious bugs in several widely used Java applications and libraries. We have found that the effort required to implement a bug pattern detector tends to be low, and that even extremely simple detectors find bugs in real applications. From our experience applying bug pattern detectors to real programs, we have drawn several interesting conclusions. First, we have found that even well tested code written by experts contains a surprising number of obvious bugs. Second, Java (and similar languages) have many language features and APIs which are prone to misuse. Finally, that simple automatic techniques can be effective at countering the impact of both ordinary mistakes and misunderstood language features.
Article
Full-text available
We introduce abstract interpretation frameworks which are variations on the archetypal framework using Galois connections between concrete and abstract semantics, widenings and narrowings and are obtained by relaxation of the original hypotheses. We consider various ways of establishing the correctness of an abstract interpretation depending on how the relation between the concrete and abstract semantics is denned. We insist upon those correspondences allowing for the inducing of the approximate abstract semantics from the concrete one. Furthermore we study various notions of widening and narrowing as a means of obtaining convergence in the iterations used in abstract interpretation.
Conference Paper
We present a new static analysis to infer necessary field conditions for object-oriented programs. A necessary field condition is a property that should hold on the fields of a given object, for otherwise there exists a calling context leading to a failure due to bad object state. Our analysis also infers the provenance of the necessary condition, so that if a necessary field condition is violated then an explanation containing the sequence of method calls leading to a failing assertion can be produced. When the analysis is restricted to readonly fields, i.e., fields that can only be set in the initialization phase of an object, it infers object invariants. We provide empirical evidence on the usefulness of necessary field conditions by integrating the analysis into cccheck, our static analyzer for .NET. Robust inference of readonly object field invariants was the #1 request from cccheck users.
Conference Paper
Modular assertion checkers are plagued with false alarms due to the need for precise environment specifications (preconditions and callee postconditions). Even the fully precise checkers report assertion failures under the most demonic environments allowed by unconstrained or partial specifications. The inability to preclude overly adversarial environments makes such checkers less attractive to developers and severely limits the adoption of such tools in the development cycle. In this work, we propose a parameterized framework for prioritizing the assertion failures reported by a modular verifier, with the goal of suppressing warnings from overly demonic environments. We formalize it almost-correct specifications as the minimal weakening of an angelic specification (over a set of predicates) that precludes any dead code intraprocedurally. Our work is inspired by and generalizes some aspects of semantic inconsistency detection. Our formulation allows us to lift this idea to a general class of warnings. We have developed a prototype acspec, which we use to explore a few instantiations of the framework and report preliminary findings on a diverse set of C benchmarks.
Conference Paper
Previous version of a program can be a powerful enabler for program analysis by defining new relative specifications and making the results of current program analysis more relevant. In this paper, we describe the approach of differential assertion checking (DAC) for comparing different versions of a program with respect to a set of assertions. DAC provides a natural way to write relative specifications over two programs. We introduce a novel modular approach to DAC by reducing it to safety checking of a composed program, which can be accomplished by standard program verifiers. In particular, we leverage automatic invariant generation to synthesize relative specifications for pairs of loops and procedures. We provide a preliminary evaluation of a prototype implementation within the SymDiff tool along two directions (a) soundly verifying bug fixes in the presence of loops and (b) providing a knob for suppressing alarms when checking a new version of a program.
Conference Paper
Modular assertion checkers are plagued with false alarms due to the need for precise environment specifications (preconditions and callee postconditions). Even the fully precise checkers report assertion failures under the most demonic environments allowed by unconstrained or partial specifications. The inability to preclude overly adversarial environments makes such checkers less attractive to developers and severely limits the adoption of such tools in the development cycle. In this work, we propose a parameterized framework for prioritizing the assertion failures reported by a modular verifier, with the goal of suppressing warnings from overly demonic environments. We formalize it almost-correct specifications as the minimal weakening of an angelic specification (over a set of predicates) that precludes any dead code intraprocedurally. Our work is inspired by and generalizes some aspects of semantic inconsistency detection. Our formulation allows us to lift this idea to a general class of warnings. We have developed a prototype acspec, which we use to explore a few instantiations of the framework and report preliminary findings on a diverse set of C benchmarks.
Conference Paper
It is widely believed that program analysis can be more closely targeted to the needs of programmers if the program is accompanied by further redundant documentation. This may include regression test suites, API protocol usage, and code contracts. To this should be added the largest and most redundant text of all: the previous version of the same program. It is the differences between successive versions of a legacy program already in use which occupy most of a programmer's time. Although differential analysis in the form of equivalence checking has been quite successful for hardware designs, it has not received as much attention in the static program analysis community. This paper briefly summarizes the current state of the art in differential static analysis for software, and suggests a number of promising applications. Although regression test generation has often been thought of as the ultimate goal of differential analysis, we highlight several other applications that can be enabled by differential static analysis. This includes equivalence checking, semantic diffing, differential contract checking, summary validation, invariant discovery and better debugging. We speculate that differential static analysis tools have the potential to be widely deployed on the developer's toolbox despite the fundamental stumbling blocks that limit the adoption of static analysis.
Conference Paper
Assertion checking is the restriction of program verification to validity of program assertions. It encompasses safety checking, which is program verification of safety properties, like memory safety or absence of overflows. In this paper, we consider assertion checking of program parts instead of whole programs, which we call modular assertion checking. Classically, modular assertion checking is possible only if the context in which a program part is executed is known. By default, the worst-case context must be assumed, which may impair the verification task. It usually takes user effort to detail enough the execution context for the verification task to succeed, by providing strong enough preconditions. We propose a method to automatically infer sufficient preconditions in the context of modular assertion checking of imperative pointer programs. It combines abstract interpretation, weakest precondition calculus and quantifier elimination. We instantiate this method to prove memory safety for C and Java programs, under some memory separation conditions.
Conference Paper
We present a sound method for clustering alarms from static analyzers. Our method clusters alarms by discovering sound dependencies between them such that if the dominant alarm of a cluster turns out to be false (respectively true) then it is assured that all others in the same cluster are also false (respectively true). We have implemented our clustering algorithm on top of a realistic buffer-overflow analyzer and proved that our method has the effect of reducing 54% of alarm reports. Our framework is applicable to any abstract interpretation-based static analysis and orthogonal to abstraction refinements and statistical ranking schemes.
Conference Paper
We introduce Subpolyhedra (SubPoly) a new numerical abstract domain to infer and propagate linear inequalities. Subpoly is as expressive as Polyhedra, but it drops some of the deductive power to achieve scalability. Subpoly is based on the insight that the reduced product of linear equalities and intervals produces powerful yet scalable analyses. Precision can be recovered using hints. Hints can be automatically generated or provided by the user in the form of annotations. We implemented Subpoly on the top of Clousot, a generic abstract interpreter for .Net. Clousot with Subpoly analyzes very large and complex code bases in few minutes. Subpoly can efficiently capture linear inequalities among hundreds of variables, a result well-beyond state-of-the-art implementations of Polyhedra.
Conference Paper
While a typical software component has a clearly specified (static) interface in terms of the methods and the input/output types they support, information about the correct sequencing of method calls the client must invoke is usually undocumented. In this paper, we propose a novel solution for automatically extracting such temporal specifications for Java classes. Given a Java class, and a safety property such as "the exception --- Java Interface Synthesis Tool---and demonstrate that the tool can construct interfaces accurately and efficiently for sample Java2SDK library classes.
Conference Paper
Program verification is a promising approach to improving program quality, because it can search all possible program executions for specific errors. However, the need to formally describe correct behavior or errors is a major barrier to the widespread adoption of program verification, since programmers historically have been reluctant to write formal specifications. Automating the process of formulating specifications would remove a barrier to program verification and enhance its practicality.This paper describes , a machine learning approach to discovering formal specifications of the protocols that code must obey when interacting with an application program interface or abstract data type. Starting from the assumption that a working program is well enough debugged to reveal strong hints of correct protocols, our tool infers a specification by observing program execution and concisely summarizing the frequent interaction patterns as state machines that capture both temporal and data dependences. These state machines can be examined by a programmer, to refine the specification and identify errors, and can be utilized by automatic verification tools, to find bugs.Our preliminary experience with the mining tool has been promising. We were able to learn specifications that not only captured the correct protocol, but also discovered serious bugs.
Article
How Coverity built a bug-finding tool, and a business, around the unlimited supply of bugs in software systems.
Article
So-called “guarded commands” are introduced as a building block for alternative and repetitive constructs that allow nondeterministic program components for which at least the activity evoked, but possibly even the final state, is not necessarily uniquely determined by the initial state. For the formal derivation of programs expressed in terms of these constructs, a calculus will be be shown.
Conference Paper
We present a generic framework for the automatic and modular inference of sound class invariants for class-based object oriented languages. The idea is to derive a sound class invariant as a conservative abstraction of the class semantics. In particular we show how a class invariant can be characterized as the solution of a set of equations extracted from the program source. Once a static analysis for the method bodies is supplied, a solution for the former equation system can be iteratively computed. Thus, the class invariant can be automatically inferred.
Permissive interfaces
  • A Thomas
  • Ranjit Henzinger
  • Rupak Jhala
  • Majumdar
Chris Hawblitzel, Differential assertion checking
  • K Shuvendu
  • Kenneth L Lahiri
  • Rahul Mcmillan
  • Sharma