ArticlePDF Available

Abstract

We present SIDE, a Semantic Integrated Development Environment. SIDE uses static analysis to enrich existing IDE features and also adds new features. It augments the way existing compilers find syntactic errors - in real time, as the programmer is writing code without execution - by also finding semantic errors, e.g., arithmetic expressions that may overflow. If it finds an error, it suggests a repair in the form of code - e.g., providing an equivalent yet non-overflowing expression. Repairs are correct by construction. SIDE also enhances code refactoring (by suggesting precise yet general contracts), code review (by answering what-if questions), and code searching (by answering questions like "find all the callers where x SIDE is built on the top of CodeContracts and the Roslyn CTP. CodeContracts provide a lightweight and programmer-friendly specification language. SIDE uses the abstract interpretation-based CodeContracts static checker (cccheck/Clousot) to obtain a deep semantic understanding of what the program does.
A Semantic Integrated Development Environment
Francesco Logozzo, Michael Barnett,
Manuel F
¨
andrich
Microsoft Research
{logozzo, mbarnett, maf} mtfro oi cs . o@m c
Patrick Cousot Radhia Cousot
ENS, CNRS, INRIA, NYU CNRS, ENS, INRIA
pcousot s e. ud.unymic@ rcousot . rfsne@
Abstract
We present SIDE, a Semantic Integrated Development En-
vironment. SIDE uses static analysis to enrich existing IDE
features and also adds new features. It augments the way ex-
isting compilers find syntactic errors in real time, as the
programmer is writing code without execution by also
finding semantic errors, e.g., arithmetic expressions that may
overflow. If it finds an error, it suggests a repair in the form
of code e.g., providing an equivalent yet non-overflowing
expression. Repairs are correct by construction. SIDE also
enhances code refactoring (by suggesting precise yet general
contracts), code review (by answering what-if questions),
and code searching (by answering questions like find all
the callers where x < y”).
SIDE is built on the top of CodeContracts and the Roslyn
CTP. CodeContracts provide a lightweight and programmer-
friendly specification language. SIDE uses the abstract
interpretation-based CodeContracts static checker (cccheck/
Clousot) to obtain a deep semantic understanding of what
the program does.
Categories and Subject Descriptors D. Software [D.3
Programmimg Languages]: D.3.3 Language Constructs and
Features; F. Theory of Computation [F.3 Logics and mean-
ings of Programs]: F.3.1 Specifying and Verifying and Rea-
soning about Programs, F.3.2 Semantics of Programming
Languages; I. Computing Methodologies [I.2 Artificial In-
telligence ]: I.2.2 Automatic Programming
General Terms Design, Documentation, Experimentation,
Human Factors, Languages, Reliability, Verification.
Keywords Abstract interpretation, Design by contract, In-
tegrated Development Enviroment, Method extraction, Pro-
gram Repair, Program transformation, Refactoring, Static
analysis
Copyright is held by the author/owner(s).
SPLASH’12, October 19–26, 2012, Tucson, Arizona, USA.
ACM 978-1-4503-1563-0/12/10.
1. Introduction
Integrated Development Environments (IDEs) provide a co-
hesive view of the software development environment in
which many tools are unified under a common and uniform
user interface. The ultimate goal of an IDE is to assist and
improve programmer productivity by simplyfing and ratio-
nalizing program development. Routinely, IDEs include a
source editor, build automation tools, debuggers and pro-
filers. Modern IDEs, like Eclipse or Visual Studio, provide
additional functionalities like real-time compilation, type
checking, IntelliSense, refactoring, class browsers, quick
fixes for compile-time errors, etc. Existing IDEs have only
a very partial and syntactical understanding of the program.
We believe that in order to provide further value to the pro-
grammer the IDEs should get a deeper, more semantic un-
derstanding of what the program does. In the demo we show
a working prototype of a Semantic Integrated Development
Environment (SIDE).
2. SIDE
SIDE is a smart programmer assistant. It statically analyzes
the program in real time, while the programmer is develop-
ing it. Unlike similar program verification tools, our static
analysis infers loop invariants, significantly reducing the an-
notation burden. The information gathered by the static anal-
ysis is used to verify the absence of common runtime errors
(e.g., division by zero, arithmetic overflows, null pointer ex-
ceptions, and buffer overruns) as well as user-provided as-
sertions and contracts [1].
If SIDE detects a potential runtime error, it suggests a
fix in the form of code. The suggested fix is valid in that it
guarantees that no good execution is removed: only bad ones
are [7]. Since the fix is based on a static analysis, SIDE can
suggest fixes for partial or even syntactically incorrect pro-
grams. No test runs are needed. Examples of fixes include
object and constant initializations, arithmetic overflows, ar-
ray indexing, wrong guards, missing contracts e.g., pre-
conditions [4].
SIDE helps the programmer in other common tasks, such
as refactoring. For instance, when the programmer extracts a
method, SIDE proposes a contract (precondition, postcondi-
tion) for the extracted method [5]. The proposed contract is
valid, safe, complete, and general. In particular, complete-
ness implies that the contract is precise (strong) enough
to carry on the proof in the method from which the code
was extracted. Generality guarantees that the contract can be
called from other calling contexts i.e., it does not just project
of the state of the analyzer, which encodes the local context
of the extracted method.
SIDE exploits the inferred semantic information to an-
swer non-trivial queries on the program execution. For in-
stance, SIDE supports what-if scenarios: The programmer
adds extra-assumptions on the program state at some points
and then she asks, e.g., if some program point is reach-
able, or a certain property holds. The assumption and the
queries are arbitrary Boolean expressions in the target lan-
guage. SIDE enables semantic search, too. The programmer
can ask if a certain method is invoked in a certain state. Ex-
amples of semantic searches are callers such that: x 6= null,
a.f > b.c + 1, or a Boolean combination thereof. Overall,
the semantic queries targets common scenarios in the code-
reviewing phases.
3. The Architecture
Our target language is C# or VB, the two most popular .NET
languages. We implemented SIDE on the top of the Roslyn
CTP and of CodeContracts. The Roslyn CTP exposes the
VB and C# compilers as services. We leverage Roslyn for
the user interaction, e.g., the squiggles for warnings and
the previews for applying fixes, as well as to get basic ser-
vices as “standard” refactoring. We use the CodeContracts
API as the specification language for the preconditions, post-
conditions and object invariants. The CodeContracts API is
a standard part of .NET. The CodeContracts static checker
(cccheck [6]) is the underlying semantic inference and rea-
soning engine for SIDE. cccheck is a static analyzer based
on abstract interpretation [3]. To enable real-time analy-
sis, cccheck drawes on a SQL database to cache the analy-
sis results, so that unmodified code is not re-analyzed. Code-
Contracts has been publically available for 3 years and has
been downloaded more than 60,000 times.
4. The Demo
We show how SIDE acts as a smart programmer assistant,
quickly catching tricky bugs, explaining them, and propos-
ing fixes. In particular we show how the interaction is very
natural for the user, despite the complex analyses and rea-
soning performed underneath.
In the first part of the demo, we code an Insert method,
which inserts an element into a list represented as an array.
SIDE points out several errors in a trivial implementation (a
buffer overrun and a null dereference) and it proposes some
preconditions to fix them. Then we add some code to resize
the array when an insertion into a full array occurs. SIDE
points out that the new code is unreached. Once the bug is
fixed, it finds some other weaknesses in the code: an arith-
metic overflow and a buffer overrun. In both cases it suggests
a code repair — actually more than one: we will see and dis-
cuss in the demo that there are several different ways of fix-
ing a program. In the case of the buffer overrun, we use the
query system of SIDE to understand the origin of the warn-
ing (“what happens when ...”). Then we apply one of the
(non-trivial) fixes proposed by SIDE. Finally, we realize that
the code for resizing is more general than the usage made in
the Insert body. Therefore we decide to refactor it into a
new method. SIDE generates a new method, Resize, and
the corresponding contracts. In particular: (i) the inferred
precondition is more general than the simple projection of
the original abstract state, enabling more calling contexts;
(ii) the inferred postcondition is strong enough to ensure the
safety in the refactored Insert method, i.e., no imprecision
is introduced by the assume/guarantee reasoning. We con-
clude this part of the demo by asking SIDE some semantic
queries (e.g., which callers insert an empty string into the
list?”).
In the second part of the demo, we consider a slightly
more complicated example, a buggy implementation of the
binary search algorithm. Discovering the bug(s) and present-
ing the fixes require the analysis to perform complex reason-
ing, e.g., inferring a complex loop invariant. However, we
will show how all this machinery is totally transparent to
the user. For instance we show how SIDE naturally suggests
a (verified!) repair for the famous Java arithmetic overflow
bug [2].
5. Presenters
F. Logozzo is a researcher in the RiSE group at MSR
Redmond. He is the co-author of the CodeContracts static
checker and of SIDE. His main interests are abstract inter-
pretation, program analysis, optimization, and verification.
References.
[1] M. Barnett, M. F
¨
ahndrich, and F. Logozzo. Embedded contract
languages. In SAC’10, pages 2103–2110. ACM, 2010.
[2] J. Bloch. Nearly all binary searches and mergesorts are broken,
2008. http://googleresearch.blogspot.com/2006/06/extr
a-extra-read-all-about-it-nearly.html.
[3] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice
model for static analysis of programs by construction or approxi-
mation of fixpoints. In POPL, pages 238–252, 1977.
[4] P. Cousot, R. Cousot, and F. Logozzo. Contract precondition infer-
ence from intermittent assertions on collections. In VMCAI, pages
150–168, 2011.
[5] P. Cousot, R. Cousot, F. Logozzo, and M. Barnett. An abstract
interpretation framework for refactoring with application to extract
methods with contracts. In OOPSLA, 2012.
[6] M. F
¨
ahndrich and F. Logozzo. Static contract checking with abstract
interpretation. In FoVeOOS, pages 10–30, 2010.
[7] F. Logozzo and T. Ball. Modular and verified repairs. In OOPSLA,
2012.
... cccheck [107,155] is an example of general purpose, modular, precise, and efficient analyzer relying on the use of abstract code contracts. Being based on an intermediate language used by compilers it is applicable to several different programming languages that compile to this intermediate language. ...
Article
Full-text available
interpretation is a theory of abstraction and constructive approximation of the mathematical structures used in the formal description of complex or infinite systems and the inference or verification of their combinatorial or undecidable properties. Developed in the late seventies, it has been since then used, implicitly or explicitly, to many aspects of computer science (such as static analysis and verification, contract inference, type inference, termination inference, model-checking, abstraction/refinement, program transformation (including watermarking, obfuscation, etc), combination of decision procedures, security, malware detection, database queries, etc) and more recently, to system biology and SAT/SMT solvers. Production-quality verification tools based on abstract interpretation are available and used in the advanced software, hardware, transportation, communication, and medical industries. The talk will consist in an introduction to the basic notions of abstract interpretation and the induced methodology for the systematic development of sound abstract interpretation-based tools. Examples of abstractions will be provided, from semantics to typing, grammars to safety, reachability to potential/definite termination, numerical to protein-protein abstractions, as well as applications (including those in industrial use) to software, hardware and system biology. This paper is a general discussion of abstract interpretation, with selected publications, which unfortunately are far from exhaustive both in the considered themes and the corresponding references.
Article
Contracts are a popular tool for specifying the functional behavior of software. This paper characterizes the contracts that developers write, the contracts that developers could write, and how a developer reacts when shown the difference. This paper makes three research contributions based on an investigation of open-source projects' use of Code Contracts. First, we characterize Code Contract usage in practice. For example, approximately three-fourths of the Code Contracts are basic checks for the presence of data. We discuss similarities and differences in usage across the projects, and we identify annotation burden, tool support, and training as possible explanations based on developer interviews. Second, based on contracts automatically inferred for four of the projects, we find that developers underutilize contracts for expressing state updates, object state indicators, and conditional properties. Third, we performed user studies to learn how developers decide which contracts to enforce. The developers used contract suggestions to support their existing use cases with more expressive contracts. However, the suggestions did not lead them to experiment with other use cases for which contracts are better-suited. In support of the research contributions, the paper presents two engineering contributions: (1) Celeriac, a tool for generating traces of .NET programs compatible with the Daikon invariant detection tool, and (2) Contract Inserter, a Visual Studio add-in for discovering and inserting likely invariants as Code Contracts.
Conference Paper
We study the problem of suggesting code repairs at design time, based on the warnings issued by modular program verifiers. We introduce the concept of a verified repair, a change to a program's source that removes bad execution traces while increasing the number of good traces, where the bad/good traces form a partition of all the traces of a program. Repairs are property-specific. We demonstrate our framework in the context of warnings produced by the modular cccheck (a.k.a. Clousot) abstract interpreter, and generate repairs for missing contracts, incorrect locals and objects initialization, wrong conditionals, buffer overruns, arithmetic overflow and incorrect floating point comparisons. We report our experience with automatically generating repairs for the .NET framework libraries, generating verified repairs for over 80% of the warnings generated by cccheck.
Article
Full-text available
Method extraction is a common refactoring feature provided by most modern IDEs. It replaces a user-selected piece of code with a call to an automatically generated method. We address the problem of automatically inferring contracts (precondition, postcondition) for the extracted method. We require the inferred contract: (a) to be valid for the extracted method (validity); (b) to guard the language and programmer assertions in the body of the extracted method by an opportune precondition (safety); (c) to preserve the proof of correctness of the original code when analyzing the new method separately (completeness); and (d) to be the most general possible (generality). These requirements rule out trivial solutions (e.g., inlining, projection, etc). We propose two theoretical solutions to the problem. The first one is simple and optimal. It is valid, safe, complete and general but unfortunately not effectively computable (except for unrealistic finiteness/decidability hypotheses). The second one is based on an iterative forward/backward method. We show it to be valid, safe, and, under reasonable assumptions, complete and general. We prove that the second solution subsumes the first. All justifications are provided with respect to a new, set-theoretic version of Hoare logic (hence without logic), and abstractions of Hoare logic, revisited to avoid surprisingly unsound inference rules. We have implemented the new algorithms on the top of two industrial-strength tools (CCCheck and the Microsoft Roslyn CTP). Our experience shows that the analysis is both fast enough to be used in an interactive environment and precise enough to generate good annotations.
Article
Full-text available
In the context of program design by contracts, programmers often insert assertions in their code to be optionally checked at runtime, at least during the debugging phase. These assertions would better be given as a precondition of the method/procedure in which they appear. Potential errors would be discovered earlier and, more importantly, the precondition could be used in the context of separate static program analysis as part of the abstract semantics of the code. However in the case of collections (data structures such as arrays, lists, etc) checking both the precondition and the assertions at runtime appears superfluous and costly. So the precondition is often omitted since it is checked anyway at runtime by the assertions. It follows that the static analysis can be much less precise, a fact that can be difficult to understand since "the precondition and assertions are equivalent" (i.e. at runtime, up to the time at which warnings are produced, but not statically) e.g. for separate static analysis. We define precisely and formally the contract inference problem from intermittent assertions on scalar variables and elements of collections inserted in the code by the programmer. Our definition excludes no good run even when a non-deterministic choice (e.g. an interactive input) could lead to a bad one. We then introduce new abstract interpretation-based methods to automatically infer both the static contract precondition of a method/procedure and the code to check it at runtime on scalar and collection variables.
Conference Paper
Full-text available
We present an overview of Clousot, our current tool to statically check CodeContracts. CodeContracts enable a compiler and language-independent specification of Contracts (precondition, postconditions and object invariants). Clousot checks every method in isolation using an assume/guarantee reasoning: For each method under analysis Clousot assumes its precondition and asserts the postcondition. For each invoked method, Clousot asserts its precondition and assumes the postcondition. Clousot also checks the absence of common runtime errors, such as null-pointer errors, buffer or array overruns, divisions by zero, as well as less common ones such as checked integer overflows or floating point precision mismatches in comparisons. At the core of Clousot there is an abstract interpretation engine which infers program facts. Facts are used to discharge the assertions. The use of abstract interpretation (vs usual weakest precondition-based checkers) has two main advantages: (i) the checker automatically infers loop invariants letting the user focus only on boundary specifications; (ii) the checker is deterministic in its behavior (which abstractly mimics the flow of the program) and it can be tuned for precision and cost. Clousot embodies other techniques, such as iterative domain refinement, goal-directed backward propagation, precondition and postcondition inference, and message prioritization.
Conference Paper
Full-text available
Programmers often insert assertions in their code to be optionally checked at runtime, at least during the debugging phase. In the context of design by contracts, these assertions would better be given as a precondition of the method/procedure which can detect that a caller has violated the procedure’s contract in a way which definitely leads to an assertion violation (e.g., for separate static analysis). We define precisely and formally the contract inference problem from intermittent assertions inserted in the code by the programmer. Our definition excludes no good run even when a non-deterministic choice (e.g., an interactive input) could lead to a bad one (so this is not the weakest precondition, nor its strengthening by abduction, since a terminating successful execution is not guaranteed). We then introduce new abstract interpretation-based methods to automatically infer both the static contract precondition of a method/procedure and the code to check it at runtime on scalar and collection variables.
Conference Paper
Full-text available
Specifying application interfaces (APIs) with information that goes beyond method argument and return types is a long-standing quest of programming language researchers and practitioners. The number of type system extensions or specification languages is a testament to that. Unfortunately, the number of such systems is also roughly equal to the number of tools that consume them. In other words, every tool comes with its own specification language. In this paper we argue that for modern object-oriented languages, using an embedding of contracts as code is a better approach. We exemplify our embedding of Code Contracts on the Microsoft managed execution platform (.NET) using the C# programming language. The embedding works as well in Visual Basic. We discuss the numerous advantages of our approach and the technical challenges, as well as the status of tools that consume the embedded contracts.
Conference Paper
Full-text available
A program denotes computations in some universe of objects. Abstract interpretation of programs consists in using that denotation to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -1515 * 17 may be understood to denote computations on the abstract universe {(+), (-), (±)} where the semantics of arithmetic operators is defined by the rule of signs. The abstract execution -1515 * 17 → -(+) * (+) → (-) * (+) → (-), proves that -1515 * 17 is a negative number. Abstract interpretation is concerned by a particular underlying structure of the usual universe of computations (the sign, in our example). It gives a summary of some facets of the actual executions of a program. In general this summary is simple to obtain but inaccurate (e.g. -1515 + 17 → -(+) + (+) → (-) + (+) → (±)). Despite its fundamentally incomplete results abstract interpretation allows the programmer or the compiler to answer questions which do not need full knowledge of program executions or which tolerate an imprecise answer, (e.g. partial correctness proofs of programs ignoring the termination problems, type checking, program optimizations which are not carried in the absence of certainty about their feasibility, …).
Modular and verified repairs
  • F Logozzo
  • T Ball
F. Logozzo and T. Ball. Modular and verified repairs. In OOPSLA, 2012.
Nearly all binary searches and mergesorts are broken
  • J Bloch
J. Bloch. Nearly all binary searches and mergesorts are broken, 2008. http://googleresearch.blogspot.com/2006/06/extr a-extra-read-all-about-it-nearly.html.