ArticlePDF Available

TACO: Efficient SAT-Based Bounded Verification Using Symmetry Breaking and Tight Bounds

Authors:

Figures

Content may be subject to copyright.
TACO: Efficient SAT-Based Analysis of Annotated Code by
Automated Computation of Tight Bounds for Fields
Juan P. Galeotti Nicol´
as Rosner Carlos G. L´
opez Pombo Marcelo F. Frias
Department of Computer Science, FCEyN, Universidad de Buenos Aires, Argentina
{jgaleotti, nrosner, clpombo, mfrias}@dc.uba.ar
Abstract
SAT-based analysis of annotated code consists on translating the
annotated code to a propositional formula, and analyzing the for-
mula for specification violations using a SAT-solver. If a violation
is found, an execution trace exposing the error is exhibited. Code
handling data structures with intricate invariants is particularly hard
to be analyzed using these techniques.
In this article we present TACO, our prototype tool implement-
ing a novel methodology for the analysis of JML-annotated Java
sequential programs dealing with complex data structures. We in-
strument the analysis of code with a symmetry breaking predicate
that allows the parallel, automated computation of tight bounds for
Java fields. Experiments show that the translations to propositional
formulas require significantly less propositional variables, leading
to an efficiency improvement of the analysis of orders of magni-
tude. Moreover, we show that our tool can uncover bugs that go
undetected by state-of-the-art tools based on model checking or
SMT-solving.
1. Introduction
SAT-based analysis of code allows to statically find failures in soft-
ware. This requires appropriately translating the original piece of
software, as well as some assertion to be verified, to a propositional
formula. The use of a SAT-solver then allows to find a valuation for
the propositional variables that encodes a failure: a valid execution
trace of the system that violates the given assertion. This procedure
for SAT-based analysis of code has known limitations. As we will
see in Section 2, the translation of annotated code we use requires
to limit the size of data domains. Therefore, only a portion of the
program domain is actually analyzed. Fortunately, this is sufficient
to expose many failures, since failures can often be reproduced with
few data [1].
In the presence of contracts for called methods, interprocedu-
ral SAT-based analysis can be done by first replacing the calls in a
method by the corresponding contracts, and then analyzing the re-
sulting code. This is the approach followed for instance in [8]. One
A very preliminary version of this article was presented at Automatic
Program Verification (APV) 2009. No formal proceedings of that workshop
were published.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
date, City.
Copyright c
2005 ACM [to be supplied].. .$5.00
1L head
(a)
Lhead 1
next
x
(b)
Figure 1. (a) A feasible model. (b) Un infeasible model.
important limitation remains at the intraprocedural level, where the
code for a single method (already including the contracts for called
methods) has to be analyzed. Code involving data structures with
rich invariants (such as circular lists, red-black trees, AVL trees or
binomial heaps) is hard to analyze using these techniques.
In this article we present TACO (Translation of Annotated
COde), our prototype tool implementing a novel and fully auto-
matic technique for SAT-based analysis of sequential annotated
code involving complex data structures. This technique relies on
a novel and effective way of computing tight bounds for relations
modeling Java fields. In Section 4 we will present experimental re-
sults showing that the technique we present sets a new standard for
SAT-based intraprocedural program analysis and allows to uncover
bugs that cannot be detected using state-of-the-art tools based on
model checking or SMT-solving. Moreover, the technique is im-
plemented in our prototype tool TACO (Translation of Annotated
COde).
In order to describe the technique at a high level of abstraction
let us consider the following class for singly-linked lists:
public class List { public class LNode {
LNode head; LNode next;
} int key;
}
We also require the following invariant to hold:
Lists do not contain repeated key values, and key values
are increasingly ordered.
For the sake of simplifying the presentation, let us assume that type
int has values 1,...,10. Is it posible to have a list whose head
node has key value 1?. The answer is “yes”. Figure 1.(a) gives
an example. Is it possible to have a list whose second node has
key value 1?. The answer is “no”. If we attempt to fill the missing
value in Fig. 1.(b), the ordering constraint in the invariant forces
x1. Since we are considering integers in the range 1,...,10, it
must be x1. Therefore, xmust be 1. But x= 1 is prevented
from the lack of repeated key values granted by the invariant. This
implies that no list exists whose key value in the second node is
1. Actually, a similar reasoning allows us to conclude that 1can
only be the key value for the head node. Moreover, a key value k
cannot occur beyond the k-th node. The translation from code to
a SAT problem (to be discussed in Section 2) uses a propositional
variable phead,1to model wether the head node can take the key
JML-Java
+
Analysis scope
Alloy
annotation
DynAlloy
program
DynAlloy
Model
Alloy
Model
KodKod
Model
SAT
Formula
Scope
JMLToAlloyTranslation JavaToDynAlloyTranslation
Join
DynAlloyToAlloyTranslator
AlloyToKodKodTranslator
KodKodToSATTranslator
Bounds
repository
getInvariant
getBoundForInv
Figure 2. Translating annotated code to SAT using TACO.
value 1. Another variable phead.next,1models wether the second
node can take the key value 1. It is clear that there are heaps for
which variable phead,1will be true (for instance one containing
the list in Fig. 1.(a)), and others for which variable phead,1will be
false (think of a list whose key value in the head node is 2). On the
other hand, variable phead.next,1will always be false. Notice that
the same holds for those variables that describe the infeasible list
configurations just discussed. But if we know that a propositional
variable must be false, it is not necessary to ask the SAT-solver
to find its value. That would require the SAT-solver to perform
a (possibly costly) task whose outcome we knew in advance. In
this article we present a fully automatic and effective method for
discovering such infeasible field values. The translation to a SAT
problem then removes the corresponding propositional variables,
leading to a SAT formula that is easier to analyze.
The article is organized as follows. In Section 2 we describe
the translation of JML-annotated sequential Java code to a SAT
problem. In Section 3 we present our novel technique for program
analysis. In Section 4 we present abundant experimental results.
In Section 5 we discuss related work. Finally, in Section 6 we
discuss lines for further work and draw conclusions about the
results presented in the article.
2. Translating JML-annotated Java Code to SAT
In this section we present an outline of our translation of JML [10]
annotated Java code to a SAT problem. We will present just enough
detail to make the remaining sections self-contained. The transla-
tion is in intention not much different from translations previously
presented by other authors [9] or by some of the authors of this ar-
ticle [13]. A schematic description of the translation is provided in
Fig. 2.
Our translation uses Alloy [16] as an intermediate language (to
be introduced in Section 2.1). This is an appropriate decision be-
cause Alloy is close to JML, and the Alloy Analyzer [16] provides
a simple interface to several SAT-solvers. Also, Java code can be
translated to DynAlloy programs [11]. DynAlloy [11], an exten-
sion of Alloy to be described in Section 2.2, allows to specify ac-
tions that modify the state much the same as Java statements do, and
whose behavior is specified by pre and post conditions given as Al-
loy formulas. From these atomic actions it is possible to build com-
plex DynAlloy programs modeling sequential Java code. Finally, in
Section 2.3 we describe the translation from JML-annotated Java
code to a SAT problem.
2.1 Alloy Through an Example
Alloy is a language for writing models of software. It is possible
to define data domains (called signatures), and operations on such
data. Below we present signature definitions for linked list-like
structures.
sig Data { }
one sig null {}
sig LNode {
key : Data,
next : LNode + null }
sig List {
head : LNode + null }
sig sizedList extends List {
size : Int }
Signature Data introduces atoms that have no particular struc-
ture. Similarly for signature null, but the modifier one permits ex-
actly one atom in this signature. Akin to classes in object orien-
tation, Alloy signatures may contain fields. Field key allows us to
retrieve a datum stored in a node. Similarly, field next maps a node
to the next node in the list (or to the value null to mark the end of
the list). According to Alloy’s semantics, signatures are viewed as
sets, and fields are viewed as relations from the signature in which
they are defined to the signature in the field’s range. For example,
if Data and LNode are the domains associated to signatures Data
and LNode, then, the semantic interpretation of field key is a func-
tional relation key :LNode Data. Inheritance between Alloy
signatures is expressed using the keyword extends. Objects in sig-
nature sizedList form a subset of the objects in signature List,
and have an extra field (size) that returns (for an input list) an
integer value.
From signatures and fields, and using the Alloy operations, we
can build terms that denote relations. Signatures and fields are the
building blocks. There are three constants in the language: univ
(which denotes the set of all atoms in the universe), none (that
denotes the empty set), and iden (which denotes the binary identity
relation over the atoms in univ). If Tis a term that denotes a binary
relation, then T,Tand ˆTdenote transposition, reflexive-
transitive closure and transitive closure of the relation denoted by
T. Union of relations is noted as +, intersection as &, difference as
, and sequential composition as “.”. For instance, the expression
head.*next relates each input list to the nodes in the list or the
value null if it ends the list (notice that, so far, nothing prevents
these lists from being cyclic). Cartesian product is denoted ->.
Thus, by a->bwe denote the relation { ha, bi }. Notation R:A
-> one B then states that Ris a binary relation in A×B. The
modifier “one” makes the relation a total function. Using instead
the modifier “lone”, yields a partial function.
From terms we build atomic formulas of the form
T1in T2or T1=T2,
which assert that relation T1is contained in relation T2, and that
T1and T2are the same relation, respectively. From atomic formu-
las we build complex formulas using the connectives !(negation),
&& (conjunction), || (disjunction) and =>(implication). Existen-
tially quantified formulas have the form “some x:S|α”, where x
ranges over the elements in signature S, and αis a formula. Simi-
larly, universally quantified formulas have the form “all x:S|α”.
We can constrain atoms and fields in signatures using formu-
las. These axioms are called facts in Alloy. The following is, for
instance, a useful fact when modeling lists:
fact Acyclic { all l : List, n : LNode |
n in l.head.*next => n !in n.^next }
In order to ease notation, Alloy allows to define predicates and
functions. For instance, the following function length computes
the length of a list (using the Alloy function #to retrieve the size
of a set), while the binary predicate sameLength asserts that two
lists have the same length:
fun length[l : List] : Int { #(l.head.*next - null) }
pred sameLength[l1, l2 : List]{ length[l1] = length[l2] }
One of the attractive features of Alloy is the possibility of
automatically analyzing Alloy models using the Alloy Analyzer
[16]. Therefore, models can include asserts to be checked by the
Alloy Analyzer. For instance, the assertion
assert sameLengthImpliesSameList { all l1, l2 : List |
sameLength[l1,l2] implies l1 = l2 }
is an assertion that is clearly false. Adding a check command of the
form
check sameLengthImpliesSameList
for 3 List, 5 Data, 5 LNode
gives the Alloy Analyzer instructions about the scopes to be used
for domains. In this case, the Alloy Analyzer will use up to 3 lists,
and up to 5 objects of types Data and LNode in the counterexample.
We will briefly discuss the translation from Alloy models to SAT
problems in Section 2.3. For a thorough description of the Alloy
language, see [16].
2.2 A Brief Introduction To DynAlloy
DynAlloy [11] is an extension of Alloy conceived for modeling and
analyzing actions specified through pre and post-conditions written
in Alloy. From these atomic actions, new, more complex actions,
can be constructed using action combinators. If
A[x1:T1,...,xn:Tn]
and
B[x1:T1,...,xn:Tn,y1:T1,...,yn:Tn]
are Alloy formulas, an atomic action “atomic” is declared through
a triple
action atomic[x1:T1,...,xn:Tn]
pre = { A[x1,...,xn] }
post = { B[x1,...,xn,x1’,...,xn’] }
Primed variable xi
0in the post condition refers to the value
of variable xiupon action termination. As an example, action
varAssign below models assignment of a value to a variable:
action VarAssign[v1, v2 : C]
pre = { true }
post = { v1’ = v2 }
Since this atomic action is used often, we will use the more pro-
grammatic notation v1 := v2. Actions denote state (variable val-
uations) transitions. Atomic actions relate those pairs of valuations
hv, v0iin which vsatisfies the precondition, and v0satisfies the
postcondition. As a frame condition, those variables that do not oc-
cur primed in the postcondition are assumed to retain their original
value. Given actions A1and A2,A1+A2stands for nondetermin-
istic choice of the actions. Action A1;A2stands for sequential
composition. Action A1stands for the reflexive-transitive closure
(finite iteration) of A1. Given an Alloy formula α,α?is a test (also
called assume) action, that returns the input state if it satisfies α,
and halts, otherwise. In order to keep DynAlloy actions modular,
DynAlloy programs can be defined using the following syntax:
program progName [v1:T1,...,vk:Tk]
var [x1:T,...,xn:Tn]
{ action }
The var clause allows to introduce local variables for the action.
The formal semantics of DynAlloy is thoroughly discussed in [12].
Like Alloy, DynAlloy was designed with the aim of being an
analyzable language. Where Alloy models include asserts, DynAl-
loy models include partial correctness assertions. An assertion of
the form
assertCorrectness nameCorrect [v1:T1,...,vk:Tk]{
pre = { alpha[v1,...,vk] }
program = { name[v1,...,vk] }
post = { beta[v1,...,vk,v1’,...,vk’] }
}
asserts that action name, when executed on states that satisfy
alpha, necessarily ends (if it terminates) in states that satisfy beta.
Notice that using the DynAlloy operators it is possible to trans-
late the Java control flow constructs as follows (predicates are Alloy
formulas):
T(while (pred) { stmt }) -> (pred? ; T(stmt))*;(!pred)?,
T(stmt1 ; stmt2) -> T(stmt1) ; T(stmt2),
T(if (pred) stmt1 else stmt2) ->
(pred?;T(stmt1)) + ((!pred)?;T(stmt2)).
As in Alloy, a check statement of the form
check nameCorrect for scopes
is used by the DynAlloyToAlloyTranslator [11] in order to translate
the DynAlloy model and the assertion to an Alloy model with its
associated Alloy assertion. If instead we write a run statement of
the form
run program for scopes
the Alloy Analyzer will look for an execution of the given program.
The techniques behind the translation are described in [12]. It is
worth emphasizing that, as in Alloy, the analysis of DynAlloy
models is partial, but complete within the scopes constraining the
sizes of data domains and the number of unrolls performed on the
* (iteration) operator.
2.3 The Translation Through an Example
Let us consider the following JML-annotated classes for list nodes
and singly linked lists.
public class LNode extends Object {
LNode next;
int key;
}
public class List extends Object {
/*@
@ invariant (\forall LNode n;
@ \reach(this.head, LNode, next).has(n);
@ !\reach(n.next, LNode, next).has(n));
@*/
LNode head;
/*@
@ ensures (\exists LNode n;
@ \reach(this.head, LNode, next).has(n) &&
@ n.val==x) <==> \result == true;
@*/
boolean find(int x) {
LNode current;
boolean output;
current = this.head;
output = false;
while (output==false && current!=null) {
if (current.val == x) {
output = true; }
current = current.next;
}
return output;
}
}
Prior to the translation of annotated code, we will model the
Java class hierarchy in Alloy. The Alloy signature hierarchy for the
example is given by
sig Object {}
one sig null {}
sig List extends Object {}
sig LNode extends Object {}
sig Throwable extends Object {}
sig Exception extends Throwable {}
sig RuntimeException extends Exception {}
one sig NullPointerException extends RuntimeException {}
Notice that signatures List and LNode do not have fields. In
order to handle aliasing appropriately [17], former Java fields will
be (modifiable) relational parameters of the DynAlloy actions and
programs. Keep in mind that these relations are the ones for which
we will find bounds in Section 3. Also, in the example we show
how exceptions are modeled within the class hierarchy.
Given a Java method with requires/ensures annotations (as
for example method find – where the absent requires clause
means there are no constraints on the input) two Alloy predicates
requires find and ensures find are introduced in the Dy-
nAlloy model. Moreover, an Alloy predicate List Inv is used to
model the class invariant. The (fully automatically generated) pred-
icates are:
pred requires_find[]{ True[] }
pred ensures_find[
this_L : List,
head : List -> one (LNode + null),
next : LNode -> one (LNode + null),
key : LNode -> one Int,
x : Int,
result : Boolean]{
(some n : LNode | n in this_L.head.*next && n.key=x)
iff result=True[] }
pred List_Inv[
this_L : List,
head : List -> one (LNode + null),
next : LNode -> one (LNode + null)]{
all n : LNode | n in this_L.head.*next implies
!(n in n.next.*next) }
Translation JMLToAlloyTranslation is mostly straightforward,
but the following points need to be clarified. While “this” has a
clear meaning in the context of Java and JML, in DynAlloy we
use a variable this L that must be passed as a parameter. Also,
expression \reach(l, T, [f1,...,fk]) denotes the objects of
type Treachable from location l, using fields f1,...,fk. This
is an extension of the JML syntax that simplifies modeling. It
translates to the Alloy expression l.*(f1+...+fk):>T (where
e:>T restricts the range of expression eto signature T).
Translation of Java code to DynAlloy is also straightforward.
Control flow constructs have already been translated in Section
2.2. Besides action VarAssign (see Section 2.2), we introduce an
action NewC for creation of objects of class C, as well as an action
Setf that modifies the f-value of an object o(actions NewC and
Setf will not be used in the example).
The translation of method find to DynAlloy is given by:
program find[
this_L:List, result:Boolean, x:Int,
head:List->one(LNode+null),
next:LNode->one(LNode+null),
key:LNode->one Int]{
var [current:LNode, output:Boolean]
current := this_L.head;
output := False[];
while (output=False[] && current!=null) {
if (current.key = x) {
output := True[]
};
current := current.next;
}
result := output
}
According to the architecture of TACO presented in Fig. 2, once
the contracts have been translated to Alloy and the methods to Dy-
nAlloy, we join the two translations into a single DynAlloy model
that comprises the typing information (signatures), includes the
atomic actions (for instance NewC), and also incorporates the Dy-
nAlloy partial correctness assertion. Since there is a class invariant,
besides checking that the contract is satisfied we also check that the
invariant is preserved.
assertCorrectness find[this_L:List, result:Boolean,
x:Int,
head : List -> one (LNode + null),
next : LNode -> one (LNode + null),
key : LNode -> one Int ]{
pre { List_Inv[this_L, head, next] }
program { find[this_L, result, x, head, next, key] }
post {ensures_find[this_L,head,next,key,x,result’]
&& List_Inv[this_L, head, next] }
}
Although we are not making any explicit mention of exceptions in
the code translation, we check runtime exceptions by default. The
code then gets instrumented for detecting uncaught exceptions.
DynAlloy models are translated to Alloy models using the
DynAlloyToAlloyTranslator. We will not discuss this translation
which has been already discussed in [12]. Instead, we will com-
ment on the translation from Alloy models to propositional formu-
las, since this will allow us to show in which ways the technique
we will present in Section 3 fits in the code analysis process.
Alloy models are translated to the intermediate language Kod-
Kod [23]. A distinguishing feature of KodKod is that it allows to
prescribe partial instances in models. In effect, each Alloy 4 rela-
tion Ris translated to a KodKod relation to which a lower bound
relation lRand an upper bound relation uRare attached. We will
focus on upper bounds. For example, Rcould be relation key :
LNode -> one Int from Section 1. If a tuple is missing from uR,
then that tuple cannot be part of any interpretation for R. We can
look at Ras a Boolean matrix Rwhere Ri,j =true iff hi, ji ∈ R.
The translation then begins by modeling Ras a matrix whose el-
ements are propositional variables; for each position hi, ji, a vari-
able pi,j is introduced. The translation of expressions proceeds re-
cursively by translating expressions to matrices whose entries are
propositional formulas. The details of the translation for expres-
sions, and the translation for formulas, are given in [23]. Notice
that the presence of bounds allows to fix some entries in R. For
instance, if hi0, j0i/uR,hi0, j0icannot be in Rallowing us to
replace variable pi0,j0by the truth value false. This allows to re-
move variables from the translation. Since the SAT-solving time
grows (in the worst case) exponentially on the number of variables,
getting rid of variables many times improves (as we will show in
Section 4) the analysis time significantly. In our example, determin-
ing that a pair hn, ii(for an LNode nand integer i) can be removed
from the bound ukey allows us to remove a propositional variable
in the translation process. When a tuple is removed from an upper
bound, the resulting bound is tighter. In Section 3 we concentrate
on how to determine wether a given pair can or cannot belong to an
upper bound relation.
3. Computing Tight Bounds for Fields
Complex data structures usually have complex invariants that im-
pose constraints on the topology of data and on the values that can
be stored. For instance, a red-black tree, a variety of balanced, or-
dered binary tree, requires that:
1. For each node nin the tree, the keys stored in the nodes in
the left subtree of nare always less than the key stored in
n. Similarly, the keys stored in nodes in the right subtree are
always greater than the key stored in n.
2. Nodes are colored red or black. The tree root and the null value
are colored black.
3. In any path from the root there are no two consecutive red
nodes.
4. Every path from the root to the null value has the same number
of black nodes.
In the Alloy model result of the translation, Java fields are
mapped to binary relations. For instance, field left is mapped to a
binary relation left :Node -> one (Node +null). Let us assume
that:
1. nodes come from a linearly ordered set, and
2. trees have their nodes chosen in a canonical way (for instance, a
breadth-first order traversal of the tree lists the nodes in order).
That is, given nodes N0, N1,...,Nk, node N0is the tree root,
N0.left =N1,N0.right =N2, etc. Notice that the breadth-
first ordering already imposes some constraints. For instance, it is
not possible that N0.left =N2. Moreover, if there is a node to
the left of node N0, it has to be node N1(otherwise the breadth-
first listing of nodes would be broken). At the Alloy level, this
means that hN0, N2i ∈ left is infeasible, and the same is true
for N3,...,Nkinstead of N2. Recalling the discussion at the end
of Section 2.3, this means that we can get rid of several propo-
sitional variables in the translation to a SAT problem. A similar
analysis can be made about field right and other fields. If we look
at field color : (TNode + null) -> one (Red + Black),
constraint 2 guarantees that hnull,Redi ∈ color and hN0,Redi ∈
color are both infeasible. This allows us to get rid of two propo-
sitional variables. Actually, as will be shown in Section 4, for a
scope of 10 tree nodes, this analysis allows to reduce the number
of variables from 650 to 200.
The previous reasonings strongly depend on:
1. being able to guarantee, fully automatically, that nodes are
placed in the heap in a canonical way, and
2. being able to automatically determine the infeasible values.
A predicate that reduces the number of equivalent models by
inducing a canonical ordering in the heap nodes is a “symme-
try breaking” predicate. We will present an appropriate symmetry
breaking predicate in Section 3.1. In Section 3.2 we present a fully
automatic and effective technique for checking feasibility.
3.1 A New Predicate for Symmetry Breaking in Heaps
The following predicate
pred acyclic[l : List] { all n : LNode |
n in l.head.*next => n !in n.^next }
describes acyclic lists. Running the predicate using the command
run acyclic for exactly 6 Object, exactly 1 List,
exactly 4 LNode, exactly 1 Data
yields the models depicted in Fig. 3. Notice that the list on the left
in Fig. 3 is a permutation (on signature LNode) of the one on the
Figure 3. Two symmetric (yet different for Alloy) lists.
right. Actually, any permutation of LNode that stores data in the
same order as any of these lists, is also a model. Pruning the state
space by removing permutations on signature LNode contributes
to improving the analysis time. For singly linked lists, a predicate
forcing nodes to be used in the order LNode0 LNode1
LNode2 . . . removes symmetries.
The idea of canonicalizing the heap in order to reduce symme-
tries is not new. In the context of explicit state model checking, the
articles [14; 21] present different ways of canonicalizing the heap
([14] uses a depth-first search traversal, while [21] uses a breadth-
first search traversal of the heap). The canonicalizations require
modifying the state exploration algorithms, and involve comput-
ing hash functions in order to determine the new location for heap
objects in the canonicalized heap. Notice that:
The canonicalizations are presented algorithmically (which is
not feasible in a SAT-solving context).
Computing a hash function requires operating on integer values,
which is appropriate in an algorithmic computation of the hash
values, but is not amenable for a SAT-solver.
In the context of SAT-based analysis, [19] proposes to canon-
icalize the heap, but the canonicalizations have to be provided by
the user as ad-hoc predicates depending on the invariants satisfied
by the heap.
In this section we present a novel family of predicates that
canonicalize arbitrary heaps. In order to include the predicates we
will instrument the Alloy model obtained by the translation from
the annotated source code. The predicates are automatically instan-
tiated in these Alloy models. In order to make the presentation of
the symmetry breaking predicates more amenable, we will do it
through a running example. Let us consider the Alloy model for
binary trees with information in the nodes presented in Fig. 4. We
assume we will use a run or check command where the scopes will
be:
exactly 1 Tree, exactly 5 TNode, exactly 5 Data
Our model of Java heaps consists of graphs hN, E, L, Riwhere
1. N(the set of heap nodes), is a set comprising elements from
signature Object and from appropriate value signatures (int,
String, etc.).
2. E, the set of edges, contains pairs hn1, n2i ∈ N×N.
sig Object { }
one sig null {}
sig Tree extends Object {
root : TNode + null }
sig TNode extends Object {
left : TNode + null,
value : Data + null,
right : TNode + null }
sig Data extends Object { }
Figure 4. An Alloy model for binary trees.
3. Lis the edge labeling function. It assigns class field names to
edges. An edge between nodes n1and n2labeled fimeans that
n1.fi=n2. The typing must be respected.
4. RNis the set of heap root nodes (method arguments and
static class fields, of object type).
In our running example the nodes are the objects from signa-
tures Tree, TNode and Data, or null. Labels correspond to the fields
in the model, and the root is the argument this, of type Tree.
We then instrument the Alloy model as follows. If the scope for
signature Tis k, we include singletons T0,...,Tk1. For the trees
example we have:
one sig Tree0 extends Tree {}
one sig TNode0,...,TNode4 extends TNode {}
one sig Data0,...,Data4 extends Data {}
We also introduce auxiliary functions defined as follows. Function
nextT establishes a linear order between elements of type T. Func-
tion minT returns the least object (according to the ordering nextT)
in an input subset of signature T. Function prevsT returns the nodes
in signature Tsmaller than the input parameter. For the example
(only for signature T N ode), we have:
fun nextTNode[] : TNode -> lone TNode {
TNode0->TNode1 + TNode1->TNode2 +
+ TNode2 -> TNode3 + TNode3->TNode4 }
fun minTNode [ns: set TNode] : lone TNode {
ns - ns.^(nextTNode[]) }
fun prevsTNode[n : TNode] : set TNode {
n.^(~nextTNode[]) }
Each recursive field r : S -> one (S + null) from signa-
ture Sis split into two fields fr : S -> lone (S + null) (the
forward part of the field, mapping nodes to strictly greater nodes
or null) and br : S -> lone S (the backward part of the field).
Non-recursive fields are not modified. In our example the resulting
fields are:
root : Tree -> one (TNode + null),
fleft : TNode -> lone (TNode + null),
bleft : TNode -> lone TNode,
fright : TNode -> lone (TNode + null),
bright : TNode -> lone TNode,
value : TNode -> one (Data + null).
Java fields must be total functions We add new facts stating
that for each recursive field ri, the domains of friand briform
a partition of ri’s domain, making fri+bria well–defined total
function. For our example we have:
fact { no ((fleft.univ) & (bleft.univ)) and
TNode = fleft.univ + bleft.univ and
no ((fright.univ) & (bright.univ)) and
TNode = fright.univ + bright.univ }
The instrumentation we are presenting will require us to talk
about the reachable heap objects. Since all the objects will be reach-
able through forward fields, we obtain a more economic (regarding
the translation to a propositional formula) description of the reach-
able heap objects using forward fields. In our example, instead of
using the expression
this.*(root + left + value + right)
to characterize the reachable heap objects, we will use the expres-
sion FReach defined as
this.*(root + fleft + value + fright).
We now introduce facts forcing the SAT-solver to choose nodes
in a canonical order. Intuitivelly, we will order heap nodes by
looking at their parents in the heap. A node may have no parents
(in case it is a heap root), or have several parent nodes. In the
latter case, among the parents we will distinguish the minimal one
(according to a global ordering) and will call it the min parent of
n(denoted minP[n]). For the example, functions globalMin and
minP are defined as follows:
fun globalMin[s : set Object] : Object {
s - s.^(Tree0->TNode0 + TNode0->TNode1 +...+
TNode3->TNode4 + TNode4->Data0 +
Data0->Data1 +...+ Data3->Data4)}
fun minP[o : Object] : Object {
globalMin[(root + fleft + value + fright).o] }
N1 : T N2 : T
(a)
N1 : T N3 : T'
N2 : T
(b)
N3 : T'
N1 : T N2 : T
f1 f2
(c)
N4 : T'
N2 : T
N3 : T'
N1 : T
(d)
N4 : T''
N2 : T
N3 : T'
N1 : T
(e)
Figure 5. Comparing nodes using their min-parents.
In order to determine how to order any pair of nodes from the
same signature T, we will consider the possibilities depicted in
Fig. 5. The ordering is then characterized by the conditions:
(a) In order to sort two root nodes of type T, we use the ordering
in which the formal parameters or static fields were declared in
the source Java file. Then a fact is added including the explicit
ordering information.
(b) In order to sort a root node and a non root node of the same
type, we add a fact stating that root nodes are always smaller
than non root nodes.
(c) In order to sort two nodes N1and N2of the same type such
that minP[N1] = minP[N2] = N3, notice that since Java
fields are functions, there must be two different fields f1and
f2such that N3.f1=N1and N3.f2=N2. We then use the
ordering in which the fields were declared in the source Java
file to determine which of N1and N2is the smallest.
(d) Let N1(with min parent N3) and N2(with min parent N4) be
nodes of the same type. If N3and N4are distinct and have the
same type, then we will sort N1and N2following the order
between N3and N4.
(e) Finally, in order to sort nodes N1and N2of type Twhose min
parents have different type, we will use the ordering in which
the classes of the parent nodes were defined in the source Java
file.
Since in the running example there is exactly one root node
(namely, this), we statically and automatically determine it is not
necessary to add a fact for condition (a). Since there is exactly
one object in class Tree, we do not add a fact for condition (b).
Regarding condition (c), since there is only one field from signature
Tree to signature TNode, there are not two objects with type TNode
with the same min parent in signature Tree.
Fact “orderTNodeCondition(c)” below orders objects of type
TNode with the same min parent of type TNode:
fact orderTNodeCondition(c){
all disj o1, o2 : TNode |
let a = minP[o1] | let b = minP[o2] |
(o1+o2 in FReach and some a and some b and
a = b and a in TNode and o1 = a.fleft and
o2 = a.fright) implies o2 = o1.nextTNode[]}
Fact “orderTNodeCondition(d)” orders objects of type TNode
with different min parents of type TNode. A similar fact is neces-
sary in order to sort objects of type Data with different TNode min
parents.
fact orderTNodeCondition(d) {
all disj o1, o2 : TNode |
let a = minP[o1] | let b = minP[o2] |
(o1+o2 in FReach and some a and some b and
a!=b and a+b in TNode and a in prevsTNode[b])
implies o1 in prevsTNode[o2]}
Regarding condition (e), fact “orderTNodeCondition(e)” orders
objects of type TNode whose min parents are one of type Tree, and
the other of type TNode:
fact orderTNodeCondition(e){
all disj o1, o2 : TNode |
let a = minP[o1] | let b = minP[o2] |
(o1+o2 in FReach and some a and some b and
a in Tree and b in TNode)
implies o1 in prevsTNode[o2]}
In order to avoid “holes” in the ordering, for each signature T
we add a fact “compactT” stating that whenever a node of type T
is reachable in the heap, all the smaller ones are also reachable. For
signature TNode we have:
fact compactTNode { all o : TNode | o in FReach
implies prevsTNode[o] in FReach}
Finally, the instrumentation modifies the facts, functions, predi-
cates and asserts of the original model by replacing each occurrence
of a recursive field riby the expression fri+bri. For instance, if
a fact acyclicTree is used to state that trees are acyclic structures:
fact acyclicTree { all t : Tree, n : TNode |
n in t.root.*(left + right) implies
n !in n.^(left + right) }
in the instrumented model it is replaced by the fact
fact acyclicTree { all t : Tree, n : TNode |
n in t.root.*(fleft+bleft+fright+bright)
implies n !in n.^(fleft+bleft+fright+bright) }
The following theorems show that the instrumentation is cor-
rect. Only proof sketches are provided due to space limitations.
THE OR EM 3.1. Given a heap Hfor a model, there exists a heap
H0isomorphic to Hand whose ordering between nodes respects
the instrumentation. Moreover, if an edge hn1, n2iis labelled r
(with ra recursive field), then: if n1is smaller (according to the
ordering) than n2(or n2is null), then hn1, n2iis labeled in H0
fr. Otherwise, it is labelled br .
Proof sketch: For each signature T, let nroot,T be the number of
root objects from T. For each pair of signatures T, T 0, let nT ,T 0be
the number of objects from Twhose min-parent has type T0. Since
there is a linear ordering between signature names, assign the first
nroot,T elements from Tto root elements. Then, assign for each
signature T0nT,T 0objects from Tfor nodes with min parent in
T0. When doing so, remember to assign smaller objects (w.r.t. the
linear ordering nextT) to smaller signature names. It only remains
to determine the order between elements within a class. Follow the
directions given in Fig. 5(b)–(d).
Theorem 3.1 shows that the instrumentation does not miss any
bugs during code analysis. If a counterexample for a partial cor-
rectness assertion exists, then there is another counterexample that
also satisfies the instrumentation.
THE OR EM 3.2. Let H , H0be heaps for an instrumented model. If
His isomorphic to H0, then H=H0.
Proof sketch: Suppose H6=H0. Since they are isomorphic there
must be a minimal position i0where their breadth-first search
traversals differ. Then look for a contradiction using the fact ax-
ioms allow to compare objects that occur before position i0in the
ordering.
Theorem 3.2 shows that the instrumentation indeed yields a
canonicalization of the heap.
3.2 Using Symmetry Breaking to Compute Tight Bounds
In this section we present the main result of this article. It is
interesting to notice that despite the symmetry breaking predicates
that come with the standard distribution of the Alloy Analyzer,
isomorphic models are generated (see Fig. 3). While in the original
Alloy model functions left and right are each one encoded using
n×(n+ 1) propositional variables, due to the ordering of nodes
we can remove arcs from relations. In order to determine wether an
edge NiNjcan be part of field F, we perform the following
analysis using the Alloy Analyzer:
pred NiToNjInF[] {Ni+Nj in FReach and Ni->Nj in F}
run NiToNjInF for scopes
In the example, for field fleft we must check, for instance,
pred TNode0ToTNode1Infleft[ ] {
TNode0+TNode1 in FReach and
TNode0->TNode1 in fleft }
run TNode0ToTNode1Infleft for exactly 1 Tree,
exactly 5 TNode, exactly 5 Data
If a “run” produces no instance, then the edge is infeasible
within the provided scopes. It is then removed from the upper
bound relation associated to field Fin the KodKod model. This
produces tighter KodKod bounds which, upon translation to a
propositional formula, yield a SAT problem involving less vari-
ables.
All these analyses are independent. A naive algorithm to deter-
mine feasibility consists on performing all the checks in parallel.
Unfortunately, the time required for each one of these analyses is
highly irregular. Some of the checks take milliseconds, while others
may exhaust the available resources. Our algorithm receives a col-
lection of Alloy models to be analyzed, one for each edge whose
feasibility must be checked. It also receives as input a threshold
time Tto be used as a time bound for the analyses. All the models
are analyzed in parallel using the available resources. Those indi-
vidual checks that exceed the time bound Tare stopped and left for
the next iteration. Each analysis that finishes as unsatisfiable tells
us that an edge may be removed from the current bound. Satisfiable
checks tell that the edge cannot be removed. After all the models
have been analyzed, we are left with a partition of the current set
of edge models in three sets: unsatisfiable checks, satisfied checks,
and stopped checks for which we do not have a conclusive answer.
We then refine the bounds (using the information from the unsat-
isfiable models) for the models whose checks were stopped. The
formerly stopped models are sent again for analysis. This leads to
an iterative process that, after a number of iterations, converges to a
(possibly empty) set of models that cannot be checked (even using
the refined bounds) within the threshold T. Then, the bounds re-
finement process finishes. For all the case studies we are reporting
in Section 4 it was possible to check all the edges using this algo-
rithm. Since bounds only depend on the class invariant, the signa-
ture scopes and the typing of the method under analysis, the same
bound can be used (as will be seen in Section 4) to improve the
analysis of different methods. Therefore, once a bound has been
computed, it is stored in a bounds repository, as shown in Fig. 2.
4. Experimental Results
In this section1we analyze methods from classes with rich invari-
ants. We will consider the following classes from the “collections”
framework:
LList: An implementation of sequences based on singly linked
lists.
AList: The actual implementation AbstractLinkedList of the
List interface from the apache package commons.collections,
based on circular doubly-linked lists.
TreeSet: The actual implementation of class TreeSet from pack-
age java.util, based on red-black trees.
AVLTree: An implementation of AVL trees obtained from the case
study used in [3].
BHeap: An implementation of priority queues based on binomial
heaps.
In Section 3 we emphasized the fact our technique allowed us
to remove variables in the translation to a propositional formula.
In Table 1 we report, for each class, the number of variables in
the non-instrumented upper bound (#NI), the size of the resulting
upper bound (number of feasible variables, noted #I), and the time
required by the algorithm from Section 3.2 to check feasibility. The
parallel algorithm was run in a cluster with 16 nodes, each node
having two Intel Dual Core Xeon processors (64 cores in total),
running at 2.67 GHz. Each node has 2 Gb of RAM shared among
the 4 cores. Times are reported using the format hh:mm:ss.
#Node 5 7 10 12 15 17
LList #NI 30 56 110 156 240 306
#I 9 13 19 23 29 33
Time 00:11 00:14 00:23 00:36 01:01 01:23
AList #NI 76 128 252 344 512 676
#I 33 47 68 82 103 117
Time 00:16 00:25 00:51 01:26 02:47 09:28
TrSet #NI 170 280 650 852 1200 2006
#I 59 107 200 279 424 533
Time 00:49 01:13 03:03 05:11 11:30 44:23
AVL #NI 150 280 650 852 1200 2006
#I 55 98 177 251 389 491
Time 00:33 00:57 03:26 09:53 22:03 1:41:31
BHeap #NI 170 280 650 852 1200 2006
#I 56 97 176 246 365 455
Time 00:48 00:53 02:47 05:10 14:29 33:05
Table 1. Sizes for instrumented (#I) and non-instrumented (#NI)
bounds, and analysis time for computation of instrumented bounds.
Once we have computed the bounds, in Table 2 we compare the
analysis times for methods in the studied classes, under three con-
ditions: (a) Using our translation, but without including the instru-
mentation and the tight bounds (NI), (b) using JForge [9] (a similar
SAT-based tool developed at MIT), and (c) using the instrumen-
tation and the tight bounds (I). In all cases we are checking that
the invariants are preserved. Also, for classes LList and AList,
we show that methods indeed implement the sequence operations.
Similarly, in class TreeSet we also show that methods correctly
implement the corresponding set operations. For class BHeap we
1Note for the referee: All the information necessary to reproduce the exper-
iments is given in http://www.dc.uba.ar/TACO
CL
CLN0
CLN1
CLN2
CLN3
next next
nextnext
prev
prev
prev
prev
header
CLN4 CLN5
nextfirstCacheNode next CLN6
Figure 6. An instance from class NodeCachingLinkedList.
also show that methods correctly implement the corresponding pri-
ority queue operations. Loops are unrolled up to 10 times, and no
contracts for called methods are being used (just their code). In
each column we consider different scopes for the nodes signature.
We set the scope for signature Data equal to the scope for nodes.
We have set a timeout (TO) of 10 hours for each one of the analy-
ses. Entries “OofM” mean “out of memory error”. Actually, JForge
ran out of memory while translating to a propositional formula. The
code being analyzed is bug-free. Since the process of bug finding
ends when no more bugs are found, this situation where bug free
code is analyzed, is a stress test that necessarily arises in actual bug
finding. Later in this section we will analyze buggy code. Since all
the analyses are sequential (the cluster has already been used for
computing the bounds in the instrumented case), we used a sin-
gle core. When reporting times using the instrumentation in TACO
(I), we are not adding the times (given in Table 1) to compute the
bounds. Still, adding these times does not make any of the analysis
that did not exceed 10 hours, to yield a TO.
5 7 10 12 15 17
LList Contains NI 00:03 00:05 00:08 00:11 00:13 00:22
JF 00:01 02:00 TO TO TO TO
I 00:03 00:04 00:05 00:06 00:07 00:09
Insert NI 00:04 00:09 01:14 00:33 04:26 01:25
JF 00:02 04:56 TO TO TO TO
I 00:04 00:05 00:07 00:08 00:13 00:26
Remove NI 00:05 00:27 TO TO TO TO
JF 00:04 21:51 TO TO TO TO
I 00:04 00:06 00:11 00:12 00:17 00:33
AList Contains NI 00:05 00:11 00:29 00:38 00:42 01:20
JF 00:02 05:01 TO TO TO TO
I 00:04 00:06 00:16 00:22 00:27 00:58
Insert NI 00:04 00:05 01:02 26:22 TO TO
JF 00:03 11:52 TO TO TO TO
I 00:04 00:05 00:07 00:08 00:12 00:16
Remove NI 00:06 00:14 11:25 05:47:39 TO TO
JF 00:18 01:13:27 TO TO TO TO
I 00:05 00:06 00:17 00:31 01:08 03:13
TreeSet Find NI 02:13 04:36:49 TO TO TO TO
JF 00:42 01:57:49 TO TO TO TO
I 00:04 00:10 01:56 12:43 58:54 05:05:06
Insert NI 21:38 TO TO TO TO TO
JF OofM OofM OofM OofM OofM OofM
I 00:43 08:44 TO TO TO TO
AVL Find NI 00:14 27:06 TO TO TO TO
JF 00:26 03:10:10 TO TO TO TO
I 00:03 00:06 00:36 01:41 08:20 33:06
FMax NI 00:02 00:04 46:12 TO TO TO
JF 00:06 49:49 TO TO TO TO
I 00:01 00:01 00:03 00:04 00:09 00:13
Ins NI 01:20 05:35:51 TO TO TO TO
JF OofM OofM OofM OofM OofM OofM
I 00:07 00:34 04:47 21:53 02:53:57 TO
BHeap Min NI 00:03 00:41 TO TO TO TO
JF 00:22 01:23:07 TO TO TO TO
I 00:02 00:04 00:11 00:20 02:29 00:07
DecK NI 00:30 38:58 TO TO TO TO
JF 01:48 TO TO TO TO TO
I 00:10 00:59 24:05 02:42:30 TO 00:26
Insert NI 01:55 51:22 TO TO TO TO
JF 01:13:47 TO TO TO TO TO
I 00:16 01:05 10:44 21:31 01:20:09 51:55
Table 2. Comparison of code analysis times for 10 loop unrolls.
Looking in Table 2 at the progression of analysis times for
TACO without bounds and JForge, it is clear that TACO with tight
bounds requires in most cases several orders of magnitude less
analysis time.
Let us consider class NodeCachingLinkedList, an implemen-
tation of the List interface from the apache commons.collections
public Object remove(int index) {
Node node = getNode(index, false);
Object oldValue = node.getValue();
super.removeNode(node);
if (cacheSize > maximumCacheSize){
return;
}
Node nextCacheNode = firstCacheNode;
node.previous = null;
node.next = nextCacheNode;
firstCacheNode = node;
return oldValue;
}
(a) (b)
public Object remove(int index) {
Node node = getNode(index, false);
Object oldValue = node.getValue();
super.removeNode(node);
if (cacheSize >= maximumCacheSize){
return;
}
Node nextCacheNode = firstCacheNode;
node.previous = null;
node.next = nextCacheNode;
firstCacheNode = node;
return oldValue;
}
Figure 7. Code snippets from remove (a), and a mutant (b).
package. As shown in Fig. 6, an instance has the actual (circular)
list, and a singly linked list (the cache). The cache list has a max-
imum size “maximumCacheSize” (maxCS), set in the actual code
to a default value of 20 nodes. When a node is removed from the
circular list, it is added to the cache (unless the cache is full). Let
us consider the code snippet from remove presented in Fig. 7.(a).
Figure 7.(b) gives us a mutant. A bug arises in the mutant when
a node is removed and the cache is full. Then, the 21st element
can be added to the cache, violating the part of the invariant that
constrains the cache size to be at most 20.
In Table 3 we report analysis information after looking for the
bug in the mutant (BM), for varying numbers of loop unrolls in
method super.removeNode. We have tailored the mutant (and
its contract), to be analyzed using the following tools: JForge,
ESC/Java2 [5], JavaPathFinder [25], Sireum/Kiasan [7], Jahob [4],
Dafny [20] and TACO. Notice that while Jahob does not pretend to
be an automatic tool for bug finding, we chose it because it provides
an expressive specification language, and it interfaces with several
state-of-the-art SMT-solvers. Since JPF does not include a way to
prescribe the number of loop unrolls, we unrolled the loop as many
times as necessary in the source code. When a tool runs out of
memory after running Nminutes, we report it as “OofM(N)”.
We computed a bound for TACO in 27:04 using one iteration
of the algorithm from Section 3.2. Table 3 shows that many times
it is not necessary to compute the tight bound, but rather thin the
default bound with a few iterations of the algorithm in order to
achieve a significant speed up in analysis time. The debugging
process consists on running a tool (such as TACO, JForge, etc.)
and, if a bug is found, correct the error and start over to look for
further bugs. Unlike JForge (where each analysis is independent
of the previous ones), the same bound can be used by TACO for
looking for all the bugs in the code. Therefore, the time required
for computing the bound can be amortized among these bugs. Since
the bound does not depend on the number of unrolls, in Table 3 we
have divided 27:04 among the 7 experiments, adding 03:52 to each
experiment. Time is reported as “bound computation time” + “SAT-
solving time”.
LU JForge ESC/Java2 JPF Kiasan Jahob TACO
4 OofM(227) OofM(206) TO OofM(4) 03:03:19 03:52 + 03:56
6 TO OofM(207) TO OofM(4) 05:05:29 03:52 + 31:14
8 OofM(287) OofM(213) TO OofM(4) 07:39:01 03:52 + 33:23
10 05:40:22 OofM(215) TO OofM(4) TO 03:52 + 00:11
12 06:53:04 OofM(219) TO OofM(4) TO 03:52 + 03:30
15 24:08 OofM(219) TO OofM(4) TO 03:52 + 15:00
20 TO OofM(218) TO OofM(4) TO 03:52 + 00:06
Table 3. Outcome of the analysis maxCS = 20. Ten hours timeout.
As expected, Jahob failed to prove that the code was correct, but
did not report wether a bug was found.
We also compared with Boogie [2] using Z3 [6] as the back-end
SMT solver. In order to produce Boogie code we used Dafny [20]
as the high-level programming and specification language. When
ran on the BM code with 10 loop unrolls, Boogie produced in the
order of 50 warning messages signaling potential bugs. A careful
inspection allowed us to conclude that all warnings produced by
Boogie were false warnings.
Since most tools failed to find the bug with maxCS = 20, we
also considered a version of the code with up to 2 loop unrolls
and varying values for maxCS; in this way the bug can be found in
smaller heaps. Table 4 reports the corresponding analysis times. In
TACO we have restricted the algorithm that computes the bound
for each scope to run at most 30 minutes.
mCS JForge ESC/Java2 JPF Kiasan Jahob TACO
5 00:13 OofM(187) 00:07 00:18 01:15:23 01:21 + 00:01
10 05:13 OofM(212) 00:20 00:43 01:16:17 02:25 + 00:11
13 OofM(529) OofM(221) 00:38 OofM(3) 01:16:17 05:27 + 00:32
15 OofM(334) OofM(214) 00:53 OofM(3) 01:14:29 21:31 + 00:15
18 14:04 OofM(200) 01:27 OofM(4) 01:17:35 30:00 + 02:27
20 OofM(494) OofM(556) 02:17 OofM(4) 01:17:02 30:00 + 02:11
Table 4. Up to 2 unrolls and varying maxCS. Ten hours timeout.
The code has a bug that requires building a non-trivial heap to
expose it. The technique introduced in this article made TACO the
only tool capable of finding the bug in all cases reported in Tables
3 and 4. When the size of the code is small (2 loop unrolls in Table
4), tools based on model checking were able to find the bug. They
failed on larger code, which shows that in the example TACO scales
better. Tools based on SMT solving systematically failed to expose
the bug. The reason is that while SMT solving is a more powerful
technique that allows to actually verify properties of code, it does
so in very restricted scenarios and requires writing specifications in
a form amenable to the SMT solver. On the other hand, SAT-based
bug finding as reported in this article offers a more limited kind of
analysis, but can be used as an actual push-button technology.
5. Related Work
In Section 3.1 we have analyzed related work on heap canonical-
ization. In Section 4 we have compared our tool with several other
state-of-the-art tools for code analysis. In this section we review
related (but difficult to compare experimentally) work.
The Alloy Annotation Language (AAL) was introduced in [18].
It allows to annotate Java-like code using Alloy as the annotation
language. The translation proposed in [18] does no differ in major
ways from the one we implement. Analysis using AAL does not
include any computation of bounds for fields.
In [24] the authors present a set of rules to be applied along
the translation to a SAT-formula in order to profit from properties
of functional relations. The article presents a case-study where
insertion in a red-black tree is analyzed. The part of the red-black
tree invariant that constrains trees to not have two consecutive
red nodes is shown to be preserved. In our experiment we prove
that the complete (more complex) invariant is preserved. Actually,
for 8 loop unrolls and scope 7 for nodes and data, the analysis
time decreases from 08:53 (for the property we analyze) to 0.153
seconds using the weakened property.
Saturn [26] is also a SAT-based static analysis tool for C. It uses
as its main techniques a slicing algorithm and function summaries.
While as in our case sequential code is faithfully modeled at the in-
traprocedural level (no abstractions are used), summaries of called
functions may produce (unlike TACO) spurious counterexamples.
Saturn can check assertions written as C “assert” statements. Its
assertion language is not as declarative as our extension of JML.
F-Soft [15] also analyses C code. It computes ranges for val-
ues of integer valued variables and for pointers under the hypoth-
esis that runs have bounded length. It is based on the framework
presented in [22]. Our technique produces tighter upper bounds be-
cause it does not compute feasible intervals for variables, but in-
stead checks each individual value. An advantage of F-Soft is that
while we must provide initial bounds for the analysis, F-Soft starts
without bounds and infers useful bounds. This is something we are
looking forward to integrate to our approach.
6. Conclusions and Further Work
This article makes a case for SAT-based bug finding to become a
mainstream method for program analysis. It shows that a method-
ology based on (1) adding appropriate constraints to SAT prob-
lems, and (2) using the constraints to remove unnecessary variables,
makes SAT-solving a method for program analysis as effective as
model checking or SMT-solving.
The experimental results presented in the article show that
bounds can be computed effectively, and that once bounds have
been computed, the analysis time improves enormously. This al-
lowed us to analyze real code, using domain scopes well beyond the
capabilities of current similar techniques. Still, this article presents
a naive approach to bound computation, and is a starting point to a
new research line on efficient bound computation.
We are developing a tool for parallel analysis of Alloy models
called ParAlloy. Preliminary results show that combining TACO
with ParAlloy will produce a new speed-up of several orders of
magnitude with respect to the times presented in this article. It
is worth emphasizing that the kind of infrastructure we are using
during parallel computation of bounds is inexpensive, and should
be accessible to any small or medium sized software development
company.
7. Acknowledgements
We want to thank: Darko Marinov for his careful comments on a
previous version of the article. Kuat Yassenov and Greg Dennis for
their support on JForge. Karen Zee for her help on writing of Jahob
models. Diego Garbervetsky for his support with Dafny. Esteban
Mocskos for his support on cluster computing.
References
[1] Andoni, A., Daniliuc, D., Khurshid, S. and
Marinov, D., Evaluating the “Small Scope Hy-
pothesis”, unpublished, downloadable from
http://sdg.csail.mit.edu/publications.html.
[2] Barnett M., Chang B.E, DeLine R., Jacobs B., Leino K.R.M.
Boogie: A Modular Reusable Verifier for Object-Oriented
Programs. FMCO 2005: pp. 364–387.
[3] Belt, J., Robby and Deng X., Sireum/Topi LDP: A Lightweight
Semi-Decision Procedure for Optimizing Symbolic Execution-
based Analyses, Technical Report SAnToS-TR2009-03-16,
2009
[4] Bouillaguet Ch., Kuncak V., Wies T., Zee K., Rinard M.C.,
Using First-Order Theorem Provers in the Jahob Data Struc-
ture Verification System. VMCAI 2007: 74-88
[5] Chalin P., Kiniry J.R., Leavens G.T., Poll E. Beyond Asser-
tions: Advanced Specification and Verification with JML and
ESC/Java2. FMCO 2005: 342-363
[6] Mendonc¸a de Moura L., Bjørner N. Z3: An Efficient SMT
Solver. TACAS 2008, pp. 337–340.
[7] Deng, X., Robby, Hatcliff, J., Towards A Case-Optimal Sym-
bolic Execution Algorithm for Analyzing Strong Properties of
Object-Oriented Programs, in Proceedings of SEFM 2007,
pp. 273-282.
[8] Dennis, G., Chang, F., Jackson, D.,. Modular Verification of
Code with SAT. in Proceedings of ISSTA’06, pp. 109–120.
[9] Dennis, G., Yessenov, K., Jackson D., Bounded Verification of
Voting Software. Second IFIP Working Conference on Ver-
ified Software: Theories, Tools, and Experiments (VSTTE
2008) . Toronto, Canada, October 2008.
[10] Flanagan, C., Leino, R., Lillibridge, M., Nelson, G., Saxe, J.,
Stata, R., Extended static checking for Java, In Proceedings of
PLDI 2002, pp. 234–245.
[11] Frias, M. F., Galeotti, J. P., Lopez Pombo, C. G., Aguirre, N.,
DynAlloy: Upgrading Alloy with Actions, in Proceedings of
ICSE’05, pp. 442–450.
[12] Frias, M. F., Lopez Pombo, C. G., Galeotti, J. P., Aguirre,
N., Efficient Analysis of DynAlloy Specifications, in ACM-
TOSEM, Vol. 17(1), 2007.
[13] Galeotti, J. P., Frias, M. F., DynAlloy as a Formal Method
for the Analysis of Java Programs, in Proceedings of IFIP
Working Conference on Software Engineering Techniques,
Warsaw, 2006, Springer.
[14] Iosif R., Symmetry Reduction Criteria for Software Model
Checking. SPIN 2002: 22-41
[15] Ivanˇ
ci´
c, F., Yang, Z., Ganai, M.K., Gupta, A., Shlyakhter, I.,
Ashar, P., F-Soft: Software Verification Platform. In Proceed-
ings of CAV’05, pp. 301–306.
[16] Jackson, D., Software Abstractions. The MIT Press, 2006.
[17] Jackson, D., Vaziri, M., Finding bugs with a constraint solver,
in Proceedings of ISSTA’00, pp. 14-25.
[18] Khurshid, S., Marinov, D., Jackson, D., An analyzable an-
notation language. In Proceedings of OOPSLA 2002, 2002,
pp. 231-245.
[19] Khurshid, S., Marinov, D., Shlyakhter, I., Jackson, D., A Case
for Efficient Solution Enumeration, in Proceedings of SAT
2003, LNCS 2919, pp. 272–286.
[20] Leino K.R.M., Specification and verification of Object-
Oriented Software, Lecture Notes from Marktoberdorf Inter-
national Summer School 2008.
[21] Musuvathi M., Dill, D. L., An Incremental Heap Canonical-
ization Algorithm, in Proceedings of SPIN 2005: 28-42
[22] Rugina, R., Rinard, M. C., Symbolic bounds analysis of point-
ers, array indices, and accessed memory regions, in Proceed-
ings of PLDI 2000, pp. 182–195, 2000.
[23] Torlak E., Jackson, D., Kodkod: A Relational Model Finder.
in Proceedings of TACAS ’07, LNCS 4425, pp. 632–647.
[24] Vaziri, M., Jackson, D., Checking Properties of Heap-
Manipulating Procedures with a Constraint Solver, in Pro-
ceedings of TACAS 2003, pp. 505-520.
[25] Visser W., Havelund K., Brat G., Park S. and Lerda F., Model
Checking Programs, ASE Journal, Vol.10, N.2, 2003.
[26] Xie, Y., Aiken, A., Saturn: A scalable framework for error de-
tection using Boolean satisfiability. in ACM TOPLAS, 29(3):
(2007).
... In the literature, there has been an extensive body of work that exhaustively generates input structures for software testing. This line of work typically specifies the invariant property and enumerates the structures declaratively [7,19,28,50], imperatively [12,31,49,56] or in a hybrid fashion [20,48]. ...
... Two popular techniques are declarative enumeration and imperative enumeration. In particular, declarative approaches leverage any given invariant to search for valid inputs [7,19,28,50], and the imperative approaches directly construct the inputs based on more prescriptive specifications [12,31,49,56]. In program synthesis, there have been studies on inductive functional programming systems for exhaustively synthesizing small programs [8,[25][26][27]. ...
Preprint
A program can be viewed as a syntactic structure P (syntactic skeleton) parameterized by a collection of the identifiers V (variable names). This paper introduces the skeletal program enumeration (SPE) problem: Given a fixed syntactic skeleton P and a set of variables V , enumerate a set of programs P exhibiting all possible variable usage patterns within P. It proposes an effective realization of SPE for systematic, rigorous compiler testing by leveraging three important observations: (1) Programs with different variable usage patterns exhibit diverse control- and data-dependence information, and help exploit different compiler optimizations and stress-test compilers; (2) most real compiler bugs were revealed by small tests (i.e., small-sized P) --- this "small-scope" observation opens up SPE for practical compiler validation; and (3) SPE is exhaustive w.r.t. a given syntactic skeleton and variable set, and thus can offer a level of guarantee that is absent from all existing compiler testing techniques. The key challenge of SPE is how to eliminate the enormous amount of equivalent programs w.r.t. α\alpha-conversion. Our main technical contribution is a novel algorithm for computing the canonical (and smallest) set of all non-α\alpha-equivalent programs. We have realized our SPE technique and evaluated it using syntactic skeletons derived from GCC's testsuite. Our evaluation results on testing GCC and Clang are extremely promising. In less than six months, our approach has led to 217 confirmed bug reports, 104 of which have already been fixed, and the majority are long latent bugs despite the extensive prior efforts of automatically testing both compilers (e.g., Csmith and EMI). The results also show that our algorithm for enumerating non-α\alpha-equivalent programs provides six orders of magnitude reduction, enabling processing the GCC test-suite in under a month.
... The significant advances in automated analysis techniques have led, in the last few decades, to the development of powerful tools able to assist software engineers in software development, that have proved to greatly contribute to software quality. Indeed, tools based on model checking [8], constraint solving [42], evolutionary computation [11] and other automated approaches, are being successfully applied to various aspects of software development, from requirements specification [3,15] to verification [23] and bug finding [16,21]. Despite the great effort that is put in software development to detect software problems (wrong requirements, deficient specifications, design flaws, implementation errors, etc.), e.g., through the use of the above mentioned techniques, many bugs reach and make it through the deployment phases. ...
Article
Full-text available
In this article, we empirically study the suitability of tests as acceptance criteria for automated program fixes, by checking patches produced by automated repair tools using a bug-finding tool, as opposed to previous works that used tests or manual inspections. We develop a number of experiments in which faulty programs from IntroClass , a known benchmark for program repair techniques, are fed to the program repair tools GenProg, Angelix, AutoFix and Nopol, using test suites of varying quality, including those accompanying the benchmark. We then check the produced patches against formal specifications using a bug-finding tool. Our results show that, in the studied scenarios, automated program repair tools are significantly more likely to accept a spurious program fix than producing an actual one. Using bounded-exhaustive suites larger than the originally given ones (with about 100 and 1,000 tests) we verify that overfitting is reduced but a) few new correct repairs are generated and b) some tools see their performance reduced by the larger suites and fewer correct repairs are produced. Finally, by comparing with previous work, we show that overfitting is underestimated in semantics-based tools and that patches not discarded using held-out tests may be discarded using a bug-finding tool.
... While formal methods traditionally employ (semi automated) deductive reasoning on formal specifications, the use of fully automated specification analysis, as realized with model checkers [7] and model finders [18,36], has significantly improved the practical applicability of formal methods. In this context, the Alloy formal specification language [18], and its automated specification analysis tool Alloy Analyzer, have received increasing attention by the formal methods and software engineering communities [2][3][4][5]10,13,14,23,28,35]. Besides the simplicity and expressiveness of the Alloy language, a main reason for the language's success is indeed the possibility of automatically analyzing specifications. ...
... The user is able to iterate over these scenarios one by one, inspecting them for correctness. Alloy has been used to verify software system designs [35], [3], [32], [7], and to perform various forms of analyses over the corresponding implementation, including deep static checking [14], [10], systematic testing [18], data structure repair [34], automated debugging [11] and to synthesize security attacks [1], [20], [27]. ...
Preprint
Full-text available
Alloy is a declarative modeling language that is well suited for verifying system designs. Alloy models are automatically analyzed using the Analyzer, a toolset that helps the user understand their system by displaying the consequences of their properties, helping identify any missing or incorrect properties, and exploring the impact of modifications to those properties. To achieve this, the Analyzer invokes off-the-shelf SAT solvers to search for scenarios, which are assignments to the sets and relations of the model such that all executed formulas hold. To help write more accurate software models, Alloy has a unit testing framework, AUnit, which allows users to outline specific scenarios and check if those scenarios are correctly generated or prevented by their model. Unfortunately, AUnit currently only supports textual specifications of scenarios. This paper introduces Crucible, which allows users to graphically create AUnit test cases. In addition, Crucible provides automated guidance to users to ensure they are creating well structured, valuable test cases. As a result, Crucible eases the burden of adopting AUnit and brings AUnit test case creation more in line with how Alloy scenarios are commonly interacted with, which is graphically.
... Based on the above observations, we present ICEBAR (Iterative CounterExample Based Alloy Repair), a technique and tool for automatically repairing formal specifications written in the Alloy language [21]. Alloy is an expressive formal specification language with support for SAT-solving based property checking, that has many applications in software development, including telecommunication protocol design [61], security analysis in mobile applications [6,8,9], automated test generation [5,24,41], and bounded program verification [14][15][16]. ICEBAR builds upon ARepair [55], an automated repair tool for Alloy specifications that, in the spirit of traditional APR, requires test cases, both for fault localization and for patch acceptance checking. ARepair introduces the difficulty of getting test cases for the specification to be repaired, which are typically not part of Alloy specifications. ...
Article
Alloy is a formal specification language, which despite featuring a simple syntax and relational semantics, is very expressive and supports efficient automated specification analysis, based on SAT solving. While the language is sufficiently expressive to accommodate both static and dynamic properties of systems within specifications, the latter kind of properties require intricate, ad-hoc, constructions to encode system executions. Thus, extensions to the language have been proposed, that internalize these encodings and provide analysis techniques, specifically tailored to properties of executions. In this paper we study two particular extensions to Alloy that incorporate elements for the specification of properties of executions. These are DynAlloy, whose syntax and semantics are inspired by dynamic logic, and Electrum, based on linear-time temporal logic and inspired by languages such as TLA+. We analyze and compare the syntactic characteristics of the languages, their corresponding expressiveness, and the effectiveness and efficiency of their associated analysis tools. The comparison is based on a set of Alloy specifications that are taken from the literature and demand dynamic behavior analysis, including an Alloy model of the Chord ring-maintenance protocol, that drives our qualitative comparison of the notations.</p
Article
Full-text available
An approach is described for checking the methods of a class against a full specification. It shares with traditional model checking the idea of exhausting the entire space of executions within some finite bounds, and with traditional verification the idea of modular analysis, in which a method is analyzed, in isolation, for all possible calling contexts. The analysis involves an automatic two-phase reduction: first, to an intermediate form in relational logic (using a new encoding described here), and second, to a boolean formula (using existing techniques), which is then handed to an off-the-shelf SAT solver. A variety of implementations of the Java Collections Frame-work's List interface were checked against existing JML specifications. The analysis revealed bugs in the implementations , as well as errors in the specifications themselves.
Conference Paper
Full-text available
This paper presents our integration of efficient resolution-based theorem provers into the Jahob data structure verification system. Our experimental results show that this approach enables Jahob to automatically verify the correctness of a range of complex dynamically instantiable data structures, such as hash tables and search trees, without the need for interactive theorem proving or techniques tailored to individual data structures. Our primary technical results include: (1) a translation from higher-order logic to first-order logic that enables the application of resolution-based theorem provers and (2) a proof that eliminating type (sort) information in formulas is both sound and complete, even in the presence of a generic equality operator. Our experimental results show that the elimination of type information often dramatically decreases the time required to prove the resulting formulas. These techniques enabled us to verify complex correctness properties of Java programs such as a mutable set implemented as an imperative linked list, a finite map implemented as a functional ordered tree, a hash table with a mutable array, and a simple library system example that uses these container data structures. Our system verifies (in a matter of minutes) that data structure operations correctly update the finite map, that they preserve data structure invariants (such as ordering of elements, membership in appropriate hash table buckets, or relationships between sets and relations), and that there are no run-time errors such as null dereferences or array out of bounds accesses.
Chapter
The specification of object-oriented and other pointer-based programs must be able to describe the structure of the program's dynamically allocated data as well as some abstract view of what the code implements. The verification of such programs can be done by generating logical verification conditions from the program and its specifications and then analyzing the verification conditions by a mechanical theorem prover.
Article
We present a tool for the formal verification of ANSI-C programs using Bounded Model Checking (BMC). The emphasis is on usability: the tool supports almost all ANSI-C language features, including pointer constructs, dynamic memory allocation, recursion, and the float and double data types. From the perspective of the user, the verification is highly automated: the only input required is the BMC bound. The tool is integrated into a graphical user interface. This is essential for presenting long counterexample traces: the tool allows stepping through the trace in the same way a debugger allows stepping through a program.
Article
This dissertation describes a method for systematic constraint-based test generation for programs that take as inputs structurally complex data, presents an automated SAT-based framework for testing such programs, and provides evidence on the feasibility of using this approach to generate high quality test suites and find bugs in non-trivial programs. The framework tests a program systematically on all nonisomorphic inputs (within a given bound on the input size). Test inputs are automatically generated from a given input constraint that characterizes allowed program inputs. In unit testing of object-oriented programs, for example, an input constraint corresponds to the representation invariant; the test inputs are then objects on which to invoke a method under test. Input constraints may additionally describe test purposes and test selection criteria. Constraints are expressed in a simple (first-order) relational logic and solved by translating them into propositional formulas that are handed to an off-the-shelf SAT solver. Solutions found by the SAT solver are lifted back to the relational domain and reified as tests. The TestEra tool implements this framework for testing Java programs. Ex-periments on generating several complex structures indicate the feasibility of using off-the-shelf SAT solvers for systematic generation of nonisomorphic structures. The tool also uncovered previously unknown errors in several applications including an intentional naming scheme for dynamic networks and a fault-tree analysis system developed for NASA.
Article
In object-oriented programs built in layers, an object at a higher level of abstraction is implemented by objects at lower levels of abstraction. It is usually crucial to correctness that a lower-level object not be shared among several higher-level objects. This paper unveils some difficulties in writing procedure specifications strong enough to guarantee that a lower-level object can be used in the implementation of another object at a higher level of abstraction. To overcome these difficulties, the paper presents virginity, a convenient way of specifying that an object is not globally reachable and thus can safely be used in the implementation of a higher-level abstraction.