Page 1
A Relational Approach to
Interprocedural Shape Analysis
BERTRAND JEANNET
and
ALEXEY LOGINOV
and
THOMAS REPS
and
MOOLY SAGIV
This paper addresses the verification of properties of imperative programs with recursive procedure
calls, heapallocated storage, and destructive updating of pointervalued fields—i.e.,interprocedural
shape analysis. The paper makes three contributions:
— It introduces a new method for abstracting relations over memory configurations for use in
abstract interpretation.
— It shows how this method furnishes the elements needed for a compositional approach to shape
analysis. In particular, abstracted relations are used to represent the shape transformation
performed by a sequence of operations, and an overapproximation to relational composition
can be performed using the meet operation of the domain of abstracted relations.
— It applies these ideas in a new algorithm for contextsensitive interprocedural shape analysis.
The algorithm creates procedure summaries using abstracted relations over memory config
urations, and the meetbased composition operation provides a way to apply the summary
transformer for a procedure P at each call site from which P is called.
The algorithm has been applied successfully to establish properties of both (i) recursive programs
that manipulate lists, and (ii) recursive programs that manipulate binary trees.
Categories and Subject Descriptors: D.2.4 [Software Engineering]: Software/program Verifi
cation—Assertion checkers; D.2.5 [Software Engineering]: Testing and Debugging—symbolic
execution; D.3.3 [Programming Languages]: Language Constructs and Features—data types
and structures; dynamic storage management; Procedures, functions and subroutines; Recur
sion; E.1 [Data]: Data Structures—Lists, stacks and queues; Trees; E.2 [Data]: Data Storage
A preliminary version of this paper appeared in the proceedings of the 11th Int. Static Analysis
Symposium (SAS), (Verona, Italy, August 2628, 2004) [Jeannet et al. 2004].
This work was supported in part by ONR under grants N000140110796 and N00014011
0708, and by NSF under grants CCR9986308, CCF0540955, and CCF0524051.
Affiliations: Bertrand Jeannet; INRIA; Bertrand.Jeannet@inrialpes.fr.
GrammaTech, Inc.; alexey@grammatech.com. Thomas Reps; Comp. Sci. Dept., University of
Wisconsin, and GrammaTech, Inc.; reps@cs.wisc.edu. Mooly Sagiv; School of Comp. Sci., Tel
Aviv University; msagiv@post.tau.ac.il.
When the research reported in the paper was carried out, Bertrand Jeannet was affiliated
with INRIA or visiting the University of Wisconsin, and Alexey Loginov was affiliated with the
University of Wisconsin.
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c ? 20YY ACM 00000000/20YY/00000001 $5.00
Alexey Loginov;
ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 1–0??.
Page 2
2
·
Bertrand Jeannet et al.
Representations—composite structures; linked representations; F.3.1 [Logics and Meanings of
Programs]: Specifying and Verifying and Reasoning about Programs—Assertions; Invariants
General Terms: Algorithms, Languages, Theory, Verification
Additional Key Words and Phrases: Abstract interpretation, contextsensitive analysis, interpro
cedural dataflow analysis, destructive updating, pointer analysis, shape analysis, static analysis,
3valued logic
1.INTRODUCTION
This paper concerns techniques for static analysis of recursive programs that ma
nipulate heapallocated storage and perform destructive updating of pointervalued
fields. The goal is to recover shape descriptors that provide information about the
characteristics of the data structures that a program’s pointer variables can point
to. Such information can be used to help programmers understand certain aspects
of the program’s behavior, to verify properties of the program, and to optimize or
parallelize the program.
The work reported in the paper builds on past work by several of the authors
on static analysis based on 3valued logic [Sagiv et al. 2002; Reps et al. 2003] and
its implementation in the TVLA system [LevAmi and Sagiv 2000]. In this setting,
two related logics come into play: an ordinary 2valued logic, as well as a related
3valued logic. A memory configuration, or store, is modeled by what logicians
call a logical structure, which consists of a predicate (i.e., a relation of appropriate
arity) for each predicate symbol of a vocabulary P. A store is modeled by a 2valued
logical structure; a set of stores is abstracted by a (finite) set of boundedsize 3
valued logical structures. An individual of a 3valued structure’s universe either
models a single memory cell or, in the case of a summary individual, a collection of
memory cells.
The constraint of working with limitedsize descriptors entails a loss of infor
mation about the store. Certain properties of concrete individuals are lost due to
abstraction, which groups together multiple individuals into summary individuals:
a property can be true for some concrete individuals of the group but false for other
individuals. It is for this reason that 3valued logic is used; uncertainty about a
property’s value is captured by means of the third truth value, 1/2.
One of the opportunities for scaling up this approach is to exploit the compo
sitional structure of programs. In interprocedural dataflow analysis, one avenue
for accomplishing this is to create a summary transformer for each procedure P,
and use the summary transformer at each call site at which P is called. Each
summary transformer must capture (an overapproximation of) the net effect of a
call on P. To be able to create summary transformers, the abstract transformers
for individual transitions must have a “composable representation”; that is, given
the representations of two abstract transformers, it must be possible to represent
their composition as an object of roughly the same size. One then carries out a
fixpointfinding procedure on a collection of equations in which each variable in the
equation set has a transformervalued value—i.e., a value drawn from the domain
of transformers—rather than a dataflow value proper.
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 3
A Relational Approach to Interprocedural Shape Analysis
·
3
A number of approaches to interprocedural dataflow analysis based on summary
transformers are known [Cousot and Cousot 1977; Sharir and Pnueli 1981; Knoop
and Steffen 1992; Reps et al. 1995; Sagiv et al. 1996; Reps et al. 2005]. However, not
all programanalysis problems have abstract transformers that have a composable
representation.
For some problems, it is possible to address this issue by working pointwise,
tabulating composed transformers using either (i) sets of pairs that consist of an
input abstract value and an output abstract value [Sharir and Pnueli 1981], or (ii)
finergranularity sets of pairs that capture how parts of an input abstract value
influence parts of an output abstract value [Reps et al. 1995; Sagiv et al. 1996; Ball
and Rajamani 2001]. In essence, these approaches start with the kinds of objects
used in intraprocedural analysis and pair them together to create the objects that
are used in interprocedural analysis.
However, for interprocedural shape analysis, tabulating pairs of 3valued
structures—the kinds of objects used in intraprocedural shape analysis—has sig
nificant drawbacks insofar as precision is concerned: in the 3valuedlogic approach
to shape analysis, individuals—which model memory cells—do not have fixed iden
tities; they are identified only up to their “distinguishing characteristics”, namely,
their values for a specific set of unary predicates. Because these “distinguishing
characteristics”can change during the course of a procedure call, there is no way to
identify individuals in an input abstract structure with their corresponding individ
uals in the output abstract structure. In essence, a pair of input/output 3valued
structures loses track of the correlations between the input and output values of
an individual’s unary predicates. Consequently, an approach based on tabulating
composed transformers as sets of pairs of 3valued structures provides only a weak
characterization of a procedure’s net effect, and is fundamentally limited in the
properties that it can express.
All is not lost, however: instead of “abstracting and then pairing” (as discussed
above), the solution is to “pair and then abstract”.
Observation 1.1. By using a 3valued structure over a doubled vocabulary P ?
P?, where P?= {p? p ∈ P} and ? denotes disjoint union, one can obtain a finite
abstraction that relates the predicate values for an individual at the beginning of a
transition to the predicate values for the individual at the end of the transition.
This approach provides a way to create much more accurate composable represen
tations of transformers, and hence much more accurate summary transformers, for
a broad class of problems. The advantages come from two effects:
— The addition of the second vocabulary changes the abstraction in use because
individuals now have additional “distinguishing characteristics” [Sagiv et al.
2002].
— The second vocabulary helps permit the changes in a predicate to be tracked
over a sequence of operations [LevAmi et al. 2000].
The benefit of these properties is that, in many cases, a relationship on the before
and after values of a predicate can be tracked on individual locations or tuples of lo
cations, over a sequence of operations—even when abstraction has been performed.
The consequence is that twovocabulary 3valued structures provide more precise
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 4
4
·
Bertrand Jeannet et al.
descriptors of relations between stores than an approach based on pairing abstract
stores from an existing store abstraction.
Moreover, by extending the abstract domain of 3valued logical structures with
some new operations, it is possible to perform abstract interpretation of call and
return statements without losing too much precision (see §6 and §7). We have
used these ideas to create a contextsensitive shapeanalysis algorithm for recursive
programsthat manipulate heapallocated storageand perform destructive updating.
The “pair and then abstract” principle of Observation 1.1 is related to several
wellknown concepts:
Pairing without abstraction:. The use of a doubled vocabulary is standard in
logicbased reasoning about execution behavior: the transition relations of a lan
guage’s concrete semantics are often expressed by means of formulas over present
state and nextstate variables (e.g., [Gries 1981; Manna and Pnueli 1995; Clarke
et al. 1999]). For instance, the semantics of a statement x := y+1 can be expressed
as the formula (x?= y + 1) ∧ (y?= y). Similarly, a procedure’s postcondition is
often expressed using such a doubled vocabulary (i.e., the postcondition expresses
a relation over input stores and output stores).
Pairing and then numeric abstraction:. For analyzing programs that manipulate
numeric data, a composable abstract transformer for a statement such as x := y+1
can be created directly from the formula (x?= y + 1) ∧ (y?= y) when using
the polyhedral abstract domain [Cousot and Halbwachs 1978]. The number of
dimensions in each polyhedron used by the analyzer is double the number V  of
numeric variables V about which the analyzer is trying to obtain information. Each
program variable has a primed and an unprimed version, and a polyhedron captures
linear relations among the 2V  variables.
In this paper, we use Observation 1.1 to create composable abstract transformers for
programs that manipulate nonnumeric data. Our work provides a new approach to
performing contextsensitive interprocedural shape analysis, and allows us to verify
properties of imperative programs with recursive procedure calls, heapallocated
storage, and destructive updating of pointervalued fields.
The contributions of our work include the following:
(1) We introduce a new method for abstracting relations over memory configura
tions for use in abstract interpretation.
(2) We show how this method furnishes the elements needed for a compositional
approach to shape analysis. In particular, abstracted relations are used to
represent the shape transformation performed by a sequence of operations, and
an overapproximation to relational composition can be performed using the
meet operation of the domain of abstracted relations.
(3) We apply these ideas in a new algorithm for contextsensitive interprocedural
shape analysis. The algorithm creates procedure summaries using abstracted
relations over memory configurations, and the meetbased composition opera
tion provides a way to apply the summary transformer for a procedure P at
each call site from which P is called.
We have been able to apply this approach successfully to establish properties of both
(i) recursive programs that manipulate lists, and (ii) recursive programs that ma
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 5
A Relational Approach to Interprocedural Shape Analysis
·
5
typedef struct node {
struct node *n;
int data;
} *List;
List res;
void main(List l) {
res = rev(l);
}
List rev(List x){
List y, z;
z = x>n;
x>n = NULL;
if (z != NULL){
y = rev(z);
z>n = x;
}
else y = x;
return y;
}
Fig. 1. Recursive listreversal program. The recursive function rev destructively reverses a non
empty, acyclic, singlylinked list using recursion to traverse the list.
nipulate binary trees. While listmanipulation programs can often be implemented
in tailrecursive fashion—and hence can be converted easily into loop programs—
treemanipulation programs are much less easily converted to nonrecursive form.
In particular, the shape properties that characterize sorted binary trees are com
plex and rely on global properties, whereas the shape properties that characterize
sorted lists are mostly local properties—with cyclicity properties being the main
exception.
Organization. The remainder of the paper is organized as follows: §2 presents,
at a semiformal level, several of the principles that lie behind our approach. §3
presents some background on 2valued and 3valued logic. §4 defines the language
to which our analysis applies, and gives a concrete semantics, based on the use
of 2valued logical structures for representing memory configurations. §5 describes
the abstraction of 2valued logical structures with boundedsize 3valued logical
structures [Sagiv et al. 2002]. Our interprocedural shape analysis is based on a
relational semantics, which establishes at each control point a relation between
the input state of the enclosing procedure and the state at the current point. This
semantics requires the ability to represent relations between memory configurations,
which presents certain difficulties at the abstract level. §6 addresses this problem
by abstracting relations between memory configurations using the same principles
as those used to abstract sets of memory configurations in §5. §7 describes the
interprocedural shapeanalysis algorithm that we developed based on these ideas.
§8 presents experimental results. §9 discusses related work.
2. OVERVIEW
In this section, we discuss at a semiformal level the “pairing” aspect of Obs. 1.1
(“pair and then abstract”). Abstraction is the subject of §5. §7 applies the “pair
and then abstract”principle in the context of interprocedural shape analysis.
Consider nonempty, acyclic, singlylinked lists constructed from nodes of the
type List whose declaration is given in Fig. 1. One of the issues discussed below
concerns how to create a summary transformer for a procedure that reverses a list,
using destructive updating. The summary transformer that we give applies both to
recursive and nonrecursive destructive listreversal procedures. Because summary
transformers (also known as“procedure summaries”) are particularly useful for an
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 6
6
·
Bertrand Jeannet et al.
(a)
[1]
[2]
[3]
[4]
a = <a 4element list>; b = NULL; p = NULL;
b = rev(a);
p = b>n;
...
(b)
[1]
[2]
[3]
[4]
a = <a 4element list>; b = NULL; c = NULL;
b = rev(a);
c = rev(b);
...
Fig. 2. Examples to illustrate onevocabulary structures, twovocabulary structures, transformer
application, and procedure summaries.
(a)S =
n
a
nn
(b)S?=
b
nnn
a
(c) S??=
b
p
nnn
a
Fig. 3.
pointed to by a; (b) the (onevocabulary) structure that represents the list from (a) after the
operation “[2] b = rev(a);”; (c) the (onevocabulary) structure that represents the list from (b)
after the operation “[3] p = b>n;”.
(a) The (onevocabulary) structure that represents a fourelement acyclic list that is
alyzing recursive programs, the running example used in later sections of the paper
is the recursive listreversal program shown in Fig. 1. That procedure destructively
reverses a nonempty, acyclic, singlylinked list using recursion to traverse the list.
In the remainder of this section, we discuss the two code fragments shown in
Fig. 2. Fig. 3 depicts three fourelement, singlylinked, acyclic lists. The nodes of
each graph represent memory cells. An addressvalued program variable (“pointer
variable”) that points to a given memory cell is represented by an arrow from the
variable name to the node for the cell. (A pointer variable whose value is NULL is
not shown.) The other arrows in the graph, labeled with n, represent the values of
cells’ nfields. Fig. 3(a), (b), and (c) represent lists that arise just before lines [2],
[3], and [4] of Fig. 2(a), respectively.
Two Kinds of Pairing. Figs. 4 and 5 illustrate two different kinds of pairing
operations that can be performed on lists:
— Fig. 4(a) depicts a pair of onevocabulary structures that represent the net
transformation from just before line [2] of Fig. 2(a) to just before line [3];
Fig. 4(b) depicts a pair of onevocabulary structures that represent the net
transformation from just before line [2] of Fig. 2(a) to just before line [4].
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 7
A Relational Approach to Interprocedural Shape Analysis
·
7
(a)?S,S?? =
n
a
nn
,
b’
n’n’n’
a’
(b)?S,S??? =
b”
p”
n”n”n”
n
a
nn
,
a”
Fig. 4. Pairs of onevocabulary structures that represent (a) the net transformation from just
before line [2] of Fig. 2(a) to just before line [3]; (b) the net transformation from just before
line [2] of Fig. 2(a) to just before line [4].
(a)S?·,??=
n
a,a’
b’
nn
n’
n’n’
(b)S?·,???=
n
a,a”
b”
p”
nn
n”n”n”
Fig. 5. Twovocabulary structures that represent (a) the net transformation from just before
line [2] of Fig. 2(a) to just before line [3]; (b) the net transformation from just before line [2]
of Fig. 2(a) to just before line [4]. (The superscript in each structure’s name indicates what
vocabularies are present in the structure; “·” stands for “unprimed”.)
— Fig. 5(a) depicts a twovocabulary structure that represents the net transfor
mation from just before line [2] of Fig. 2(a) to just before line [3]; Fig. 5(b)
depicts a twovocabulary structure that represents the net transformation from
just before line [2] of Fig. 2(a) to just before line [4].
A twovocabulary structure has a single set of memory cells that are structured
using two vocabularies. In Fig. 5(a), one vocabulary is {a,b,p,n}; the second
vocabulary is {a?,b?,p?,n?}. In Fig. 5(b), the two vocabularies are {a,b,p,n} and
{a??,b??,p??,n??}.1(In Fig. 4(a) and (b), we have used singleprimed and double
primed vocabularies in the respective secondcomponent structures to emphasize
how they correspond to the twovocabulary structures of Fig. 5(a) and (b). Strictly
speaking, these should have been unprimed vocabularies.)
Even though we have drawn the list in the second component of the pair shown
in Fig. 4(a) so that each n?edge appears to have been reversed from the nedge
in the first component, we have not given names to the nodes, and thus Fig. 4(a)
does not contain sufficient information to ensure that each the original edges has,
in fact, been reversed.2
1Variables b, p, and p?do not appear in Fig. 5(a) because they have the value NULL. Likewise,
variables b and p do not appear in Fig. 5(b) because they have the value NULL.
2Although it would be easy to give indelible names to nodes in each concrete list, it will become
apparent in §5 that this is not the case for nodes in abstract lists. The discussion in this section is
intended to convey—using concrete lists—how we overcome the lack of indelible names for nodes
in abstract lists.
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 8
8
·
Bertrand Jeannet et al.
In contrast, because there is a unique set of nodes in the twovocabulary structure
of Fig. 5(a), we know that for each nedge there is a corresponding reversed n?edge,
and vice versa.
Transformer Application. Let τ denote the transformation produced by the state
ment“[3] p = b>n;”in line [3] of Fig. 2(a). Consider three ways of depicting the
effect:
— In terms of onevocabulary structures, the transformation amounts to passing
from Fig. 3(b) to Fig. 3(c):
τ(S?) = S??.
— In terms of pairs of onevocabulary structures, the transformation amounts to
passing from Fig. 4(a) to Fig. 4(b):
τ(?S,S??) = ?S,τ(S?)? = ?S,S???.
— In terms of twovocabulary structures, the transformation amounts to passing
from Fig. 5(a) to Fig. 5(b):
τ(S?·,??) = S?·,???,
where the superscript indicates what vocabularies are included in the structure
(“·”stands for “unprimed”).
TwoVocabulary Structures as Procedure Summaries. Both (i) a pair of one
vocabulary structures, and (ii) a twovocabulary structure provide a way to rep
resent the net transformation performed by an operation (or a sequence of opera
tions). However, as illustrated above, in the absence of indelible names for nodes,
a twovocabulary structure can represent information more precisely than a pair of
onevocabulary structures, and thus a twovocabulary structure can provide a more
precise procedure summary than a pair of onevocabulary structures.
In the remainder of this section, we discuss the code fragment shown in
Fig. 2(b). Structure S?·,??
2
in Fig. 6(a) summarizes the transformation performed by
“[2] b = rev(a);”, and structure S??,???
3
in Fig. 6(b) summarizes the transformation
performed by “[3] c = rev(b);”.
Transformer Composition. The result of composing the transformations repre
sented by two twovocabularystructures can be expressed as another twovocabulary
structure. For instance, consider the twovocabulary structure S?·,???
Fig. 7, which represents the result of composing Fig. 6(b) with Fig. 6(a) to obtain a
twovocabulary structure for the sequence “[2] b = rev(a); [3] c = rev(b);”.
The composition of the transformations represented by two twovocabulary struc
tures can be expressed in terms of a meet operation on threevocabulary structures.
To explain this, we introduce the graphical notation of dotted edges to represent un
known information (i.e., with truth value 1/2). For instance, Fig. 8(a) and Fig. 8(b)
show two threevocabulary structures S?·,?,1/2???
2
the symbol 1/2 in the superscript of a structure name indicates that the structure
has only unknown information for a given vocabulary. Note that S?·,?,1/2???
2;3
shown in
and S?1/2,?,???
3
, respectively, where
2
and
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 9
A Relational Approach to Interprocedural Shape Analysis
·
9
Operation Resulting Structure
[2] b = rev(a);
S?·,??
2
=
n
a,a’
b’
nn
n’
n’ n’
[3] c = rev(b);
S??,???
3
=
n’n’ n’
n”
b’,b”
n” n”
a’,a”
c”
Fig. 6.
“[2] b = rev(a);”, and (b) the transformation performed by “[3] c = rev(b);”.
The twovocabulary structures that summarize (a) the transformation performed by
S?·,???
2;3
=
n
a,a”
nn
n”
n”n”
c”
b”
Fig. 7. The twovocabulary structure S?·,???
sequence “[2] b = rev(a); [3] c = rev(b)”. Note that for each (unprimed) nedge there is a
corresponding (doubleprimed) n??edge, and vice versa.
2;3
represents the net transformation performed by the
S?1/2,?,???
3
structures S?·,??
We introduce the meet operation (?), where“unknown”? “definite information”
yields “definite information”.3With this notation, the composition S??,???
the transformations represented by two twovocabulary structures S?·,??
can be expressed in terms of threevocabulary structures as
are threevocabulary structures that correspond to the twovocabulary
and S??,???
3
from Fig. 6, respectively.
2
3
◦ S?·,??
2
and S??,???
of
23
S??,???
3
◦ S?·,??
2
= project1,3(S?1/2,?,???
= S?·,???
2;3.
3
? S?·,?,1/2???
2
)
The threevocabulary structure S?·,?,???
in Fig. 9. Finally, by projecting away the“middle”(singleprimed) vocabulary from
S?·,?,???
2;3
2;3
obtained from S?1/2,?,???
3
?S?·,?,1/2???
2
is shown
, we obtain the twovocabulary composition result S?·,???
2;3
shown in Fig. 7.
How These Ideas are Used in Relational Shape Analysis. In §5, we introduce a
way to use 3valued structures as abstractions of sets of 2valued structures. In
§6, this is extended to using twovocabulary 3valued structures as abstractions
of transformations on 2valued structures.
compositional approach to shape analysis:
This provides what is needed for a
— the 3valued analog of the twovocabulary version of transformer application
can be used for intraprocedural propagation;
3“Definite information”means“definitely present”(true, denoted by 1) or“definitely absent”(false,
denoted by 0). Thus, 1/2 ? 1 = 1 = 1 ? 1/2 and 1/2 ? 0 = 0 = 0 ? 1/2.
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 10
10
·
Bertrand Jeannet et al.
(a)S?·,?,1/2???
2
=
n
b’
nn
n’
n’n’
n”
n”
n”
n”
n”
n”
n”
n”
n”
n”
a,a’
a”,b”,c”
(b)S?1/2,?,???
3
=
n”
b’,b”
n”n”
n’
n’n’
n
n
n
n
n
n
n
n
n
n
a,b,c
a’,a”
c”
Fig. 8. Threevocabulary structures for the twovocabulary structures from Fig. 6. Dotted edges
indicate predicate tuples that have the value 1/2 (and hence correspond to information that is
unknown). In (a), the unprimed and singleprimed vocabularies capture the transformation per
formed by [2] b = rev(a);, and the information in the doubleprimed vocabulary (predicates a??,
b??, c??, and n??) is unknown. In (b), the singleprimed and doubleprimed vocabularies capture the
transformation performed by [3] c = rev(b);, and the information in the unprimed vocabulary
(predicates a, b, c, and n) is unknown.
S?·,?,???
2;3
=
n
n’
a,a’,a”
b’,b”
n
n’
n
n’
n”
n”n”
c”
Fig. 9. The threevocabulary structure S?·,?,???
from Fig. 8: S?1/2,?,???
3
(doubleprimed) n??edge, and vice versa.
2;3
obtained from the meet (?) of the two structures
?S?·,?,1/2???
2
. Note that for each (unprimed) nedge there is a corresponding
— the 3valued analog of transformer composition can be used for interprocedural
propagation.
Twovocabulary 3valued structures are used as summary transformers for the
shape transformations performed by the possible sequences of operations in each
procedure, and an overapproximation of composition can be performed using the
meet operation on threevocabulary 3valued structures. In particular, it is possi
ble to perform an overapproximation of the composition of the transformations
represented by two twovocabulary 3valued structures, (S#)?·,??and (T#)??,???,
by (i) promoting them to threevocabulary 3valued structures (S#)?·,?,1/2???and
(T#)?1/2,?,???, (ii) taking their meet, and (iii) projecting away the middle vocabu
lary. (See §6.5.)
3.PRELIMINARIES
3.1
2Valued FirstOrder Logic
We briefly discuss definitions related to firstorder logic. We assume a vocabulary P
of predicate symbols and a set of variables, usually denoted by v, v1, .... Formulas
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 11
A Relational Approach to Interprocedural Shape Analysis
·
11
?1?S(Z) = 1
?p(v1,...,vk)?S(Z) = ι(p)(Z(v1),...,Z(vk))
?¬ϕ1?S(Z) = 1 − ?ϕ1?S(Z)
?ϕ1∨ ϕ?S(Z) = max(?ϕ1?S(Z),?ϕ1?S(Z))
?∃v1: ϕ1?S(Z) = maxu∈U?ϕ1?(Z[v1?→ u])
Table I. Meaning of firstorder formulas, given a logical structure
S = (U,ι) and an assignment Z.
are defined by the syntax:
ϕ ::= 1



logical literal
p(v1,...,vk) where p is a predicate symbol of arity k
¬ϕ  ϕ ∨ ϕlogical connectives
∃v : ϕexistential quantification
(1)
For reasons that will be made explicit in the next paragraph, we do not include
the formula v1= v2in the grammar itself. Instead, we assume that the vocabulary
P contains a special predicate symbol eq of arity 2 that will have a special inter
pretation. We will write v1= v2and v1?= v2 for eq(v1,v2) and ¬eq(v1,v2). The
literal 0, the connectives ⇒ and ∧, and the quantifier ∀v are defined in the usual
way, in terms of items in grammar (1). A conditional expression ϕ1? ϕ2: ϕ3is an
abbreviation for (ϕ1∧ ϕ2) ∨ (¬ϕ1∧ ϕ3). The notion of free variables is defined in
the usual way.
The set {0,1} of (2valued) truth values is denoted by B. A 2valued logical
structure S = (U,ι) is a pair, where the universe U is a set of individuals and
the valuation ι : P →
?
to a predicate (or truthvalued function). The set of 2valued structures over a
vocabulary P is denoted by 2−STRUCT[P]. We assume that for any (U,ι) ∈
2−STRUCT[P], ι(eq) is defined by ι(eq)(u1,u2) = (u1= u2).
An assignment Z : {v1,...,vk} → U maps free variables (implicitly with respect
to a formula) to individuals. Given a 2valued logical structure S = (U,ι) and an
assignment Z of free variables, the (2valued) meaning of a formula ϕ, denoted by
?ϕ?S(Z), is defined in Tab. I by induction on the syntax of ϕ. A logical structure
satisfies a closed formula ϕ (i.e., without free variables), denoted by S = ϕ, iff
?ϕ?S= 1. For open formulas, satisfaction with respect to assignment Z is defined
by S,Z = ϕ, iff ?ϕ?S(Z) = 1.
k≥0(Uk→ B) maps each predicate symbol of arity k
3.2
3Valued FirstOrder Logic
We now extend the definitions from §3.1 to 3valued logic, in which a third truth
value, denoted by 1/2, represents uncertainty. The set B ∪ {1/2} of 3valued truth
values is denoted by T, and is partially ordered by the order l ? 1/2 for l ∈ B.
A 3valued logical structure S = (U,ι) is almost identical to a 2valued structure,
except for the fact that ι : P →?
k to a 3valued truthvalued function. The syntax of formulas defined in Eqn. (1)
is extended with the logical literal 1/2, which is given the meaning ?1/2?S= 1/2.
The meaning of other syntactic constructs is still defined by Tab. I. Note that the
operations“−”and “max” can accept the value 1/2 as an operand.
A 3valued logical structure potentially satisfies a closed (3valued) formula ϕ,
k≥0(Uk→ T) maps each predicate symbol of arity
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 12
12
·
Bertrand Jeannet et al.
denoted by S = ϕ, iff ?ϕ?S∈ {1/2,1}. For open formulas, we have S,Z = ϕ, iff
?ϕ?S(Z) ∈ {1/2,1}.
We refer to [Sagiv et al. 2002] for the extension of firstorder 2 and 3valued
logic with transitive closure, which we have omitted here for the sake of simplicity.
The transitive closure of a formula with two free variables ϕ(v1,v2) is denoted by
ϕ∗(v1,v2).
Embedding of 3Valued Logical Structures. To abstract memory configurations
represented by logical structures, we use the following notion of embedding:
Definition 3.1. Given S = (U,ι) and S?= (U?,ι?), two 3valued structures
over the same vocabulary P, and f : U → U?, a surjective function, f embeds S in
S?, denoted by S ?fS?, if for all p ∈ P and u1,...,uk∈ U,
ι(p)(u1,...,uk) ? ι?(p)(f(u1),...,f(uk))
If, in addition,
ι?(p)(u?
1,...,u?
k) =
?
u1∈f−1(u?
1),...,uk∈f−1(u?
k)
ι(p)(u1,...,uk)
then S?is the tight embedding of S with respect to f, denoted by S?= f(S).
Intuitively, f(S) is obtained by merging individuals of S and by defining accordingly
the valuation of predicates (in the most precise way). Observe that ?id, which will
be denoted simply by ?, is the natural information order between structures that
share the same universe. Note that one has S ?fS?⇔ f(S) ?idS?.
We can now explain the usefulness of the eq predicate.
2−STRUCT and S?= (U?,ι?) = f(S). We have
Let S = (U,ι) ∈
ι?(eq)(u?
1,u?
2) =
1
0
if ∀u1∈ f−1(u?
if ∀u1∈ f−1(u?
1/2 otherwise
1),∀u2∈ f−1(u?
1),∀u1∈ f−1(u?
2) : ι(eq)(u1,u2) = 1
2) : ι(eq)(u1,u2) = 0
which can be simplified to
ι?(eq)(u?
1,u?
2) =
1
0
if u?
if u?
1= u?
1?= u?
1= u?
2∧ f−1(u?
1) = 1
2
2∧ f−1(u?
1/2 if u?
1) > 1
Note that u?
it evaluates to true whenever u?
u?
any S??= (U??,ι??) ?fS, if for some u??∈ U??ι??(eq)(u??,u??) = 1, then f−1(u??) = 1,
otherwise f−1(u??) ≥ 1. Consequently, the value of the formula eq(v,v) evaluated
in a 3valued structure S??indicates whether an individual of S??represents exactly
one individual in each of the structures S that can be embedded into S??, or at least
one individual.
The following preservation theorem about the interpretation of logical formulas
allows to interpret logical formulas in embedded structures in a conservative way
with respect to the original structure.
1= u?
2in the simplified definition is not a shorthand for eq(u?
1and u?
2evaluates to true when u?
1,u?
2);
2are the same individual of U?. Similarly,
1and u?
1?= u?
2are distinct individuals of U?. Hence, for
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 13
A Relational Approach to Interprocedural Shape Analysis
·
13
smain
srev
erev
call rev
return site
call rev
return site
z>n=x
…ret y=rev(z)
if(z==NULL)
z==NULL
emain
z!=NULL
y=x
x>n=NULL
z=x>n
…call res=rev(l)
…ret res=rev(l)
…call y=rev(z)
Fig. 10.Interprocedural CFG of the listreversal program.
Theorem 3.2 (Embedding theorem [Sagiv et al. 2002]). Let S = (U,ι)
and S?= (U?,ι?) be two 3valued structures, such that there exists an embedding
function f with S ?fS?. Then, for any formula ϕ(v1,...,vk) and assignment
Z : {v1,...,vk} → U of free variables of ϕ, we have
?ϕ?S
3(Z) ? ?ϕ?S?
3(Z?),
where Z?: {v1,...,vk} → U?is the abstract assignment defined by Z?(vi) =
f(Z(vi)).
4.PROGRAMS AND MEMORY CONFIGURATIONS
We consider programs written in an imperative programming language in which
(1) it is forbidden to take the address of a local variable, a global variable, a pa
rameter, or a function;
(2) parameters are passed by value;
(3) pointer arithmetic is forbidden.
These restrictions prevent direct aliasing among variables; thus, only nodes in heap
allocated structures can be aliased. The third feature makes memory configurations
invariant under permutations of addresses. Note that both Java and Ml follow
these conventions.
4.1 Program Syntax
A program is defined by a set of procedures Pi, 0 ≤ i ≤ K. Each procedure has
local variables, formal input parameters, and output parameters. To simplify our
notation, we will assume that each procedure has only one input parameter and
one output parameter; the generalization to multiple parameters is straightforward.
We also assume that an input parameter is not modified during the execution of
the procedure. This assumption is made solely for convenience, and involves no loss
of generality because it is always possible to copy input parameters to additional
local variables.
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 14
14
·
Bertrand Jeannet et al.
Settheoretic view
Pointer field n
n ∈ Cell → Cell ∪ {NULL}
Set of cells
Cell
Pointer variable z
z ∈ Cell ∪ {NULL}
Data variable x
x ∈ D
Data field d
d ∈ Cell → D
Uz : U → B
Unary relation
n : U × U → B
Binary relation
Logical view
x : Dd : U → D
Unary functionUniverseNullary function
Table II. Two related models of a program state, where D may be B or Int.
NULL
x
y
5
239
Fig. 11.A possible store, consisting of a fournode linked list pointed to by x and y.
Thus, a procedure Pi= ?fpii,fpoi,Li,Gi? is defined by its input parameter fpii,
its output parameter fpoi, its set of local variables Li (containing fpiiand fpoi),
and Gi, its intraprocedural control flow graph (CFG).
A program is represented by a directed graph G∗= (N∗,E∗), called an interpro
cedural CFG. G∗consists of a collection of intraprocedural CFGs G1,G2,...,GK,
one of which, Gmain, represents the program’s main procedure. Each CFG Gicon
tains exactly one start node siand exactly one exit node ei. The nodes of a CFG
represent control points and its edges represent individual statements and branches
of a procedure in the usual way. A procedure call statement relates a call node
and a returnsite node. For n ∈ N∗, proc(n) denotes the (index of the) procedure
that contains n. In addition to the ordinary intraprocedural edges that connect
the nodes of the individual flowgraphs in G∗, each procedure call, represented by
callnode c and returnsite node r, has two edges: (1) a calltostart edge from c to
the start node of the called procedure; (2) an exittoreturnsite edge from the exit
node of the called procedure to r. The functions call and ret record matching call
and returnsite nodes: call(r) = c and ret(c) = r. We assume that a start node has
no incoming edges except calltostart edges.
4.2Representing Memory Configurations
Consider a program that consists of several procedures, and, for the moment, ignore
the stack of activation records in each state. At a given control point, a program
state s ∈ State is defined by the values of the local variables and the heap. We
describe two ways in which such a state s can be modeled (see Tab. II):
— The settheoretic model is perhaps more intuitive. We consider a fixed set Cell
of memory cells. The value of a pointer variable z is modeled by an element z ∈
Cell∪{NULL}, where NULL denotes the null value. If cells have a pointervalued
field n, the values of n fields are modeled by a function n : Cell → Cell∪{NULL}
that associates with each memory cell the cell pointed to by the field. If cells
have an Intvalued (or, more generally, a datavalued) field x, the values of
x fields are modeled by a function d : Cell → Int that associates with each
memory cell the value of the corresponding field.
— Sagiv et al. [2002] model states using the tools of logic (cf. §3.1). Each state is
modeled as a 2valued logical structure: the set of memory cells is replaced by
ACM Journal Name, Vol. V, No. N, Month 20YY.
Page 15
A Relational Approach to Interprocedural Shape Analysis
·
15
a universe U of individuals; the value of a program variable z is defined by a
unary predicate on U; and the value of a field n is defined by a binary predicate
on U. Integrity constraints are used to capture the fact that, for instance, a
unary predicate z that represents what program variable z points to can have
the value “true”for at most one memory cell [Sagiv et al. 2002].
We use the term “predicate of arity n” for a Boolean function Un→ B. We use
Pn to denote the set of predicates symbols of arity n, and N to denote the set
of integervalued function symbols. With such notation, the concrete statespace
considered is:4
State = (U →B)P1× (U2→B)P2× (U →Int)N
(2)
where E denotes the size of a finite set E. A concrete property in ℘(State) is thus
a set of tuples, each field of which is a function.
From now on, for the sake of simplicity, we will first perform the trivial abstraction
of the concrete state space defined by Eqn. (2) to the statespace
State = (U →B)P1× (U2→B)P2
(3)
In this case, a state S ∈ State can be represented by a 2valued logical structure
(U,ι) (§3.1), where the valuation function ι : P →?
predicate symbol of arity k with a kary relation over U. We thus have State ?
2−STRUCT[P].
In the sequel, we also assume that the universe U is infinite. Because all infinite
countable sets are isomorphic, we can omit the universe in declarations of 2valued
structures S = (U,ι) ∈ 2−STRUCT[P], so that S will denote both the 2valued
structure and its valuation function ι.
k(Uk→ B) associates each
Remark 4.1. Because we want shape properties to be invariant under permu
tations of memory cells, we implicitly quotient State by the equivalence relation
S ≈ S?if there is a permutation f : U → U such that
∀p ∈ P : S?(p)(u1,...,uk) = S(p)(f(u1),...,f(uk))
2
The predicates that are part of the underlying semantics of the language to be
analyzed are called core predicates. They will be distinguished from additional pred
icates that will be introduced later when abstracting concrete heaps. The set of
core predicates that are used is dictated by the semantics of the programming lan
guage to be analyzed. (The programming language can have a degree of abstraction
already built into it by the analysis designer, as illustrated by Remarks 4.2 and 4.3
below.) For the programs that we consider, and the part of the statespace that we
chose to analyze (Eqn. (3)), we need to introduce a core predicate for each program
variable and datastructure field, following Tab. II. The set of core predicates is
thus uniquely defined for a given program.
4Eqn. (2) is the concrete statespace that one has when the techniques of [Sagiv et al. 2002]
are combined with those of [Gopan et al. 2004]. To simplify Eqn. (2), we have omitted nullary
predicates, which would be used to model Booleanvalued variables, and nullary functions, which
would be used to model datavalued variables.
ACM Journal Name, Vol. V, No. N, Month 20YY.