Bit level types for high level reasoning.
ABSTRACT Bitwise operations are commonly used in lowlevel systems code to access multiple data fields that have been packed into a single word. Program analysis tools that reason about such programs must model the semantics of bitwise opera tions precisely in order to capture program control and data flow through these operations. We present a type system for subword data structures that explitictly tracks the flow of bit values in the program and identifies consecutive sections of bits as logical entities manipulated atomically by the pro grammer. Our type inference algorithm tags each integer value of the program with a bitvector type that identifies the data layout at the subword level. These types are used in a translation phase to remove bitwise operations from the pro gram, thereby allowing verification engines to avoid the ex pensive lowlevel reasoning required for analyzing bitvector operations. We have used a software model checker to check properties of translated versions of a Linux device driver and a memory protection system. The resulting verifica tion runs could prove many more properties than the naive model checker that did not reason about bitvectors, and could prove properties much faster than a model checker that did reason about bitvectors. We have also applied our bitvector type inference algorithm to generate program doc umentation for a virtual memory subsystem of an OS kernel. While we have applied the type system mainly for program understanding and verification, bitvector types also have ap plications to better variable ordering heuristics in boolean model checking and memory optimizations in compilers for embedded software.

Conference Paper: Typebased data structure verification.
[Show abstract] [Hide abstract]
ABSTRACT: We present a refinement typebased approach for the static verification of complex data structure invariants. Our approach is based on the observation that complex data structures are typically fashioned from two elements: recursion (e.g., lists and trees), and maps (e.g., arrays and hash tables). We introduce two novel typebased mechanisms targeted towards these elements: recursive refinements and polymorphic refinements. These mechanisms automate the challenging work of generalizing and instantiating rich universal invariants by piggybacking simple refinement predicates on top of types, and carefully dividing the labor of analysis between the type system and an SMT solver. Further, the mechanisms permit the use of the abstract interpretation framework of liquid type inference to automatically synthesize complex invariants from simple logical qualifiers, thereby almost completely automating the verification. We have implemented our approach in dsolve, which uses liquid types to verify ocaml programs. We present experiments that show that our typebased approach reduces the manual annotation required to verify complex properties like sortedness, balancedness, binarysearchordering, and acyclicity by more than an order of magnitude.Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, Dublin, Ireland, June 1521, 2009; 01/2009
Page 1
Bit Level Types for High Level Reasoning
Ranjit Jhala
UC San Diego
jhala@cs.ucsd.edu
Rupak Majumdar
UC Los Angeles
rupak@cs.ucla.edu
ABSTRACT
Bitwise operations are commonly used in lowlevel systems
code to access multiple data fields that have been packed
into a single word. Program analysis tools that reason about
such programs must model the semantics of bitwise opera
tions precisely in order to capture program control and data
flow through these operations. We present a type system for
subword data structures that explitictly tracks the flow of
bit values in the program and identifies consecutive sections
of bits as logical entities manipulated atomically by the pro
grammer. Our type inference algorithm tags each integer
value of the program with a bitvector type that identifies the
data layout at the subword level. These types are used in a
translation phase to remove bitwise operations from the pro
gram, thereby allowing verification engines to avoid the ex
pensive lowlevel reasoning required for analyzing bitvector
operations. We have used a software model checker to check
properties of translated versions of a Linux device driver
and a memory protection system. The resulting verifica
tion runs could prove many more properties than the naive
model checker that did not reason about bitvectors, and
could prove properties much faster than a model checker
that did reason about bitvectors. We have also applied our
bitvector type inference algorithm to generate program doc
umentation for a virtual memory subsystem of an OS kernel.
While we have applied the type system mainly for program
understanding and verification, bitvector types also have ap
plications to better variable ordering heuristics in boolean
model checking and memory optimizations in compilers for
embedded software.
Categories and Subject Descriptors: D.2.4 [Software
Engineering]: Software/Program Verification.
General Terms: Languages, Verification, Reliability.
Keywords: Bit vectors, type inference, model checking.
1. INTRODUCTION
Many programs manipulate data at subword levels where
multiple data fields are packed into a single word. For ex
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGSOFT’06/FSE14, November 5–11, 2006, Portland, Oregon, USA.
Copyright 2006 ACM 1595934685/06/0011 ..$5.00
ample, in the virtual memory subsystem of an operating
system, a 32bit linear address can represent 20 bits of an
index into a page table, 10 more bits as an offset into a
page, and two permission bits. Similarly, in embedded ap
plications where the memory footprint must be small, dif
ferent fields must often be packed (sometimes automatically
by a compiler) into one word. These subword records are
manipulated by the programmer using bitlevel operations,
namely, masks and shifts. To reason about such programs
manually or automatically we require techniques that ac
curately capture the flow of information through bitvector
variables and the shifting and masking operations.
There are usually three ways of handling bitvectors in
program analysis. The first is to treat bitvector operations
as uninterpreted functions, ignoring the semantics of these
operations. The second is to use specialized decision pro
cedures for bitvectors [17, 2, 14]. The third is to reduce
all variables to (32 or 64) individual bits (“bitblasting”)
and use propositional reasoning [3, 23]. The first option,
while surprisingly common [1, 8, 5, 10], especially when
the tool builder does not anticipate the use of bitvectors,
produces imprecisions in the analysis that show up as false
positives.In our experience, the second approach of us
ing specialized decision procedures makes the analysis much
slower, just because specialized decision procedures are com
paratively slower than for example, welltuned procedures
for the theory of equality and arithmetic available in de
cision procedures [6, 20].The third option is attractive,
due to the efficiency of SAT solvers and BDD engines, and
the accuracy with which the machine semantics is reflected
[3, 23]. While this option has been successfully applied in
“bounded” analyses for finding bugs, the bitblasting loses
the highlevel relationships between the variables that are
critical for statically computing invariants, thereby making
the option unsuitable for safety verification.
In this paper, we provide a fourth, typebased, option. We
define a type system at the bit level that identifies consecu
tive sections of bits as logical subword entities manipulated
by the programmer. The type system explicitly tracks bit
wise operations in the program, and propagates flow of bit
values across the program. The type system captures the
intuition that the programmer is using the masks and shifts
to identify subsections of the bits.
Consider a virtual memory system where virtual addresses
are 32bit values, organized as a 20 bit page table index, a 10
bit offset into a page (for the address), and two permission
bits. The programmer finds the index and the offset using
bit masks:
Page 2
01
02
index =
offset = (x & 0xFFC) >> 2;
(x & 0xFFFFF000) >> 12;
and in addition, can check the permission bits:
03
04
can_read
can_write = (x & 0x2);
= (x & 0x1);
Our type system generates constraints on subfields of the
32bit quantities defined in the program. For example, the
assignment on line 01 constrains the representation of index
to be a subtype of the rhs, and the bitmask on the rhs con
strains the representation of x to have a 20bit part and
a 12bit part (which may be further refined by other con
straints). By generating and solving all such constraints, we
infer the bitlevel types for virtual addresses:
?index,20??offset,10??wr,1??rd,1?
representing a structure with an index field (of 20 bits), an
offset field (of 10 bits), and two permission bits.
In a second step, the types identified by our algorithm are
used to “compile away” the bitvector operations, by trans
lating bitvectors into structures, and translating masks and
shifts into field accesses. The resulting program contains
only assignments between integer variables and boolean
tests, for which existing decision procedures [6, 20] are ex
tremely fast. For bitwise operations that do not conform
to this access pattern (e.g., in signal processing operations,
security algorithms, or in condition codes where each bit in
dependently maintains some boolean information), the type
of a bitvector is just the sequence of individual bits. In this
case, we are no worse than the boolean reasoning obtained
by reducing the word into bits.
We have implemented the bitvector type inference and
program transformation algorithm, and we have applied the
transformation to check safety properties of C programs us
ing the Blast model checker [10]. Blast uses decision pro
cedures for equality and arithmetic. Our previous attempt
to use specialized decision procedures for bitvectors did not
scale to large programs. The decision procedures were un
able to infer and exploit the fact that the common idiomatic
use of bitlevel operations is to use words as packed struc
tures, and thus, were very inefficient. This work is the result
of our attempt to build an analysis that discovered what
highlevel relationships were buried beneath the bitvector
operations.
In our experiments, we took two implementations: one a
memory management and protection framework (Mondrian
[22]) and one a Linux device driver [4]. We first performed
the bitvector type inference on these programs and trans
lated them automatically to programs without bitvector op
erations based on the types. We then applied the Blast
model checker to check safety properties on the translated
programs.The Mondrian implementation was annotated
with a set of assertions by the programmer. For the driver,
we considered five safety properties identified in [11]. Both
programs involve nontrivial bitwise operations. In a pre
vious study [11], the operations from the driver were re
moved by hand. To the best of our knowledge, the Mon
drian implementation was not automatically checked before.
Blast was able to prove 12 of the 15 properties checked. In
contrast, the bitreduction approach implemented in Blast
that applied boolean reasoning to individual bits did not
finish for these examples. The remaining three properties
involved reasoning about multidimensional arrays on the
heap, which is a limitation of Blast and an important, but
orthogonal, problem.
While our primary application was to lift bits to highlevel
structures, bitvector types are relevant even if the analy
sis is carried out at the boolean level, using for example
SAT solvers or binary decision diagrams (BDDs). In these
cases, the bitvector type determines a good variableordering
heuristic. Precisely, variables with the same bitvector type
should be interleaved in the variable order, while variables
that have different bit types can be independently ordered.
This can make an exponential improvement even in simple,
and common, cases. Consider the following program where
all variables are 32bit integers:
y = (x & 0xFFFF0000)>>16;
z = (x & 0x0000FFFF);
y = y  (z << 16);
The bitvector types of the variables are
x
y
z
?a,16??b,16?
?b,16??a,16?
016?b,16?
indicating that values from the top 16 bits of x and the
bottom 16 bits of y may flow into each other, and values
from the low 16 bits of x, the low 16 bits of z, and the high
16 bits of y may flow into each other. This suggests the
variable ordering where the high 16 bits of x and the low 16
bits of y are interleaved, and the low 16 bits of x, the high
16 bits of y, and the low 16 bits of z are interleaved. With
this ordering, the size of the BDDs is linear in the number
of bits. However, with the natural ordering where all bits of
x are ordered before all bits of y and all bits of z, the BDDs
are exponential in the number of bits.
Our inferred types can also be used for memory opti
mizations in resource constrained systems [16, 21]: for a
variable that has a bitvector type with k bits, the compiler
can “pack” several allocations into one word. For variables
where certain bits are unused, the compiler need not allocate
those bits at all.
Related Work.Our typebased approach was directly in
spired by the representation inference of Lackwit [18]. Our
type system and inference algorithm is very similar to those
presented by Ramalingam et. al. for aggregate structure
inference in Cobol [19, 13]. Both the above use type infer
ence to identify structure in weakly typed languages. Our
contribution is to apply similar inference techniques at the
subword level in C programs, and apply to the task of veri
fying code that uses lowlevel operations. This setting poses
the additional challenge of developing a notion of subtyping
that captures the idiomatic uses of bitwise operations, which
is essential for not splitting the structures too finely. By pro
ducing constraints on the bitlevel representations, bitlevel
types provide a means to identify applicationlevel abstrac
tions used by the programmer. The resulting types are an
important program understanding tool that replaces compli
cated bitwise operations in the code with record structures
and field accesses. By focusing on constraints derived from
actual use, we are able to more accurately identify abstract
structures, and highlevel relationships between the sub
words packed inside the integer, that are beyond the scope
of the typedef facility provided by C at the source level
and used by programmers to convey abstractions. While
Page 3
Expressions
e
::=c  x  x[e]  e1⊕ e2
 e&c  ec  e>>n  e<<n
where c,n ∈ N, 0 ≤ n < N
e1 ≤ e2  ¬p  p1∧ p2  p1∨ p2
x:=e  x[e]:=e1  s;s
 if (p) then s else s
 while (p) s
Predicates
Statements
p
s
::=
::=
Figure 1: Program Syntax
the translation does not preserve the runtime memory lay
out, we have found bitlevel types to be an attractive tool
to examine, understand and check the usage of bit fields in
unknown programs and the flow of values between different
bit sections in different variables.
Our work is also similar to [9], where a dataflow analysis
is provided to identify bit sections in programs. This infor
mation is used by the compiler to allocate only the required
number of bits, thereby optimizing memory allocation. Our
type system enables us to formalize many of the proper
ties that were informally stated in [9], to extend uniformly
to complex source level features such as pointers and func
tions and subtyping, to provide principal typing (or “best
possible” representation) guarantees, and cleanly prove the
correctness of the algorithm using standard type inference
techniques. Further, we provide an explicit bound on the
complexity of the algorithm (quadratic in the size of the
program and in the number of bits). While quadratic in the
worst case, in practice this has never been a bottleneck and
all our type inference experiments ran within a few seconds.
2.
2.1
BITLEVEL TYPES
Programs
We demonstrate our algorithm on an imperative language
of while programs with bitlevel operations. Let X be a set
of program variables. For each array variable x in X, we
include two variables x idx and x elem in X (which will be
used as the “index” and “contents” of the array, respec
tively). Figure 1 shows the syntactic elements of our pro
grams. For ease of exposition, we shall assume all variables
and constants have N bits. Expressions are either integer
constants, or Nbit bitlevel constants, or Nbit variables (x)
or array accesses (x[e], where x is an array variable and e its
index), arithmetic operations e1⊕ e2, and the bitlevel op
erations bitwiseand with a constant (e&c), bitwiseor with
a constant (ec), and right or left shift by a constant (e>>n
and e<<n respectively). Predicates are built using arith
metic comparison and the boolean operators. Statements
comprise assignments (to integer variables or array fields),
sequential composition, conditionals, and while loops. Let
Exp(X) denote the set of all expressions using the variables
X. The operational semantics for the language is defined
in the standard way, using a store mapping each variable
to a bit vector and each array to an array of bitvectors and
interpreting the operations in the usual way.
Example 1: [Program] On the left in Figure 2, we show a
program that uses bitlevel operations. We assume all values
have 32 bits. The program mget abstracts (and simplifies)
a kernel’s physical memory access routine.
bit virtual address p as an input. We assume memory is
organized into a one level page table, where each page is 1K
It gets a 32
bytes. We model the page table as an array tab. The page
table is indexed using the top 20 bits of a virtual address.
The next 10 bits of the virtual address is the offset into
the page. The last bit of p is a permission bit, that must
be set in order to access memory at the physical address.
The second last bit of p is a dirtybit that is not relevant to
the current program. If the rightmost permission bit of p is
not set, then the function mget returns an error condition
(permission failure). The variable p1 drops the last 12 bits
from the virtual address p, and the variable b1 shifts the top
20 bits to the right, this is used as an index into the page
table tab. The page table entry returns the base address
b1 of a memory page where the address resides, the variable
base zeroes out the last permission bit. The offset into the
page table is stored in variable off that zeroes out all but the
10 offset bits of p. Finally, the address in memory is obtained
by adding the offset to the base address of the memory page,
and actual memory is looked up by looking at the index a of
a global array m that represents physical memory. We ignore
error conditions and array bounds checks for clarity.
2
2.2Types
We introduce special bitlevel types that are refinements of
the usual base types which describe how the corresponding
Nbit expressions are used in the program as packed records.
Consider the example of Figure 2. We want the bitlevel
type to specify that the virtual address p is used as a packed
record structured as a segment with 20 bits, followed by a
segment with 10 bits, followed by two onebit segments.
Let A be an infinite set of names. A block is a pair ?a,??,
where a ∈ A and ? is an integer in {1...N}. We call a
(resp. ?) the name (resp. size) of the block. We write B for
the set of all blocks. For a block b, we write bkfor a the
sequence of k repetitions of b. A bitlevel type is a sequence
of blocks¯b ≡ ?a1,?1?...?an,?n?. The size ¯b of a bitlevel
type isP
significant bit) to 0 (the rightmost or least significant bit).
We write Typ(?) for the set of all bitlevel types of size ?.
i?i. A bitlevel type¯b encodes a sequence of bits
of length ¯b, indexed from ¯b − 1 (the leftmost or most
Example 2: [Bitlevel Types] The top row in Figure 3(a)
shows a bitlevel type:
¯b ≡ ?e,20?.?a,10?.?k,1?.?u,1?
of size 32 that encodes a virtual address, with a 20bit page
table index, a 10bit offset, and two permission bits. Above
the blocks are integers indicating the ranges of bits occupied
by each block. The page table index occupies section from
the 31st bit to the 12th (inclusive), the offset the section
from the 11th to the 2nd, and the two permission bits are
the 1st and 0th bits.
2
Zero Blocks. We shall assume the set of names A contains
a special name 0, which corresponds to blocks whose bits
have value 0. Hence, the zero block ?0,1? intuitively corre
sponds to a block of 1 bit whose value is 0. We abbreviate
the block ?0,1? to 0; thus 0krepresents k copies of the block
?0,1?, i.e., k consecutive 0 bits.
Projections. For a bitlevel type¯b ≡¯b?.?a,?? and an integer
i we define the blockfragment of¯b at position i as:
(¯b?[i − ?]
(a,i,? − i)
¯b[i] ≡
if ? < i + 1
o.w.
(1)
Page 4
Program
mget(u32 p){
if ((p & 0x1) == 0){
error("Permission Failure");
} else {
p1 = p & 0xFFFFF000;
Zero ConstraintsInequality Constraints
τp&1[31 : 1] = 0τp[0 : 0] ≤ τp&1[0 : 0]
τp&120012[11 : 0] = 0τp[31 : 12] ≤ τp&120012[31 : 12]
τp&120012 ≤ τp1
τp1[31 : 12] ≤ τp1> >12[19 : 0]
τp1> >12 ≤ τpte
τpte ≤ τtab idx
τtab elem≤ τb1
τb1[31 : 2] ≤ τb1&13002[31 : 2]
τb1&13002 ≤ τbase
τp[11 : 2] ≤ τp&11002[11 : 2]
τp&11002 ≤ τoff
τbase ≤ τbase+off
τoff ≤ τbase+off
τbase+off ≤ τa
τa ≤ τm idx
τm elem≤ τmget r
pte= p1 >> 12;
τp1> >12[31 : 20] = 0
b1= tab[pte];
base = b1 & 0xFFFFFFFC;
τb1&13002[1 : 0] = 0
off = p & 0xFFC;
τp&11002[1 : 0] = 0
τp&11002[31 : 12] = 0
a= base + off;
return m[a];
}
}
Figure 2: Example Program, Constraints
Intuitively, the blockfragment of¯b at position i is a triple
(a,?,??) such that the ith bit of¯b lies in a block ?a,? + ???,
with ??bits to the left of the ith bit in ?a,? + ???, and ? bits
to the right. Equivalently,¯b has as a suffix the sequence
?a,? + ???.¯b??where the size of¯b??is i−?. In particular, if the
blockfragment at position i is (a,?,0) then¯b has as a suffix
the sequence ?a,??.¯b??of size i. In this case, we say that¯b
has a break at position i.
For a bitlevel type¯b and a pair of integers i,j we define
the projection of¯b to the interval [i : j] as:
8
>
where ⊥ means the value is undefined and the concatenation
. is ⊥ if either argument is ⊥. When defined, the projection
of¯b to [i : j] is a bitlevel type¯b?of size i − j + 1, such that
¯b is of the form¯b+.¯b?.¯b−where the size of¯b+(resp.¯b−) is
N − i (resp. j). In other words, the projection is a bitlevel
type corresponding to the sequence of blocks beginning at
position i and ending at position j of¯b. The projection of¯b
to the interval [i : j] is defined iff¯b has breaks at i + 1 and
j.
¯b[i : j] ≡
>
:
<
?
?a,??.¯b[i − ? : j]
⊥
if i < j
if¯b[i + 1] = (a,?,0)
o.w.
Example 3:
row of Figure 3(a) shows the blockfragment of¯b (from the
top row) at position 8. Notice that¯b does not have a break
at that position: in particular, the block ?a,10? occupies the
section from the 11th to the 2nd bit, and hence there are 6
bits of ?a,10? “remaining” on the right, at position 8, and a
prefix of 4 bits on the left. Hence¯b[8] is (a,6,4). The third
row of Figure 3(a) shows the blockfragment of¯b at posi
tion 12. Notice that¯b has a break at position 12 as there
are three whole blocks to the right. Hence¯b[12] is (a,10,0),
which in turn indicates that a block ?a,10? starts at the po
sition 12 of¯b. The fourth row of Figure 3(a), shows the
projection of¯b to the interval [11 : 1], which is the sequence
of blocks ?a,10??k,1?. It is welldefined as¯b has a break at
[BlockFragments, Projections] The second
position 11 + 1, and hence¯b[11 + 1] = (a,10,0), and also
has a break at position 1. The projection is the sequence of
blocks occupying the 11ththrough the 1stbits (inclusive).
2
Subtyping.
¯b1 ≤¯b2 if either 1)¯b1 =¯b2, or 2)¯b1 ≡ 0?.?a,???.¯b?
¯b2 ≡ ?a,? + ???.¯b?
each i such that¯b2 has a break at position i, it is the case
that¯b1 also has a break at that position, and moreover, if
the block of¯b2at that position is ?a,?? then at that position,
¯b1 has a sequence of zero blocks followed by a block ?a,???
such that the length of the entire sequence is ?.
To get an intuitive understanding of our subtyping rela
tion, note that if ? ≤ ??then
0N−??a,?? ≤ 0N−???a,???
as the LHS corresponds to integers less than 2?which are
a subset of integers less than 2??
on the RHS corresponds to.
sider 0N−?−k?a,??0kto be a subtype of 0N−???a,???, when
k > 0 because in the former type, the nonzero segments
are not rightaligned. This captures the idiomatic use of
subtypes, wherein the programmer ensures rightalignment
(via a rightshift) before assignment. Our subtyping def
inition generalizes this observation to bitlevel types that
contain multiple blocks, i.e. where the bitvector contains
more than one packed integer integer. It is easy to check
that the subtyping relation is a partialorder.
For two bitlevel types¯b1,¯b2, we say that
1,
2, and¯b?
1≤¯b?
2. Intuitively,¯b1 ≤¯b2 if for
which is what the type
However, we do not con
Example 4: [Zero blocks and Subtyping] The lower half
of the top row of Figure 3(b) shows a bitlevel type¯b1 ≡
010.?p,6?.06?q,10? containing zero blocks. The upper half
shows a type¯b2 ≡ ?p,16?.?q,16?, such that¯b1 ≤¯b2.
Sound Type Assignment. A Xtype assignment is a map
Γ : X → Typ(N) that assigns assigning a bitlevel type
to each element of X.For an Xtype assignment Γ, we
write Γ ? e :¯b to denote that under the assignment Γ the
2
Page 5
Figure 3: (a) Bitlevel types and Projections (b) Substitution and Subtyping (c) Bitlevel type for mget
Γ(x) =¯b
Γ ? x :¯bVar
Γ(x idx) =¯bΓ(x elem) =¯b?
Γ ? x[e] :¯b?
Γ ? e :¯b
Arr
Γ ? e :¯b
¯b ≤¯b?
Γ ? e :¯b?
Sub
Γ ? e : τ1
τ[h,l] = τ1[h,l] for [h,l] ∈ Brk(c,1)
Γ ? (e  c) : τ
BitOr
Γ ? e : τ1
τ[h,l] = 0h−lfor [h,l] ∈ Brk(c,0)τ[h,l] = τ1[h,l] for [h,l] ∈ Brk(c,1)
Γ ? (e&c) : τ
BitAnd
Γ ? e :¯b
Γ ? (e>>c) : 0c.¯b[N : N − c]Rshift
Γ ? e1 :¯b
Γ ? e :¯b
Γ ? (e<<c) :¯b[N − c : 0].0cLshift
Bound(e1⊕ e2,?)
AOp
Γ ? e2 :¯b
¯b = 0N−??a,??
Γ ? (e1⊕ e2) :¯b
Leq
Γ ? e1 :¯bΓ ? e2 :¯b
Γ ? (e1 ≤ e2)
¯b?≤¯b
¯b = 0N−??a,??
Γ ? p1
Γ ? (p1∧ p2)
Γ ? p2
BOp
Γ ? l :¯bΓ ? e :¯b?
Γ ? l := e :¯b?
Γ ? s1
Γ ? if p then s1 else s2
Asgn
Γ ? s1
Γ ? s2
Γ ? s1;s2
Γ ? p
Γ ? while (p) sWhile
Seq
Γ ? pΓ ? s2
Cond
Γ ? s
Figure 4: Typing rules
expression e has type¯b, and we write Γ ? s to denote that
Γ is sound w.r.t. the program s. The rules that derive these
judgements are shown in Figure 4.
The key rules are the assignment rule (Asgn), which re
quires that the lvalue assigned to be a supertype of the value
to which it is assigned, the standard subtyping subsumption
rule (Subs), the rules that relate the types of bitlevel ex
pressions to their subexpressions, and the rules for arith
metic operations (AOp).
A bitlevel type¯b is called single, if it is of the form
0??a,???.
operands have the same type, which must be a single type,
and the result expression must be within the correspond
ing number of bits. The requirement that the result of an
arithmetic operation does not overflow the number of bits
available in the block ?a,??? can be established by either stat
ically proving the absence of overflows or inserting runtime
checks [7]. This requirement is stipulated by the predicate
Bound(e,?) which states that e ≤ 2?, in the hypothesis of
the rule for arithmetic operations AOp.
For the rules BitAnd and BitOr, that describe the
types of bitwise expressions in terms of their subexpres
sions we use the following notation. For a constant c, and
For arithmetic operations, we require that the
b ∈ {0,1}, let Brk(c,b) be the set of maximal consecutive
bintervals in c, that is, Brk(c,b) = {[h1 : l1],...,[hk: ll]},
where for each i ∈ {1,...,k}, we have that the bits in the
interval [hi : li] are all b, the (li−1)−th and the (hi+1)−th
bits (if they are in the range [0,N − 1]) are 1 − b, and all
other bits of c are 1 − b.
Sound typings capture the intuition of when bitvectors
are correctly used in a program. It ensures the correct us
age of packed fields within a bitlevel (when the bitlevel
is used as a record), and it only allows arithmetic opera
tions when there is effectively only one field of a record thus
avoiding arithmetic overflows that go into the next fields. If
the program can be soundly typed then we can translate the
program into an equivalent program where the (packed) in
tegers are replaced with records and the bitwise operations
are replaced with the appropriate field selectors as described
in Section 4.
Example 5:
ure 3(c) demonstrates a sound bitlevel typing. These types
correspond to the use of virtual addresses that are struc
tured as a 20bit index, a 10bit offset, a dirty bit, and a
permission bit.
For the program mget from Figure 2, Fig
2