Conference PaperPDF Available

A Machine-Verified Code Generator

Authors:

Abstract and Figures

We consider the machine-supported verification of a code generator computing machine code from WHILE-programs, i.e. abstract syntax trees which may be obtained by a parser from programs of an imperative programming language. We motivate the representation of states developed for the verification, which is crucial for success, as the interpretation of tree-structured WHILE-programs differs significantly in its operation from the interpretation of the linear machine code. This work has been developed for a course to demonstrate to the students the support gained by computer-aided verification in a central subject of computer science, boiled down to the classroom-level. We report about the insights obtained into the properties of machine code as well as the challenges and efforts encountered when verifying the correctness of the code generator. We also illustrate the performance of the VeriFun system that was used for this work.
Content may be subject to copyright.
A Machine-Verified Code Generator
Christoph Walther and Stephan Schweitzer
Fachgebiet Programmiermethodik, Technische Universit¨at Darmstadt,
{chr.walther,schweitz}@informatik.tu-darmstadt.de
Abstract. We consider the machine-supported verification of a code
generator computing machine code from WHILE-programs, i.e. abstract
syntax trees which may be obtained by a parser from programs of an
imperative programming language. We motivate the representation of
states developed for the verification, which is crucial for success, as the
interpretation of tree-structured WHILE-programs differs significantly in
its operation from the interpretation of the linear machine code. This
work has been developed for a course to demonstrate to the students
the support gained by computer-aided verification in a central subject of
computer science, boiled down to the classroom-level. We report about
the insights obtained into the properties of machine code as well as the
challenges and efforts encountered when verifying the correctness of the
code generator. We also illustrate the performance of the eriFunsystem
that was used for this work.
1 Introduction
We develop the eriFun system [1],[22], a semi-automated system for the verifi-
cation of programs written in a functional programming language. One reason for
this development originates from our experiences when teaching Formal Meth-
ods, Automated Reasoning, Semantics, Verification, and similar subjects. As the
motivation of the students largely increases when they can gather practical ex-
periences with the principles and methods taught, eriFun has been developed
as a small, highly portable system with an elaborated user interface and a simple
base logic, which nevertheless allows the students to perform ambitious verifica-
tion case studies within the restricted time frame of a course. The system has
been used in practical courses at the graduate level, cf. [24], for proving e.g. the
correctness of a first-order matching algorithm, the RSA public key encryption
algorithm and the unsolvability of the Halting Problem, as well as recently in an
undergraduate course about Algorithms and Data Structures, where more than
400 students took their first steps in computer-aided verification of simple state-
ments about Arithmetic and Linear Lists and the verification of algorithms like
Insertion Sort and Mergesort.eriFun comes as a JAVA application which the
students can run on their home PC (whatever platform it may use) aftera1MB
download to work with the system whenever they like to.
This paper is concerned with the verification of a code generator for a sim-
ple imperative language. Work on verified code generators and compilers dates
M.Y. Vardi and A. Voronkov (Eds.): LPAR 2003, LNAI 2850, pp. 91–106, 2003.
c
Springer-Verlag Berlin Heidelberg 2003
92 C. Walther and S. Schweitzer
back more than 35 years [9]. With the development of elaborated logics and the
evolving technology of theorem proving over the years, systems developed that
provide a remarkable support for compiler verification as well. Various impres-
sive projects have been carried out which demonstrate well the benefits of certain
logical frameworks and their implementation by reasoning systems in this do-
main. Meanwhile a tremendous amount of literature exists, which excludes an
exhaustive account. E.g., [8] presents a case study using the Elf language, [14]
uses the HOL system to verify a compiler for an assembly language, [6] and [3]
report on compiler verification projects for a subset of CommonLisp using PVS,
and [15] verifies a compiler for Prolog with the KIV system. Much work also
centers around the Boyer-Moore prover and its successors, e.g. [4], and in one
of the largest projects the compilation of an imperative programming language
via an assembly language down to machine code is verified, cf. [10], [11], [27].
However, the high performance of these systems also comes with the price
of highly elaborated logics and complicated user interfaces, which makes their
use difficult for teaching within the restricted time frame of a course (if it is
not impossible at all). Furthermore, as almost all of the cited work is concerned
with real programming languages and the bits-and-pieces coming with them, it
is hard to work out the essential principles and problems from the presentations
to demonstrate them in the classroom. And last but not least, it is also difficult
(in particular for the students) to assess the effort needed when using a certain
tool, as most of the papers do not provide appropriate statistics but refer to
large proof scripts in an appendix or to be downloaded from the web for further
investigation.
The work presented here was prepared (in addition to the material given in
[18]) for a course about Semantics and Program Verification to illustrate the
principles of state-based semantics and the practical use of formal semantics
when developing compilers etc. However, the main focus is to demonstrate the
support gained by computer-aided verification in a central subject of computer
science education, boiled down to the classroom-level. The code generator com-
putes machine code from abstract syntax trees as used in standard textbooks
of formal semantics, e.g. [7], [13], [26]. We report about the insights obtained
into the properties of machine code as well as the challenges and efforts en-
countered when verifying the correctness of this program. We also illustrate the
performance of the eriFun system that was used for this work.
2 WHILE – Programs
The language of WHILE-programs consists of conditional statements, while-loops,
assignments, compound statements and statements for doing nothing, and is de-
fined by the data structure WHILE.PROGRAM in Fig. 1. WHILE-programs represent
abstract syntax trees which for instance are computed by a compiler from a
program conforming to the concrete syntax of a programming language to be
available for subsequent code generation, cf. e.g. [2]. The language of WHILE-
programs uses program variables and expressions built with arithmetical and
A Machine-Verified Code Generator 93
structure VARIABLE <= VAR(ADR:nat)
structure EXPR <=
EXPR0(e-op0:nullary.operator),EXPR1(e-op1:unary.operator,arg:EXPR),
EXPR2(e-op2:binary.operator,arg1:EXPR,arg2:EXPR), VAR@(index:nat)
structure WHILE.PROGRAM <=
SKIP, COMPOUND(LEFT:WHILE.PROGRAM,RIGHT:WHILE.PROGRAM),
SET(CELL:VARIABLE,TERM:EXPR), WHILE(WCOND:EXPR,BODY:WHILE.PROGRAM),
IF(ICOND:EXPR,THEN:WHILE.PROGRAM,ELSE:WHILE.PROGRAM)
Fig. 1. The languages of Expressions and WHILE-programs
logical operators, which are defined by further data structures, cf. Fig. 1. Fol-
lowing the standard approach for the definition of a (structural) operational
semantics of WHILE-programs, e.g. [7], [13], [26], we start by providing an op-
erational semantics for the expressions EXPR: Given a data structure memory
which assigns natural numbers to VARIABLEs, we define a procedure function
value(e:EXPR,m:memory):nat <= ... to compute the value of an expression
wrt. the assignments in memory mand the semantics provided for the operators.
Procedure value retrieves the number assigned to a program variable by mem-
ory m, and otherwise applies the operation corresponding to the expression’s
operator to the values of the expression’s arguments computed recursively by
value. For example, the computation of value(EXPR2(PLUS,e1,e2),m) yields
call-PLUS(value(e1,m),value(e2,m)), where call-PLUS and similar proce-
dures defining the semantics of the other operators are given elsewhere.
An operational semantics for WHILE-programs is given by the interpreter
eval, which maps a program state and a WHILE-program to a program state,
cf. Fig. 2. A program state is either the symbol timeout (denoting a non-
terminating interpretation of a WHILE-program) or is a triple consisting of a
counter loops, a memory store and a stack stack. The store holds the current
variable assignments under which expressions are evaluated by value (when ex-
ecuting a WHILE-, SET-orIF-statement), and which is updated when executing
aSET-statement. The rˆole of counter loops is discussed in Sections 6 and 7.
Also the stack is used for subsequent developments, cf. Section 3, and may be
ignored here as eval does not consider this component of a triple at all.
3 Machine Programs
Our target machine consists of a program store, a random access memory, a stack
and a program counter pc. Arithmetical and logical operations are performed by
a subdevice of the target machine, called the stack machine. The stack machine
may push some content of the memory onto the stack and performs arithmetical
and logical operations by fetching operands from and returning results to the
stack. The operation of the stack machine is controlled by so-called stack pro-
94 C. Walther and S. Schweitzer
function eval(r:state, wp:WHILE.PROGRAM):state <=
if r=timeout
then timeout
else if wp=WHILE(WCOND(wp),BODY(wp))
then if value(WCOND(wp),store(r))=0
then if loops(r)=0
then timeout
else if pred(loops(r))loops(eval(S,BODY(wp)))
then eval(eval(S,BODY(wp)),wp)
else timeout
fi fi
else r fi
else if wp=SET(CELL(wp),TERM(wp))
then triple(loops(r),
update(assoc(CELL(wp),value(TERM(wp),store(r))),
store(r)),stack(r))
else if wp=COMPOUND(LEFT(wp),RIGHT(wp))
then eval(eval(r,LEFT(wp)),RIGHT(wp))
else if wp=IF(ICOND(wp),THEN(wp),ELSE(wp))
then if value(ICOND(wp),store(r))=0
then eval(r,THEN(wp))
else eval(r,ELSE(wp))
fi
else r fi fi fi fi fi
where Sabbreviates triple(pred(loops(r)),store(r),stack(r))
Fig. 2. An interpreter eval for WHILE-programs
grams. The target machine provides an instruction EXEC which calls the stack
machine to run the stack program provided by the parameter of EXEC.
Stack programs are finite sequences of PUSH-commands, defined by some data
structure STACK.PROGRAM. An operational semantics for stack programs is given
by a procedure function run(sp:STACK.PROGRAM,s:Stack,m:memory):Stack
<= ... which interprets the commands of a stack program step by step: When
executing PUSH.VAR(v,sp), the number assigned to program variable vby mem-
ory mis pushed onto the stack, and otherwise operands are fetched from and
the result of the operation is pushed onto the stack. For instance, stack sis
replaced by push(call-PLUS(top(pop(s)),top(s)),pop(pop(s))) when ex-
ecuting PUSH.OP2(PLUS,sp), where call-PLUS and the procedures defining the
semantics of the other operators are the same like for value. Having executed a
PUSH-command, run proceeds with the execution of the remaining stack program
sp.
Besides the EXEC instruction, the target machine also provides a LOAD in-
struction to write the top-of-stack to a designated address of the memory, two
A Machine-Verified Code Generator 95
function exec(r:state, pc:nat, mp:MACHINE.PROGRAM):state <=
if r=timeout
then timeout
else if mp=VOID
then r
else if pc>size(end(mp))
then r
else if fetch(pc,mp)=LOAD(loc(fetch(pc,mp)))
then exec(triple(loops(r),update(assoc(loc(fetch(pc,mp)),
top(stack(r))),
store(r)),pop(stack(r))),succ(pc),mp)
else if fetch(pc,mp)=EXEC(prog(fetch(pc,mp)))
then exec(triple(loops(r),store(r),run(prog(fetch(pc,mp)),
stack(r),store(r))),succ(pc),mp)
else if fetch(pc,mp)=BRANCH-(displ.B-(fetch(pc,mp)))
then if top(stack(r))=0
then if succ(displ.B-(fetch(pc,mp)))>pc
then r
else if loops(r)=0
then timeout
else exec(triple(pred(loops(r)),
store(r),pop(stack(r))),
minus(pred(pc),
displ.B-(fetch(pc,mp))),
mp) fi fi
else exec(triple(loops(r),store(r),pop(stack(r))),
succ(pc),mp) fi
else if fetch(pc,mp)=JUMP+(displ.J+(fetch(pc,mp)))
then exec(r,succ(plus(pc,displ.J+(fetch(pc,mp)))),mp)
else if fetch(pc,mp)=NOOP then exec(r,succ(pc),mp) fi ...
Fig. 3. The interpreter exec for machine programs
unconditional jump instructions JUMP+ and JUMP- to move the pc forward or
backward in the program store, two conditional jump instructions BRANCH+ and
BRANCH- which are controlled by the top-of-stack, a HALT instruction which halts
the target machine, and a NOOP instruction which does nothing (except incre-
menting the pc). The instruction set is formally defined by some data structure
INSTRUCTION, and a MACHINE.PROGRAM is simply a linear list of instructions,
where VOID denotes the empty program, begin yields the first instruction and
end denotes the machine program obtained by removing the first instruction.
An operational semantics of machine programs is defined by procedure exec
of Fig. 3 (where some instructions are omitted). This interpreter uses a procedure
fetch to fetch the instruction to which the program counter pc points in the
96 C. Walther and S. Schweitzer
machine program mp. The interpreter exec returns the input state if called in
state timeout, with an empty machine program, with a HALT instruction or if
pc is not within the address space [0, ..., size(mp)1] of the machine program
mp, where size(mp) computes the number of instructions in mp.
4 Code Generation
We aim at defining a procedure code which generates a machine program from
aWHILE-program such that both programs compute the same function. To do
so, we start by generating stack programs from the expressions used in a WHILE-
program. This is achieved by some procedure postfix which simply translates
the tree-structure of expressions into the linear format of stack programs. E.g.,
the expression EXPR2(EQ,VAR@(1),VAR@(2)) is translated by postfix to the
stack program PUSH.VAR(VAR(1),PUSH.VAR(VAR(2),PUSH.OP2(EQ,null))).
Using the stack code generation for expressions, procedure code of Fig. 4
defines the machine code generation for WHILE-programs in a straightforward
recursive way, where the recursively computed code is embedded into machine
instructions in order to translate the control structure of the statement under
consideration. The correctness property for code is formally stated by
lemma code is correct <= all wp:WHILE.PROGRAM, r:state
eval(r,wp)=exec(r,0,code(wp)) .(1)
5 About eriFun
We intend to prove (1) code is correct with eriFun. In a typical session with
the system, a user defines a program by stipulating the data structures and the
procedures of the program, defines statements about these program elements
and verifies these statements and the termination of the procedures.
eriFun consists of several automated routines for theorem proving and for
the formation of hypotheses to support verification. It is designed as an inter-
active system, where, however, the automated routines substitute the human
expert in striving for a proof until they fail. In such a case, the user may step in
to guide the system for a continuation of the proof.
When called to prove a statement, the system computes a prooft ree.An
interaction, which may be required when the construction of the prooftree gets
stuck, is to instruct the system
to perform a case analysis, to unfold a procedure call,
to apply an equation, to use an induction axiom,
to use an instance of a lemma or an induction hypothesis, or
to insert, move or delete some hypothesis in the sequent of a proof-node.
For simplifying proof goals, a further set of so-called computed proof rules
is provided. For example, the Simplification rule rewrites a goalterm using the
A Machine-Verified Code Generator 97
function code(wp:WHILE.PROGRAM):MACHINE.PROGRAM <=
if wp=SKIP
then conc(NOOP,VOID)
else if wp=IF(ICOND(wp),THEN(wp),ELSE(wp))
then conc(EXEC(postfix(ICOND(wp))),
conc(BRANCH+(succ(size(code(ELSE(wp))))),
append(code(ELSE(wp)),
conc(JUMP+(size(code(THEN(wp)))),
code(THEN(wp))))))
else if wp=WHILE(WCOND(wp),BODY(wp))
then conc(JUMP+(size(code(BODY(wp)))),
append(code(BODY(wp)),
conc(EXEC(postfix(WCOND(wp))),
conc(BRANCH-(size(code(BODY(wp)))),
VOID))))
else if wp=SET(CELL(wp),TERM(wp))
then conc(EXEC(postfix(TERM(wp))),
conc(LOAD(CELL(wp)),VOID))
else append(code(LEFT(wp)),code(RIGHT(wp)))
fi fi fi fi
Fig. 4. Code generation for WHILE-programs
definitions of the data structures and the procedures, the hypotheses and the
induction hypotheses of the proof-node sequent and the lemmas already veri-
fied. The other computed proof rules perform a similar rewrite, however with
restricted performance. The computed proof rules are implemented by the Sym-
bolic Evaluator, i.e. an automated theorem prover over which the eriFun user
has no control. eriFun provides no control commands (except disabling induc-
tion hypotheses upon symbolic evaluation), thus leaving the proof rules as the
only means to control the system’s behavior. The symbolic evaluations and all
proofs computed by the system may be inspected by the user.
Having applied a user suggested proof rule, the system takes over control
again and tries to develop the prooftree further until it gets stuck once more etc.
or it eventually succeeds. In addition, it may be necessary to formulate (and to
prove) an auxiliary lemma (sometimes after providing a new definition) in order
to complete the actual proof task.
eriFun demands that the termination of each procedure that is called in
a statement be verified before a proof of the statement can be started. There-
fore the system’s automated termination analysis [16] is activated immediately
after the definition of a recursively defined procedure. If the automated termi-
nation analysis fails, the user has to tell the system useful termination functions
represented by (sequences of) so-called measure terms. Based on this hint, the
system computes termination hypotheses that are sufficient for the procedure’s
termination and then need to be verified like any other statement.
98 C. Walther and S. Schweitzer
An introduction into the use of the system is given in [20], a short survey
is presented in [22], and a detailed account on the system’s operation and its
logical foundations can be found in [21].
6 A Machine Verification of code
Termination. eriFun’s automated termination analysis verifies termination
of all procedures upon definition, except for eval and exec. The interpreter
eval terminates because upon each recursive call either the size of the program
decreases or otherwise remains unchanged (for the outer eval-call in the WHILE-
case), but the loops-counter decreases. But the system is unable to recognize this
argumentation by itself and must be provided with the pair of termination func-
tions λwp|wp|rloops(r), causing the system to prove termination by the lexi-
cographic relation (wp1,r
1)>eval (wp2,r
2)i|wp1|>|wp2|or |wp1|=|wp2|and
loops(r1)> loops(r2). Hence for the outer eval-call in the WHILE-case, the proof
obligation loops(r)>loops(eval(S,body(wp)) is obtained, which is trivially
verified as loops(r)=0and (*) pred(loops(r))loops(eval(S,body(wp))
must hold, cf. Fig. 2. Note that requirement (*) controlling the outer recursive
call in the WHILE-case is always satisfied, as we may prove (after having verified
eval’s termination)
lemma eval not increases loops <=
all r:state, wp:WHILE.PROGRAM loops(r)loops(eval(r,wp)) (2)
expressing that eval never increases the loops-component of a state. However,
when removing (*) from the definition of eval,eriFun is unable to prove
termination because the system verifies only strong termination of procedures.
Roughly speaking, a terminating procedure terminates strongly iff termination
can be proven without reasoning about the procedure’s semantics, cf. [17] for a
formal definition. Since one has to reason about the semantics of eval, viz. (2),
for proving its termination if (*) is not provided, the system would fail to prove
eval’s termination.1
The interpreter exec terminates because with each fetch&execute cycle either
the loops-component of the state decreases (in case of a back-leap BRANCH- or
JUMP-) or otherwise stays even, but pc moves towards the end of the program,
cf. Fig. 3. Here the system is unable to recognize this argumentation as well
and needs to be provided with the pair of termination functions λr loops(r),
λmp, pc |mp|−pc.
1The need for terminating procedures is necessitated by the logic implemented by our
system, while the failure to prove eval’s termination without requirement (*) is a
lack of our implementation only. This is because, as proved in [5], our approach for
automated termination proofs [16] is also sound for procedures with nested recur-
sions like eval. We learned how to transform a terminating procedure with nested
recursions into a strongly terminating procedure from [12].
A Machine-Verified Code Generator 99
Correctness of code.Before considering (1) code is correct, the correctness
of the procedure postfix has to be verified. Therefore we start with proving
lemma postfix is correct <= all e:EXPR, s:Stack, m:memory
top(run(postfix(e),s,m))=value(e,m) (3)
expressing that the value computed by value upon the evaluation of an expres-
sion eis obtained as top-of-stack after running the stack program returned by
postfix when applied to expression e. The proof of (3) requires two auxiliary
lemmas, one stating that stack programs can be executed step-by-step
lemma run extend <= all m:memory,s:Stack,sp1,sp2:STACK.PROGR
run(extend(sp1,sp2),s,m)=run(sp2,run(sp1,s,m),m) (4)
and the other one expressing that the execution of stack programs obtained by
postfix does not affect the stack initially given to run, i.e.
lemma pop run postfix <= all e:EXPR, s:Stack, m:memory
pop(run(postfix(e),s,m))=s .(5)
All three lemmas have an automated proof and are frequently (and automati-
cally) used in subsequent proofs.
The induction proof of (1) is based on eval’s recursion structure. Hence it
develops into 3 base cases, viz. wp is a SKIP-oraSET-statement or wp is a WHILE-
statement and loops(r)=0, and 3 step cases, viz. wp is a compound statement,
wp is an IF-statement, or wp is a WHILE-statement and loops(r)=0.
The base cases are proved easily. However, Unfold Procedure must be applied
to exec-calls several times before Simplification (using (3) and (5)) succeeds.
Of course, it is annoying to call interactively for Unfold Procedure quite often
instead of only letting Simplification doing the job. The reason is that the unfold
of procedure calls needs to be controlled heuristically upon symbolic evaluation,
because otherwise unusable goalterms may result. eriFun uses a heuristic which
is based on a similarity measure between the well-founded relation used for an
induction proof of a statement and the well-founded relations that have been
used to prove the termination of the procedures involved in the statement, and
only calls of procedures having a termination relation similar to the induction
relation are automatically unfolded. This heuristic proved successful in almost
all cases but may fail if a statement refers to procedures which differ significantly
in their recursion structure, as eval and exec do in the present case. In the proof
of (1), we must therefore call for Unfold Procedure not only in the base cases,
but also in the step cases to instruct the system interactively to execute parts
of the machine code in the goalterms symbolically.
Having proved the base cases, the system proceeds with the first step case
(where wp is a compound statement) and simplifies the induction conclusion to
if(r=timeout,
true,
exec(exec(r,0,code(LEFT(wp))),0,code(RIGHT(wp)))
= exec(r,0,append(code(LEFT(wp)),code(RIGHT(wp)))))
(6)
100 C. Walther and S. Schweitzer
which we straightforwardly generalize to
exec(exec(r,pc,mp1),0,mp2) = exec(r,pc,append(mp1,mp2)) (7)
in order to prove the subgoal (6) with this equation.
However, (7) does not hold. If the pc points to some instruction of append(
mp1,mp2) but is not within the address space of mp1, equation (7) rewrites
to exec(r,0,mp2)=exec(r,pc,append(mp1,mp2)), which obviously is false. We
therefore demand size(mp1)>pc, but this restriction is not enough: Assume that
mp1 contains a HALT instruction and exec(r,pc,mp1) returns the state r’ upon
the execution of HALT. Then r’ is also obtained when executing append(mp1,
mp2), and consequently equation (7) rewrites to exec(r’,0,mp2)=r’. Hence
we also demand HALT.free(mp1), where procedure HALT.free returns true iff
it is applied to a machine program free of HALT instructions. But even with
this additional restriction equation (7) is still false. This time we assume that
mp1 contains some forward-leap instruction with a displacement pointing be-
yond the last instruction of append(mp1,mp2). If this instruction is executed in
a state r’,exec returns r’ and equation (7) rewrites to exec(r’,0,mp2)=r’
again. We therefore demand closed+(mp1) as a further restriction, where pro-
cedure closed+ returns true iff it is applied to a machine program mp such that
pc+dsize(mp)-1 for each forward-leap instruction in mp with fetch(pc,mp)=
JUMP+(d) or fetch(pc,mp)=BRANCH+(d). We continue with our analysis and
now assume that mp2 contains some back-leap instruction with a displacement
pointing beyond the first instruction of mp2. For instance, mp1 may consist of
one instruction NOOP only and mp2 may consist of only one instruction JUMP-(0).
Then equation (7) rewrites to r=timeout, and a counter example is found again.
We therefore demand closed-(mp2) as another restriction, where procedure
closed- returns true iff it is applied to a machine program mp such that
pcd+1 for each back-leap instruction in mp with fetch(pc,mp)=BRANCH-(d)
or fetch(pc,mp)=JUMP-(d). But we also have to demand closed-(mp1),as
otherwise equation (7) may rewrite to exec(r’,0,mp2)=r’ again. We are done
with this final restriction, and the required lemma reads as
lemma exec stepwise <=
all pc:nat, mp1,mp2:MACHINE.PROGRAM, r:state
if(size(mp1)>pc,if(HALT.free(mp1),if(closed+(mp1),
if(closed-(mp1),if(closed-(mp2),
exec(exec(r,pc,mp1),0,mp2) = exec(r,pc,append(mp1,mp2)),
true), ... , true) .
(8)
However, in order to prove subgoal (6) by lemma (8), it must be verified that all
preconditions of (8) are satisfied for machine programs mp1 and mp2 computed by
code. We therefore formulate the lemmas code is closed-,code is closed+,
size code not zero and code is HALT.free all of which are proved easily. Rec-
ognizing the lemmas just developed and verified, the system simplifies the in-
duction conclusion of the first step case to true.
A Machine-Verified Code Generator 101
We continue in the verification of lemma (1) with the remaining step cases,
which are concerned with the code generation for the IF- and the WHILE-state-
ments (if loops(r)=0). Here, too, a sequence of symbolic machine program
executions followed by Simplification rewrites the goalterm to a subgoal, which
then has to be generalized to a further lemma in order to complete the proof.
For the WHILE-case, the lemma
lemma exec repetition <=
all mp:MACHINE.PROGRAM, pc:nat, sp:STACK.PROGRAM, r:state
if(pc>size(mp),true,if(closed-(mp),if(closed+(mp),
if(HALT.free(mp),
exec(exec(r,pc,mp),size(mp),
append(mp,conc(EXEC(sp),conc(BRANCH-(size(mp)),VOID)))),
=exec(r,pc,append(mp,conc(EXEC(sp),
conc(BRANCH-(size(mp)),VOID))))
true), ... , true).
(9)
is speculated, and the IF-case requires the lemma
lemma exec skip program <=
all mp1,mp2:MACHINE.PROGRAM, r:state, pc:nat
if(closed-(mp2),
exec(r,plus(size(mp1),pc),append(mp1,mp2))=exec(r,pc,mp2),
true)
(10)
To avoid over-generalizations, the equations in (9) and (10) must be restricted
to machine programs being closed-,closed+ and HALT.free, where these re-
quirements have been recognized by the same careful analysis undertaken when
developing lemma (8). Using these lemmas, each step case (and in turn the whole
lemma (1) code is correct) is proved easily, see [23] for details.
7 Discussion
Viewed in retrospect, the proofs required to verify (1) code is correct were
obtained without that much effort. However, theorem proving sometimes is like
crossing an unknown terrain for climbing a hill. Viewing the way after reaching
the top, it seems quite obvious how to get there directly, but being on the way
one is faced with deadends and traps turning the whole event into a nightmare.
In case of (1) code is correct, the crucial steps to success were (1) the “right”
definition of the machine language’s interpreter exec, (2) the “right” definition
of a state, and (3) the invention of the key lemmas (8) exec stepwise, (9)
exec repetition and (10) exec skip program as well as the key notions needed
to formulate them.
Our first attempt was to define a state without the loops- and the stack-
components, but to care for the termination of exec by limiting the total number
of fetch-calls performed when executing a machine program. This means to use
102 C. Walther and S. Schweitzer
a definition like function exec(r:state, pc:nat, mp: MACHINE.PROGRAM,
cycles:nat):state <= ..., were cycles is decremented in each recursive
call of exec to enforce termination (and the treatment of the stack is ignored
here for the moment). But this approach requires an additional procedure, say
get.cycles, computing the minimum number of cycles needed to execute a
machine program (if this execution does not result in timeout). This procedure
get.cycles is required to formulate the correctness statement for code,asthe
number of loop-bodies of a WHILE-program wp evaluated by eval has to be re-
lated to the number of machine cycles needed to execute code(wp) by exec.
Procedure get.cycles is easily obtained from the definition of exec, but with
this procedure also a bunch of additional lemmas, in particular the get.cycles-
versions of the key lemmas (8) – (10) need to be formulated and verified.2
Having followed this approach for some time, we gave up and started to
work with another version of exec. This definition was based on the observation
that the number of evaluated loop-bodies in a WHILE-program wp is exactly
the number of BRANCH--calls performed upon the execution of code(wp).We
therefore got rid of get.cycles (and also, what is even more important, of all the
lemmas coming with it) by (renaming cycles by loops and) decrementing loops
only in the recursive exec-calls coming with a JUMP- or a BRANCH--instruction.
However, this approach has problems, too: This time another procedure, say
get.loops, is required to rephrase the equation in (8) exec stepwise for this
version of exec, which then would read
exec(exec(r,pc,mp1,loops),
0,mp2,minus(loops,get.loops(r,pc,mp1,loops)))
= exec(r,pc,append(mp1,mp2),loops).
(11)
Here the procedure get.loops is needed to compute the number of loops re-
maining after the execution of mp1 starting at pc in state r, hence this approach
necessitates the same burden of additional lemmas as the get.cycles idea.
But fortunately, there is an easy remedy to this problem. We simply let
loops become a component of a state rather than a formal parameter of exec,
and then the need for an additional procedure get.loops disappears. As similar
problems with the formulation of (8) arise if exec is provided with a formal
parameter s:stack, also the stack becomes a component of a state, and this
motivates our definitions of the data structure state and the procedure exec.
Having settled these definitions, the next problem was to formulate the key
lemmas (8), (9) and (10). It is interesting to see that each lemma corresponds
directly to a statement of the WHILE-language, viz. the compound-, the WHILE-,
and the IF-statement. Whereas the control structure of WHILE-programs is easily
captured by the notions of the meta-language, viz. functional composition, con-
ditionals and recursion, cf. Fig. 2, the control structure of a machine program mp
is encoded in the data (viz. mp)bytheJUMP and BRANCH instructions, cf. Fig. 3.
2get.cycles corresponds to the clock-procedures used in [4], [10], [11], [27], causing
similar problems there. However, clock can be used to reason about running times,
and also for proof-technical reasons as Moore explains (personal communication).
A Machine-Verified Code Generator 103
This requires making the control structure of machine programs explicit in the
meta-language, and this is achieved by the key lemmas, whose syntactical sim-
ilarity with the recursive eval-calls (necessitated by the respective statements)
is obvious. However, this was not obvious to us when we developed the proof of
(1) code is correct, and most of the time was spent to analyze the system’s
outcome when it got stuck, to find out which properties of machine programs
are required in order to get the proof through. Upon this work, the key notions
of HALT.free,closed-, and closed+ machine programs were recognized, and
the key lemmas were speculated step by step.
While the speculation of the key lemmas constituted the main challenge for
us, the proofs of these lemmas constituted the main challenge for the system:
By the rich case-structure of procedure exec, goalterms evolved which are so
huge that the Symbolic Evaluator fails to process them within reasonable time.
A remedy to this problem is to throw in interactively a case analysis (stipulating
the kind of instructions fetch(pc,mp) may yield) which separates the goalterm
into smaller pieces so that the theorem prover can cope with them.3
But still the system showed performance problems for another reason: For
each key lemma, 8 induction hypotheses are available in the step case which
separate into 66 clauses having 10–12 literals each. This creates a large search
space for the Symbolic Evaluator and performance decreases significantly.4We
therefore instructed the system to disable the induction hypotheses upon sym-
bolic evaluation, and then the system went through the proofs (needing further
advice from time to time).5Using this setting, eriFun succeeded because after
Simplification following the case analysis, the system picked up the right induc-
tion hypotheses with the Use Lemma rule, and a subsequent Simplification then
proved the case. In addition, a bunch of “routine” lemmas had been created
(expressing for example that HALT.free distributes over append, etc.), whose
need, however, was immediately obvious from the system’s outcome and whose
proofs needed an interaction in very rare cases only.
Fig. 5 gives an account on eriFun’s automation degree measured in terms of
prooftree edits.6For the whole case study, 13 data structures and 22 procedures
were defined to formulate 56 lemmas, whose verification required 1038 prooftree
edits in total, where only 186 of them had to be suggested interactively.
3This case study also revealed a shortcoming of eriFun’s object language, viz. not
having let- and case-instructions available. We intend to remove this lack of the
language in a future version of the system, as this would significantly improve the
performance of the Symbolic Evaluator.
4The system uses a lemma-filter to throw out all clauses computed from verified lem-
mas which (heuristically) do not seem to contribute to a proof. However, induction
hypotheses are not considered by this filter, because they are quite similar to the
original statement, thus causing the lemma-filter to let them pass in almost all cases.
5As a matter of fact, we had never seen the need for disabling induction hypotheses
before.
6The values are computed for eriFun 2.6.1 under Windows XP running JAVA
1.4.1 01 ona2.2 GHz Pentium 4 PC. The 3 interactions to disable induction
hypotheses are not listed in Fig. 5.
104 C. Walther and S. Schweitzer
Fig. 5. Proof statistics obtained for the verification of the code generator
The Symbolic Evaluator computed proofs with a total number of 64750
rewrite steps within 33 minutes running time, where the longest subproof of
1259 steps had been created for lemma (9) exec repetition.
The value for the automated calls of Use Lemma is unusually high as com-
pared to other case studies performed with eriFun , e.g. [19], [25], which is
caused by the fact that the induction hypotheses had been disabled upon sym-
bolic evaluation of the key lemmas so that Use Lemma could succeed following
Simplification. Whereas the Induction rule performed perfectly here, the values
for Unfold Procedure and for Case Analysis are unusually high, which reflects the
need for frequent interactive calls for symbolic execution of machine programs
when proving (1), and also reflects the separation into subcases needed for the
proofs of the key lemmas.
In total, 82.1% of the required prooftree edits had been computed by ma-
chine, a number which (although it is not as good as the values encountered
in other cases) we consider as good enough to provide significant support for
computer-aided verification. With the key notions for machine programs and
the key lemmas for the machine language interpreter, a clear and illuminating
structure for the proof of the main statement evolved, which lacks formal clutter
and therefore provides a useful base to illustrate the rˆole of formal semantics
and the benefits of computer-aided verification in the classroom.
Acknowledgement
We are grateful to Markus Aderhold for useful comments.
A Machine-Verified Code Generator 105
References
1. http://www.informatik.tu-darmstadt.de/pm/verifun/.
2. A. V. Aho, R. Sethi, and J. D. Ullmann. Compilers: Principles, Techniques and
Tools. Addison-Wesley, New York, 1986.
3. A. Dold and V. Vialard. A mechanically verified compiling specification for a
Lisp compiler. In R. Hariharan, M. Mukund, and V. Vinay, editors, FST TCS
2001: Foundations of Software Technology and Theoretical Computer Sience, vol-
ume 2245 of Lect. Notes in Comp. Sc., pages 144–155, 2001.
4. A. D. Flatau. A Verified Implementation of an Applicative Language with Dynamic
Storage Allocation. PhD. Thesis, Univ. of Texas, 1992.
5. J. Giesl. Termination of Nested and Mutually Recursive Algorithms. Journal of
Automated Reasoning, 19:1–29, 1997.
6. W. Goerigk, A. Dold, T. Gaul, G. Goos, A. Heberle, H. von Henke, U. Hoffmann,
H. Langmaack, and W. Zimmermann. Compiler correctness and implementation
verification: The Verifix approach. In P. Fritzson, editor, Proc. of the Poster
Session of CC’96 - Intern. Conf. on Compiler Construction, pages 65 – 73, 1996.
7. C. A. Gunter. Semantics of Programming Languages — Structures and Techniques.
The MIT Press, Cambridge, 1992.
8. J. Hannan and F. Pfenning. Compiler verification in LF. In A. Scedrov, edi-
tor, Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer
Science, pages 407–418. IEEE Computer Society Press, 1992.
9. J. McCarthy and J. A. Painter. Correctness of a Compiler for Arithmetical Ex-
pressions. In J. T. Schwartz, editor, Proc. on a Symp. in Applied Math., 19, Math.
Aspects of Comp. Sc. American Math. Society, 1967.
10. J. S. Moore. A Mechanically Verified Language Implementation. Journal of Au-
tomated Reasoning, 5(4):461–492, 1989.
11. J. S. Moore. PITON-AMechanically Verified Assembly-Level Language. Kluwer
Academic Publishers, Dordrecht, 1996.
12. J. S. Moore. An exercise in graph theory. In M. Kaufmann, P. Manolios, and
J. S. Moore, editors, Computer-Aided Reasoning: ACL2 Case Studies, pages 41–
74, Boston, MA., 2000. Kluwer Academic Press.
13. H. R. Nielson and F. Nielson. Semantics with Applications. John Wiley and Sons,
New York, 1992.
14. P. Curzon. A verified compiler for a structured assembly language. In M. Archer,
J.J. Joyce, K.N. Levitt, and P.J. Windley, editors, International Workshop on
Higher Order Logic Theorem Proving and its Applications, pages 253–262, Davis,
California, 1991. IEEE Computer Society Press.
15. G. Schellhorn and W. Ahrendt. The WAM case study: Verifying compiler correct-
ness for Prolog with KIV. In W. Bibel and P. H. Schmidt, editors, Automated
Deduction: A Basis for Applications. Volume III, Applications. Kluwer Academic
Publishers, Dordrecht, 1998.
16. C. Walther. On Proving the Termination of Algorithms by Machine. Artificial
Intelligence, 71(1):101–157, 1994.
17. C. Walther. Criteria for Termination. In S. H¨olldobler, editor, Intellectics and
Computational Logic. Kluwer Academic Publishers, Dordrecht, 2000.
18. C. Walther. Semantik und Programmverifikation. Teubner-Wiley, Leipzig, 2001.
19. C. Walther and S. Schweitzer. A Machine Supported Proof of the Unique Prime
Factorization Theorem. Technical Report VFR 02/03, Programmiermethodik,
Technische Universit¨at Darmstadt, 2002.
106 C. Walther and S. Schweitzer
20. C. Walther and S. Schweitzer. The eriFun Tutorial. Technical Report VFR
02/04, Programmiermethodik, Technische Universit¨at Darmstadt, 2002.
21. C. Walther and S. Schweitzer. eriFun User Guide. Technical Report VFR 02/01,
Programmiermethodik, Technische Universit¨at Darmstadt, 2002.
22. C. Walther and S. Schweitzer. About eriFun. In F. Baader, editor, Proc. of the
19th Inter. Conf. on Automated Deduction (CADE-19), volume 2741 of Lecture
Notes in Artifical Intelligence, pages 1–5, Miami, 2003. Springer-Verlag.
23. C. Walther and S. Schweitzer. A Machine-Verified Code Generator. Technical
Report VFR 03/01, Programmiermethodik, Technische Universit¨at Darmstadt,
2003.
24. C. Walther and S. Schweitzer. Verification in the Classroom. To appear in Journal
of Automated Reasoning - Special Issue on Automated Reasoning and Theorem
Proving in Education, pages 1–21, 2003.
25. C. Walther and S. Schweitzer. A Verification of Binary Search. In D. Hutter and
W. Stephan, editors, Mechanizing Mathematical Reasoning: Techniques, Tools and
Applications, volume 2605 of LNAI, pages 1–18. Springer-Verlag, 2003.
26. G. Winskel. The Formal Semantics of Programming Languages. The MIT Press,
Cambridge, 1993.
27. W. D. Young. A Mechanically Verified Code Generator. Journal of Automated
Reasoning, 5(4):493–518, 1989.
... In this thesis we consider the verification of second-order functional programs by induction. Our hypothesis is that a degree of automation can be achieved for the verification of second-order functional programs that is comparable to the degree of automation that has been achieved for the verification of first-order functional programs [18,33,36,37,56,57,58,93,95]. Our goal is to verify total correctness of second-order programs; i. e., termination of a program and compliance with a formally specified property φ. ...
... For instance, gcd (n, gcd (m, n)) ` + H,A gcd(n, m) is obtained if idempotency, commutativity as well as associativity of gcd has been verified before. Symbolic evaluation as presented here has been developed, refined and optimized by surveying theorem proving power and runtime performance for a large number of case studies, see [1], [7], [8], [9], [11], [12] for examples. It has been integrated into the X eriFun system and proved successful upon verification of functional programs. ...
Technical Report
Full-text available
We report about a first-order theorem prover which is implemented in the interactive verification tool VeriFun to prove the base and step cases of an induction proof. The use in an interactive environment requires a terminating system providing a satisfying balance between theorem proving power and runtime performance as well as the supply of results being useful for carrying on with a proof attempt (by some user interaction, say) if a proof cannot be found. The latter requirement is particularly important because non-valid formulas are frequently encountered when proving theorems by induction. Our prover is based on symbolic evaluation, i.e. a method which combines symbolic execution of programs with techniques from classical theorem proving and term rewriting. We illustrate how to integrate the use of lemmas and induction hypotheses into symbolic evaluation and discuss the incorporation of equality reasoning in particular. We call our approach "pragmatic" because no interesting formal qualities (except soundness) can be assigned to it, but it successfully performs when running VeriFun to prove statements about programs.
... Quite often a value of more than 80% is obtained as proved in several cases, e.g. [4], [8], [9] 2 . eriFun has been used so far in an industrial IT-security project concerned with electronic payment in public networks [3], in particular for the investigation of a public key infrastructure [2]. ...
Conference Paper
Full-text available
VeriFun is a semi-automated system for the verification of functional programs. It has been used so far in an industrial IT-security project concerned with electronic payment in public networks as well as for teaching semantics and verification in university courses both at the undergraduate and at the graduate level. On the development it has been attempted to achieve a high degree of automatization, to provide the system with a clear and intuitive user interface, and to care for an transparent mode of operation, as all these features strongly support the work with a system in particular for non-expert users.
Article
The aim of the workshop was to identify challenges for automated reasoning that will fire both the imaginations of new researchers and those long established in the field. The workshop encompassed two distinct kinds of challenge: Grand Challenges propose inspirational projects that could take the efforts of many researchers over a decade or more to achieve; Novel Applications describe, in detail, new and relatively unexplored areas where automated reasoning can be employed right now. Grand Challenges have recently been the focus of an initiative by the UK Computing Research Committee, whose published criteria are given after this preface. Examples of Grand Challenges from computer science include: to prove whether P = NP (open), to develop a world class chess program (completed, 1990s), or to automatically translate Russian into English (failed, 1960s). A huge range of challenges are represented in this workshop. Several suggest a change of emphasis in the tasks performed by automated reasoners. Sutcliffe, Gao and Colton suggest developing systems that discover and prove new and
Conference Paper
We consider automated reasoning about recursive partial functions with decidable domain, i.e. functions computed by incompletely defined but terminating functional programs. Incomplete definitions provide an elegant and easy way to write and to reason about programs which may halt with a run time error by throwing an exception or printing an error message, e.g. when attempting to divide by zero. We investigate the semantics of incompletely defined programs, define an interpreter for those programs and discuss the termination of incompletely defined procedures. We then analyze which problems need to be solved if a theorem prover designed for verification of completely defined programs is modified to work for incompletely defined programs as well. We also discuss how to reason about stuck computations which arise when calling incompletely defined procedures with invalid arguments. Our method of automated reasoning about incompletely defined programs has been implemented in the verification tool VeriFun . We conclude by discussing experiences obtained in several case studies with this implementation and also compare and relate our proposal to other work.
Technical Report
Full-text available
We demonstrate the use of the VeriFun system with a verification of the Unique Prime Factorization Theorem. We illustrate the operation and performance of our system and present the challenges encountered when working on this problem.
Article
Full-text available
This chapter describes the first half of the formal, machine- supported verifi- cation of a Prolog compiler with the KIV system. Our work is based on the mathematical analysis given in (Börger and Rosenzweig, 1995), where an operational semantics (an "interpreter") for Prolog is defined as an Abstract State Machine (ASM). This int erpreter is then transformed in 12 systematic refinements to an ASM which executes machine code of the Warren Abstract Machine (WAM). The goal of our case study was to formalize ASMs and the proof tech- niques given in (Börger and Rosenzweig, 1995), and to give machine-checked correctness proofs for the correctness of the refinements. S o far we have ver- ified the first 6 refinements, and we will give a detailed accoun t on the prob- lems we found in verification. Our motivations for beginning such a large case study — based on our current experience we estimate the necessary effort to deve lop a verified com- piler to be around a person year — are the following Mathematical analysis is an indispensable prerequisite fo r formal verifi- cation to be applicable. Nevertheless mathematical analysis will always omit details and have minor errors. These errors are due to th e large complexity of correctness proofs, which is easily underest imated at first glance. The errors usually do not invalidate the analysis, b ut would still result in erroneous compilers. We want to demonstrate that t he absence of errors can be guaranteed by formal correctness proofs, ma king them a suitable counterpart to mathematical analysis. We want to show that Dynamic Logic (DL) as it is used in the KIV system can serve as a suitable starting point for the verifica tion of Ab- stract State Machine refinements. In particular, the proof t echnique of commuting diagrams of Proof Maps, used informally in (Börger and Rosenzweig, 1995), can be formalized in DL.
Technical Report
Full-text available
We demonstrate the use of the VeriFun system with a verification of the Binary Search method. We present the challenges encoun- tered when working on this problem, illustrate the operation and performance of the system and finally discuss technical improvements as well as subjects of further research, which are not specific to our approach.
Book
Computer-Aided Reasoning: ACL2 Case Studies illustrates how the computer-aided reasoning system ACL2 can be used in productive and innovative ways to design, build, and maintain hardware and software systems. Included here are technical papers written by twenty-one contributors that report on self-contained case studies, some of which are sanitized industrial projects. The papers deal with a wide variety of ideas, including floating-point arithmetic, microprocessor simulation, model checking, symbolic trajectory evaluation, compilation, proof checking, real analysis, and several others. Computer-Aided Reasoning: ACL2 Case Studies is meant for two audiences: those looking for innovative ways to design, build, and maintain hardware and software systems faster and more reliably, and those wishing to learn how to do this. The former audience includes project managers and students in survey-oriented courses. The latter audience includes students and professionals pursuing rigorous approaches to hardware and software engineering or formal methods. Computer-Aided Reasoning: ACL2 Case Studies can be used in graduate and upper-division undergraduate courses on Software Engineering, Formal Methods, Hardware Design, Theory of Computation, Artificial Intelligence, and Automated Reasoning. The book is divided into two parts. Part I begins with a discussion of the effort involved in using ACL2. It also contains a brief introduction to the ACL2 logic and its mechanization, which is intended to give the reader sufficient background to read the case studies. A more thorough, textbook introduction to ACL2 may be found in the companion book, Computer-Aided Reasoning: An Approach. The heart of the book is Part II, where the case studies are presented. The case studies contain exercises whose solutions are on the Web. In addition, the complete ACL2 scripts necessary to formalize the models and prove all the properties discussed are on the Web. For example, when we say that one of the case studies formalizes a floating-point multiplier and proves it correct, we mean that not only can you read an English description of the model and how it was proved correct, but you can obtain the entire formal content of the project and replay the proofs, if you wish, with your copy of ACL2. ACL2 may be obtained from its home page. The results reported in each case study, as ACL2 input scripts, as well as exercise solutions for both books, are available from this page.
Conference Paper
We define a function that finds a path between two given nodes of a given directed graph, if such a path exists. We prove the function terminates and we prove that it is correct. Aside from illustrating one way to formalize directed graphs in ACL2, this chapter illustrates the importance of the user’s decomposition of a problem into mathematically tractable parts and the importance of defining new concepts to formalize those parts. Our proof involves such auxiliary notions as that of a simple (loop-free) path, the process for obtaining a simple path from an arbitrary path, and an algorithm for collecting all simple paths. The algorithm we analyze is a naive one that executes in time exponential in the number of edges. This chapter requires no specialized knowledge of graph theory; indeed, the main thrusts of the chapter have nothing to do with graph theory. They are: to develop your formalization skills and to refine your expectations of what a formalization will involve. Appropriate expectations of a project are often the key to success.