Technical ReportPDF Available

A Verification of Binary Search



We demonstrate the use of the VeriFun system with a verification of the Binary Search method. We present the challenges encoun- tered when working on this problem, illustrate the operation and performance of the system and finally discuss technical improvements as well as subjects of further research, which are not specific to our approach.
AVerication of Binary Search
Christoph Walther and Stephan Schweitzer B
Fachgebiet Programmiermethodik
Technische Universit¨at Darmstadt
Abstract. We demonstrate the use of the XeriFun system with a veri-
cation of the Binary Search method. We present the challenges encoun-
tered when working on this problem, illustrate the operation and perfor-
mance of the system and nally discuss technical improvements as well
as subjects of further research, which are not specic to our approach.
We develop the XeriFun system [1],[19], a semi-automated system for the veri-
cation of statements about programs written in a functional programming lan-
guage. The motivation for this development is twofold: Since we are interested in
methods for automating reasoning tasks which usually require the creativity of
a human expert, we felt the need for having an experimental base of easy access
which we can use to evaluate new ideas of our own and also proposals known
from the literature. Such an experimental base is needed, because the value of
a method ultimately can be determined only after an integration into a running
the competing interplay with other methods being also implemented,
the bits and pieces necessary to make a method really work, and
the implicit assumptions of a method which may conict with the settings
of a real application.
The second reason for the development of XeriFun origins from our experi-
ences when teaching Formal Methods, Automated Reasoning, Semantics, Veri-
cation and subjects there like. The motivation of the students largely increase, if
they can gather practical experiences with the principles and methods teached.
Students of Computer Science expect to see the computer to solve some problem
instead of working at their own on small problems using pencil and paper only,
thus treating the whole subject as pure theoretical exercise. Of course, powerful
systems exist and may be used, e.g. NQTHM [2], ACL2 [9], PVS [11], Isabelle
[12], VSE [5], KIV [13], HOL [3], only to mention a few, which are beyond in
their abilities and the verication and reasoning problems they can handle as
compared to a small system like XeriFun .However,theperformanceofthese
systems also comes with the price of highly elaborated logics, complicated user
Technical Report VFR 02/02
interfaces, severe installation requirements, license fees (for some of them) etc.,
which complicates their use for teaching mainly principles within the restricted
time frame of a course (if not impossible at all). The situation greatly improves,
however, when having a small, highly portable system, with an elaborated user
interface and a simple base logic easy to grasp, which nevertheless allows the
students to perform verication case studies for problems, e.g. Sorting, Search-
ing, Basic Number Theory, Propositional Logic, Protocols, Matching, Unication
etc., already known from other courses. Therefore XeriFun has been developed
as a JAVA [4] application, which the students can run after a 1 MB download on
their home computer (whatever platform it may use) to work with the system
whenever they like to do so.
In this paper, we demonstrate the use of XeriFun with a verication of the
Binary Search method.1We illustrate the challenges encountered when working
on this problem, and - based on this analysis - discuss technical improvements
to our system and also subjects of further research, which (as we feel) are not
specic to our approach.
about Algorithms and Data Structures [10]. Searching an array of nso-called
keys for a certain key requires O(n) steps in the worst case, but needs only
O(log n) steps if the array is ordered. This is because by the ordering property,
the domain of search can be halved after each array lookup, thus caring for the
logarithmic complexity of the method. Fig. 1 displays a version of Binary Search,
where |a|denotes the size of an array aof natural numbers with indices ranging
from 0 to |a|1.
The correctness of this algorithm is not that obvious because of the puzzling
index calculations which provide a plentiful source of programming errors. In
order to specify the correctness of the algorithm, additional notions are needed:
If we dene “key a:⇔∃i{0, ..., |a|1}.a[i]=key”and“ordered(a):
i, j {0, ..., |a|1}.ija[i]a[j]”, then the correctness requirements
for Binary Search can be stated formally as
key:nat, a:array. BIN SEARCH(key, a)key a(1)
key:nat, a:array. ordered(a)key aBINSEARCH(key, a)(2)
where statement (1) formulates the soundness requirement and statement (2) is
the completeness requirement of the algorithm. In addition, statement
key:nat, a:array. BINSEARC H.steps(key, a)blog2|a|c (3)
1Roughly speaking, this is a “medium size” problem, more complicated than, e.g.,
the verication of quicksort or mergesort,butwithlesseort than needed for e.g.
proving the unique prime factorization theorem,theverication of heapsort or of a
rst-order matching algorithm.
procedure BINSEARCH(key:nat,a:array):bool <=
var i:= 0;
var j:= |a|-1;
while j>ianda[i+b(ji)/2c]6=key do
if a[i+b(ji)/2c]>key
then j:= i+b(ji)/2c-1
else i:= i+b(ji)/2c+1
done ;
if j>ithen return (true)fi ;
if j=i
then return (a[i]=key)
else return (false)
fi ;
Fig. 1. The Binary Search Algorithm
asserts that the algorithm causes only logarithmic costs (measured in terms of the
array size), where BINSEARCH.steps(key, a) counts the number of executions
of the while-loop body in Fig. 1.
We aim to verify statements (1), (2) and (3) using the XeriFun system: Data
structures are dened in XeriFun in a constructor-selector style discipline. The
data structures structure bool <= false,true and structure nat <=0,
succ(pred:nat) (representing natural numbers) are predenedinthesystem,
and we use linear lists over natural numbers, dened by structure list <=
empty,add(hd:nat,tl:list), to represent arrays.
Procedures and while-loops are given by (recursively dened) functional pro-
cedures. Here we use the procedure
function element(a:list, i:nat):nat <=
if a=empty
then 0
else if i=0 then hd(a) else element(tl(a),pred(i)) fi
to compute the value a[i] of a (non-empty) array aat index i, and we dene the
tail-recursive procedure
function find(key:nat, a:list, i:nat, j:nat):bool <=
if length(a)>j
then if j>i
then if element(a,plus(i,half(minus(j,i))))>key
then find(key,a,i,pred(plus(i,half(minus(j,i)))))
else if key>element(a,plus(i,half(minus(j,i))))
then find(key,a,
else true
else if j=i then key=element(a,i) else false fi
else false
to represent the body of the while-loopinFig.1.Finally,thefunctionalversion
of the Binary Search algorithm is given as
function binsearch(key:nat, a:list):bool <=
find(key,a,0,pred(length(a))) ,
where the denitions of the predened procedure >and the user dened proce-
dures plus,minus,half,length,member and ordered, which are required to
dene the procedure find or to formulate the correctness statements (1) and
(2), are given in Appendix 6.1 or 6.2 respectively.
We briey sketch the outline of the correctness and complexity proofs before we
discuss their development with our system in Section 4.
For proving the correctness of the index calculations and also for proving the
complexity statement, some properties about the >-relation must be known and
some arithmetic groundwork has to be layed. The set of the ordering lemmas
and of the arithmetic lemmas, which were used in the whole verication, are
listed in Appendices 6.1 and 6.4.
3.1 Soundness
The soundness statement
lemma binsearch is sound <= all key:nat, a:list
if(binsearch(key,a),member(key,a),true) (4)
for binsearch,2cf. (1), immediately raises the proof-obligation
all a:list, key:nat
if(find(key,a,0,pred(length(a))),member(key,a),true) (5)
2The ternary conditional if is the only connective for writing formulas in
XeriFun.Henceif(a, b, true) is written for ab,if (a, true, b) is written for ab,
if(a, b, f alse) is written for abetc.
which originates from (4) simply by the execution of the non-recursively dened
procedure binsearch. This proof-goal has a straightforward generalization [15],
viz. lemma find is sound <= all i,j,key:nat, a:list
if(find(key,a,i,j),member(key,a),true) ,(6)
which then immediately proves the soundness statement (4) by rst-order infer-
ences only. For proving (6), some of the ordering and arithmetic lemmas and a
further auxiliary lemma
lemma element entails member <= all i,n:nat, a:list
if(n=element(a,i),if(length(a)>i,member(n,a),true),true) (7)
relating element with member are required.
3.2 Completeness
As in the soundness case, the completeness statement
lemma binsearch is complete <= all a:list, key:nat
if(ordered(a),if(member(key,a),binsearch(key,a),true),true) (8)
for binsearch, cf. (2), immediately raises the proof-obligation
all a:list, key:nat
which originates from (8) simply by the execution of the non-recursively dened
procedure binsearch. However, this proof-goal lacks to have a straightforward
generalization. The diculty in proving (9) is typical for a statement involving
a tail-recursive procedure stemming from the body of a loop: The variables
which vary in a procedure (or in the loop-body respectively), viz. iand jin the
present case, are replaced in the statement with non-variable terms representing
the initialization of the loop-variables, viz. 0 and pred(length(a)) here, thus
preventing an induction proof upon these variables according to the recursion
structure of the tail-recursive procedure, viz. find.
Often,aremedytosuchaverication problem is to formulate an auxiliary
lemma based on a (useful) loop-invariant:Leta[i..j]denotethepartition of a
wrt. the array-indices ij,i.e.a[i..j] denotes the sub-array of awith bounds i
and j.Then“key akey a[i..j]” is an invariant of the while-loop of Fig.
1 leading to the auxiliary lemma
lemma in.partition entails find <= all key,i,j:nat, a:list
where in.partition is dened by
function in.partition(key:nat, a:list, i:nat, j:nat):bool <=
if a=empty
then false
else if i=0
then if key=hd(a)
then true
else if j=0
then false
else in.partition(key,tl(a),i,pred(j))
else if j=0
then false
else if pred(i)>pred(j)
then false
else in.partition(key,tl(a),pred(i),pred(j))
Using lemma (10) and a completeness statement about in.partition,viz.
lemma member entails in.partition <= all key:nat, a:list
the completeness statement (8) about binsearch canbeprovedbyrst-order
reasoning only. However, the proof of statement (10) requires a bunch of further
lemmas, viz.
key, i,j :nat, a:list. j ikey a[i..j]i=j(12)
key, i,j :nat, a:list. j ikey a[i..j]a[i]=key (13)
key, i, j:nat, a:list. key a[i..j]key =a[i]key a[i+1..j](14)
key, i,j :nat, a:list. key a[i..j]key a[i..j 1] key =a[j](15)
key, i,j, h:nat, a:list.
ihjkey a[i..j]key a[i..h]key a[h..j](16)
key, i,j, h:nat, a:list. ordered(a)key a[i..j]a[h]>keyh>i (17)
key, h:nat, a:list. ordered(a)key > a [h]|a|>hkey > hd(a)(18)
key, i, j,h:nat, a:list.
ordered(a)key a[i..j]key > a [h]|a|>hj>h (19)
and also some of the ordering and arithmetic lemmas.3
3.3 Complexity
For measuring the costs of binsearch,thecostsof find have to be determined
rst. To this eect, the number of recursive calls in find (corresponding to the
number of executions of the while-loop body in Fig. 1) is counted by the proce-
dure find.steps andthenthecostsofbinsearch are dened by the procedure
function find.steps(key:nat, a:list, i:nat, j:nat):nat <=
if length(a)>j
then if j>i
then if element(a,plus(i,half(minus(j,i))))>key
then succ(find.steps(key,a,
else if key>element(a,plus(i,half(minus(j,i))))
then succ(find.steps(key,a,
else 0
else 0
else 0
function binsearch.steps(key:nat, a:list):nat <=
Using binsearch.steps, the complexity statement now is formulated as
lemma binsearch log bounded <= all a:list, key:nat
if(binsearch.steps(key,a)>log2(length(a)),false,true) (20)
3It is interesting to see that the lemmas (12) - (15) refer to the bounds of the array,
known as a source of frequent programming errors. Lemma (16) states that each legal
index hseparates a partition into a pair of covering sub-partitions, and lemmas (17)
- (19) relate the ordering of array elements with the ordering of the array indices.
where the denition of log2 (computing the binary logarithm truncated down-
wards) is given in Appendix 6.2.4Also here, the system replaces the call of
binsearch.steps raising after some simplication steps the proof-obligation
all a:list, key:nat
true) .
This subgoal now is straightforwardly generalized to a corresponding lemma
about find.steps,viz.
lemma find.steps log bounded <= all i,j,key:nat, a:list
true) ,
then proving statement (20) by rst-order reasoning only. However, the proof of
statement (22) requires induction and two additional lemmas, viz.
lemma complexity lemma#1 <= all i,j:nat
lemma complexity lemma#2 <= all i,j:nat
and also some of the ordering and arithmetic lemmas.
4 Verifying binsearch with XeriFun
We now report on the eorts required to guide XeriFun to the verication
of Binary Search. First, we briey illustrate how to use the system before we
consider the actual case. We then analyze the system’s behaviour and nally
discuss problems which either are specic to our system or constitute a general
challenge for computer supported verication.
4We use if(x>y, f alse, true),i.e. xy, instead of the more readable xybecause
>and several lemmas about >(like transitivity, irreexivity etc.) are predened in
XeriFun, cf. Appendix 6.1, thus saving the additional denition of and formulation
and proving of lemmas about .
4.1 About XeriFun
XeriFun is a semi-automated system for the verication of statements about
programs written in a functional programming language, cf. [1],[19]. In a typical
session with the system, a user
denes a (functional) program by stipulating the data structures and the
procedures of the program using XeriFun ’s language editor,
denes statements about the data structures and procedures of the program
using XeriFun ’s language editor,
veries these statements and the termination of the procedures using
XeriFun ’s proof editor.
XeriFun consists of several fully-automated routines for theorem proving
and for the formation of hypotheses to support verication. It is designed as
an interactive system, where, however, the automated routines substitute the
human expert in striving for a proof until they fail. In such a case, the user may
step in to guide the system for a continuation of the proof.
When called to prove a statement, the system computes a proof-tree.An
interaction, which may be required when the construction of the proof-tree gets
stuck, is to instruct the system to prune some unwanted branches of the proof-
tree (if necessary), and then
to perform a case analysis,
to use an instance of a lemma or an induction hypothesis,
to unfold a procedure call,
to apply an equation,
In addition, it may be necessary to formulate (and to prove) an auxiliary
lemma (sometimes after providing a new denition) in order to continue with
the actual proof task.
For proving the termination of a recursively dened procedure, it may be
required to tell the system a useful termination function.Usingthishint,the
system computes termination hypotheses for the procedure which then must
be veried like any other given statement. In a user guided termination proof,
the termination hypotheses are based on the predened procedure >(which is
“assumed” to compute a well-founded relation). In order to ease such termination
proofs, the system holds a set of predened lemmas about >, cf. Appendix 6.1.
5In XeriFun, the nodes of a proof-tree consist of sequents, and hypotheses can be
inserted into the antecedent of a sequent, can be deleted from the antecedent or
moved to the succedent. The insertion of a hypothesis implements a case analysis,
and the deletion of a hypothesis corresponds to a generalization step, cf. [15]. The
move of a hypothesis preserves equivalence and is sometimes needed (for technical
reasons) to enable a subsequent induction.
It is therefore useful (albeit not required) not to use other “usual” orderings
on natural numbers for a verication problem, e.g. <or ,thussavingthe
formulation and proving of lemmas about these orderings. This also motivates
why our formulation of Binary Search is based on >.
Having proved the termination of a functional procedure, the system may
generate additional (terminating) procedures and (veried) lemmas about them.
These system-generated procedures and lemmas are used by XeriFun ’s auto-
mated termination analysis [16], but are also useful for proofs not related to
termination. Appendix 6.3 lists the system-generated procedures and lemmas
which were actually used in the verication of Binary Search.
For proving the correctness and complexity statements for Binary Search,
some lemmas about the arithmetic functions are obviously also needed. In partic-
ular, monotonicity and estimation statements are required to justify the sound-
ness of the index calculations and to prove the loop-invariant. Here we used
XeriFun ’s Import-feature which allows to import lemmas together with their
proofs from a le. While working with the system on several verication prob-
lems, we set up certain libraries,e.g.forArithmetic and for Linear Lists,which
we use when we start to work on a new problem. Usually we begin the work
with an import from the libraries of all lemmas stating properties about the
procedures in our verication problem. This guarantees that all proven lemmas
are available when needed, where XeriFun’s lemma-lter takes care that system
performance is not spoiled by a vast amount of irrelevant lemmas. A library is up-
dated by those lemmas (not too specictoacertainverication problem) which
were needed in the actual verication, but are not member of a library. For the
verication of Binary Search, we started with an import from our Arithmetic-
library of all lemmas about plus,minus,half and log2. Appendix 6.4 lists the
arithmetic lemmas which were actually needed when developing the proofs for
Binary Search.
4.2 Termination
XeriFun demands that the termination of each procedure which is called in a
statement is veried before a proof of the statement can be started. Therefore,
the system’s automated termination analysis [16] is activated immediately af-
ter the denition of a recursively dened procedure. The termination analysis
recognizes termination based on (nested) structural recursion in particular, as
e.g. for minus and half, which we consider as trivial termination problems. In
the Binary Search example, only log2,find and find.steps cause non-trivial
termination problems. The system proves the termination of log2, but fails to
succeed for find and for find.steps. Obviously, minus(j,i) is a termination
function for find, and providing this hint, the system generates the termina-
tion hypotheses (25) and (26), but fails to verify each of them. We therefore
demand to ignore all hypotheses except j>iin (25) and in (26), causing the
system to compute the generalized termination hypotheses (27) and (28), which
both have an automatic (induction) proof using some of the predened, system-
all a:list, key,i,j:nat
all a:list, key,i,j:nat
all i,j:nat
all i,j:nat
generated and arithmetic lemmas.6As find and find.steps share the same
recursion structure, we use XeriFun ’s Create Lem ma -command to make (27)
and (28) available for subsequent use. Now the verication of find.steps’ter-
mination only requires to provide the termination function minus(j,i),leaving
the remaining work to the system.
4.3 Soundness, Completeness and Complexity
Proving the soundness of binsearch (4) does not require much eort: The sys-
tem immediately comes up with the proof-goal (5), which motivates us to for-
mulate lemma (6). However, the proof of this lemma also gets stuck, but it is
immediately obvious from the failure that another lemma, viz. (7), (having a
6The motivation for these generalizations is not only to support the proofs, but to
identify the hypotheses irrelevant for the procedure’s termination. As the system gen-
erates induction axioms from the body of the terminating procedures, disregarding
irrelevant hypotheses results in induction axioms stronger than would be obtained
otherwise, see [15] for details.
straightforward induction proof) is needed. Given the proof of (7), (6) and in
turn (4) are easily proved.
The completeness statement is considerably harder to verify: The proof of
(8) gets stuck with the proof-obligation (9), and the task here is to develop a
lemma useful for proving this subgoal. However, dierent from the soundness
(and complexity) case, the required lemma cannot be formulated with the given
notions. Instead a new concept, viz. partition, has to be invented which allows to
demand the success of find(key, a, i,j )onlyforkeyskey satisfying key a[i..j].
Generally, the challenge here is to spot the missing concept, of course, and,
once it is found, to represent it in such a way that subsequent proofs are sup-
ported. The problem is, however, that (except in trivial cases) it seems hard
to distinguish a good representation from a bad one a priori. We were trapped
by this problem in the following way: It seemed straightforward to us to dene
a procedure function partition(a:list,i:nat,j:nat):list <= ... com-
puting the sublist of list abetween the indices iand j, and then to prove the
lemma member partition entails find <=
all key,i,j:nat, a:list
true) .
However, we failed in proving (29). New subgoals were created involving the
creation of new lemmas, which in turn raised new subgoals etc., until we gave up.
An analysis of this tedious and frustrating eort revealed that the problem is the
use of partition. Roughly speaking, this procedure (using tl) strips of the list
elements in abetween positions 0 and i1fromthebeginningofaand cuts of
(using a procedure butlast)thelistelementsbetween|a|1andj+1 working
from the end of atowards the beginning. The problem which comes with this
approach now is that by the presence of butlast, the use of induction hypotheses
and lemmas are spoiled so that the proofs do not get through. This problem
immediately disappears if we use key a[i..j], cf. function in.partition,
instead of a[i..j] as the new concept to express the lemma required to continue
with the proof of (8).
Using in.partition, lemma (10) can be formulated and a proof attempt
can be started. Now the situation develops similar to the soundness case: Proofs
get stuck, the system’s outcome is analyzed, a new lemma is spotted as the
result of the analysis, a proof of the new lemma is started which either requires
further lemmas (and so on ...) or gets through so that the proof of the statement
calling for the proved lemma can be resumed.7In course of the proof for (10),
7The way proofs are developed with the system does not dier in principle from
the way proofs are developed using pencil and paper. It is good practice to continue
with the proof obligation which is most in doubt: If it seems obvious that the spotted
the lemmas (12),...,(19) were invented and proved, were (16) is the central one.
Each of these lemmas evolved with less imagination from the analysis of the
systems outcome when it gets stuck in proving another lemma. The proofs use
some of the predened and system-generated lemmas, and the proof of (10) also
uses some of the arithmetic lemmas. Finally, the completeness statement (8) is
proved by some rst-order reasoning steps using (10) and (11).
Also the proof of the complexity statement requires more work than the
soundness proof, but considerably less eort than needed in the completeness
case. The proof of (20) gets stuck with the proof-obligation (21), and the missing
lemma, viz. (22), is easy to spot. Again, the induction proof of (22) gets stuck and
it is immediately obvious from the result and the induction hypotheses that the
lemmas (23) and (24) will do the job. Although these lemmas look considerably
nasty and one may expect a lot of tedious proof steps, to our surprise both
lemmas had a straightforward induction proof (using some of the predened and
arithmetic lemmas).8After (23) and (24) being proved, the system immediately
comes up with a proof of (22), and then the proof of (20) is routine.
4.4 Analysis
The verication of Binary Search is characteristic for the way proofs are de-
veloped with XeriFun (which, as we suspect, does not dier in principle from
the way when using other systems). User interactions are required from time to
time to recover from the weakness of the rst-order theorem prover, which, in
particular, is responsible for proving the base and step cases of an induction.
Such a theorem prover needs to terminate, because the system frequently deals
with proof-obligations which are not valid but can be proved by induction only.
Consequently, a sound theorem prover must be incomplete. E.g., the proof of
(6) gets stuck with a proof-obligation which is true but invalid, and therefore a
lemma, viz. (7), must be formulated and proved to nish the proof of (6).
In addition, a reasonable compromise must be made between theorem-proving
performance and system eciency (measured both in memory and computing
time), thus providing a further source of incompleteness. To yield an acceptable
answer time, XeriFun uses several kinds of resource restrictions.Whensearch-
ing for a proof, the system’s rst-order theorem prover considers veried lemmas
(and induction hypotheses). To test the applicability of such a lemma, the prover
calls itself recursively, however with additional restricted resources not to waste
too much time for deciding whether a lemma supports the proof under consid-
eration. Consequently, the prover sometimes overlooks a useful lemma and then
must be told to use it. E.g. when proving (6), the system must be called to use
lemma will do the job, one should continue rst with proving the lemma. On the
other hand, if the lemma’s proof seems easy but its use is in doubt, it is advantageous
to verify rst that the lemma is useful indeed. XeriFun supports both ways of proof
development at the user’s wish.
8In fact, it costs far more time T
Xing one of the formulas than the system needs to
prove the log2-boundedness of binsearch.
Sound Complete Complexity Arithmetic Termination Σ
Proof-Obligations 311 413 11 42
Insert Lemma 210 3 1 4 20
Insert Function 1 1
Proof-Tree Edits 1 3 3 3 4 14
Case Analysis
Use Lemma
Unfold Procedure
Apply Equation − − − −
Induction − − − −
Hypotheses − − 2 4 6
none 2 8 1 11 931
Tab l e 1 . User Interactions for Binary Search
(7) albeit available, because it fails to prove “j>ij>i+b(ji)/2c”when
operating under additional restrictions, but easily proves this subgoal when in-
structed to use (7), because then theorem proving proceeds with the resources
initially given to the system.
Table 1 illustrates the eorts when verifying Binary Search with XeriFun .9
The row Proof-Obligations displays the overall number of statements proved in
this example, separated into the subtasks “Soundness”, “Completeness”, “Com-
plexity”, “Arithmetic” and “Termination”. E.g., 11 lemmas had to be proved
for verifying the completeness statement. The 13 lemmas about the arithmetic
functions (collected in Appendix 6.4), which were used for Binary Search, are
listed separately in the Arithmetic-column. The whole case study uses 11 recur-
sively dened procedures (including those for arithmetic, cf. Appendix 6.2) for
which the system also had to prove termination.
Below, the Insert Lemma - row displays the number of lemmas we had to
create to support the proof of each of the main statements respectively, e.g. 10
for Completeness,andtheInsert Function - row shows the number of additional
denitions required for formulating these lemmas, e.g. 1 for Completeness.In
the Termination-column, the number of termination functions which we had to
submit to the system are counted in addition. The Insert Lemma -rowgivesan
account of the system’s unability to spot the “right” lemma by itself, which is
rather large for XeriFun as (except for the termination analysis) no methods
for lemma speculation (and generalization, in particular) are implemented in the
used system version 2.5.6.
The Proof-Tree Edits - row counts the number of proof-tree edits required
to guide the system to success even if the required lemmas are available, subse-
quently separated into the dierent activities. E.g., 3 user interventions were re-
9Predened and system-generated procedures and lemmas are not considered in this
table as they are given “for free”.
quired for the completeness case.10 The Hypotheses-rules had to be called 4 times
for Termination to provide the generalizations of the termination hypotheses for
find, and had been called 2 times for Arithmetic, because the commutativity of
plus was proved by 2 nested inductions. Finally, the last row gives the number
of proof-obligations which went through the system with no user guidance at all.
5 Concluding Remarks
Thenumberofrequireduserinteractions measured in terms of additional def-
initions and proof-tree edits gives a fair account of a system’s automatization
degree. Since approx. 1/3 proof-tree edit is required for each proof-obligation
in this case study, the rst-order prover shows a good performance here. This
judgement is based on a further analysis of the proofs computed by the system,
cf. [1]. For instance, after being told to use (16) in the proof of (10), the system
continued (and succeeded) without any further intervention computing a proof
of 127 steps using the 2 induction hypotheses and 20 dierent instances of 13
lemmas. We consider this degree of automatization as an important feature, be-
cause it relieves a user to reason which lemmas to consider and how to apply
them, which is not obvious for a large lemma set and a long proof. Also the
heuristic for chosing induction axioms and the implemented equality reasoning
behaves well here, as the system had not to be called to use a certain induction
or to apply a certain equation.11
Nevertheless, we intend to improve some of the values in Table 1: The need
for the insertion of a lemma can be decreased by extending the system with some
state-of-the-art technology for lemma speculation,e.g.[6],[7],[8],[14],[17],[18], and
- based on experiences gained from further case studies - the need of proof-tree
edits can be decreased by a ner tuning of the theorem prover wrt. the perfor-
mance vs. eciency tradeo. Also the steady increase of hardware performance
may improve system performance: XeriFun’s resource restrictions are controlled
by certain resource parameters. This eases to benet from an increase of hard-
ware performance, as only the resource parameters need to be adjusted when
moving to a more powerful platform.12
However, a problem which seems far more serious is the problem of concept
formation, i.e. the invention of new concepts which are required to formulate
10 The Use Lemma row also counts the use of induction hypotheses. When distinguish-
ing proof-tree edits between Case Analysis,Use Lemma and Unfold Procedure,one
should bear in mind that very often the desired eect can be obtained by either of
the proof-rules. For instance, a cases analysis may enable the system to recognize a
useful lemma or to unfold a procedure call, a required case analysis may implicitly
evolve from the use of a lemma, etc.
11 XeriFun’s equality reasoning is based on conditional term rewriting, where the ori-
entation of the equations are computed by the system.
12 Since the right tuning of the resource parameters require a deep insight into the
system’s architecture and operation, these parameters are not at a users disposal,
but are xed for each system version.
a necessary lemma. E.g. in the Binary Search verication, some new concept,
viz. in.partition, is required to express the success of find,cf.(10).The
invention of new, usefully represented, concepts require much more creativity
than needed to formulate a lemma with the notions given, or to tell the prover
what to do next. We believe that this constitutes the main obstacle for a wider
use of theorem-proving based verication systems, and major improvements in
this situation will only come with some computer support also for this problem.
6 Appendix
6.1 Predened Procedures and Lemmas
function >(x:nat, y:nat):bool <=
if x=0 then false else if y=0 then true else pred(x)>pred(y) fi fi
x,y:nat. x =yxyx,y:nat. x =yxˆ
x,y,z:nat. x > y y>zx>z x,y:nat. x > y x6=0
x,y,z:nat. x yyzxzx,y:nat. x > y yx
x,y:nat. x > y y>xx=yx,y:nat. x ˆ
x,y:nat. x 6=0x=yx>yˆ
1x,y:nat. x 6=0y=0x>y
x,y:nat. x > y x>yˆ
1x,y:nat. x ˆ
x,y:nat. x > y ˆ
6.2 Auxiliary Procedures
function minus(x:nat, y:nat):nat <=
if x=0
then 0
else if y=0 then x else minus(pred(x),pred(y)) fi
function half(x:nat):nat <=
if x=0
then 0
else if pred(x)=0 then 0 else succ(half(pred(pred(x)))) fi
function log2(x:nat):nat <=
if x=0
then 0
else if pred(x)=0
then 0
else succ(log2(succ(half(pred(pred(x))))))
13 We write 1 ˆ
+ for the successor function and ˆ
1 for the predecessor.
function plus(x:nat, y:nat):nat <=
if x=0 then y else succ(plus(pred(x),y)) fi
function member(n:nat, k:list):bool <=
if k=empty
then false
else if n=hd(k) then true else member(n,tl(k)) fi
function length(k:list):nat <=
if k=empty then 0 else succ(length(tl(k))) fi
function ordered(k:list):bool <=
if k=empty
then true
else if tl(k)=empty
then true
else if hd(k)>hd(tl(k)) then false else ordered(tl(k)) fi
6.3 System-Generated Procedures and Lemmas
function minus$1(x:nat, y:nat):bool <=
if x=0
then false
else if y=0 then false else true fi
function half$1(x:nat):bool <=
if x=0 then false else true fi
x,y:nat. x (xy)(xy)=xx:nat. x bx/2cbx/2c=x
x,y:nat. ¬minus$1(x,y)(xy)=xx:nat. ¬half $1(x)bx/2c=x
x,y:nat. minus$1(x,y)x>(xy)x:nat. half $1(x)x>bx/2c
x,y:nat. (xy)xx:nat. bx/2cx
6.4 Arithmetic Lemmas
x1,x2:nat. x1=x2x1x2=0 x,y,z:nat. x > y x>yz
x,y:nat. x 6=0x+y>x
1x,y,z:nat. (x+y)+z=x+(y+z)
x,y:nat. x +y=y+xx,y:nat. x yblog2(x)cblog2(y)c
x,y:nat. x ybx/2cby/2cx,y:nat. x > y y+b(xy)/2cxˆ
x:nat. bx/2cxˆ
1x,y:nat. (xy)ˆ
x:nat. bx/2cˆ
1¢/2¦x:nat. ¡xˆ
x,y:nat. x yxy=0
2. R. S. Boyer and J. S. Moore. A Computational Logic. Academic Press, New York,
3. M. J. C. Gordon and T. F. Melham. Introduction to HOL: A Theorem Proving
Environment for Higher-Order Logic. Cambridge University Press, Cambridge,
4. J. Gosling, B. Joy, and G. L. Steele. The Java Language Specication. Addison-
Wesley, Reading, Massachusetts, 1996.
5. D.Hutter,B.Langenstein,C.Sengler,J.Siekmann,W.Stephan,andA.Wolpers.
DeductionintheVerication Support Environment. In M.-C. Gaudel and J. Wood-
cock, editors, Intern. Symp. of Formal Methods Europe (FME), volume 1051 of
Lecture Notes in Articial Intelligence, New York, 1996. Springer-Verlag.
6. A. Ireland and A. Bundy. Extensions to a Generalization Critic for Inductive
Proofs. In M. McRobbie and J.Slaney, editors, Proc.ofthe13thInter.Conf.
on Automated Deduction (CADE-96), volume 1104 of Lect ure Note s in Artical
Intelligence, pages 47—61, New Brunswick, 1996. Springer-Verlag.
7. A. Ireland and A. Bundy. Productive use of failure in inductive proof. J. Automat.
Reason., 16(1—2), 1996.
8. D. Kapur and M. Subramaniam. Lemma discovery in automating induction. In
M. McRobbie and J.Slaney, editors, Proc. 13th Intern. Conf. on Automated De-
duction (CADE-96), volume 1104 of Lecture Notes in Articial Intelligence, pages
538—552, New Brunswick, 1996. Springer-Verlag.
9. M. Kaufmann and J. S. Moore. ACL2: An Industrial Strength Version of NQTHM.
In Compass’96: 11th Annual Conf. on Computer Assurance,Gaithersburg,Mary-
land, 1996. National Institute of Standards and Technology.
10. J. H. Kingston. Algorithms and Data Structures : Design, Correctness, Analysis.
Addison-Wesley, Reading, Massachusetts, 1990.
11. S. Owre, J. Rushby, and N. Shankar. PVS: A prototype verication system. In Proc.
11th Intern. Conf. on Automated Deduction (CADE-92),volume607ofLecture
Notes in Articial Intelligence, New York, 1992. Springer-Verlag.
12. L. C. Paulson. Isabelle: a generic theorem prover, volume 828 of Lecture No tes in
Computer Science. Springer-Verlag, New York, 1994.
13. W. Reif. The KIV Approach to Software Verication. In M. Broy and S. J¨ahnichen,
editors, KORSO: Methods, Languages and Tools for the Construction of Correct
Software,volume1009ofLecture Notes in Computer Science. Springer-Verlag,
14. T. Walsh. A Divergence Critic for Inductive Proof. J. Artical Intelligence Re-
search, 4:209—235, 1996.
15. C. Walther. Mathematical Induction. In D. Gabbay, C. Hogger, and J. Robin-
son, editors, Handbook of Logic in Articial Intelligence and Logic Programming,
volume 2, pages 127—227. Oxford University Press, Oxford, 1994.
16. C. Walther. On proving the termination of algorithms by machine. Articial
Intelligence, 71(1):101—157, 1994.
17. C. Walther and T. Kolbe. On Terminating Lemma Speculations. Information and
Computation, 162:96—116, 2000.
18. C. Walther and T. Kolbe. Proving theorems by reuse. Articial Intelligence,
116:17—66, 2000.
19. C. Walther and S. Schweitzer. XeriFun User Guide. Technical Report VFR 02/01,
Programmiermethodik, Technische Universit¨at Darmstadt, 2002.
... For instance, gcd (n, gcd (m, n)) ` + H,A gcd(n, m) is obtained if idempotency, commutativity as well as associativity of gcd has been verified before. Symbolic evaluation as presented here has been developed, refined and optimized by surveying theorem proving power and runtime performance for a large number of case studies, see [1], [7], [8], [9], [11], [12] for examples. It has been integrated into the X eriFun system and proved successful upon verification of functional programs. ...
Technical Report
Full-text available
We report about a first-order theorem prover which is implemented in the interactive verification tool VeriFun to prove the base and step cases of an induction proof. The use in an interactive environment requires a terminating system providing a satisfying balance between theorem proving power and runtime performance as well as the supply of results being useful for carrying on with a proof attempt (by some user interaction, say) if a proof cannot be found. The latter requirement is particularly important because non-valid formulas are frequently encountered when proving theorems by induction. Our prover is based on symbolic evaluation, i.e. a method which combines symbolic execution of programs with techniques from classical theorem proving and term rewriting. We illustrate how to integrate the use of lemmas and induction hypotheses into symbolic evaluation and discuss the incorporation of equality reasoning in particular. We call our approach "pragmatic" because no interesting formal qualities (except soundness) can be assigned to it, but it successfully performs when running VeriFun to prove statements about programs.
... .. of Fig. 3. Figure 1 gives an example of an L-program for searching in an ordered list by the binary search method as well as the lemmas stating soundness and completeness of the search procedure, cf. [13]. ...
Full-text available
We present two enhancements of the functional language L which is used in the ✓eriFun system to write programs and formulate statements about them. Context dependent procedures allow to stipulate the context under which procedures are sensibly executed, thus avoiding runtime tests in program code as well as verification of absence of exceptions by proving stuck-freeness of procedure calls. Computed types lead to more compact code, increase the readability of programs, and make the well-known benefits of type systems available to non-freely generated data types as well. Since satisfaction of context requirements as well as type checking becomes undecidable, proof obligations are synthesized to be proved by the verifier at hand, thus supporting static code analysis. Information about the type hierarchy is utilized for increasing the performance and efficiency of the verifier.
... The value for the automated calls of Use Lemma is unusually high as compared to other case studies performed with eriFun , e.g. [19], [25], which is caused by the fact that the induction hypotheses had been disabled upon symbolic evaluation of the key lemmas so that Use Lemma could succeed following Simplification. Whereas the Induction rule performed perfectly here, the values for Unfold Procedure and for Case Analysis are unusually high, which reflects the need for frequent interactive calls for symbolic execution of machine programs when proving (1), and also reflects the separation into subcases needed for the proofs of the key lemmas. ...
Conference Paper
Full-text available
We consider the machine-supported verification of a code generator computing machine code from WHILE-programs, i.e. abstract syntax trees which may be obtained by a parser from programs of an imperative programming language. We motivate the representation of states developed for the verification, which is crucial for success, as the interpretation of tree-structured WHILE-programs differs significantly in its operation from the interpretation of the linear machine code. This work has been developed for a course to demonstrate to the students the support gained by computer-aided verification in a central subject of computer science, boiled down to the classroom-level. We report about the insights obtained into the properties of machine code as well as the challenges and efforts encountered when verifying the correctness of the code generator. We also illustrate the performance of the VeriFun system that was used for this work.
... Quite often a value of more than 80% is obtained as proved in several cases, e.g. [4], [8], [9] 2 . eriFun has been used so far in an industrial IT-security project concerned with electronic payment in public networks [3], in particular for the investigation of a public key infrastructure [2]. ...
Conference Paper
Full-text available
VeriFun is a semi-automated system for the verification of functional programs. It has been used so far in an industrial IT-security project concerned with electronic payment in public networks as well as for teaching semantics and verification in university courses both at the undergraduate and at the graduate level. On the development it has been attempted to achieve a high degree of automatization, to provide the system with a clear and intuitive user interface, and to care for an transparent mode of operation, as all these features strongly support the work with a system in particular for non-expert users.
The aim of the workshop was to identify challenges for automated reasoning that will fire both the imaginations of new researchers and those long established in the field. The workshop encompassed two distinct kinds of challenge: Grand Challenges propose inspirational projects that could take the efforts of many researchers over a decade or more to achieve; Novel Applications describe, in detail, new and relatively unexplored areas where automated reasoning can be employed right now. Grand Challenges have recently been the focus of an initiative by the UK Computing Research Committee, whose published criteria are given after this preface. Examples of Grand Challenges from computer science include: to prove whether P = NP (open), to develop a world class chess program (completed, 1990s), or to automatically translate Russian into English (failed, 1960s). A huge range of challenges are represented in this workshop. Several suggest a change of emphasis in the tasks performed by automated reasoners. Sutcliffe, Gao and Colton suggest developing systems that discover and prove new and
Full-text available
Full-text available
The reliability of complex software systems is becoming increasingly important for the technical systems they are embedded in. In order to assure the highest levels of trustworthiness of software formal methods for the development of software are required. The VSE-tool was developed by a consortium of German universities and industry to make a tool available which supports this formal development process. VSE is based on a particular method for programming in the large. This method is embodied in an administration system to edit and maintain formal developments. A deduction component is integrated into this administration system in order to provide proof support for the formed concepts. In parallel to the development of the system itself, two large case studies were conducted in close collaboration with an industrial partner. In both cases components of systems previously developed by the industry were re-developed from scratch, starting with a formal specification derived from the original documents. This paper focuses on the deduction component and its integration. We use a part of one of the industrial case studies in order to illustrate the important aspects of the deduction component: We argue that a close integration which makes the structure of developments visible for the theorem prover is necessary for an efficient treatment of changes and an indispensable structuring of the deduction process itself. Also we commend an architecture for interactive strategic theorem proving which has turned out to be adequate for applications in the context of formal program development. The last one of the three main sections addresses the important point of detecting bugs in implementations and specifications.
Full-text available
We investigate the improvement of theorem proving by reusing previously computed proofs. We have developed and implemented the Plagiator system which proves theorems by mathematical induction with the aid of a human advisor: If a base or step formula is submitted to the system, it tries to reuse a proof of a previously verified formula. If successful, labour is saved, because the number of required user interactions is decreased. Otherwise the human advisor is called for providing a hand crafted proof for such a formula, which subsequently—after some (automated) preparation steps—is stored in the system's memory, to be in stock for future reasoning problems. Besides the potential savings of resources, the performance of the overall system is improved, because necessary lemmata might be speculated as the result of an attempt to reuse a proof. The success of the approach is based on our techniques for preparing given proofs as well as by our methods for retrieval and adaptation of reuse candidates which are promising for future proof reuses. We prove the soundness of our approach and illustrate its performance with several examples.
Proof by mathematical induction gives rise to various kinds of eureka steps, e.g., missing lemmata and generalization. Most inductive theorem provers rely upon user intervention in supplying the required eureka steps. In contrast, we present a novel theorem-proving architecture for supporting the automatic discovery of eureka steps. We build upon rippling, a search control heuristic designed for inductive reasoning. We show how the failure if rippling can be used in bridging gaps in the search for inductive proofs.
The improvement of theorem provers by reusing previously computed proofs is investigated. A method for reusing proofs is formulated as an instance of the problem reduction paradigm such that lemmata are speculated as proof obligations, being subject for subsequent reuse attempts. We motivate and develop a termination requirement, prove its soundness, and show that the reusability of proofs is not spoiled by the termination requirement imposed on the reuse procedure. Additional evidence for the general usefulness of the proposed termination order is given for lemma speculation in induction theorem proving.