Content uploaded by Christoph Walther

Author content

All content in this area was uploaded by Christoph Walther on Apr 23, 2017

Content may be subject to copyright.

AVeriﬁcation of Binary Search

Christoph Walther and Stephan Schweitzer B

Fachgebiet Programmiermethodik

Technische Universit¨at Darmstadt

Abstract. We demonstrate the use of the XeriFun system with a veri-

ﬁcation of the Binary Search method. We present the challenges encoun-

tered when working on this problem, illustrate the operation and perfor-

mance of the system and ﬁnally discuss technical improvements as well

as subjects of further research, which are not speciﬁc to our approach.

1Introduction

We develop the XeriFun system [1],[19], a semi-automated system for the veri-

ﬁcation of statements about programs written in a functional programming lan-

guage. The motivation for this development is twofold: Since we are interested in

methods for automating reasoning tasks which usually require the creativity of

a human expert, we felt the need for having an experimental base of easy access

which we can use to evaluate new ideas of our own and also proposals known

from the literature. Such an experimental base is needed, because the value of

a method ultimately can be determined only after an integration into a running

system,asthisistheonlywaytouncover

—the competing interplay with other methods being also implemented,

—the bits and pieces necessary to make a method really work, and

—the implicit assumptions of a method which may conﬂict with the settings

of a real application.

The second reason for the development of XeriFun origins from our experi-

ences when teaching Formal Methods, Automated Reasoning, Semantics, Veriﬁ-

cation and subjects there like. The motivation of the students largely increase, if

they can gather practical experiences with the principles and methods teached.

Students of Computer Science expect to see the computer to solve some problem

instead of working at their own on small problems using pencil and paper only,

thus treating the whole subject as pure theoretical exercise. Of course, powerful

systems exist and may be used, e.g. NQTHM [2], ACL2 [9], PVS [11], Isabelle

[12], VSE [5], KIV [13], HOL [3], only to mention a few, which are beyond in

their abilities and the veriﬁcation and reasoning problems they can handle as

compared to a small system like XeriFun .However,theperformanceofthese

systems also comes with the price of highly elaborated logics, complicated user

B{chr.walther,schweitz}@informatik.tu-darmstadt.de

Technical Report VFR 02/02

interfaces, severe installation requirements, license fees (for some of them) etc.,

which complicates their use for teaching mainly principles within the restricted

time frame of a course (if not impossible at all). The situation greatly improves,

however, when having a small, highly portable system, with an elaborated user

interface and a simple base logic easy to grasp, which nevertheless allows the

students to perform veriﬁcation case studies for problems, e.g. Sorting, Search-

ing, Basic Number Theory, Propositional Logic, Protocols, Matching, Uniﬁcation

etc., already known from other courses. Therefore XeriFun has been developed

as a JAVA [4] application, which the students can run after a 1 MB download on

their home computer (whatever platform it may use) to work with the system

whenever they like to do so.

In this paper, we demonstrate the use of XeriFun with a veriﬁcation of the

Binary Search method.1We illustrate the challenges encountered when working

on this problem, and - based on this analysis - discuss technical improvements

to our system and also subjects of further research, which (as we feel) are not

speciﬁc to our approach.

2BinarySearch

TheBinarySearchmethodisawell-knownintroductoryexampleincourses

about Algorithms and Data Structures [10]. Searching an array of nso-called

keys for a certain key requires O(n) steps in the worst case, but needs only

O(log n) steps if the array is ordered. This is because by the ordering property,

the domain of search can be halved after each array lookup, thus caring for the

logarithmic complexity of the method. Fig. 1 displays a version of Binary Search,

where |a|denotes the size of an array aof natural numbers with indices ranging

from 0 to |a|−1.

The correctness of this algorithm is not that obvious because of the puzzling

index calculations which provide a plentiful source of programming errors. In

order to specify the correctness of the algorithm, additional notions are needed:

If we deﬁne “key ∈a:⇔∃i∈{0, ..., |a|−1}.a[i]=key”and“ordered(a):⇔

∀i, j ∈{0, ..., |a|−1}.i≤j→a[i]≤a[j]”, then the correctness requirements

for Binary Search can be stated formally as

∀key:nat, a:array. BIN SEARCH(key, a)→key ∈a(1)

and

∀key:nat, a:array. ordered(a)∧key ∈a→BINSEARCH(key, a)(2)

where statement (1) formulates the soundness requirement and statement (2) is

the completeness requirement of the algorithm. In addition, statement

∀key:nat, a:array. BINSEARC H.steps(key, a)≤blog2|a|c (3)

1Roughly speaking, this is a “medium size” problem, more complicated than, e.g.,

the veriﬁcation of quicksort or mergesort,butwithlesseﬀort than needed for e.g.

proving the unique prime factorization theorem,theveriﬁcation of heapsort or of a

ﬁrst-order matching algorithm.

2

procedure BINSEARCH(key:nat,a:array):bool <=

var i:= 0;

var j:= |a|-1;

while j>ianda[i+b(j−i)/2c]6=key do

if a[i+b(j−i)/2c]>key

then j:= i+b(j−i)/2c-1

else i:= i+b(j−i)/2c+1

fi

done ;

if j>ithen return (true)fi ;

if j=i

then return (a[i]=key)

else return (false)

fi ;

end

Fig. 1. The Binary Search Algorithm

asserts that the algorithm causes only logarithmic costs (measured in terms of the

array size), where BINSEARCH.steps(key, a) counts the number of executions

of the while-loop body in Fig. 1.

We aim to verify statements (1), (2) and (3) using the XeriFun system: Data

structures are deﬁned in XeriFun in a constructor-selector style discipline. The

data structures structure bool <= false,true and structure nat <=0,

succ(pred:nat) (representing natural numbers) are predeﬁnedinthesystem,

and we use linear lists over natural numbers, deﬁned by structure list <=

empty,add(hd:nat,tl:list), to represent arrays.

Procedures and while-loops are given by (recursively deﬁned) functional pro-

cedures. Here we use the procedure

function element(a:list, i:nat):nat <=

if a=empty

then 0

else if i=0 then hd(a) else element(tl(a),pred(i)) fi

fi

to compute the value a[i] of a (non-empty) array aat index i, and we deﬁne the

tail-recursive procedure

function find(key:nat, a:list, i:nat, j:nat):bool <=

if length(a)>j

then if j>i

then if element(a,plus(i,half(minus(j,i))))>key

then find(key,a,i,pred(plus(i,half(minus(j,i)))))

else if key>element(a,plus(i,half(minus(j,i))))

3

then find(key,a,

succ(plus(i,half(minus(j,i)))),j)

else true

fi

fi

else if j=i then key=element(a,i) else false fi

fi

else false

fi

to represent the body of the while-loopinFig.1.Finally,thefunctionalversion

of the Binary Search algorithm is given as

function binsearch(key:nat, a:list):bool <=

find(key,a,0,pred(length(a))) ,

where the deﬁnitions of the predeﬁned procedure >and the user deﬁned proce-

dures plus,minus,half,length,member and ordered, which are required to

deﬁne the procedure find or to formulate the correctness statements (1) and

(2), are given in Appendix 6.1 or 6.2 respectively.

3AProofSketchforbinsearch

We brieﬂy sketch the outline of the correctness and complexity proofs before we

discuss their development with our system in Section 4.

For proving the correctness of the index calculations and also for proving the

complexity statement, some properties about the >-relation must be known and

some arithmetic groundwork has to be layed. The set of the ordering lemmas

and of the arithmetic lemmas, which were used in the whole veriﬁcation, are

listed in Appendices 6.1 and 6.4.

3.1 Soundness

The soundness statement

lemma binsearch is sound <= all key:nat, a:list

if(binsearch(key,a),member(key,a),true) (4)

for binsearch,2cf. (1), immediately raises the proof-obligation

all a:list, key:nat

if(find(key,a,0,pred(length(a))),member(key,a),true) (5)

2The ternary conditional if is the only connective for writing formulas in

XeriFun.Henceif(a, b, true) is written for a→b,if (a, true, b) is written for a∨b,

if(a, b, f alse) is written for a∧betc.

4

which originates from (4) simply by the execution of the non-recursively deﬁned

procedure binsearch. This proof-goal has a straightforward generalization [15],

viz. lemma find is sound <= all i,j,key:nat, a:list

if(find(key,a,i,j),member(key,a),true) ,(6)

which then immediately proves the soundness statement (4) by ﬁrst-order infer-

ences only. For proving (6), some of the ordering and arithmetic lemmas and a

further auxiliary lemma

lemma element entails member <= all i,n:nat, a:list

if(n=element(a,i),if(length(a)>i,member(n,a),true),true) (7)

relating element with member are required.

3.2 Completeness

As in the soundness case, the completeness statement

lemma binsearch is complete <= all a:list, key:nat

if(ordered(a),if(member(key,a),binsearch(key,a),true),true) (8)

for binsearch, cf. (2), immediately raises the proof-obligation

all a:list, key:nat

if(ordered(a),

if(member(key,a),find(key,a,0,pred(length(a))),true),

true)

(9)

which originates from (8) simply by the execution of the non-recursively deﬁned

procedure binsearch. However, this proof-goal lacks to have a straightforward

generalization. The diﬃculty in proving (9) is typical for a statement involving

a tail-recursive procedure stemming from the body of a loop: The variables

which vary in a procedure (or in the loop-body respectively), viz. iand jin the

present case, are replaced in the statement with non-variable terms representing

the initialization of the loop-variables, viz. 0 and pred(length(a)) here, thus

preventing an induction proof upon these variables according to the recursion

structure of the tail-recursive procedure, viz. find.

Often,aremedytosuchaveriﬁcation problem is to formulate an auxiliary

lemma based on a (useful) loop-invariant:Leta[i..j]denotethepartition of a

wrt. the array-indices i≤j,i.e.a[i..j] denotes the sub-array of awith bounds i

and j.Then“key ∈a↔key ∈a[i..j]” is an invariant of the while-loop of Fig.

1 leading to the auxiliary lemma

lemma in.partition entails find <= all key,i,j:nat, a:list

if(length(a)>j,

if(ordered(a),

if(in.partition(key,a,i,j),find(key,a,i,j),true),

true),

true)

(10)

5

where in.partition is deﬁned by

function in.partition(key:nat, a:list, i:nat, j:nat):bool <=

if a=empty

then false

else if i=0

then if key=hd(a)

then true

else if j=0

then false

else in.partition(key,tl(a),i,pred(j))

fi

fi

else if j=0

then false

else if pred(i)>pred(j)

then false

else in.partition(key,tl(a),pred(i),pred(j))

fi

fi

fi

fi

Using lemma (10) and a completeness statement about in.partition,viz.

lemma member entails in.partition <= all key:nat, a:list

if(member(key,a),

in.partition(key,a,0,pred(length(a))),

true)

(11)

the completeness statement (8) about binsearch canbeprovedbyﬁrst-order

reasoning only. However, the proof of statement (10) requires a bunch of further

lemmas, viz.

∀key, i,j :nat, a:list. j ≤i∧key ∈a[i..j]→i=j(12)

∀key, i,j :nat, a:list. j ≤i∧key ∈a[i..j]→a[i]=key (13)

∀key, i, j:nat, a:list. key ∈a[i..j]→key =a[i]∨key ∈a[i+1..j](14)

∀key, i,j :nat, a:list. key ∈a[i..j]→key ∈a[i..j −1] ∨key =a[j](15)

∀key, i,j, h:nat, a:list.

i≤h≤j∧key ∈a[i..j]→key ∈a[i..h]∨key ∈a[h..j](16)

∀key, i,j, h:nat, a:list. ordered(a)∧key ∈a[i..j]∧a[h]>key→h>i (17)

∀key, h:nat, a:list. ordered(a)∧key > a [h]∧|a|>h→key > hd(a)(18)

6

∀key, i, j,h:nat, a:list.

ordered(a)∧key ∈a[i..j]∧key > a [h]∧|a|>h→j>h (19)

and also some of the ordering and arithmetic lemmas.3

3.3 Complexity

For measuring the costs of binsearch,thecostsof find have to be determined

ﬁrst. To this eﬀect, the number of recursive calls in find (corresponding to the

number of executions of the while-loop body in Fig. 1) is counted by the proce-

dure find.steps andthenthecostsofbinsearch are deﬁned by the procedure

binsearch.steps:

function find.steps(key:nat, a:list, i:nat, j:nat):nat <=

if length(a)>j

then if j>i

then if element(a,plus(i,half(minus(j,i))))>key

then succ(find.steps(key,a,

i,pred(plus(i,half(minus(j,i))))))

else if key>element(a,plus(i,half(minus(j,i))))

then succ(find.steps(key,a,

succ(plus(i,

half(minus(j,i)))),j))

else 0

fi

fi

else 0

fi

else 0

fi

function binsearch.steps(key:nat, a:list):nat <=

find.steps(key,a,0,pred(length(a)))

Using binsearch.steps, the complexity statement now is formulated as

lemma binsearch log bounded <= all a:list, key:nat

if(binsearch.steps(key,a)>log2(length(a)),false,true) (20)

3It is interesting to see that the lemmas (12) - (15) refer to the bounds of the array,

known as a source of frequent programming errors. Lemma (16) states that each legal

index hseparates a partition into a pair of covering sub-partitions, and lemmas (17)

- (19) relate the ordering of array elements with the ordering of the array indices.

7

where the deﬁnition of log2 (computing the binary logarithm truncated down-

wards) is given in Appendix 6.2.4Also here, the system replaces the call of

binsearch.steps raising after some simpliﬁcation steps the proof-obligation

all a:list, key:nat

if(find.steps(key,a,0,length(tl(a)))>log2(succ(length(tl(a))))

false,

true) .

(21)

This subgoal now is straightforwardly generalized to a corresponding lemma

about find.steps,viz.

lemma find.steps log bounded <= all i,j,key:nat, a:list

if(find.steps(key,a,i,j)>log2(succ(minus(j,i))),

false,

true) ,

(22)

then proving statement (20) by ﬁrst-order reasoning only. However, the proof of

statement (22) requires induction and two additional lemmas, viz.

lemma complexity lemma#1 <= all i,j:nat

if(log2(succ(minus(pred(plus(i,half(minus(j,i)))),i)))

>pred(log2(succ(minus(j,i)))),

false,

true)

(23)

and

lemma complexity lemma#2 <= all i,j:nat

if(log2(succ(minus(pred(j),plus(i,half(minus(j,i))))))

>pred(log2(succ(minus(j,i)))),

false,

true)

(24)

and also some of the ordering and arithmetic lemmas.

4 Verifying binsearch with XeriFun

We now report on the eﬀorts required to guide XeriFun to the veriﬁcation

of Binary Search. First, we brieﬂy illustrate how to use the system before we

consider the actual case. We then analyze the system’s behaviour and ﬁnally

discuss problems which either are speciﬁc to our system or constitute a general

challenge for computer supported veriﬁcation.

4We use if(x>y, f alse, true),i.e. x≯y, instead of the more readable x≤ybecause

>and several lemmas about >(like transitivity, irreﬂexivity etc.) are predeﬁned in

XeriFun, cf. Appendix 6.1, thus saving the additional deﬁnition of ≤and formulation

and proving of lemmas about ≤.

8

4.1 About XeriFun

XeriFun is a semi-automated system for the veriﬁcation of statements about

programs written in a functional programming language, cf. [1],[19]. In a typical

session with the system, a user

—deﬁnes a (functional) program by stipulating the data structures and the

procedures of the program using XeriFun ’s language editor,

—deﬁnes statements about the data structures and procedures of the program

using XeriFun ’s language editor,

—veriﬁes these statements and the termination of the procedures using

XeriFun ’s proof editor.

XeriFun consists of several fully-automated routines for theorem proving

and for the formation of hypotheses to support veriﬁcation. It is designed as

an interactive system, where, however, the automated routines substitute the

human expert in striving for a proof until they fail. In such a case, the user may

step in to guide the system for a continuation of the proof.

When called to prove a statement, the system computes a proof-tree.An

interaction, which may be required when the construction of the proof-tree gets

stuck, is to instruct the system to prune some unwanted branches of the proof-

tree (if necessary), and then

—to perform a case analysis,

—to use an instance of a lemma or an induction hypothesis,

—to unfold a procedure call,

—to apply an equation,

—touseaninductionaxiom,or

—toinsert,tomoveortodeletesomehypotheses.

5

In addition, it may be necessary to formulate (and to prove) an auxiliary

lemma (sometimes after providing a new deﬁnition) in order to continue with

the actual proof task.

For proving the termination of a recursively deﬁned procedure, it may be

required to tell the system a useful termination function.Usingthishint,the

system computes termination hypotheses for the procedure which then must

be veriﬁed like any other given statement. In a user guided termination proof,

the termination hypotheses are based on the predeﬁned procedure >(which is

“assumed” to compute a well-founded relation). In order to ease such termination

proofs, the system holds a set of predeﬁned lemmas about >, cf. Appendix 6.1.

5In XeriFun, the nodes of a proof-tree consist of sequents, and hypotheses can be

inserted into the antecedent of a sequent, can be deleted from the antecedent or

moved to the succedent. The insertion of a hypothesis implements a case analysis,

and the deletion of a hypothesis corresponds to a generalization step, cf. [15]. The

move of a hypothesis preserves equivalence and is sometimes needed (for technical

reasons) to enable a subsequent induction.

9

It is therefore useful (albeit not required) not to use other “usual” orderings

on natural numbers for a veriﬁcation problem, e.g. <or ≥,thussavingthe

formulation and proving of lemmas about these orderings. This also motivates

why our formulation of Binary Search is based on >.

Having proved the termination of a functional procedure, the system may

generate additional (terminating) procedures and (veriﬁed) lemmas about them.

These system-generated procedures and lemmas are used by XeriFun ’s auto-

mated termination analysis [16], but are also useful for proofs not related to

termination. Appendix 6.3 lists the system-generated procedures and lemmas

which were actually used in the veriﬁcation of Binary Search.

For proving the correctness and complexity statements for Binary Search,

some lemmas about the arithmetic functions are obviously also needed. In partic-

ular, monotonicity and estimation statements are required to justify the sound-

ness of the index calculations and to prove the loop-invariant. Here we used

XeriFun ’s Import-feature which allows to import lemmas together with their

proofs from a ﬁle. While working with the system on several veriﬁcation prob-

lems, we set up certain libraries,e.g.forArithmetic and for Linear Lists,which

we use when we start to work on a new problem. Usually we begin the work

with an import from the libraries of all lemmas stating properties about the

procedures in our veriﬁcation problem. This guarantees that all proven lemmas

are available when needed, where XeriFun’s lemma-ﬁlter takes care that system

performance is not spoiled by a vast amount of irrelevant lemmas. A library is up-

dated by those lemmas (not too speciﬁctoacertainveriﬁcation problem) which

were needed in the actual veriﬁcation, but are not member of a library. For the

veriﬁcation of Binary Search, we started with an import from our Arithmetic-

library of all lemmas about plus,minus,half and log2. Appendix 6.4 lists the

arithmetic lemmas which were actually needed when developing the proofs for

Binary Search.

4.2 Termination

XeriFun demands that the termination of each procedure which is called in a

statement is veriﬁed before a proof of the statement can be started. Therefore,

the system’s automated termination analysis [16] is activated immediately af-

ter the deﬁnition of a recursively deﬁned procedure. The termination analysis

recognizes termination based on (nested) structural recursion in particular, as

e.g. for minus and half, which we consider as trivial termination problems. In

the Binary Search example, only log2,find and find.steps cause non-trivial

termination problems. The system proves the termination of log2, but fails to

succeed for find and for find.steps. Obviously, minus(j,i) is a termination

function for find, and providing this hint, the system generates the termina-

tion hypotheses (25) and (26), but fails to verify each of them. We therefore

demand to ignore all hypotheses except j>iin (25) and in (26), causing the

system to compute the generalized termination hypotheses (27) and (28), which

both have an automatic (induction) proof using some of the predeﬁned, system-

10

all a:list, key,i,j:nat

if(length(a)>j,

if(element(a,plus(i,half(minus(j,i))))>key,

if(j>i,

minus(j,i)>minus(pred(plus(i,half(minus(j,i)))),i),

true),

true),

true)

(25)

all a:list, key,i,j:nat

if(length(a)>j,

if(key>element(a,plus(i,half(minus(j,i)))),

if(element(a,plus(i,half(minus(j,i))))>key,

true,

if(j>i,

minus(j,i)>minus(pred(j),plus(i,half(minus(j,i)))),

true)),

true),

true)

(26)

all i,j:nat

if(j>i,

minus(j,i)>minus(pred(plus(i,half(minus(j,i)))),i),

true)

(27)

all i,j:nat

if(j>i,

minus(j,i)>minus(pred(j),plus(i,half(minus(j,i)))),

true)

(28)

generated and arithmetic lemmas.6As find and find.steps share the same

recursion structure, we use XeriFun ’s Create Lem ma -command to make (27)

and (28) available for subsequent use. Now the veriﬁcation of find.steps’ter-

mination only requires to provide the termination function minus(j,i),leaving

the remaining work to the system.

4.3 Soundness, Completeness and Complexity

Proving the soundness of binsearch (4) does not require much eﬀort: The sys-

tem immediately comes up with the proof-goal (5), which motivates us to for-

mulate lemma (6). However, the proof of this lemma also gets stuck, but it is

immediately obvious from the failure that another lemma, viz. (7), (having a

6The motivation for these generalizations is not only to support the proofs, but to

identify the hypotheses irrelevant for the procedure’s termination. As the system gen-

erates induction axioms from the body of the terminating procedures, disregarding

irrelevant hypotheses results in induction axioms stronger than would be obtained

otherwise, see [15] for details.

11

straightforward induction proof) is needed. Given the proof of (7), (6) and in

turn (4) are easily proved.

The completeness statement is considerably harder to verify: The proof of

(8) gets stuck with the proof-obligation (9), and the task here is to develop a

lemma useful for proving this subgoal. However, diﬀerent from the soundness

(and complexity) case, the required lemma cannot be formulated with the given

notions. Instead a new concept, viz. partition, has to be invented which allows to

demand the success of find(key, a, i,j )onlyforkeyskey satisfying key ∈a[i..j].

Generally, the challenge here is to spot the missing concept, of course, and,

once it is found, to represent it in such a way that subsequent proofs are sup-

ported. The problem is, however, that (except in trivial cases) it seems hard

to distinguish a good representation from a bad one a priori. We were trapped

by this problem in the following way: It seemed straightforward to us to deﬁne

a procedure function partition(a:list,i:nat,j:nat):list <= ... com-

puting the sublist of list abetween the indices iand j, and then to prove the

lemma

lemma member partition entails find <=

all key,i,j:nat, a:list

if(length(a)>j,

if(ordered(a),

if(member(key,partition(a,i,j)),find(key,a,i,j),true),

true),

true) .

(29)

However, we failed in proving (29). New subgoals were created involving the

creation of new lemmas, which in turn raised new subgoals etc., until we gave up.

An analysis of this tedious and frustrating eﬀort revealed that the problem is the

use of partition. Roughly speaking, this procedure (using tl) strips of the list

elements in abetween positions 0 and i−1fromthebeginningofaand cuts of

(using a procedure butlast)thelistelementsbetween|a|−1andj+1 working

from the end of atowards the beginning. The problem which comes with this

approach now is that by the presence of butlast, the use of induction hypotheses

and lemmas are spoiled so that the proofs do not get through. This problem

immediately disappears if we use key ∈a[i..j], cf. function in.partition,

instead of a[i..j] as the new concept to express the lemma required to continue

with the proof of (8).

Using in.partition, lemma (10) can be formulated and a proof attempt

can be started. Now the situation develops similar to the soundness case: Proofs

get stuck, the system’s outcome is analyzed, a new lemma is spotted as the

result of the analysis, a proof of the new lemma is started which either requires

further lemmas (and so on ...) or gets through so that the proof of the statement

calling for the proved lemma can be resumed.7In course of the proof for (10),

7The way proofs are developed with the system does not diﬀer in principle from

the way proofs are developed using pencil and paper. It is good practice to continue

with the proof obligation which is most in doubt: If it seems obvious that the spotted

12

the lemmas (12),...,(19) were invented and proved, were (16) is the central one.

Each of these lemmas evolved with less imagination from the analysis of the

systems outcome when it gets stuck in proving another lemma. The proofs use

some of the predeﬁned and system-generated lemmas, and the proof of (10) also

uses some of the arithmetic lemmas. Finally, the completeness statement (8) is

proved by some ﬁrst-order reasoning steps using (10) and (11).

Also the proof of the complexity statement requires more work than the

soundness proof, but considerably less eﬀort than needed in the completeness

case. The proof of (20) gets stuck with the proof-obligation (21), and the missing

lemma, viz. (22), is easy to spot. Again, the induction proof of (22) gets stuck and

it is immediately obvious from the result and the induction hypotheses that the

lemmas (23) and (24) will do the job. Although these lemmas look considerably

nasty and one may expect a lot of tedious proof steps, to our surprise both

lemmas had a straightforward induction proof (using some of the predeﬁned and

arithmetic lemmas).8After (23) and (24) being proved, the system immediately

comes up with a proof of (22), and then the proof of (20) is routine.

4.4 Analysis

The veriﬁcation of Binary Search is characteristic for the way proofs are de-

veloped with XeriFun (which, as we suspect, does not diﬀer in principle from

the way when using other systems). User interactions are required from time to

time to recover from the weakness of the ﬁrst-order theorem prover, which, in

particular, is responsible for proving the base and step cases of an induction.

Such a theorem prover needs to terminate, because the system frequently deals

with proof-obligations which are not valid but can be proved by induction only.

Consequently, a sound theorem prover must be incomplete. E.g., the proof of

(6) gets stuck with a proof-obligation which is true but invalid, and therefore a

lemma, viz. (7), must be formulated and proved to ﬁnish the proof of (6).

In addition, a reasonable compromise must be made between theorem-proving

performance and system eﬃciency (measured both in memory and computing

time), thus providing a further source of incompleteness. To yield an acceptable

answer time, XeriFun uses several kinds of resource restrictions.Whensearch-

ing for a proof, the system’s ﬁrst-order theorem prover considers veriﬁed lemmas

(and induction hypotheses). To test the applicability of such a lemma, the prover

calls itself recursively, however with additional restricted resources not to waste

too much time for deciding whether a lemma supports the proof under consid-

eration. Consequently, the prover sometimes overlooks a useful lemma and then

must be told to use it. E.g. when proving (6), the system must be called to use

lemma will do the job, one should continue ﬁrst with proving the lemma. On the

other hand, if the lemma’s proof seems easy but its use is in doubt, it is advantageous

to verify ﬁrst that the lemma is useful indeed. XeriFun supports both ways of proof

development at the user’s wish.

8In fact, it costs far more time T

E

Xing one of the formulas than the system needs to

prove the log2-boundedness of binsearch.

13

Sound Complete Complexity Arithmetic Termination Σ

Proof-Obligations 311 413 11 42

Insert Lemma 210 3 1 4 20

Insert Function −1− − − 1

Proof-Tree Edits 1 3 3 3 4 14

Case Analysis

Use Lemma

Unfold Procedure

−

1

−

−

3

−

−

1

2

−

−

1

−

−

−

−

5

3

Apply Equation − − − − − −

Induction − − − − − −

Hypotheses − − − 2 4 6

none 2 8 1 11 931

Tab l e 1 . User Interactions for Binary Search

(7) albeit available, because it fails to prove “j>i→j>i+b(j−i)/2c”when

operating under additional restrictions, but easily proves this subgoal when in-

structed to use (7), because then theorem proving proceeds with the resources

initially given to the system.

Table 1 illustrates the eﬀorts when verifying Binary Search with XeriFun .9

The row Proof-Obligations displays the overall number of statements proved in

this example, separated into the subtasks “Soundness”, “Completeness”, “Com-

plexity”, “Arithmetic” and “Termination”. E.g., 11 lemmas had to be proved

for verifying the completeness statement. The 13 lemmas about the arithmetic

functions (collected in Appendix 6.4), which were used for Binary Search, are

listed separately in the Arithmetic-column. The whole case study uses 11 recur-

sively deﬁned procedures (including those for arithmetic, cf. Appendix 6.2) for

which the system also had to prove termination.

Below, the Insert Lemma - row displays the number of lemmas we had to

create to support the proof of each of the main statements respectively, e.g. 10

for Completeness,andtheInsert Function - row shows the number of additional

deﬁnitions required for formulating these lemmas, e.g. 1 for Completeness.In

the Termination-column, the number of termination functions which we had to

submit to the system are counted in addition. The Insert Lemma -rowgivesan

account of the system’s unability to spot the “right” lemma by itself, which is

rather large for XeriFun as (except for the termination analysis) no methods

for lemma speculation (and generalization, in particular) are implemented in the

used system version 2.5.6.

The Proof-Tree Edits - row counts the number of proof-tree edits required

to guide the system to success even if the required lemmas are available, subse-

quently separated into the diﬀerent activities. E.g., 3 user interventions were re-

9Predeﬁned and system-generated procedures and lemmas are not considered in this

table as they are given “for free”.

14

quired for the completeness case.10 The Hypotheses-rules had to be called 4 times

for Termination to provide the generalizations of the termination hypotheses for

find, and had been called 2 times for Arithmetic, because the commutativity of

plus was proved by 2 nested inductions. Finally, the last row gives the number

of proof-obligations which went through the system with no user guidance at all.

5 Concluding Remarks

Thenumberofrequireduserinteractions measured in terms of additional def-

initions and proof-tree edits gives a fair account of a system’s automatization

degree. Since approx. 1/3 proof-tree edit is required for each proof-obligation

in this case study, the ﬁrst-order prover shows a good performance here. This

judgement is based on a further analysis of the proofs computed by the system,

cf. [1]. For instance, after being told to use (16) in the proof of (10), the system

continued (and succeeded) without any further intervention computing a proof

of 127 steps using the 2 induction hypotheses and 20 diﬀerent instances of 13

lemmas. We consider this degree of automatization as an important feature, be-

cause it relieves a user to reason which lemmas to consider and how to apply

them, which is not obvious for a large lemma set and a long proof. Also the

heuristic for chosing induction axioms and the implemented equality reasoning

behaves well here, as the system had not to be called to use a certain induction

or to apply a certain equation.11

Nevertheless, we intend to improve some of the values in Table 1: The need

for the insertion of a lemma can be decreased by extending the system with some

state-of-the-art technology for lemma speculation,e.g.[6],[7],[8],[14],[17],[18], and

- based on experiences gained from further case studies - the need of proof-tree

edits can be decreased by a ﬁner tuning of the theorem prover wrt. the perfor-

mance vs. eﬃciency tradeoﬀ. Also the steady increase of hardware performance

may improve system performance: XeriFun’s resource restrictions are controlled

by certain resource parameters. This eases to beneﬁt from an increase of hard-

ware performance, as only the resource parameters need to be adjusted when

moving to a more powerful platform.12

However, a problem which seems far more serious is the problem of concept

formation, i.e. the invention of new concepts which are required to formulate

10 The Use Lemma row also counts the use of induction hypotheses. When distinguish-

ing proof-tree edits between Case Analysis,Use Lemma and Unfold Procedure,one

should bear in mind that very often the desired eﬀect can be obtained by either of

the proof-rules. For instance, a cases analysis may enable the system to recognize a

useful lemma or to unfold a procedure call, a required case analysis may implicitly

evolve from the use of a lemma, etc.

11 XeriFun’s equality reasoning is based on conditional term rewriting, where the ori-

entation of the equations are computed by the system.

12 Since the right tuning of the resource parameters require a deep insight into the

system’s architecture and operation, these parameters are not at a users disposal,

but are ﬁxed for each system version.

15

a necessary lemma. E.g. in the Binary Search veriﬁcation, some new concept,

viz. in.partition, is required to express the success of find,cf.(10).The

invention of new, usefully represented, concepts require much more creativity

than needed to formulate a lemma with the notions given, or to tell the prover

what to do next. We believe that this constitutes the main obstacle for a wider

use of theorem-proving based veriﬁcation systems, and major improvements in

this situation will only come with some computer support also for this problem.

6 Appendix

6.1 Predeﬁned Procedures and Lemmas

function >(x:nat, y:nat):bool <=

if x=0 then false else if y=0 then true else pred(x)>pred(y) fi fi

∀x,y:nat. x =y→x≯y∀x,y:nat. x =y→xˆ

−1≯y

∀x,y,z:nat. x > y ∧y>z→x>z ∀x,y:nat. x > y →x6=0

∀x,y,z:nat. x ≯y∧y≯z→x≯z∀x,y:nat. x > y →y≯x

∀x,y:nat. x > y ∨y>x∨x=y∀x,y:nat. x ˆ

−1>y→x6=0

∀x,y:nat. x 6=0∧x=y→x>yˆ

−1∀x,y:nat. x 6=0∧y=0→x>y

∀x,y:nat. x > y →x>yˆ

−1∀x,y:nat. x ˆ

−1>yˆ

−1→x>y

∀x,y:nat. x > y ˆ

−1→(x>y∨x=y)13

6.2 Auxiliary Procedures

function minus(x:nat, y:nat):nat <=

if x=0

then 0

else if y=0 then x else minus(pred(x),pred(y)) fi

fi

function half(x:nat):nat <=

if x=0

then 0

else if pred(x)=0 then 0 else succ(half(pred(pred(x)))) fi

fi

function log2(x:nat):nat <=

if x=0

then 0

else if pred(x)=0

then 0

else succ(log2(succ(half(pred(pred(x))))))

fi

fi

13 We write 1 ˆ

+ for the successor function and ˆ

−1 for the predecessor.

16

function plus(x:nat, y:nat):nat <=

if x=0 then y else succ(plus(pred(x),y)) fi

function member(n:nat, k:list):bool <=

if k=empty

then false

else if n=hd(k) then true else member(n,tl(k)) fi

fi

function length(k:list):nat <=

if k=empty then 0 else succ(length(tl(k))) fi

function ordered(k:list):bool <=

if k=empty

then true

else if tl(k)=empty

then true

else if hd(k)>hd(tl(k)) then false else ordered(tl(k)) fi

fi

fi

6.3 System-Generated Procedures and Lemmas

function minus$1(x:nat, y:nat):bool <=

if x=0

then false

else if y=0 then false else true fi

fi

function half$1(x:nat):bool <=

if x=0 then false else true fi

∀x,y:nat. x ≯(x−y)→(x−y)=x∀x:nat. x ≯bx/2c→bx/2c=x

∀x,y:nat. ¬minus$1(x,y)→(x−y)=x∀x:nat. ¬half $1(x)→bx/2c=x

∀x,y:nat. minus$1(x,y)→x>(x−y)∀x:nat. half $1(x)→x>bx/2c

∀x,y:nat. (x−y)≯x∀x:nat. bx/2c≯x

6.4 Arithmetic Lemmas

∀x1,x2:nat. x1=x2→x1−x2=0 ∀x,y,z:nat. x > y →x>y−z

∀x,y:nat. x 6=0→x+y>x

ˆ

−1∀x,y,z:nat. (x+y)+z=x+(y+z)

∀x,y:nat. x +y=y+x∀x,y:nat. x ≯y→blog2(x)c≯blog2(y)c

∀x,y:nat. x ≯y→bx/2c≯by/2c∀x,y:nat. x > y →y+b(x−y)/2c≯xˆ

−1

∀x:nat. bx/2c≯xˆ

−1∀x,y:nat. (x−y)ˆ

−1=¡xˆ

−1¢−y

∀x:nat. bx/2cˆ

−1≯¥¡xˆ

−1¢/2¦∀x:nat. ¡xˆ

−1¢−bx/2c≯¥¡xˆ

−1¢/2¦

∀x,y:nat. x ≯y↔x−y=0

17

References

1. http://www.inferenzsysteme.informatik.tu-darmstadt.de/verifun/.

2. R. S. Boyer and J. S. Moore. A Computational Logic. Academic Press, New York,

1979.

3. M. J. C. Gordon and T. F. Melham. Introduction to HOL: A Theorem Proving

Environment for Higher-Order Logic. Cambridge University Press, Cambridge,

1993.

4. J. Gosling, B. Joy, and G. L. Steele. The Java Language Speciﬁcation. Addison-

Wesley, Reading, Massachusetts, 1996.

5. D.Hutter,B.Langenstein,C.Sengler,J.Siekmann,W.Stephan,andA.Wolpers.

DeductionintheVeriﬁcation Support Environment. In M.-C. Gaudel and J. Wood-

cock, editors, Intern. Symp. of Formal Methods Europe (FME), volume 1051 of

Lecture Notes in Artiﬁcial Intelligence, New York, 1996. Springer-Verlag.

6. A. Ireland and A. Bundy. Extensions to a Generalization Critic for Inductive

Proofs. In M. McRobbie and J.Slaney, editors, Proc.ofthe13thInter.Conf.

on Automated Deduction (CADE-96), volume 1104 of Lect ure Note s in Artiﬁcal

Intelligence, pages 47—61, New Brunswick, 1996. Springer-Verlag.

7. A. Ireland and A. Bundy. Productive use of failure in inductive proof. J. Automat.

Reason., 16(1—2), 1996.

8. D. Kapur and M. Subramaniam. Lemma discovery in automating induction. In

M. McRobbie and J.Slaney, editors, Proc. 13th Intern. Conf. on Automated De-

duction (CADE-96), volume 1104 of Lecture Notes in Artiﬁcial Intelligence, pages

538—552, New Brunswick, 1996. Springer-Verlag.

9. M. Kaufmann and J. S. Moore. ACL2: An Industrial Strength Version of NQTHM.

In Compass’96: 11th Annual Conf. on Computer Assurance,Gaithersburg,Mary-

land, 1996. National Institute of Standards and Technology.

10. J. H. Kingston. Algorithms and Data Structures : Design, Correctness, Analysis.

Addison-Wesley, Reading, Massachusetts, 1990.

11. S. Owre, J. Rushby, and N. Shankar. PVS: A prototype veriﬁcation system. In Proc.

11th Intern. Conf. on Automated Deduction (CADE-92),volume607ofLecture

Notes in Artiﬁcial Intelligence, New York, 1992. Springer-Verlag.

12. L. C. Paulson. Isabelle: a generic theorem prover, volume 828 of Lecture No tes in

Computer Science. Springer-Verlag, New York, 1994.

13. W. Reif. The KIV Approach to Software Veriﬁcation. In M. Broy and S. J¨ahnichen,

editors, KORSO: Methods, Languages and Tools for the Construction of Correct

Software,volume1009ofLecture Notes in Computer Science. Springer-Verlag,

1995.

14. T. Walsh. A Divergence Critic for Inductive Proof. J. Artiﬁcal Intelligence Re-

search, 4:209—235, 1996.

15. C. Walther. Mathematical Induction. In D. Gabbay, C. Hogger, and J. Robin-

son, editors, Handbook of Logic in Artiﬁcial Intelligence and Logic Programming,

volume 2, pages 127—227. Oxford University Press, Oxford, 1994.

16. C. Walther. On proving the termination of algorithms by machine. Artiﬁcial

Intelligence, 71(1):101—157, 1994.

17. C. Walther and T. Kolbe. On Terminating Lemma Speculations. Information and

Computation, 162:96—116, 2000.

18. C. Walther and T. Kolbe. Proving theorems by reuse. Artiﬁcial Intelligence,

116:17—66, 2000.

19. C. Walther and S. Schweitzer. XeriFun User Guide. Technical Report VFR 02/01,

Programmiermethodik, Technische Universit¨at Darmstadt, 2002.

18