Content uploaded by Christoph Walther

Author content

All content in this area was uploaded by Christoph Walther on Apr 23, 2017

Content may be subject to copyright.

The 2006 Federated Logic Conference

The Seattle Sheraton Hotel and Towers

Seattle, Washington

August 10 - 22, 2006

IJCAR’06 Workshop

DISPROVING’06:

Non-Theorems, Non-Validity, Non-Provability

August 16th, 2006

Proceedings

Editors:

W. Ahrendt, P. Baumgartner, H. de Nivelle

A Fast Disprover for XeriFun

Markus Aderhold, Christoph Walther, Daniel Szallies, and Andreas Schlosser

Fachgebiet Programmiermethodik, Technische Universit¨at Darmstadt, Germany

{aderhold, chr.walther, szallies, schlosser}@pm.tu-darmstadt.de

Abstract. We present a disprover for universal formulas stating conjec-

tures about functional programs. The quantiﬁed variables range over free-

ly generated polymorphic data types, thus the domains of discourse are

inﬁnite in general. The objective in the development was to quickly ﬁnd

counter-examples for as many false conjectures as possible without wast-

ing too much time on true statements. We present the reasoning method

underlying the disprover and illustrate its practical value in several ap-

plications in an experimental version of the veriﬁcation tool XeriFun .

1 Introduction

As a common experience, speciﬁcations are faulty and programs do not meet

their intention. Program bugs range from easily detected simple lapses (such

as not excluding division by zero or typos when setting array bounds) to deep

logical ﬂaws in the program design which emerge elsewhere in the program and

therefore are hard to discover.

But programmers’ faults are not the only source of bugs. State-of-the-art

veriﬁers synthesize conjectures about a program that are needed (or are at least

useful) in the course of veriﬁcation: Statements may be generalized to be qualiﬁed

for a proof by induction, the veriﬁer might generate termination hypotheses that

ensure a procedure’s termination, or it might synthesize conjectures justifying

an optimization of a procedure. Sometimes these conjectures can be faulty, i.e.

over-generalizations might result, the veriﬁer comes up with a wrong idea for

termination, or an optimization simply does not apply.

Verifying that a program meets its speciﬁcation is a waste of time in all these

cases, and therefore one should begin with testing the program beforehand. How-

ever, as testing is a time consuming and boring task, machine support is welcome

(not to say needed) to relieve the human from the test-and-verify cycle. Program

testing can be reformulated as a veriﬁcation problem: A program conjecture φ

fails the test if the negation of φcan be veriﬁed. However, for proving these

negated conjectures, a special veriﬁer—called a disprover—is needed.

In this paper, we present such a disprover for statements about programs

written in the functional programming language L[14], which has been in-

tegrated into an experimental version [12] of the interactive veriﬁcation tool

XeriFun [15, 16]. The procedures of L-programs operate over freely generated

structure bool <=true,false

structure N<= 0, +(−:N)

structure list[@A]<=ε,[infix] ::(hd : @A,tl :list[@A])

function [outfix] |(k:list[@A]) : N<=

if k=εthen 0else +(|tl(k)|)

function [infix] <>(k, l :list[@A]) : list[@A]<=

if k=εthen lelse hd(k) :: (tl(k)<> l)end

function rev(k:list [@A]) : list [@A]<=

if k=εthen εelse rev(tl (k)) <> hd (k) :: εend

lemma rev <> <=

∀k, l :list[@A]rev(k <> l)=rev (k)<> rev(l)

Fig. 1. A simple L-program

polymorphic data types and are deﬁned by using recursion, case analyses, let-

expressions, and functional composition. The data types bool for Boolean values

and Nfor natural numbers as well as equality =: @A×@A→bool and

a procedure >:N×N→bool deciding the >-relation on are predeﬁned

in L. Figure 1 shows an example of an L-program that deﬁnes a polymorphic

data type list[@A], list concatenation <>, and list reversal rev. In this program,

the symbols true,false are constructors of type bool,−(. . .) is the selector of

the N-constructor +(. . .), and hd and tl are the selectors of constructor :: for

lists. Subsequently, we let Σ(P) denote the signature of all function symbols de-

ﬁned by an L-program P, and Σ(P)cis the signature of all constructor function

symbols in P. An operational semantics for L-programs Pis deﬁned by an inter-

preter evalP:T(Σ(P)) 7→ T (Σ(P)c) which maps ground terms to constructor

ground terms of the respective monomorphic data types using the deﬁnition of

the procedures and data types in P, cf. [10, 14, 18].

In L, statements about programs are given by expressions of the form lemma

name <=∀x1:τ1, . . . , xn:τnb(cf. Fig. 1), where b—called the body of the

lemma—is a Boolean term built with the variables xi(of type τi) from a set V

of typed variables and the function symbols in Σ(P), where case analyses (like

in procedure deﬁnitions) and the truth values are used to represent connectives.

Hence the general form of the proof obligations we are concerned with are univer-

sal formulas φ=∀x1:τ1, . . . , xn:τnb. Disproving such a formula φis equivalent

to proving its negation ¬φ≈ ∃x1:τ1, . . . , xn:τn¬b, and as the domain of each

type τican be enumerated, disproving φis a semi-decidable problem. A disproof

of φ(also called a witness of ¬φ) can be represented by a constructor ground

substitution σsuch that evalP(σ(b)) = false . Consequently, disproving φcan be

viewed as solving the (semi-decidable) equational problem b.

=false.

To solve such an equation, we develop two disproving calculi that consti-

tute the two phases of our disprover. The inference rules of both calculi are

inspired by the calculus proposed in [6] by Comon and Lescanne for solving

equational problems. As disproving is semi-decidable, a complete disprover can

60

be developed. However, as truth of universal formulas φis not semi-decidable

by G¨odel’s ﬁrst incompleteness theorem, disproving φis undecidable. Therefore

a complete (and sound) disprover need not terminate. But the use in an inter-

active environment—such as the one XeriFun provides—requires termination of

all subsystems, hence completeness must be sacriﬁced in favor of termination.

The use in an interactive environment also demands runtime performance, so

particular care is taken to achieve early failure on non-disprovable conjectures.

In Section 2, we explain how we disprove universal formulas φ. In Section 3,

we demonstrate the practical use of our disprover when XeriFun employs it in

diﬀerent disproving applications. We compare our proposal with related work in

Section 4 and conclude with an outlook on future work in Section 5.

2 Disproving Universal Formulas

Our disproving method proceeds in two phases. The ﬁrst phase is based on

the elimination calculus (E-calculus for short). Its language is given by LE:=

{hE, σ i ∈ LE×Sub | V(E)∩dom(σ) = ∅}:LEis the set of clauses in which

atoms are built with terms from T(Σ(P),V) and the predicate symbols .

= of

type @A×@Aand mof type N×N; negative literals are written t16.

=t2or t1m/ t2,

respectively. Sub denotes the set of all constructor ground substitutions σ, i. e.

σ(v)∈ T (Σ(P)c) for each v∈dom(σ). The inference rules of the E-calculus

(deﬁned below) are of the form “ hE,σi

hλ(E0),σ◦λi, if cond”, where cond stands for

a side condition that has to be satisﬁed to apply the rule, and λ∈Sub. An E-

deduction is a sequence hE1, σ1i, . . . , hEn, σnisuch that for each i,hEi+1 , σi+1i

originates from hEi, σiiby applying an E-inference rule, and hE1, σ1i `EhEn, σni

denotes the existence of such an E-deduction.

The second phase of our disproving method uses the solution calculus (S-

calculus for short). It operates on LS:= {hE, σi ∈ LS×Sub | V(E)∩dom (σ) =

∅}:LS⊂LEis the set of clauses in which atoms are formed with predicate

symbols .

=, mand terms from T(Σ(P0),V), where Σ(P0) emerges from Σ(P)

by removing the function symbols if and =as well as all procedure function

symbols. The form of the S-inference rules (deﬁned below) and deduction `S

are deﬁned identically to the E-calculus, and hE, σi `S ◦E hE00, σ 00idenotes the

existence of a composed deduction hE , σi `EhE0, σ0i `ShE00, σ00i.

A substitution σis an E-substitution for a clause E∈LE,σ∈SubEfor short,

iﬀ σ(v)∈ T (Σ(P)c) for each v∈ V(E). We write σlif an {l}-substitution σ

solves an E-literal l, deﬁned by σt1.

=t2iﬀ evalP(σ(t1)) = evalP(σ(t2)) and

σt1mt2iﬀ evalP(σ(t1)) >evalP(σ(t2)). An E-clause Eis solved by σ∈SubE,

σEfor short, iﬀ σlfor each l∈E. Both calculi are sound in the sense that

hE, σ i `... hE0, σ0ientails θσ0(E) for each θ∈SubE0with θE0, and σ⊆σ0.

To disprove a conjecture φ=∀x1:τ1, . . . , xn:τnb, we search for a deduction

h{b.

=false}, εi `S ◦E h∅, σi.1Hence σrepresents a disproof of φ, as σb.

=false.

1Since the domain of each data type is at most countably inﬁnite, we actually use

monomorphic types τ0

iinstead of the polymorphic types τiin φwithout loss of gen-

61

(1) hE] {l}, σi

hE∪ {l[π←t2], t1.

=true}, σ i|hE∪ {l[π←t3], t1.

=false}, σ i,

if l|π=if (t1, t2, t3)

(2) hE] {l}, σi

hE∪ {l[π←w], w .

=f(...)}, σi, if f∈Σproc,l|π=f(...), and π /∈ {1,2}

(3) hE] {l}, σi

hE∪ {l[π←ρ(r)]} ∪ Econd, σ i, if l|π=f(t1,...,tn) for some π∈ {1,2}

and hC, C, r i ∈ Dffor f∈Σproc

where Econd =Sc∈C{ρ(c).

=true} ∪ Sc∈C{ρ(c).

=false}and

ρ:= {x1/t1,...,xn/tn}for the formal parameters xiof f

(5) hE] {v.

=cons(t1,...,tn), v .

=cons(t0

1,...,t0

n)}, σi

hE∪ {v.

=cons(w1,...,wn)} ∪ Sn

j=1{wj.

=tj, wj.

=t0

j}, σi

(6) hE] {v.

=cons(t1,...,tn), v 6.

=v0, v0.

=cons(t0

1,...,t0

n)}, σi

hE∪ {v.

=cons(w1,...,wn), v0.

=cons(w0

1,...,w0

n)}

∪ {wi6.

=w0

i} ∪ Sn

j=1{wj.

=tj, w0

j

.

=t0

j}, σi

, if i∈ {1,...,n}

(7) hE] {l}, σi

hE∪ {l[π←wi], w .

=cons(w1,...,wn), w .

=t}, σi, if l|π=sel i(t)2

(8) hE] {+(t1)+(t2)}, σi

hE∪ {t1t2}, σi, if ∈{m,m/}

Fig. 2. Inference rules of the E-calculus

The inference rules of both calculi are given in the subsequent paragraphs.

The most important rules are formally deﬁned whereas others (denoted by rule

numbers in italics) are only informally described for the sake of brevity. In order

to reduce the depth of the terms in E- and S-literals, some of the rules introduce

fresh variables (called “auxiliary unknowns” in [6]), which we denote by wand

w0. Terms are written as t,t1and t2, and vand v0denote variables.

2.1 Inference rules of the E-calculus

The E-calculus consists of the inference rules (1)–(3) and (5)–(8) of Fig. 2 plus

rules (4) and (9)–(10 ) described informally. The purpose of the E-inference rules

is to eliminate all occurrences of if ,=, and of procedure function symbols so that

some hE, σ i ∈ LSis obtained by an E-deduction h{b.

=false}, εi `EhE , σi.3All

rules are supplied with an additional side condition (*) demanding E /∈L⊥for

each hE, σithey apply to, where L⊥is the set of all E-clauses containing evident

erality. Type τ0

ioriginates from type τiby instantiating each type variable in τiwith

type N. E. g., to disprove ∀k, l :list [@A]k <> l =l <> k, the monomorphic instance

∀k, l :list [N]k <> l =l <> k is considered.

2Assuming t.

=cons(...) is sound, as well-typedness is demanded.

3This elimination is possible whenever b.

=false is solvable, as each procedure call

needs to be unfolded by rule (3) only ﬁnitely many times.

62

(15) hE] {t1t2}, σi

hE∪ {w1.

=t1, w2.

=t2, w1w2}, σi, if t1, t2/∈ V

(16) hE] {v6.

=t}, σi

hE∪ {v6.

=t, v .

=cons0(w1,...,wn)}, σi, if t∈ V or t=cons(...)

(17) hE] {t1m/ t2}, σ i

hE∪ {t1.

=t2}, σi|hE∪ {t2mt1}, σi, if t1, t2/∈ V

(18) hE] {t16.

=t2}, σi

hE∪ {t1mt2}, σi|hE∪ {t2mt1}, σi, if t1, t2∈ T (Σ(P0),V)N

Fig. 3. Inference rules of the S-calculus

contradictions such as {t6.

=t, . . .},{0mt, . . .}or {t.

= 0, t .

= 1, . . .}. This

proviso corresponds to the elimination of trivial disequations and the clash rule

for equations in [6].

Rule (1) eliminates an if -conditional and rule (2) eliminates an inner pro-

cedure call from a literal.4Rule (3) unfolds a call of procedure fthat occurs as

adirect argument in a literal. A procedure fis represented here by a set Dfof

triples hC, C, risuch that ris the if -free result term in the procedure body of f

obtained under the conditions C∪{¬c|c∈C}, where Cand Cconsist of if -free

Boolean terms only. E. g., D<> consists of two triples, viz. d1={k=ε},∅, l

and d2=∅,{k=ε},hd(k) :: (tl(k)<> l), for procedure <> of Fig. 1.

A further rule (4) translates inequations and equations expressed with sym-

bols from Σ(P) into E-literals; e. g., “t1>t2.

=false” is translated into “t1m/ t2”.

Rule (6) is like the decomposition rule from [6] for inequations, but restricted

to constructors. Another rule (9)—corresponding to the elimination of trivial

equations and clash for inequations in [6]—removes trivial literals such as t.

=t

or 0 m/ t from a clause E∈LEand supplies arbitrary values in λfor variables

that disappear from the clause. Finally, literals are simpliﬁed by rule (10 ), which

replaces subterms of the form seli(cons(t1, . . . , tn)) with ti. This rule does not

exist in [6] and accounts for data type deﬁnitions with selectors.

2.2 Inference rules of the S-calculus

The S-calculus consists of the inference rules (5)–(19), to which the additional

side condition (*) applies as well. Rules (5)–(10) are the same as in the E-calculus,

rules (11 )–(14 ) are “structural” rules to merge (in)equations, to replace variables

with terms, and to solve equations v.

=twith t∈ T (Σ(P)c) by substitutions

λ:= {v/t}.

Rules (15)–(18) are given in Fig. 3. Rule (15) removes non-variable argu-

ments from an S-literal, rule (16) is basically a case analysis on vusing some

4For a literal l=t1t2,l|πis the subterm of lat occurrence π∈Occ(l), and l[π←t]

is obtained from lby replacing l|πin lwith t. We use “|” as a shorthand for the

succedents of diﬀerent rules with the same premises and side conditions. All rules

are applied “modulo symmetry” of .

= and 6.

= if possible (e. g., see rule (5)).

63

constructor cons0, and rules (17) and (18) eliminate negative literals. Rule (19 )

invokes a constraint solver.5We call an S-literal t1t2aconstraint literal iﬀ

at least one of the tiis a variable of type N,Σ(t1)∪Σ(t2)⊆ {0,+,−}, and

∈ { .

=,m};Cis the set of all constraint literals. When none of the other S-rules

is applicable, rule (19 ) passes the constraint literals E∩C to a modiﬁed version

of Indigo [5]: To terminate on cyclic constraints like {xmy, y mx}, we simply

limit the number of times a constraint can be used by the number of variables

in E∩ C. If E∩ C can be satisﬁed, we get a solving assignment λ∈SubE∩C;

otherwise rule (19 ) fails.6

2.3 Search Heuristic and Implementation

By the inherent indeterminism of both calculi, search is required for computing

E- and S-deductions: An E-clause Eto be solved deﬁnes an inﬁnite E-search tree

ThE,εi

Ewhose nodes are labeled with elements from LE. The root node of ThE ,εi

E

is labeled with hE , εi, and hE00, σ00iis a successor of node hE0, σ0iiﬀ hE00, σ 00i

originates from hE0, σ0iby applying some E-inference rule. The leaves of ThE,εi

E

are given by the E-success and the E-failure nodes: hE0, σ0iis an E-success node

iﬀ E0∈LS\L⊥, and hE0, σ0iis an E-failure node iﬀ E0∈L⊥. A path from the

root node to an E-success node is called an E-solution path. All these notions

carry over to S-search trees ThE ,σi

Sby replacing Ewith Sliterally, except that

an S-node labeled with hE0, σ 0iis an S-success node iﬀ E0=∅, and hE0, σ0iis

an S-failure node iﬀ E06=∅and no S-inference rule applies to hE0, σ0i.7

An E-clause Eto be solved deﬁnes an inﬁnite S ◦ E -search tree TE

S◦E , which

originates from ThE,εi

Eby replacing each E-success node hE0, σ0iwith the S-search

tree ThE0,σ0i

S. An S ◦ E-solution path in TE

S◦E is an E-solution path pfollowed by

an S-solution path that starts at the E-success node of path p.

To disprove a conjecture φ=∀x1:τ1, . . . , xn:τnb, the S ◦ E-search tree

T{b.

=false}

S◦E is explored to ﬁnd an S ◦E-solution path. In order to guarantee termi-

nation of the search, only a ﬁnite part of T{b.

=false}

S◦E may be explored. Additional

side conditions for the E- and S-inference rules ensure that one rule does not

undo another rule’s work. Rules that do not require a choice, e.g. rule (5), are

preferred to those that need some choice, e. g. rules (6) and (16).

The most signiﬁcant restriction in exploring T{b.

=false}

S◦E (supporting termina-

tion at the cost of completeness) comes from an additional side condition (**) of

rule (3), called the paramodulation rule in [8]: Deﬁnition triples d:= (C , C, r)∈

Dfwith f /∈Σ(r), called non-recursive deﬁnition triples, can be applied as often

5We use mand m/in LEonly to handle calls of the predeﬁned procedure >more

eﬃciently by a constraint solver.

6In our setting there is no need to assign priorities to constraints, so we can simplify

the algorithm by treating all constraints as “required” constraints.

7E0∈L⊥is suﬃcient but not necessary for hE0, σ0ibeing an S-failure node, as the

constraint solver called by rule (19 ) might fail on some S-clause E0∈LS\L⊥.

64

as possible. However, if f∈Σ(r), we need to limit the usage of d. Side condi-

tion (**) demands that recursive deﬁnition triples be used at most once on each

side of a literal in each branch of Th{b.

=false},εi

E. This leads to a fast disprover

that works well on simple examples, cf. Sect. 3. We call this restriction simple

paramodulation.

To increase the deductive performance of the disprover, we can allow more

applications of a recursive deﬁnition triple dby considering the “context” of

f-procedure calls. Each procedure call f(. . .) in the original formula φis labeled

with (N, . . . , N )∈kfor a constant N∈, e. g. N= 2, and k=|Df|.Context-

sensitive paramodulation modiﬁes side condition (**) in the following way: A

recursive deﬁnition triple di:= (Ci, C i, ri)∈Dfmay only be used if procedure

call f(. . .) is labeled with (n1, . . . , nk) such that ni>0. The recursive calls of f

in riare labeled with (n1, . . . , ni−1, ni−1, ni+1, . . . , nk), and the other procedure

calls in riare labeled with (N, . . . , N ). Context-sensitive paramodulation still

allows only ﬁnitely many applications of rule (3), as the labels decrease with each

rule application. Section 3 gives examples that illustrate the diﬀerence between

these alternatives in practice. Note that simple paramodulation is not a special

case of context-sensitive paramodulation (by setting N= 1), because it does

not distinguish between diﬀerent occurrences of a procedure as context-sensitive

paramodulation does.

For eﬃciency reasons (wrt. memory consumption), we explore Th{b.

=false},εi

E

with a depth-ﬁrst strategy, whereas ThE0,σ0i

Sis examined breadth-ﬁrst to avoid

inﬁnite applications of rule (16). Two technical optimizations considerably speed

up the search for an S ◦ E-solution path. Firstly, caching allows to prune a

branch that has already been considered in another derivation. The cache hit

rates are about 20 %. Secondly, while exploring Th{b.

=false},εi

Ewe can already

start a subsidiary S-search on S-literals from a clause E /∈L⊥(even though

node hE, σiof Th{b.

=false},εi

Eis not an E-success node) and feed the results back

to the E-search node hE, σ i. For instance, if we derive x.

=+(y) from E, we

can discard E-branches that consider the case x.

= 0. In conjunction with simple

paramodulation, this (empirically) leads to early failure on unsolvable examples.

3 Using the Disprover

In this section we illustrate the use and the performance of our disprover when

it is employed as a subsystem of XeriFun [12]. Unless otherwise stated, we use

simple paramodulation. We distinguish between conjectures provided by the user

and conjectures speculated by the system.

3.1 User-Provided Conjectures

Before trying to verify a program statement, it is advisable to make sure that

it does not contain lapses that render it false. E. g., in arithmetic we are often

interested in cancelation lemmas such as xy=xz→y=z. However, the disprover

65

ﬁnds the witness {x/0, y/0, z/1}falsifying the conjecture. Excluding x=0 does

not help, as now the witness {x/1, y/0, z/1}is quickly computed. But excluding

x=1 as well causes the disprover to fail, hence we are expectant that veriﬁ-

cation of ∀x, y, z :Nx=/ 0 ∧−(x)=/ 0 ∧xy=xz→y=zwill succeed. If we con-

jecture the associativity of expontiation, (xy)z=x(yz), the disprover ﬁnds the

witness {x/2, y/0, z/0}. For the injectivity conjecture of the factorial function,

i. e. ∀x, y :Nx!=y!→x=y, the disprover comes up with the witness {x/1, y/0}

and fails if we demand x=/ 0 ∧y=/0 in addition. For ∀k, l :list[@A]k <> l =l <> k,

the solution {k/0 :: ε, l/1 :: ε}is computed.

All conjectures from above are disproved within less than a second.8One

might argue that these disproofs are quite simple, so they should be easy to

ﬁnd. XeriFun’s old disprover [1] basically substitutes the variables with values

(or value templates like n:: kfor lists) of a limited size and uses a heuristic search

strategy to track down a counter-example quickly if one exists. However, such

a strategy does not lead to early failure on true conjectures: The old disprover

fails after 46 s on the conjecture that procedure perm (deciding whether two

lists are a permutation of each other) computes a symmetric relation.9The new

disprover fails after just a second.

The disprover also helps to ﬁnd simple ﬂaws in the deﬁnition of lemmas

or procedures. For instance, it disproves lemma “rev <>” (cf. Fig. 1), yielding

{k/0 :: ε, l/1 :: ε}. Also, the termination hypothesis for <> is disproved at once

if one inadvertently writes tl(l) in the recursive call of <> (instead of tl(k)).

Similar errors are the use of ≥instead of >in program conjectures and procedure

deﬁnitions.

To illustrate the consequences of simple paramodulation, consider formula

∀k:list[@A]rev(k)=k. As the smallest solution is {k/0 :: 1 :: ε}, we need to open

rev twice. Thus the disprover fails to ﬁnd this witness with simple paramodu-

lation, but succeeds with context-sensitive paramodulation. The same eﬀect is

observed with lemma “rev <>” or with ∀x:N2x> x2. However, as most con-

jectures do not need extensive search, we prefer to save time and oﬀer this

alternative only as an option to the user who is willing to spend more time on

the search for a disproof.

3.2 Conjectures Speculated by the System

When generalizing statements by machine, a disprover is needed to detect over-

generalizations. E. g., XeriFun’s generalization heuristic [1] tries to generalize

φ=∀k, l :list [@A]half (|k <> l |)=half (|l <> k|) to φ0=∀k, l :list [@A]|k <> l|=

|l <> k|and then φ0to φ00 =∀k, l :list[@A]k <> l =l <> k . Our disprover quickly

fails on φ0and succeeds on φ00 (see above), hence generalization φ0is a good candi-

date for a proof by induction, whereas φ00 is recognized as an over-generalization

of φ.

8All timing details refer to our single-threaded Java implementation on a 3.2 GHz

hyper-threading CPU, where the Java VM was assigned 300 MB of main memory.

9The old disprover examined the conjecture for lists of length ≤2 and natural numbers

between 0 and 2.

66

Another example of such a generate-and-test cycle is recursion elimination:

For user-deﬁned procedures, XeriFun synthesizes so-called diﬀerence and do-

main procedures which represent information that is useful for automated anal-

ysis of termination [13, 17] and for proving absence of “exceptions” [18] (caused

by division by 0, for example). Both kinds of procedures may contain unneces-

sary recursive calls, which complicate subsequent proofs. Therefore the system

generates recursion elimination formulas [13] justifying a sound replacement of

some recursive calls with truth values. For those formulas that the system could

not prove, the user has to decide whether to support the system either by inter-

actively constructing a proof or by giving a witness to disprove the conjecture.

He can also ignore the often unreadable conjectures (which most users do), not

being aware that missing a true recursion elimination formula means much more

work in subsequent proofs.

For example, for the domain procedure of a tautology checker (cf. procedure 0

in [14]), XeriFun generates 62 recursion elimination formulas. Our disprover

falsiﬁes all of them within 33 s. Without a disprover, we wasted four times longer

on futile proof attempts from which we cannot conclude anything. With the old

disprover, it took more than ﬁve times longer to disprove 59 formulas; it failed

on the others. For other domain or diﬀerence procedures, the disprover performs

equally well, so in the vast majority of cases the user does not need to worry

about recursion elimination any more. This is a tremendous improvement in

user-friendliness.

4 Related Work

The problem of automatically disproving statements in the context of program

veriﬁcation has been tackled in various research projects.

Protzen [8] describes a calculus to disprove universal conjectures in the INKA

system [4]. While it apparently performs quite well on false system-generated

conjectures, it has a rather poor performance on true ones; if the input conjecture

is true, it searches until it reaches an explicit limit of the search depth.

A disprover for KIV is presented in [9]. The existing proof calculus is modiﬁed

so that it is able to construct disproofs. This interleaves the incremental instan-

tiation of variables and simplifying proof steps. For solvable cases “good results”

are reported, whereas performance on unsolvable problems is not communicated.

Ahrendt has developed a complete disprover for free data type speciﬁca-

tions [2]. Since the interpretation of function symbols is left open in this loose

semantics approach, one needs to consider all models satisfying the axioms when

proving the non-consequence of a conjecture φ. Similarly, the Alloy modeling

system [7] can investigate properties of under-speciﬁed models. The correspond-

ing constraint analyzer checks only models with a bounded number of elements

in each primitive type, so (like our disprover) it is incomplete. Diﬀerently from

these approaches, we consider only a ﬁxed interpretation of function symbols

(given by the interpreter evalP) in our setting.

67

Isabelle supports a “quickcheck” command [3] to test a conjecture by substi-

tuting random values for the variables several times. A comparison of the success

rates and the performance of this approach with our results is planned as future

work.

Coral [11] is a system designed to ﬁnd non-trivial ﬂaws in security protocols.

It is based on the Comon-Nieuwenhuis method for proof by consistency and uses

a parallel architecture for consistency checking and so-called induction derivation

to ensure termination. Finding an attack on a protocol may take several hours

with Coral.

5 Conclusion

In the design of our disprover we tried to minimize the time wasted on true

conjectures. We achieved this by limiting the application of the paramodulation

rule. Apart from this, we do not need any explicit depth limits. In particular,

there is no explicit limit on the size of a witness. We also reduce the cost of

simpliﬁcations by restricting them to selector and constructor calls. By incor-

porating a constraint solver [5] for inequalities on the predeﬁned data type N

for , we further improved the performance.

We identiﬁed several applications of our disprover that considerably improve

the productivity when working with the XeriFun system. The main application

of our disprover is bulk processing (such as recursion elimination) or automatic

generalization. While it is possible to approximate completeness arbitrarily well

to ﬁnd deeper ﬂaws in a program (conjecture), this would tremendously increase

the time wasted on true conjectures. The advantage of our disprover is that it

is successful in most solvable cases and quickly gives up in unsolvable cases, as

practical experiments reveal.

In future work, we intend to investigate further heuristics for the paramodu-

lation rule, which primarily controls the power of the disprover. We also intend

to examine whether the use of veriﬁed lemmas supports the disproving pro-

cess. Finally, it would be interesting to look at combinations of various disprov-

ing strategies. When we are aware of the strengths and weaknesses of diﬀerent

strategies, we could possibly decide beforehand which one is most suitable for a

speciﬁc problem.

References

1. Markus Aderhold. Formula generalization in XeriFun. Diploma thesis, Technische

Universit¨at Darmstadt, 2004.

2. Wolfgang Ahrendt. Deductive search for errors in free data type speciﬁcations

using model generation. In A. Voronkov, editor, Proc. of the 18th International

Conference on Automated Deduction, volume 2392 of LNCS. Springer, 2002.

3. Stefan Berghofer and Tobias Nipkow. Random testing in Isabelle/HOL. In Software

Engineering and Formal Methods, pages 230–239. IEEE Computer Society, 2004.

68

4. Susanne Biundo, Birgit Hummel, Dieter Hutter, and Christoph Walther. The

Karlsruhe induction theorem proving system. In J. Siekmann, editor, Proc. of

CADE-8, volume 230 of LNCS, pages 672–674. Springer, 1986.

5. Alan Borning, Richard Anderson, and Bjorn N. Freeman-Benson. Indigo: A local

propagation algorithm for inequality constraints. In ACM Symposium on User

Interface Software and Technology, pages 129–136, 1996.

6. Hubert Comon and Pierre Lescanne. Equational problems and disuniﬁcation. Jour-

nal of Symbolic Computation, 7:371–425, 1989.

7. Daniel Jackson. Alloy: A lightweight object modelling notation. ACM Transactions

on Software Engineering and Methodology, 11(2):256–290, 2002.

8. Martin Protzen. Disproving conjectures. In D. Kapur, editor, Proc. of CADE-11,

volume 607 of LNAI, pages 340–354. Springer, 1992.

9. Wolfgang Reif, Gerhard Schellhorn, and Andreas Thums. Flaw detection in formal

speciﬁcations. In Proc. of IJCAR-1, pages 642–657. Springer, 2001.

10. Stephan Schweitzer. Symbolische Auswertung und Heuristiken zur Veriﬁkation

funktionaler Programme. Doctoral dissertation, TU Darmstadt, to appear 2006.

11. Graham Steel and Alan Bundy. Attacking group protocols by refuting incorrect

inductive conjectures. Journal of Automated Reasoning, pages 1–28, 2005.

12. Daniel Szallies. Ein Werkzeug zur automatischen Widerlegung von Aussagen in

XeriFun . Diplomarbeit, Technische Universit¨at Darmstadt, 2006.

13. Christoph Walther. On proving the termination of algorithms by machine. Artiﬁ-

cial Intelligence, 71(1):101–157, 1994.

14. Christoph Walther, Markus Aderhold, and Andreas Schlosser. The L1.0 Primer.

Technical Report VFR 06/01, Technische Universit¨at Darmstadt, 2006.

15. Christoph Walther and Stephan Schweitzer. About XeriFun. In F. Baader, editor,

Proc. of CADE-19, volume 2741 of LNCS, pages 322–327. Springer, 2003.

16. Christoph Walther and Stephan Schweitzer. Veriﬁcation in the classroom. Journal

of Automated Reasoning, 32(1):35–73, 2004.

17. Christoph Walther and Stephan Schweitzer. Automated termination analysis for

incompletely deﬁned programs. In F. Baader and A. Voronkov, editors, Proc. of

LPAR-11, volume 3452 of LNAI, pages 332–346. Springer, 2005.

18. Christoph Walther and Stephan Schweitzer. Reasoning about incompletely deﬁned

programs. In G. Sutcliﬀe and A. Voronkov, editors, Proc. of LPAR-12, volume 3835

of LNAI, pages 427–442. Springer, 2005.

69