Content uploaded by Christoph Walther
Author content
All content in this area was uploaded by Christoph Walther on Apr 23, 2017
Content may be subject to copyright.
The 2006 Federated Logic Conference
The Seattle Sheraton Hotel and Towers
Seattle, Washington
August 10 - 22, 2006
IJCAR’06 Workshop
DISPROVING’06:
Non-Theorems, Non-Validity, Non-Provability
August 16th, 2006
Proceedings
Editors:
W. Ahrendt, P. Baumgartner, H. de Nivelle
A Fast Disprover for XeriFun
Markus Aderhold, Christoph Walther, Daniel Szallies, and Andreas Schlosser
Fachgebiet Programmiermethodik, Technische Universit¨at Darmstadt, Germany
{aderhold, chr.walther, szallies, schlosser}@pm.tu-darmstadt.de
Abstract. We present a disprover for universal formulas stating conjec-
tures about functional programs. The quantified variables range over free-
ly generated polymorphic data types, thus the domains of discourse are
infinite in general. The objective in the development was to quickly find
counter-examples for as many false conjectures as possible without wast-
ing too much time on true statements. We present the reasoning method
underlying the disprover and illustrate its practical value in several ap-
plications in an experimental version of the verification tool XeriFun .
1 Introduction
As a common experience, specifications are faulty and programs do not meet
their intention. Program bugs range from easily detected simple lapses (such
as not excluding division by zero or typos when setting array bounds) to deep
logical flaws in the program design which emerge elsewhere in the program and
therefore are hard to discover.
But programmers’ faults are not the only source of bugs. State-of-the-art
verifiers synthesize conjectures about a program that are needed (or are at least
useful) in the course of verification: Statements may be generalized to be qualified
for a proof by induction, the verifier might generate termination hypotheses that
ensure a procedure’s termination, or it might synthesize conjectures justifying
an optimization of a procedure. Sometimes these conjectures can be faulty, i.e.
over-generalizations might result, the verifier comes up with a wrong idea for
termination, or an optimization simply does not apply.
Verifying that a program meets its specification is a waste of time in all these
cases, and therefore one should begin with testing the program beforehand. How-
ever, as testing is a time consuming and boring task, machine support is welcome
(not to say needed) to relieve the human from the test-and-verify cycle. Program
testing can be reformulated as a verification problem: A program conjecture φ
fails the test if the negation of φcan be verified. However, for proving these
negated conjectures, a special verifier—called a disprover—is needed.
In this paper, we present such a disprover for statements about programs
written in the functional programming language L[14], which has been in-
tegrated into an experimental version [12] of the interactive verification tool
XeriFun [15, 16]. The procedures of L-programs operate over freely generated
structure bool <=true,false
structure N<= 0, +(−:N)
structure list[@A]<=ε,[infix] ::(hd : @A,tl :list[@A])
function [outfix] |(k:list[@A]) : N<=
if k=εthen 0else +(|tl(k)|)
function [infix] <>(k, l :list[@A]) : list[@A]<=
if k=εthen lelse hd(k) :: (tl(k)<> l)end
function rev(k:list [@A]) : list [@A]<=
if k=εthen εelse rev(tl (k)) <> hd (k) :: εend
lemma rev <> <=
∀k, l :list[@A]rev(k <> l)=rev (k)<> rev(l)
Fig. 1. A simple L-program
polymorphic data types and are defined by using recursion, case analyses, let-
expressions, and functional composition. The data types bool for Boolean values
and Nfor natural numbers as well as equality =: @A×@A→bool and
a procedure >:N×N→bool deciding the >-relation on are predefined
in L. Figure 1 shows an example of an L-program that defines a polymorphic
data type list[@A], list concatenation <>, and list reversal rev. In this program,
the symbols true,false are constructors of type bool,−(. . .) is the selector of
the N-constructor +(. . .), and hd and tl are the selectors of constructor :: for
lists. Subsequently, we let Σ(P) denote the signature of all function symbols de-
fined by an L-program P, and Σ(P)cis the signature of all constructor function
symbols in P. An operational semantics for L-programs Pis defined by an inter-
preter evalP:T(Σ(P)) 7→ T (Σ(P)c) which maps ground terms to constructor
ground terms of the respective monomorphic data types using the definition of
the procedures and data types in P, cf. [10, 14, 18].
In L, statements about programs are given by expressions of the form lemma
name <=∀x1:τ1, . . . , xn:τnb(cf. Fig. 1), where b—called the body of the
lemma—is a Boolean term built with the variables xi(of type τi) from a set V
of typed variables and the function symbols in Σ(P), where case analyses (like
in procedure definitions) and the truth values are used to represent connectives.
Hence the general form of the proof obligations we are concerned with are univer-
sal formulas φ=∀x1:τ1, . . . , xn:τnb. Disproving such a formula φis equivalent
to proving its negation ¬φ≈ ∃x1:τ1, . . . , xn:τn¬b, and as the domain of each
type τican be enumerated, disproving φis a semi-decidable problem. A disproof
of φ(also called a witness of ¬φ) can be represented by a constructor ground
substitution σsuch that evalP(σ(b)) = false . Consequently, disproving φcan be
viewed as solving the (semi-decidable) equational problem b.
=false.
To solve such an equation, we develop two disproving calculi that consti-
tute the two phases of our disprover. The inference rules of both calculi are
inspired by the calculus proposed in [6] by Comon and Lescanne for solving
equational problems. As disproving is semi-decidable, a complete disprover can
60
be developed. However, as truth of universal formulas φis not semi-decidable
by G¨odel’s first incompleteness theorem, disproving φis undecidable. Therefore
a complete (and sound) disprover need not terminate. But the use in an inter-
active environment—such as the one XeriFun provides—requires termination of
all subsystems, hence completeness must be sacrificed in favor of termination.
The use in an interactive environment also demands runtime performance, so
particular care is taken to achieve early failure on non-disprovable conjectures.
In Section 2, we explain how we disprove universal formulas φ. In Section 3,
we demonstrate the practical use of our disprover when XeriFun employs it in
different disproving applications. We compare our proposal with related work in
Section 4 and conclude with an outlook on future work in Section 5.
2 Disproving Universal Formulas
Our disproving method proceeds in two phases. The first phase is based on
the elimination calculus (E-calculus for short). Its language is given by LE:=
{hE, σ i ∈ LE×Sub | V(E)∩dom(σ) = ∅}:LEis the set of clauses in which
atoms are built with terms from T(Σ(P),V) and the predicate symbols .
= of
type @A×@Aand mof type N×N; negative literals are written t16.
=t2or t1m/ t2,
respectively. Sub denotes the set of all constructor ground substitutions σ, i. e.
σ(v)∈ T (Σ(P)c) for each v∈dom(σ). The inference rules of the E-calculus
(defined below) are of the form “ hE,σi
hλ(E0),σ◦λi, if cond”, where cond stands for
a side condition that has to be satisfied to apply the rule, and λ∈Sub. An E-
deduction is a sequence hE1, σ1i, . . . , hEn, σnisuch that for each i,hEi+1 , σi+1i
originates from hEi, σiiby applying an E-inference rule, and hE1, σ1i `EhEn, σni
denotes the existence of such an E-deduction.
The second phase of our disproving method uses the solution calculus (S-
calculus for short). It operates on LS:= {hE, σi ∈ LS×Sub | V(E)∩dom (σ) =
∅}:LS⊂LEis the set of clauses in which atoms are formed with predicate
symbols .
=, mand terms from T(Σ(P0),V), where Σ(P0) emerges from Σ(P)
by removing the function symbols if and =as well as all procedure function
symbols. The form of the S-inference rules (defined below) and deduction `S
are defined identically to the E-calculus, and hE, σi `S ◦E hE00, σ 00idenotes the
existence of a composed deduction hE , σi `EhE0, σ0i `ShE00, σ00i.
A substitution σis an E-substitution for a clause E∈LE,σ∈SubEfor short,
iff σ(v)∈ T (Σ(P)c) for each v∈ V(E). We write σlif an {l}-substitution σ
solves an E-literal l, defined by σt1.
=t2iff evalP(σ(t1)) = evalP(σ(t2)) and
σt1mt2iff evalP(σ(t1)) >evalP(σ(t2)). An E-clause Eis solved by σ∈SubE,
σEfor short, iff σlfor each l∈E. Both calculi are sound in the sense that
hE, σ i `... hE0, σ0ientails θσ0(E) for each θ∈SubE0with θE0, and σ⊆σ0.
To disprove a conjecture φ=∀x1:τ1, . . . , xn:τnb, we search for a deduction
h{b.
=false}, εi `S ◦E h∅, σi.1Hence σrepresents a disproof of φ, as σb.
=false.
1Since the domain of each data type is at most countably infinite, we actually use
monomorphic types τ0
iinstead of the polymorphic types τiin φwithout loss of gen-
61
(1) hE] {l}, σi
hE∪ {l[π←t2], t1.
=true}, σ i|hE∪ {l[π←t3], t1.
=false}, σ i,
if l|π=if (t1, t2, t3)
(2) hE] {l}, σi
hE∪ {l[π←w], w .
=f(...)}, σi, if f∈Σproc,l|π=f(...), and π /∈ {1,2}
(3) hE] {l}, σi
hE∪ {l[π←ρ(r)]} ∪ Econd, σ i, if l|π=f(t1,...,tn) for some π∈ {1,2}
and hC, C, r i ∈ Dffor f∈Σproc
where Econd =Sc∈C{ρ(c).
=true} ∪ Sc∈C{ρ(c).
=false}and
ρ:= {x1/t1,...,xn/tn}for the formal parameters xiof f
(5) hE] {v.
=cons(t1,...,tn), v .
=cons(t0
1,...,t0
n)}, σi
hE∪ {v.
=cons(w1,...,wn)} ∪ Sn
j=1{wj.
=tj, wj.
=t0
j}, σi
(6) hE] {v.
=cons(t1,...,tn), v 6.
=v0, v0.
=cons(t0
1,...,t0
n)}, σi
hE∪ {v.
=cons(w1,...,wn), v0.
=cons(w0
1,...,w0
n)}
∪ {wi6.
=w0
i} ∪ Sn
j=1{wj.
=tj, w0
j
.
=t0
j}, σi
, if i∈ {1,...,n}
(7) hE] {l}, σi
hE∪ {l[π←wi], w .
=cons(w1,...,wn), w .
=t}, σi, if l|π=sel i(t)2
(8) hE] {+(t1)+(t2)}, σi
hE∪ {t1t2}, σi, if ∈{m,m/}
Fig. 2. Inference rules of the E-calculus
The inference rules of both calculi are given in the subsequent paragraphs.
The most important rules are formally defined whereas others (denoted by rule
numbers in italics) are only informally described for the sake of brevity. In order
to reduce the depth of the terms in E- and S-literals, some of the rules introduce
fresh variables (called “auxiliary unknowns” in [6]), which we denote by wand
w0. Terms are written as t,t1and t2, and vand v0denote variables.
2.1 Inference rules of the E-calculus
The E-calculus consists of the inference rules (1)–(3) and (5)–(8) of Fig. 2 plus
rules (4) and (9)–(10 ) described informally. The purpose of the E-inference rules
is to eliminate all occurrences of if ,=, and of procedure function symbols so that
some hE, σ i ∈ LSis obtained by an E-deduction h{b.
=false}, εi `EhE , σi.3All
rules are supplied with an additional side condition (*) demanding E /∈L⊥for
each hE, σithey apply to, where L⊥is the set of all E-clauses containing evident
erality. Type τ0
ioriginates from type τiby instantiating each type variable in τiwith
type N. E. g., to disprove ∀k, l :list [@A]k <> l =l <> k, the monomorphic instance
∀k, l :list [N]k <> l =l <> k is considered.
2Assuming t.
=cons(...) is sound, as well-typedness is demanded.
3This elimination is possible whenever b.
=false is solvable, as each procedure call
needs to be unfolded by rule (3) only finitely many times.
62
(15) hE] {t1t2}, σi
hE∪ {w1.
=t1, w2.
=t2, w1w2}, σi, if t1, t2/∈ V
(16) hE] {v6.
=t}, σi
hE∪ {v6.
=t, v .
=cons0(w1,...,wn)}, σi, if t∈ V or t=cons(...)
(17) hE] {t1m/ t2}, σ i
hE∪ {t1.
=t2}, σi|hE∪ {t2mt1}, σi, if t1, t2/∈ V
(18) hE] {t16.
=t2}, σi
hE∪ {t1mt2}, σi|hE∪ {t2mt1}, σi, if t1, t2∈ T (Σ(P0),V)N
Fig. 3. Inference rules of the S-calculus
contradictions such as {t6.
=t, . . .},{0mt, . . .}or {t.
= 0, t .
= 1, . . .}. This
proviso corresponds to the elimination of trivial disequations and the clash rule
for equations in [6].
Rule (1) eliminates an if -conditional and rule (2) eliminates an inner pro-
cedure call from a literal.4Rule (3) unfolds a call of procedure fthat occurs as
adirect argument in a literal. A procedure fis represented here by a set Dfof
triples hC, C, risuch that ris the if -free result term in the procedure body of f
obtained under the conditions C∪{¬c|c∈C}, where Cand Cconsist of if -free
Boolean terms only. E. g., D<> consists of two triples, viz. d1={k=ε},∅, l
and d2=∅,{k=ε},hd(k) :: (tl(k)<> l), for procedure <> of Fig. 1.
A further rule (4) translates inequations and equations expressed with sym-
bols from Σ(P) into E-literals; e. g., “t1>t2.
=false” is translated into “t1m/ t2”.
Rule (6) is like the decomposition rule from [6] for inequations, but restricted
to constructors. Another rule (9)—corresponding to the elimination of trivial
equations and clash for inequations in [6]—removes trivial literals such as t.
=t
or 0 m/ t from a clause E∈LEand supplies arbitrary values in λfor variables
that disappear from the clause. Finally, literals are simplified by rule (10 ), which
replaces subterms of the form seli(cons(t1, . . . , tn)) with ti. This rule does not
exist in [6] and accounts for data type definitions with selectors.
2.2 Inference rules of the S-calculus
The S-calculus consists of the inference rules (5)–(19), to which the additional
side condition (*) applies as well. Rules (5)–(10) are the same as in the E-calculus,
rules (11 )–(14 ) are “structural” rules to merge (in)equations, to replace variables
with terms, and to solve equations v.
=twith t∈ T (Σ(P)c) by substitutions
λ:= {v/t}.
Rules (15)–(18) are given in Fig. 3. Rule (15) removes non-variable argu-
ments from an S-literal, rule (16) is basically a case analysis on vusing some
4For a literal l=t1t2,l|πis the subterm of lat occurrence π∈Occ(l), and l[π←t]
is obtained from lby replacing l|πin lwith t. We use “|” as a shorthand for the
succedents of different rules with the same premises and side conditions. All rules
are applied “modulo symmetry” of .
= and 6.
= if possible (e. g., see rule (5)).
63
constructor cons0, and rules (17) and (18) eliminate negative literals. Rule (19 )
invokes a constraint solver.5We call an S-literal t1t2aconstraint literal iff
at least one of the tiis a variable of type N,Σ(t1)∪Σ(t2)⊆ {0,+,−}, and
∈ { .
=,m};Cis the set of all constraint literals. When none of the other S-rules
is applicable, rule (19 ) passes the constraint literals E∩C to a modified version
of Indigo [5]: To terminate on cyclic constraints like {xmy, y mx}, we simply
limit the number of times a constraint can be used by the number of variables
in E∩ C. If E∩ C can be satisfied, we get a solving assignment λ∈SubE∩C;
otherwise rule (19 ) fails.6
2.3 Search Heuristic and Implementation
By the inherent indeterminism of both calculi, search is required for computing
E- and S-deductions: An E-clause Eto be solved defines an infinite E-search tree
ThE,εi
Ewhose nodes are labeled with elements from LE. The root node of ThE ,εi
E
is labeled with hE , εi, and hE00, σ00iis a successor of node hE0, σ0iiff hE00, σ 00i
originates from hE0, σ0iby applying some E-inference rule. The leaves of ThE,εi
E
are given by the E-success and the E-failure nodes: hE0, σ0iis an E-success node
iff E0∈LS\L⊥, and hE0, σ0iis an E-failure node iff E0∈L⊥. A path from the
root node to an E-success node is called an E-solution path. All these notions
carry over to S-search trees ThE ,σi
Sby replacing Ewith Sliterally, except that
an S-node labeled with hE0, σ 0iis an S-success node iff E0=∅, and hE0, σ0iis
an S-failure node iff E06=∅and no S-inference rule applies to hE0, σ0i.7
An E-clause Eto be solved defines an infinite S ◦ E -search tree TE
S◦E , which
originates from ThE,εi
Eby replacing each E-success node hE0, σ0iwith the S-search
tree ThE0,σ0i
S. An S ◦ E-solution path in TE
S◦E is an E-solution path pfollowed by
an S-solution path that starts at the E-success node of path p.
To disprove a conjecture φ=∀x1:τ1, . . . , xn:τnb, the S ◦ E-search tree
T{b.
=false}
S◦E is explored to find an S ◦E-solution path. In order to guarantee termi-
nation of the search, only a finite part of T{b.
=false}
S◦E may be explored. Additional
side conditions for the E- and S-inference rules ensure that one rule does not
undo another rule’s work. Rules that do not require a choice, e.g. rule (5), are
preferred to those that need some choice, e. g. rules (6) and (16).
The most significant restriction in exploring T{b.
=false}
S◦E (supporting termina-
tion at the cost of completeness) comes from an additional side condition (**) of
rule (3), called the paramodulation rule in [8]: Definition triples d:= (C , C, r)∈
Dfwith f /∈Σ(r), called non-recursive definition triples, can be applied as often
5We use mand m/in LEonly to handle calls of the predefined procedure >more
efficiently by a constraint solver.
6In our setting there is no need to assign priorities to constraints, so we can simplify
the algorithm by treating all constraints as “required” constraints.
7E0∈L⊥is sufficient but not necessary for hE0, σ0ibeing an S-failure node, as the
constraint solver called by rule (19 ) might fail on some S-clause E0∈LS\L⊥.
64
as possible. However, if f∈Σ(r), we need to limit the usage of d. Side condi-
tion (**) demands that recursive definition triples be used at most once on each
side of a literal in each branch of Th{b.
=false},εi
E. This leads to a fast disprover
that works well on simple examples, cf. Sect. 3. We call this restriction simple
paramodulation.
To increase the deductive performance of the disprover, we can allow more
applications of a recursive definition triple dby considering the “context” of
f-procedure calls. Each procedure call f(. . .) in the original formula φis labeled
with (N, . . . , N )∈kfor a constant N∈, e. g. N= 2, and k=|Df|.Context-
sensitive paramodulation modifies side condition (**) in the following way: A
recursive definition triple di:= (Ci, C i, ri)∈Dfmay only be used if procedure
call f(. . .) is labeled with (n1, . . . , nk) such that ni>0. The recursive calls of f
in riare labeled with (n1, . . . , ni−1, ni−1, ni+1, . . . , nk), and the other procedure
calls in riare labeled with (N, . . . , N ). Context-sensitive paramodulation still
allows only finitely many applications of rule (3), as the labels decrease with each
rule application. Section 3 gives examples that illustrate the difference between
these alternatives in practice. Note that simple paramodulation is not a special
case of context-sensitive paramodulation (by setting N= 1), because it does
not distinguish between different occurrences of a procedure as context-sensitive
paramodulation does.
For efficiency reasons (wrt. memory consumption), we explore Th{b.
=false},εi
E
with a depth-first strategy, whereas ThE0,σ0i
Sis examined breadth-first to avoid
infinite applications of rule (16). Two technical optimizations considerably speed
up the search for an S ◦ E-solution path. Firstly, caching allows to prune a
branch that has already been considered in another derivation. The cache hit
rates are about 20 %. Secondly, while exploring Th{b.
=false},εi
Ewe can already
start a subsidiary S-search on S-literals from a clause E /∈L⊥(even though
node hE, σiof Th{b.
=false},εi
Eis not an E-success node) and feed the results back
to the E-search node hE, σ i. For instance, if we derive x.
=+(y) from E, we
can discard E-branches that consider the case x.
= 0. In conjunction with simple
paramodulation, this (empirically) leads to early failure on unsolvable examples.
3 Using the Disprover
In this section we illustrate the use and the performance of our disprover when
it is employed as a subsystem of XeriFun [12]. Unless otherwise stated, we use
simple paramodulation. We distinguish between conjectures provided by the user
and conjectures speculated by the system.
3.1 User-Provided Conjectures
Before trying to verify a program statement, it is advisable to make sure that
it does not contain lapses that render it false. E. g., in arithmetic we are often
interested in cancelation lemmas such as xy=xz→y=z. However, the disprover
65
finds the witness {x/0, y/0, z/1}falsifying the conjecture. Excluding x=0 does
not help, as now the witness {x/1, y/0, z/1}is quickly computed. But excluding
x=1 as well causes the disprover to fail, hence we are expectant that verifi-
cation of ∀x, y, z :Nx=/ 0 ∧−(x)=/ 0 ∧xy=xz→y=zwill succeed. If we con-
jecture the associativity of expontiation, (xy)z=x(yz), the disprover finds the
witness {x/2, y/0, z/0}. For the injectivity conjecture of the factorial function,
i. e. ∀x, y :Nx!=y!→x=y, the disprover comes up with the witness {x/1, y/0}
and fails if we demand x=/ 0 ∧y=/0 in addition. For ∀k, l :list[@A]k <> l =l <> k,
the solution {k/0 :: ε, l/1 :: ε}is computed.
All conjectures from above are disproved within less than a second.8One
might argue that these disproofs are quite simple, so they should be easy to
find. XeriFun’s old disprover [1] basically substitutes the variables with values
(or value templates like n:: kfor lists) of a limited size and uses a heuristic search
strategy to track down a counter-example quickly if one exists. However, such
a strategy does not lead to early failure on true conjectures: The old disprover
fails after 46 s on the conjecture that procedure perm (deciding whether two
lists are a permutation of each other) computes a symmetric relation.9The new
disprover fails after just a second.
The disprover also helps to find simple flaws in the definition of lemmas
or procedures. For instance, it disproves lemma “rev <>” (cf. Fig. 1), yielding
{k/0 :: ε, l/1 :: ε}. Also, the termination hypothesis for <> is disproved at once
if one inadvertently writes tl(l) in the recursive call of <> (instead of tl(k)).
Similar errors are the use of ≥instead of >in program conjectures and procedure
definitions.
To illustrate the consequences of simple paramodulation, consider formula
∀k:list[@A]rev(k)=k. As the smallest solution is {k/0 :: 1 :: ε}, we need to open
rev twice. Thus the disprover fails to find this witness with simple paramodu-
lation, but succeeds with context-sensitive paramodulation. The same effect is
observed with lemma “rev <>” or with ∀x:N2x> x2. However, as most con-
jectures do not need extensive search, we prefer to save time and offer this
alternative only as an option to the user who is willing to spend more time on
the search for a disproof.
3.2 Conjectures Speculated by the System
When generalizing statements by machine, a disprover is needed to detect over-
generalizations. E. g., XeriFun’s generalization heuristic [1] tries to generalize
φ=∀k, l :list [@A]half (|k <> l |)=half (|l <> k|) to φ0=∀k, l :list [@A]|k <> l|=
|l <> k|and then φ0to φ00 =∀k, l :list[@A]k <> l =l <> k . Our disprover quickly
fails on φ0and succeeds on φ00 (see above), hence generalization φ0is a good candi-
date for a proof by induction, whereas φ00 is recognized as an over-generalization
of φ.
8All timing details refer to our single-threaded Java implementation on a 3.2 GHz
hyper-threading CPU, where the Java VM was assigned 300 MB of main memory.
9The old disprover examined the conjecture for lists of length ≤2 and natural numbers
between 0 and 2.
66
Another example of such a generate-and-test cycle is recursion elimination:
For user-defined procedures, XeriFun synthesizes so-called difference and do-
main procedures which represent information that is useful for automated anal-
ysis of termination [13, 17] and for proving absence of “exceptions” [18] (caused
by division by 0, for example). Both kinds of procedures may contain unneces-
sary recursive calls, which complicate subsequent proofs. Therefore the system
generates recursion elimination formulas [13] justifying a sound replacement of
some recursive calls with truth values. For those formulas that the system could
not prove, the user has to decide whether to support the system either by inter-
actively constructing a proof or by giving a witness to disprove the conjecture.
He can also ignore the often unreadable conjectures (which most users do), not
being aware that missing a true recursion elimination formula means much more
work in subsequent proofs.
For example, for the domain procedure of a tautology checker (cf. procedure 0
in [14]), XeriFun generates 62 recursion elimination formulas. Our disprover
falsifies all of them within 33 s. Without a disprover, we wasted four times longer
on futile proof attempts from which we cannot conclude anything. With the old
disprover, it took more than five times longer to disprove 59 formulas; it failed
on the others. For other domain or difference procedures, the disprover performs
equally well, so in the vast majority of cases the user does not need to worry
about recursion elimination any more. This is a tremendous improvement in
user-friendliness.
4 Related Work
The problem of automatically disproving statements in the context of program
verification has been tackled in various research projects.
Protzen [8] describes a calculus to disprove universal conjectures in the INKA
system [4]. While it apparently performs quite well on false system-generated
conjectures, it has a rather poor performance on true ones; if the input conjecture
is true, it searches until it reaches an explicit limit of the search depth.
A disprover for KIV is presented in [9]. The existing proof calculus is modified
so that it is able to construct disproofs. This interleaves the incremental instan-
tiation of variables and simplifying proof steps. For solvable cases “good results”
are reported, whereas performance on unsolvable problems is not communicated.
Ahrendt has developed a complete disprover for free data type specifica-
tions [2]. Since the interpretation of function symbols is left open in this loose
semantics approach, one needs to consider all models satisfying the axioms when
proving the non-consequence of a conjecture φ. Similarly, the Alloy modeling
system [7] can investigate properties of under-specified models. The correspond-
ing constraint analyzer checks only models with a bounded number of elements
in each primitive type, so (like our disprover) it is incomplete. Differently from
these approaches, we consider only a fixed interpretation of function symbols
(given by the interpreter evalP) in our setting.
67
Isabelle supports a “quickcheck” command [3] to test a conjecture by substi-
tuting random values for the variables several times. A comparison of the success
rates and the performance of this approach with our results is planned as future
work.
Coral [11] is a system designed to find non-trivial flaws in security protocols.
It is based on the Comon-Nieuwenhuis method for proof by consistency and uses
a parallel architecture for consistency checking and so-called induction derivation
to ensure termination. Finding an attack on a protocol may take several hours
with Coral.
5 Conclusion
In the design of our disprover we tried to minimize the time wasted on true
conjectures. We achieved this by limiting the application of the paramodulation
rule. Apart from this, we do not need any explicit depth limits. In particular,
there is no explicit limit on the size of a witness. We also reduce the cost of
simplifications by restricting them to selector and constructor calls. By incor-
porating a constraint solver [5] for inequalities on the predefined data type N
for , we further improved the performance.
We identified several applications of our disprover that considerably improve
the productivity when working with the XeriFun system. The main application
of our disprover is bulk processing (such as recursion elimination) or automatic
generalization. While it is possible to approximate completeness arbitrarily well
to find deeper flaws in a program (conjecture), this would tremendously increase
the time wasted on true conjectures. The advantage of our disprover is that it
is successful in most solvable cases and quickly gives up in unsolvable cases, as
practical experiments reveal.
In future work, we intend to investigate further heuristics for the paramodu-
lation rule, which primarily controls the power of the disprover. We also intend
to examine whether the use of verified lemmas supports the disproving pro-
cess. Finally, it would be interesting to look at combinations of various disprov-
ing strategies. When we are aware of the strengths and weaknesses of different
strategies, we could possibly decide beforehand which one is most suitable for a
specific problem.
References
1. Markus Aderhold. Formula generalization in XeriFun. Diploma thesis, Technische
Universit¨at Darmstadt, 2004.
2. Wolfgang Ahrendt. Deductive search for errors in free data type specifications
using model generation. In A. Voronkov, editor, Proc. of the 18th International
Conference on Automated Deduction, volume 2392 of LNCS. Springer, 2002.
3. Stefan Berghofer and Tobias Nipkow. Random testing in Isabelle/HOL. In Software
Engineering and Formal Methods, pages 230–239. IEEE Computer Society, 2004.
68
4. Susanne Biundo, Birgit Hummel, Dieter Hutter, and Christoph Walther. The
Karlsruhe induction theorem proving system. In J. Siekmann, editor, Proc. of
CADE-8, volume 230 of LNCS, pages 672–674. Springer, 1986.
5. Alan Borning, Richard Anderson, and Bjorn N. Freeman-Benson. Indigo: A local
propagation algorithm for inequality constraints. In ACM Symposium on User
Interface Software and Technology, pages 129–136, 1996.
6. Hubert Comon and Pierre Lescanne. Equational problems and disunification. Jour-
nal of Symbolic Computation, 7:371–425, 1989.
7. Daniel Jackson. Alloy: A lightweight object modelling notation. ACM Transactions
on Software Engineering and Methodology, 11(2):256–290, 2002.
8. Martin Protzen. Disproving conjectures. In D. Kapur, editor, Proc. of CADE-11,
volume 607 of LNAI, pages 340–354. Springer, 1992.
9. Wolfgang Reif, Gerhard Schellhorn, and Andreas Thums. Flaw detection in formal
specifications. In Proc. of IJCAR-1, pages 642–657. Springer, 2001.
10. Stephan Schweitzer. Symbolische Auswertung und Heuristiken zur Verifikation
funktionaler Programme. Doctoral dissertation, TU Darmstadt, to appear 2006.
11. Graham Steel and Alan Bundy. Attacking group protocols by refuting incorrect
inductive conjectures. Journal of Automated Reasoning, pages 1–28, 2005.
12. Daniel Szallies. Ein Werkzeug zur automatischen Widerlegung von Aussagen in
XeriFun . Diplomarbeit, Technische Universit¨at Darmstadt, 2006.
13. Christoph Walther. On proving the termination of algorithms by machine. Artifi-
cial Intelligence, 71(1):101–157, 1994.
14. Christoph Walther, Markus Aderhold, and Andreas Schlosser. The L1.0 Primer.
Technical Report VFR 06/01, Technische Universit¨at Darmstadt, 2006.
15. Christoph Walther and Stephan Schweitzer. About XeriFun. In F. Baader, editor,
Proc. of CADE-19, volume 2741 of LNCS, pages 322–327. Springer, 2003.
16. Christoph Walther and Stephan Schweitzer. Verification in the classroom. Journal
of Automated Reasoning, 32(1):35–73, 2004.
17. Christoph Walther and Stephan Schweitzer. Automated termination analysis for
incompletely defined programs. In F. Baader and A. Voronkov, editors, Proc. of
LPAR-11, volume 3452 of LNAI, pages 332–346. Springer, 2005.
18. Christoph Walther and Stephan Schweitzer. Reasoning about incompletely defined
programs. In G. Sutcliffe and A. Voronkov, editors, Proc. of LPAR-12, volume 3835
of LNAI, pages 427–442. Springer, 2005.
69