ArticlePDF Available


This paper reports on an exploration of Boolos' Curious Inference, using higher-order automated theorem provers (ATPs). Surprisingly, only suitable shorthand notations had to be provided by hand for ATPs to find a short proof. The higher-order lemmas required for constructing a short proof are automatically discovered by the ATPs. Given the observations and suggestions in this paper, full proof automation of Boolos' and related examples now seems to be within reach of higher-order ATPs. 1 Which Automated Theorem Prover to Choose for Really Difficult Prob-lems? Consider the following thought experiment: shortly after their death, Folbert and Holly arrive at the gates of heaven, and both want to enter. Unfortunately, only one of the two can be admitted, and they have to settle this in a little contest: Folbert and Holly are both asked to (i) choose either a first-order (FO) automated theorem prover (ATP), or, alternatively, an higher-order (HO) ATP; and then (ii) pose a FO proof problem, encode it in FO logic and give it to the other's ATP. The one whose ATP solves the proof problem posed to it the faster will be admitted to heaven. There is a time limit for *
Who Finds the Short Proof?
An Exploration of Boolos’ Curious Inference using
Higher-order Automated Theorem Proving
Christoph Benzm¨uller1,2*, David Fuenmayor1,2,
Alexander Steen3and Geoff Sutcliffe4
1AI Systems Engineering, Otto-Friedrich-University Bamberg,
An der Weberei 5, Bamberg, 96049, Germany.
2Department of Mathematics and Computer Science, FU Berlin,
Arnimallee 7, Berlin, 14195, Germany.
3Institute of Mathematics and Computer Science, University of
Greifswald, Walter-Rathenau-Str. 47, Greifswald, 17489,
4Department of Computer Science, University of Miami,
1365 Memorial Drive, Coral Gables, FL 33124-4245, USA.
*Corresponding author(s). E-mail(s):;
Contributing authors:;;;
This paper reports on an exploration of Boolos’ Curious Inference, using
higher-order automated theorem provers (ATPs). Surprisingly, only suit-
able shorthand notations had to be provided by hand for ATPs to find a
short proof. The higher-order lemmas required for constructing a short
proof are automatically discovered by the ATPs. Given the observa-
tions and suggestions in this paper, full proof automation of Boolos’ and
related examples now seems to be within reach of higher-order ATPs.
Keywords: Speedup of proofs; Boolos’ Curious Inference; Higher-order
automated theorem proving; Cut-introduction; Comprehension
2Who Finds the Short Proof?
1 Which Automated Theorem Prover to
choose for Really Difficult Problems?
Consider the following thought experiment: Shortly after their death, Folbert
and Holly arrive at the gates of heaven, and both want to enter. Unfortunately
only one of the two can be admitted, and they have to settle this in a little
contest: Folbert and Holly are both asked to (i) choose either a first-order
(FO) automated theorem prover (ATP), or, alternatively, an higher-order (HO)
ATP; and then (ii) pose a FO proof problem, encode it in FO logic, and give
it to the other’s ATP. The one whose ATP solves the proof problem posed to
it the faster will be admitted to heaven. There is a time limit for the contest
until midnight the same day. If the battle ends in a draw because neither
ATP finds a proof by midnight, the contest is repeated the next day, and so
on. Which ATP should Folbert and Holly choose, and which proof problem?
Is there a winning strategy?
FOlbert turns out to be a FO logic enthusiast, and announces that he
will use a FO ATP. HOlly likes HO logic better, and chooses an HO ATP.
Which exact system they choose is not revealed to the other person, nor how
it functions internally. At first glance, Folbert seems to be in a more promising
position because the proof problems they must provide have to be encoded in
FO logic. However, as will be seen, only Holly has a realistic chance to win this
contest (maybe not on the first day though), provided she chooses a suitable
proof problem.
Key to Holly’s advantage are the (hyper-)exponentially shorter proofs that
are possible as one moves up the ladder of expressiveness from first-order logic
to second-order logic, to third-order logic, and so on [1]. The fact that the
proof problems are stated in FO logic does not matter. When stating the same
problem in the same FO way but in higher-order logic, much shorter proofs
are possible, some of which might be (hyper-)exponentially shorter than the
proofs that can be found with FO ATPs. A very prominent example of such a
short proof is that of Boolos’ Curious Inference [2].
The key to the shorter proofs in HO logic is the possibility of introducing
powerful shorthand notations in which structural aspects of the problem at
hand can be expressed in ways that are not possible in FO logic. The use of
HO variables and quantifiers in these shorthand notations provides an oppor-
tunity to shorten proof arguments by explicitly talking about and working
with the mathematical structures involved. In Boolos’ Curious Inference this
concerns an inductively defined, very fast growing, Ackermann-style function.
A fascinating perspective on meta-level aspects of an otherwise mathemati-
cally boring, repetitive FO inference can be provided and exploited this way.
A most interesting aspect of this paper is that the required “curious” lemmas,
supplied by hand by the ingenious Boolos, can now be synthesized by modern
Based on our experiments we argue that there is a winning strategy in the
contest for only Holly. Holly is the only person who is able to reliably prevent
Who Finds the Short Proof? 3
her opponent from winning, provided she chooses Boolos’ challenge problem
to be given to Folbert. Folbert can of course try something similar, but this
cannot prevent Holly’s victory in the long run, since full proof automation of
Boolos’ challenge problem is now within reach of HO ATPs. As the experiments
reported in this paper show, significant progress has been achieved in HO ATP
systems in recent years, which enables the automatic exploration of variants
of Boolos’ Curious Inference. In particular, powerful and intuitive lemmas can
be automatically discovered by HO ATPs, and used in short proofs. The only
thing left for the human to do has been to suggest suitable shorthand notations.
In Section 3the proofs produced by the E ATP [3], which are based on
three lemmas, are discussed in some detail (we chose E because its proofs
were the most readable and because E also performed best). Further proofs
are reported by the HO ATPs cvc5 [4], Ehoh [5], Leo-III [6], and Zipperposi-
tion [7]; these contributions will be studied in more depth in future work. This
paper also discusses the remaining key challenge that needs to be addressed by
HO ATPs to enable full automation of finding a solution to Boolos’ and sim-
ilar problems: constructing the required shorthand notations using controlled
cut/comprehension introduction, as was already hinted at in [8].
Paper outline: Section 2briefly recaps Boolos’ challenge problem, and
points to some formalisations that have been verified with interactive HO
proof assistants. Section 3demonstrates that HO ATPs can easily solve Boo-
los’ problem and produce short proofs when suitable shorthand notations are
provided. Section 3presents and discusses the proofs found by E. Section 4
provides a survey of the results from several HO ATPs. Section 5explains why
the required shorthand notation cannot (yet) be generated with state-of-the
art HO ATPs, and points to interesting future research on controlled cut/com-
prehension introduction. Section 6concludes the paper: Holly is the only one
who will one day enter heaven.
2 Boolos’ Curious Inference
In his article “A Curious Inference” [2], Boolos presents the following proof
problem, consisting of axioms A1-A5 with the conjecture C. This challenge is
referred to as BCP (for Boolos’ Curious Problem) in the rest of this paper:
n.f(n, e) = s(e) (A1)
y.f (e, s(y)) = s(s(f(e, y))) (A2)
x y.f (s(x), s(y)) = f(x, f(s(x), y)) (A3)
d(e) (A4)
x.d(x)d(s(x)) (A5)
d(f(s(s(s(s(e)))), s(s(s(s(e)))))) (C)
In the first three axioms there are three uninterpreted FO constant symbols: a
nullary FO constant symbol e(intuitively, think of it as the number one, if it
4Who Finds the Short Proof?
helps), an unary FO function symbol s(think of is as the successor function),
and a binary FO function symbol f, whose semantics is inductively charac-
terised in the axioms A1-A3 (where A2 and A3 are recursive; note that A3 is
actually double recursive). These axioms capture the fact that fbelongs to
a class of extremely fast growing functions, also known as Ackermann(-style)
functions [9]. Exhaustive evaluation of the term f(s(s(s(s(e)))), s(s(s(s(e)))))
with these recursive equations unfolds it to a term that contains more s’s
than there are atoms in the universe. The conjecture C formulated by Boo-
los is to prove that d(think of it as the is-a-natural-number predicate) holds
for f(s(s(s(s(e)))), s(s(s(s(e))))). dis actually an uninterpreted FO predicate
symbol that holds for e(A4), and is propagated from any xto sx (A5). A
formalisation of BCP in the TPTP THF language [10,11] can be found in
Appendix A.
It is easy to see that this proof problem is solvable in FO logic by applying
an astronomically large number of modus ponens steps to A4 and (instances
of) A5. Boolos showed that for a cut-free FO calculus the magnitude of this
number is comparable to f(5,5) for the Ackermann function finterpreted on
the naturals. Due to the enormous number of required inference steps it is
certain that no such proof can realistically be found, let alone represented,
within cut-free FO proof systems. However, Boolos also showed that there
exists a short proof of BCP in second-order logic. This proof has been encoded
and verified in the mathematical proof assistant systems Omega [12] and Mizar
[13] by Benzm¨uller and Brown [14], at essentially the same level of granularity
as that of Boolos’ proof. This earlier work was recently repeated by Ketland
[15] in the interactive proof assistant Isabelle/HOL [16].
The short proof by Boolos makes use of specific instances of the compre-
hension principle of second-order logic, an axiom schema given by
R.x1. . . xn. R(x1, . . . , xn)φ(x1, . . . , xn) (COM)
where φ(x1, . . . , xn) is a second-order formula with x1, . . . , xnamong its free
variables, and Ris a second-order variable not free in φ. The comprehen-
sion principle postulates the existence of relations (or predicates) that can be
defined within the second-order language, or, equivalently, that there exists
some Rthat can act as a shorthand notation for the property expressed by
φ(x1, . . . , xn). For function symbols a similar axiom scheme exists. The idea is
to choose the right instances of COM such that helpful lemmas can be used to
enable a short (second-order) proof for BCP. Details are provided in Boolos’
In a HO logic based on λ-terms, such as Church’s type theory [17
19], the existence of Rin COM is generally guaranteed: simply choose
R=λx1, . . . , xn. φ(x1, . . . , xn). The COM instances are easily provable, and
axiomatisations of the COM principle are generally avoided in HO ATPs.
During HO proof search, λ-terms for Rare often synthesised on the fly by
HO unification. In less trivial cases, however, parts of the term Rneed to be
Who Finds the Short Proof? 5
“guessed” by the ATPs, e.g., by applying a technique called primitive substi-
tution. A well known case is Cantor’s theorem, where an initial part of the
diagonal set description needs to be guessed before the rest can be synthesised;
see, e.g., [20,21] for further details on primitive substitution.
3 Proof Variants
Without further support BCP cannot be solved by today’s HO ATP systems.
Quite surprisingly (to the authors, at least), with a little help provided in the
form of suitable shorthand notations, the ATPs can automatically discover
lemmas that enable a short proof.
In the remainder of this paper we use Boolos’ convention to write sssse
instead of s(s(s(s(e)))); analogously for ssse,sse,se,sx, etc. Moreover, we
generally assume HO notation, so that e.g. f(sssse, sssse) is represented as
((f sssse)sssse) or simply as fssssesssse, avoiding unnecessary parentheses
and even spaces when the correct term and formula structures can be easily
inferred in context.
3.1 Variants using Two Shorthand Notations
When given the following two shorthand notations, various HO ATPs can find
a short proof for BCP. The HO representation of BCP is augmented with
two additional uninterpreted symbols ind and p. They are governed by two
additional axioms, (Def ind) and (Def p), stating equalities between constant
symbols and λ-terms.
ind =λX. X e x. Xx Xsx (Def ind)
p=λx y. (λz. X. ind X Xz)fxy (Def p)
Recalling the discussion of COM in Section 2, note that ind and pare just
shorthand notations for the HO λ-terms given on the right. The λ-term abbre-
viated by ind expresses what it means to be an inductive set defined over
eand s. The λ-term abbreviated by pexpresses that for any xand ygiven
as arguments, fxy is in the smallest inductive set defined over eand s.1By
adding these shorthand notations HO ATPs can find short proofs for BCP.
In Section 4the experiments conducted using the TPTP THF infrastructure
for HO logic [11,22] are summarized. Interesting lemmas are automatically
discovered, proven, and used. No mathematical or logical ingenuity is needed.
1It is acknowledged that the use of terminology might be slightly misleading here: smight not
be injective, and emight have predecessors; in particular s-cycles have not been axiomatised away.
There is thus an alternative explanation and terminology that could be used: The declarations
of the constants e:: iand s:: iiexpress (semantically speaking) that every model behaves
as an algebra over the signature Σ = {e/0, s/1}. Then ind X means that Xis a Σ-(sub)algebra
of the model. Thus, the term “smallest inductive set defined over eand s (suggestively called
N by Boolos) corresponds to the infimum/intersection of all Σ-(sub)algebras. From universal
algebra it is known that the set of all subalgebras of any algebra forms a complete lattice, where
infimum corresponds to set-intersection, and thus N:= Tind is guaranteed to exist. Hence p
can be alternatively written as λx y. N (fxy).
6Who Finds the Short Proof?
The proof found by E can easily be converted into the following quite intuitive
informal proof, divided into four parts. The first three parts I–III introduce
and prove the key lemmas. This was done fully automatically by E. In the
last part IV, the final refutation argument is automatically constructed. The
new lemmas are used in this proof to derive a contradiction from the negated
proof statement. The clause names mentioned in the text below reference the
annotated formula names in the proofs shown in Appendices Band C, where
the proof graphs and the proofs generated by E are presented.2
Part I
From A4, A5, and Def ind, it follows that dis an inductive set:
C48 :ind d (L1a)
From A1, Def ind, and Def p, it follows that:
C31 :pxe for all x(L1b)
From A1, A2, Def ind, and Def p, it follows that pe is an inductive set:
C59 :ind pe (L1c)
(Remember that p is a binary predicate. Modulo currying and the identifi-
cation of sets with their characteristic functions, this means that pe, and its
η-expansion λz. pez, denotes an unary predicate on entities.)
Proof. Fully formal proofs are provided in Appendix C.
The proof of L1a (see the derivation of clause C48 from A4, A5, and Def ind)
is obvious.
Proof of L1b (see the derivation of clause C31 from axiom A1, Def ind, and
Def p): From Def ind get C23 :ind U Ue and C18 :ind U (Ux Usx),
for all Uand x. Moreover, from Def pinfer that for all xand ythere exists
a property qxy (depending on xand y) encoded by epred1 2 @ X @ Y in
Appendix C, such that C16 :qxy (f xy)pxy and C19 :ind qxy pxy. From C19
and C23 it follows C25 :qxy epxy, for all xand y. From C19 and C18, for all x,y
and z, get C21 : (qxy z qxy sz)pxy. From A1 and C16 get C28 :qze se pze,
for all z. Finally obtain L1b, i.e., clause C31, from C28 ,C25, and C21.
Proof of L1c (see the derivation of clause C59 from A1, A2, Def ind,
and Def p): Instantiating zwith s(fez) in C21 and applying the equation
fesz=ssfez, which follows from A2, get C50 : (qxy(sf ez)qxy(fesz))
pxy, for all z. From this and C16 obtain C53 :qesz(sf ez)peszand
also C55 :qesz(fez)pesz, both for all z. Now, from A1 and C16 get
C28 :qze se pze, for all z. Moreover, Def ind implies that for all properties
2In order to improve readability, Ciis used as a label for the clause named c 0 iin the formal
proof by E. Labels are prepended to clauses with a colon, as in Ci:φ, where φis some formula.
Each clause reference is a hyperlink pointing to the clause representation in the original proof
output of E (in a Github repository).
Who Finds the Short Proof? 7
Uthere exists some property kU (depending on U) encoded by esk1 1@U
in Appendix C, such that C27 :Ue (ind U U kU ) and C37 : (U e UskU)
ind U . From Def pit follows C26 : (ind U pxy)U fxy, for all x,yand U.
Using C26 and C27 get that for all xthere exists (k px) such that for all Uholds
C30 : (ind U pxe)(ind px U fx(k px)). Apply L1b to infer that for all x
there exists (k px) such that for all Uholds C34 :ind U (ind px Uf x(k px)).
Using C55 it then follows from C34 that C57 :ind pe pes(k pe). From C57 and
C37 finally obtain L1c, i.e., clause C59.
Part II
From A1, A2, A3, Def ind, and Def p, it follows:
C47 :x.Y. (ind px ind psx ind Y )Y f sxsssse (L2)
Informally, for all entities xand sets Y, if px,psx, and Yare inductive sets
(over e and s), then f sx sssse is in Y.
Proof. See the derivation of clause C47 from clauses C8C11, corresponding to
Def p, A2, A1, and Def ind, in Appendix C: From A1 and A2 infer C17 :fese =
ssse, so that with C16 it follows C20 :qese ssse pese, where qis again encoded
by epred1 2 in Appendix C. From this and C21 obtain C22 :qese sse pese
and C24 :qese se pese, which together with C25 leads to C29 :pese. Using C26,
i.e., Def p, and f ese =ssse obtain C32 :ind U u(ssse), for all U. Further
application of C26 leads to C36 : (ind U ind px)U f xssse and subsequently
to C43 : (ind U ind px ind py)U fxfyssse. Finally, since fsxsy =f xfsxy
by A3, obtain L2, i.e., clause C47.
Part III
From A1, A3, Def ind, and Def p, it follows:
C52 :x. ind px ind psx (L3)
Informally, for any x: if px is an inductive set (over eand s), then so is psx.
Proof. See the derivation of clause C52 from clauses C8,C10,C11, and
C35, corresponding to Def p, A1, Def ind, and A3, in Appendix C: From A1,
Def ind, and Def plemma L1b has been established, given by C31 :pxe, for
all x. From Def ind it follows that C37 : (Ue UskU)ind U for all prop-
erties U. Clauses C26 and C34 have already been inferred above. Together
with A3, it follows that C45 : (ind px ind U )(U fsx s(k psx)ind psx)
for every xand every property U. From clause C45 and Def pit follows that
C49 :ind px (p sx s(k psx)ind psx) for all x, where kis again the Skolem
function called esk1 1 in Appendix C. An application of C49 with C37 yields
(ind px psxe)ind psx, i.e., choose U=psx. With a simple application of
lemma L1b, instantiated for sx, obtain L3, i.e., clause C52.
8Who Finds the Short Proof?
Part IV: Proof of C by reductio ad absurdum
To prove C assume ¬C and derive a contradiction. Assume ¬d fssssesssse
(clauses C42 and C46). From this and L1 obtain that psssse is not an inductive
set or that pssse is not an inductive set, i.e., clause C51. Use L3 to show
that pssse,psse and pse are not inductive sets (clauses C54,C56, and C58).
From these, together with L1 and L3, derive a contradiction. By reductio ad
absurdum Cholds.
Part IV: Constructive proof of C
The refutation argument IV can be converted into constructive argument IV,
avoiding reduction ad absurdum: From L1 and L3 pse is an inductive set. By
further applications of L3, psse,pssse and psssse are inductive sets. Use L1a
and L2 to conclude d f ssssesssse.
Note on shorter proofs
By systematic experimentation using HO ATPs it is possible to further simplify
the proof found by E, and to reduce dependencies. For example (as pointed
out by an anonymous reviewer), the ind px in lemma L2 can be avoided and
L2 can be proved from just Def pand Def ind.
3.2 Variants using One Shorthand Notation
It turns out that the two shorthand notations, ind and pas introduced above,
are dispensable. ind can be avoided when unfolded (but not β-reduced) in p.
The HO ATPs solve this alternative problem formulation even quicker than
before. Instead of ind and puse:
p=λx y. (λz. X. (λQ. Qe w. Qw Qsw)XXz)fxy (Def p)
Alternatively, just the following lemma can be suggested, proven, and subse-
quently used:
p.p=λx y. (λz. X. (λQ. Qe w. Qw Qsw)XXz)fxy (L)
This corresponds to (and is equivalent to) the following instance Lof COM:
p.x y. pxy (λz. X. (λQ. Qe w. Qw Qsw)XXz)fxy (L)
L and Lillustrate the relationship to comprehension and cut-introduction.
Once introduced, these lemmas can be easily proven by HO ATPs using HO
unification, which simply instantiates the existentially quantified pwith the
λ-term: λx y. (λz. X. (λQ. Qe w. Qw Qsw)XXz)f xy. Solving BCP
fully automatically using HO ATPs thus boils down to speculating L (or L),
proving it, and then proving BCP using L (or L). The HO ATPs can use the
shorthand notations to automatically discover the required cut-lemmas L1a,
Who Finds the Short Proof? 9
L1b, L2, and L3. The encoding of BCP with only L as an axiom is presented
in Appendix F. A refutation proof for this formulation of BCP, found by E in
a few milliseconds, is shown in Appendix G.
4 Results of Experiments with HO ATP’s
To assess the performance and robustness of HO ATPs in finding short proofs
for BCP, experiments with different problem encodings were done, using the
ATPs cvc5 1.0 [4], E 3.0 [3], Leo-III 1.7.0 [6], Vampire 4.7 [23], and Zipperpo-
sition 2.1 [7]. These systems were deployed on the StarExec [24] Miami cluster
running octa-core Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz CPUs, 128GB
memory, and CentOS Linux 7.4.1708 (Core) operating system. A CPU time
limit of 300s was set.
The range of CPU times reported below is indicative of the range of dif-
ficulty of the problems for ATP. Comparison of the proof inference statistics
between different ATPs is not meaningful, but for the E ATP the statistics
indicate that the proofs are of comparable complexity.
4.1 Using Two Shorthand Notations
When using the two shorthand notations ind and pdiscussed in Section 3.1,
several HO ATPs are able to prove BCP encoding given in Appendix A. It
is solved by E (13.8s CPU time, 45 inferences in the refutation, 16 inferences
deep), Leo-III (231.0s CPU time, 445 inferences in the refutation, 36 inferences
deep), and Zipperposition (85.7s CPU time, 29 inferences in the refutation,
6 inferences deep).3The proof by E, discussed in Section 3, is presented in
Appendix C.
When encoding ind and pas comprehension instances (see Appendix D)
BCP is proven by only Leo-III (215.8s CPU time, 409 inferences in the refuta-
tion, 37 inferences deep). The resolution proof generated by Leo-III is presented
in Appendix E.
4.2 Using One Shorthand Notation
A problem encoding using the comprehension instance Lfrom Section 3.2 as
an axiom is shown in Appendix F. This problem is solved by only E (0.2s
CPU time, 53 inferences in the refutation, 20 inferences deep). The resolution
proof is shown in Appendix G. The related encoding using Def pas an axiom
was also proven by only E (0.3s CPU time, 47 inferences in the refutation, 17
inferences deep).
5 Future Challenge: Shorthand Invention
Why are the shorthand notations ind and pnot found automatically by the HO
ATPs? Or, alternatively, why is lemma L (or L) not automatically introduced,
3An anonymous reviewer was able to find a proof also with Vampire 4.6, when using a specific
parameter setting.
10 Who Finds the Short Proof?
proven, and then used to obtain a short proofs for BCP, by the HO ATPs? The
answer is that the HO ATPs, in the tradition of FO ATPs, put a strong focus
on cut-elimination. They do not incorporate suitable support for controlled
cut/comprehension introduction in their proof search. While cut-freedom is
clearly a desirable property of calculi (a kind of quality seal), things become
really interesting, from a mathematical, cognitive, and AI perspective, when
cut-elimination is given up at least partially, and some forms of controlled cut-
introduction are applied. Once controlled cut-introduction steps are applied,
powerful further key lemmas can be synthesised automatically within a cut-
free resolution-style calculus. In this sense HO ATPs somehow still neglect a
crucial expressivity advantage that they enjoy over FO ATPs. The situation
has already been discussed in a prior paper [8], but so far not much attention
has been paid to this aspect in the HO ATP community.
The experiments with HO ATPs conducted for this paper, and the
discussion above, illustrate the following:
1. Cut-elimination, which has attracted the interest of many ATP researchers
and theoreticians, is important for achieving robust proof automation. The
price of cut-elimination is, however, that some short proofs are eliminated
from the search space; see also [25]. This includes short proofs that can be
found by today’s HO ATPs when the right cut/comprehension introduction
steps are applied. The complete avoidance of cut/comprehension introduc-
tion turns certain solvable problems into unsolvable ones; BCP is one such
example. A hybrid approach seems to be required.
2. Controlled cut/comprehension introduction should be considered to be a
challenge for the 21st century, expanding on the progress that has been
made with regards to cut-elimination since the last century. Machine
learning may have an interesting role to play here.
3. To some extent statement 2 is not entirely new. Earlier research on induc-
tion theorem proving, proof planning and proof methods [2628] can be seen
as pioneering work on controlled cut/comprehension introduction. Unfor-
tunately this line of research did not receive the attention that it deserved
at the time, due the lack of robustness and coverage of the proof plan-
ning approaches, especially when compared with the much more technically
advanced FO ATPs of the time; cf. the discussions by Bundy [29] and
Benzm¨uller et al. [30]. There is, however, successful recent related work in
this area, including, e.g., work on lemma discovery for induction [31].
The shorthand notations ind and p, resp. lemma L, introduced to solve
BCP with HO ATPs, are actually less out of the blue than they might seem
at first glance. Rather, they are indicative of a general proof method:
1. Introducing shorthand notation for set predicates such as ind seems a good
idea in general when certain inductive definitions are found in the input
problem, as exemplified here by axioms A1–A3 of the Ackermann function.
The formulation of such predicates is quite straightforward.
Who Finds the Short Proof? 11
2. The systematic introduction of related p-predicates also seems quite practi-
cable. The idea is to express that results of applying an inductively defined
function are always contained in the smallest ind-set. As noted in foot-
note 1, this sort of operation can also be viewed from an compositional
(algebraic) perspective.
3. Alternatively, these two steps can be combined and a respective analogon
to lemma L can be used.
In future work we plan to experiment with the implementation and assess-
ment of such a general method for controlled cut/comprehension introduction
in HO ATPs, in order to enable HO ATPs to find short proofs for problems like
BCP. This will include further investigation of the surprise effect revealed in
this paper: By proposing appropriate shorthand notation, inspirational lemma
introduction steps that would normally require further cut-introductions
are now synthesized by the cut-free search procedures in HO ATPs. This
unexpected observation deserves further attention and clarification.
The ability to find or construct short proofs plays an important role not
only for FO and HO ATPs, but also for SAT and SMT solvers; see, e.g., the
techniques presented by Heule et al. [32]. Such techniques are orthogonal to
the findings reported in this paper, which exploit the gain in expressivity when
moving from a less expressive logic to a more expressive one.
6 Conclusion
Holly has a strategy to never lose the contest, provided she chooses BCP as
her challenge problem. Whatever FO ATP Folbert chooses, it will never be
able to find and express a short proof for BCP in its FO calculus. On the
flip side, if Holly chooses a state of the art HO ATP, while it might not be
able to solve Folbert’s challenge problem on the first days of the contest, one
day it will integrate the controlled cut/comprehension introduction techniques
(and include further related techniques exploiting HO expressiveness), so that
it will be able to speculate the necessary lemmas, and be able find a short
proof for BCP (and other related challenge problems). The core observation
is that Holly’s HO ATP can steadily be improved regarding (i) its traditional
FO proof search capabilities, and (ii) clever exploitation of its HO expressivity
advantage. In contrast, Folbert’s FO ATP is stuck with only (i), which will
never lead to a solution of BCP. Even if the contest might (for the time being)
end up in numerous draws, only Holly ever has a chance of entering heaven.
We thank the anonymous reviewers for their valuable feedback.
[1] odel, K.: ¨
Uber die ange von Beweisen. In: Menger, K., odel, K., Wald,
12 Who Finds the Short Proof?
A. (eds.) Ergebnisse Eines Mathematischen Kolloquiums: Heft X7: 1934-
1935, pp. 23–24. Franz Deuticke, Vienna (1936)
[2] Boolos, G.: A curious inference. J. Philos. Log. 16(1), 1–12 (1987). https:
[3] Schulz, S., Cruanes, S., Vukmirovic, P.: Faster, higher, stronger: E 2.3.
In: Fontaine, P. (ed.) CADE 2019. Lecture Notes in Computer Science,
vol. 11716, pp. 495–507. Springer, Cham (2019).
978-3-030-29436-6 29
[4] Barbosa, H., Barrett, C.W., Brain, M., Kremer, G., Lachnitt, H., Mann,
M., Mohamed, A., Mohamed, M., Niemetz, A., otzli, A., Ozdemir, A.,
Preiner, M., Reynolds, A., Sheng, Y., Tinelli, C., Zohar, Y.: cvc5: A ver-
satile and industrial-strength SMT solver. In: Fisman, D., Rosu, G. (eds.)
Tools and Algorithms for the Construction and Analysis of Systems -
28th International Conference, TACAS 2022, Held as Part of the Euro-
pean Joint Conferences on Theory and Practice of Software, ETAPS 2022,
Munich, Germany, April 2-7, 2022, Proceedings, Part I. Lecture Notes
in Computer Science, vol. 13243, pp. 415–442. Springer, Cham (2022). 24
[5] Vukmirovic, P., Blanchette, J., Cruanes, S., Schulz, S.: Extending a
brainiac prover to lambda-free higher-order logic. International Journal
on Software Tools for Technology Transfer 24(1), 67–87 (2022). https:
[6] Steen, A., Benzm¨uller, C.: Extensional higher-order paramodulation in
Leo-III. Journal of Automated Reasoning 65(6), 775–807 (2021). https:
[7] Bentkamp, A., Blanchette, J., Tourret, S., Vukmirovic, P., Waldmann,
U.: Superposition with lambdas. Journal of Automated Reasoning 65(7),
893–940 (2021).
[8] Benzm¨uller, C., Kerber, M.: A lost proof. In: Proceed-
ings of the IJCAR 2001 Workshop: Future Directions
in Automated Reasoning, Siena, Italy, pp. 13–24 (2001).
[9] Ackermann, W.: Zum Hilbertschen Aufbau der reellen Zahlen. Math-
ematische Annalen 99(1), 118–133 (1928).
[10] Sutcliffe, G.: The logic languages of the TPTP world. Logic Journal of
the IGPL (2022).
Who Finds the Short Proof? 13
[11] Sutcliffe, G., Benzm¨uller, C.: Automated reasoning in higher-order logic
using the TPTP THF infrastructure. Journal of Formalized Reasoning
3(1), 1–27 (2010).
[12] Autexier, S., Benzm¨uller, C., Dietrich, D., Siekmann, J.: OMEGA:
Resource-adaptive processes in an automated reasoning systems. In:
Crocker, M.W., Siekmann, J. (eds.) Resource-Adaptive Cognitive Pro-
cesses. Cognitive Technologies, pp. 389–423. Springer, Berlin, Heidelberg
(2010). 17
[13] Bancerek, G., Bylinski, C., Grabowski, A., Kornilowicz, A., Matuszewski,
R., Naumowicz, A., Pak, K., Urban, J.: Mizar: State-of-the-art and
beyond. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge,
V. (eds.) Intelligent Computer Mathematics - International Conference,
CICM 2015, Washington, DC, USA, July 13-17, 2015, Proceedings. Lec-
ture Notes in Computer Science, vol. 9150, pp. 261–279. Springer, Cham
(2015). 17
[14] Benzm¨uller, C., Brown, C.: The curious inference of Boolos in MIZAR and
OMEGA. In: Matuszewski, R., Zalewska, A. (eds.) From Insight to Proof
Festschrift in Honour of Andrzej Trybulec. Studies in Logic, Gram-
mar, and Rhetoric, vol. 10(23), pp. 299–388. The University of Bialystok,
Poland (2007).
[15] Ketland, J.: Boolos’s curious inference in Isabelle/HOL.
Archive of Formal Proofs, 1–19 (2022). https://isa- Curious Inference.html
[16] Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL - A Proof Assis-
tant for Higher-Order Logic. Lecture Notes in Computer Science,
vol. 2283. Springer, Berlin, Heidelberg (2002).
[17] Church, A.: A formulation of the simple theory of types. Journal of
Symbolic Logic 5(2), 56–68 (1940).
[18] Andrews, P.B.: An Introduction to Mathematical Logic and Type Theory.
Applied Logic Series. Springer, Netherlands (2002).
[19] Benzm¨uller, C., Andrews, P.: Church’s type theory. In: Zalta, E.N. (ed.)
The Stanford Encyclopedia of Philosophy, Summer 2019 edn., pp. 1–
62. Metaphysics Research Lab, Stanford University, CA, USA (2019).
[20] Andrews, P.B.: On connections and higher-order logic. Journal of
Automated Reasoning 5(3), 257–291 (1989).
14 Who Finds the Short Proof?
[21] Benzm¨uller, C., Sultana, N., Paulson, L.C., Theiss, F.: The higher-order
prover LEO-II. Journal of Automated Reasoning 55(4), 389–404 (2015).
[22] Sutcliffe, G.: The TPTP problem library and associated infrastructure -
from CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning 59(4),
483–502 (2017).
[23] Koacs, L., Voronkov, A.: First-order theorem proving and Vampire.
In: Sharygina, N., Veith, H. (eds.) Computer Aided Verification - 25th
International Conference, CAV 2013, Saint Petersburg, Russia, July 13-
19, 2013. Proceedings. Lecture Notes in Computer Science, vol. 8044,
pp. 1–35. Springer, Berlin, Heidelberg (2013).
978-3-642-39799-8 1
[24] Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: A cross-community infras-
tructure for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.)
Automated Reasoning - 7th International Joint Conference, IJCAR 2014,
Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Aus-
tria, July 19-22, 2014. Proceedings. Lecture Notes in Computer Science,
vol. 8562, pp. 367–373. Springer, Cham (2014).
978-3-319-08587-6 28
[25] Boolos, G.: Don’t eliminate cut. Journal of Philosophical Logic 13(4),
373–378 (1984).
[26] Boyer, R.S., Moore, J.S.: A Computational Logic. ACM monograph series.
Academic Press, London (1979)
[27] Bundy, A.: A science of reasoning. In: Lassez, J., Plotkin, G.D.
(eds.) Computational Logic - Essays in Honor of Alan Robin-
son, pp. 178–198. The MIT Press, Cambridge, MA, USA (1991).
[28] Melis, E., Siekmann, J.H.: Knowledge-based proof planning. Arti-
ficial Intelligence 115(1), 65–105 (1999).
[29] Bundy, A.: A critique of proof planning. In: Kakas, A.C., Sadri, F. (eds.)
Computational Logic: Logic Programming and Beyond, Essays in Honour
of Robert A. Kowalski, Part II. Lecture Notes in Computer Science, vol.
2408, pp. 160–177. Springer, Berlin, Heidelberg (2002).
10.1007/3-540-45632-5 7
Who Finds the Short Proof? 15
[30] Benzm¨uller, C., Meier, A., Melis, E., Pollet, M., Sorge, V.: Proof plan-
ning: A fresh start? In: Proceedings of the IJCAR 2001 Workshop:
Future Directions in Automated Reasoning, Siena, Italy, pp. 25–37 (2001).
[31] Johansson, M.: Lemma discovery for induction - A survey. In: Kaliszyk,
C., Brady, E.C., Kohlhase, A., Coen, C.S. (eds.) Intelligent Computer
Mathematics - 12th International Conference, CICM 2019, Prague, Czech
Republic, July 8-12, 2019, Proceedings. Lecture Notes in Computer Sci-
ence, vol. 11617, pp. 125–139. Springer, Cham (2019).
1007/978-3-030-23250-4 9
[32] Heule, M.J.H., Kiesl, B., Biere, A.: Short proofs without new vari-
ables. In: de Moura, L. (ed.) Automated Deduction - CADE 26 - 26th
International Conference on Automated Deduction, Gothenburg, Swe-
den, August 6-11, 2017, Proceedings. Lecture Notes in Computer Science,
vol. 10395, pp. 130–147. Springer, Cham (2017).
978-3-319-63046-5 9
16 Who Finds the Short Proof?
A BCP with ind and pencoded in TPTP THF
Encoding of BCP with ind and pin TPTP THF syntax [11]; see also the online
sources at
% Declarations
thf(e_decl,type, e: $i ).
thf(s_decl,type, s: $i > $i ).
thf(d_decl,type, d: $i > $o ).
thf(f_decl,type, f: $i > $i > $i ).
% Auxiliary declarations
thf(ind_decl,type, ind: ( $i > $o ) > $o ).
thf(p_decl,type, p: $i > $i > $o ).
% Axioms
! [N: $i] :
! [Y: $i] :
! [X: $i,Y: $i] :
d @ e ).
! [X: $i] :
% Shorthand notations
( ind
= ( ^ [Q: $i > $o] :
& ! [X: $i] :
( p
= ( ^ [X: $i,Y: $i] :
( ^ [X2: $i] :
! [X3: $i > $o] :
( ( ind @ X3 )
=> ( X3 @ X2 ) )
% Conjecture
Who Finds the Short Proof? 17
B IDV proof by E for BCP with ind and p
IDV display of the resolution proof by E for BCP with ind and p(highlighted
in red are the dependencies of clause c0 52); the resolution proof by E is
presented in full detail in Appendix C.
18 Who Finds the Short Proof?
C Plain proof by E for BCP with ind and p
See also the online sources at
BoolosCuriousInference-ATP/tree/main/Boolos1.proof. The exact call to
prover E was (where /tmp/AjzF3EheOy/SOT_78srga refers to the input file for E as
presented in Appendix A):
/home/tptp/Systems/E---3.0/eprover-ho --delete-bad-limit=2000000000 --definitional-cnf=24
-s --print-statistics -R --print-version --proof-object --auto-schedule=8 --cpu-limit=60
% File : E---3.0
% Problem : SOT_78srga : v?.?.?
% Transfm : none
% Format : tptp:raw
% Command : run_E %s %d THM
% Computer :
% Model : x86_64 x86_64
% CPU : Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz
% Memory : 128720MB
% OS : Linux 3.10.0-1160.71.1.el7.x86_64
% CPULimit : 60s
% WCLimit : 0s
% DateTime : Fri Jul 29 06:14:27 EDT 2022
% Result : Theorem 14.38s 1.87s
% Output : CNFRefutation 14.38s
% Verified :
% SZS Type : Refutation
% Derivation depth : 16
% Number of leaves : 16
% Syntax : Number of formulae : 69 ( 26 unt; 8 typ; 0 def)
% Number of atoms : 141 ( 12 equ; 0 cnn)
% Maximal formula atoms : 7 ( 2 avg)
% Number of connectives : 545 ( 54 ~; 61 |; 7 &; 416 @)
% ( 2 <=>; 5 =>; 0 <=; 0 <~>)
% Maximal formula depth : 13 ( 6 avg)
% Number of types : 2 ( 0 usr)
% Number of type conns : 33 ( 33 >; 0 *; 0 +; 0 <<)
% Number of symbols : 10 ( 8 usr; 2 con; 0-3 aty)
% Number of variables : 77 ( 4 ^ 73 !; 0 ?; 77 :)
% Comments :
e: $i ).
s: $i > $i ).
d: $i > $o ).
f: $i > $i > $i ).
ind: ( $i > $o ) > $o ).
p: $i > $i > $o ).
esk1_1: ( $i > $o ) > $i ).
epred1_2: $i > $i > $i > $o ).
( p
= ( ^ [X3: $i,X2: $i] :
( ^ [X5: $i] :
! [X6: $i > $o] :
( ( ind @ X6 )
=> ( X6 @ X5 ) )
file(’/tmp/AjzF3EheOy/SOT_78srga’,p_def) ).
! [X2: $i] :
Who Finds the Short Proof? 19
file(’/tmp/AjzF3EheOy/SOT_78srga’,a2) ).
! [X1: $i] :
file(’/tmp/AjzF3EheOy/SOT_78srga’,a1) ).
( ind
= ( ^ [X4: $i > $o] :
& ! [X3: $i] :
( ( X4 @ X3 )
file(’/tmp/AjzF3EheOy/SOT_78srga’,ind_def) ).
! [X3: $i] :
file(’/tmp/AjzF3EheOy/SOT_78srga’,a5) ).
! [X3: $i,X2: $i] :
file(’/tmp/AjzF3EheOy/SOT_78srga’,a3) ).
d @ e,
file(’/tmp/AjzF3EheOy/SOT_78srga’,a4) ).
file(’/tmp/AjzF3EheOy/SOT_78srga’,conj_0) ).
! [X13: $i,X14: $i] :
( ( p @ X13 @ X14 )
<=> ! [X6: $i > $o] :
( ( ind @ X6 )
=> ( X6 @ ( f @ X13 @ X14 ) ) ) ),
[inference(fof_simplification,[status(thm)],[p_def])]) ).
! [X16: $i] :
inference(variable_rename,[status(thm)],[a2]) ).
! [X15: $i] :
( ( f @ X15 @ e )
inference(variable_rename,[status(thm)],[a1]) ).
! [X12: $i > $o] :
( ( ind @ X12 )
<=> ( ( X12 @ e )
& ! [X3: $i] :
( ( X12 @ X3 )
[inference(fof_simplification,[status(thm)],[ind_def])]) ).
! [X24: $i,X25: $i,X26: $i > $o,X27: $i,X28: $i] :
( ( ~ ( p @ X24 @ X25 )
| ~ ( ind @ X26 )
& ( ( ind @ ( epred1_2 @ X27 @ X28 ) )
| ( p @ X27 @ X28 ) )
& ( ~ ( epred1_2 @ X27 @ X28 @ ( f @ X27 @ X28 ) )
| ( p @ X27 @ X28 ) ) ),
[inference(fof_nnf,[status(thm)],[c_0_8])])])])])]) ).
! [X1: $i] :
inference(split_conjunct,[status(thm)],[c_0_9]) ).
! [X1: $i] :
20 Who Finds the Short Proof?
inference(split_conjunct,[status(thm)],[c_0_10]) ).
! [X20: $i > $o,X21: $i,X22: $i > $o] :
| ~ ( ind @ X20 ) )
& ( ~ ( X20 @ X21 )
| ( X20 @ ( s @ X21 ) )
| ~ ( ind @ X20 ) )
& ( ( X22 @ ( esk1_1 @ X22 ) )
| ( ind @ X22 ) )
& ( ~ ( X22 @ ( s @ ( esk1_1 @ X22 ) ) )
| ( ind @ X22 ) ) ),
[inference(fof_nnf,[status(thm)],[c_0_11])])])])])]) ).
! [X1: $i,X2: $i] :
( ( p @ X1 @ X2 )
| ~ ( epred1_2 @ X1 @ X2 @ ( f @ X1 @ X2 ) ) ),
inference(split_conjunct,[status(thm)],[c_0_12]) ).
inference(spm,[status(thm)],[c_0_13,c_0_14]) ).
! [X1: $i,X4: $i > $o] :
( ( X4 @ ( s @ X1 ) )
| ~ ( ind @ X4 ) ),
inference(split_conjunct,[status(thm)],[c_0_15]) ).
! [X1: $i,X2: $i] :
( ( ind @ ( epred1_2 @ X1 @ X2 ) )
| ( p @ X1 @ X2 ) ),
inference(split_conjunct,[status(thm)],[c_0_12]) ).
inference(spm,[status(thm)],[c_0_16,c_0_17]) ).
! [X1: $i,X2: $i,X3: $i] :
( ( epred1_2 @ X1 @ X2 @ ( s @ X3 ) )
| ( p @ X1 @ X2 )
| ~ ( epred1_2 @ X1 @ X2 @ X3 ) ),
inference(spm,[status(thm)],[c_0_18,c_0_19]) ).
inference(spm,[status(thm)],[c_0_20,c_0_21]) ).
! [X4: $i > $o] :
( ( X4 @ e )
| ~ ( ind @ X4 ) ),
inference(split_conjunct,[status(thm)],[c_0_15]) ).
| ~ ( epred1_2 @ e @ ( s @ e ) @ ( s @ e ) ) ),
inference(spm,[status(thm)],[c_0_22,c_0_21]) ).
! [X1: $i,X2: $i] :
( ( epred1_2 @ X1 @ X2 @ e )
| ( p @ X1 @ X2 ) ),
inference(spm,[status(thm)],[c_0_23,c_0_19]) ).
! [X1: $i,X2: $i,X4: $i > $o] :
| ~ ( ind @ X4 ) ),
inference(split_conjunct,[status(thm)],[c_0_12]) ).
! [X4: $i > $o] :
( ( X4 @ ( esk1_1 @ X4 ) )
| ( ind @ X4 )
| ~ ( X4 @ e ) ),
Who Finds the Short Proof? 21
inference(split_conjunct,[status(thm)],[c_0_15]) ).
! [X1: $i] :
| ~ ( epred1_2 @ X1 @ e @ ( s @ e ) ) ),
inference(spm,[status(thm)],[c_0_16,c_0_14]) ).
inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_24,c_0_21]),c_0_25]) ).
! [X1: $i,X4: $i > $o] :
( ( X4 @ ( f @ X1 @ ( esk1_1 @ ( p @ X1 ) ) ) )
| ( ind @ ( p @ X1 ) )
| ~ ( ind @ X4 ) ),
inference(spm,[status(thm)],[c_0_26,c_0_27]) ).
! [X1: $i] : ( p @ X1 @ e ),
inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_28,c_0_21]),c_0_25]) ).
! [X4: $i > $o] :
| ~ ( ind @ X4 ) ),
inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_26,c_0_29]),c_0_17]) ).
! [X19: $i] :
inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[a5])]) ).
! [X1: $i,X4: $i > $o] :
( ( X4 @ ( f @ X1 @ ( esk1_1 @ ( p @ X1 ) ) ) )
| ( ind @ ( p @ X1 ) )
| ~ ( ind @ X4 ) ),
inference(cn,[status(thm)],[inference(rw,[status(thm)],[c_0_30,c_0_31])]) ).
! [X17: $i,X18: $i] :
inference(variable_rename,[status(thm)],[a3]) ).
! [X1: $i,X4: $i > $o] :
| ~ ( ind @ X4 ) ),
inference(spm,[status(thm)],[c_0_26,c_0_32]) ).
! [X4: $i > $o] :
( ( ind @ X4 )
| ~ ( X4 @ ( s @ ( esk1_1 @ X4 ) ) )
| ~ ( X4 @ e ) ),
inference(split_conjunct,[status(thm)],[c_0_15]) ).
! [X1: $i] :
| ~ ( d @ X1 ) ),
inference(split_conjunct,[status(thm)],[c_0_33]) ).
d @ e,
inference(split_conjunct,[status(thm)],[a4]) ).
! [X1: $i,X2: $i,X4: $i > $o] :
| ( ind @ ( p @ X2 ) )
| ~ ( ind @ X4 ) ),
inference(spm,[status(thm)],[c_0_26,c_0_34]) ).
! [X1: $i,X2: $i] :
inference(split_conjunct,[status(thm)],[c_0_35]) ).
inference(fof_simplification,[status(thm)],[inference(assume_negation,[status(cth)],[conj_0])]) ).
22 Who Finds the Short Proof?
! [X1: $i,X2: $i,X4: $i > $o] :
| ~ ( ind @ X4 ) ),
inference(spm,[status(thm)],[c_0_26,c_0_36]) ).
( ( ind @ d )
| ~ ( d @ ( esk1_1 @ d ) ) ),
[inference(spm,[status(thm)],[c_0_37,c_0_38]),c_0_39])]) ).
! [X1: $i,X4: $i > $o] :
| ~ ( ind @ X4 ) ),
inference(spm,[status(thm)],[c_0_40,c_0_41]) ).
inference(split_conjunct,[status(thm)],[c_0_42]) ).
! [X1: $i,X4: $i > $o] :
| ~ ( ind @ X4 ) ),
inference(spm,[status(thm)],[c_0_43,c_0_41]) ).
ind @ d,
[inference(spm,[status(thm)],[c_0_44,c_0_27]),c_0_39])]) ).
! [X1: $i] :
inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_16,c_0_45]),c_0_19]) ).
! [X1: $i,X2: $i,X3: $i] :
( ( epred1_2 @ X1 @ X2 @ ( f @ e @ ( s @ X3 ) ) )
| ( p @ X1 @ X2 )
| ~ ( epred1_2 @ X1 @ X2 @ ( s @ ( f @ e @ X3 ) ) ) ),
inference(spm,[status(thm)],[c_0_21,c_0_13]) ).
[inference(spm,[status(thm)],[c_0_46,c_0_47]),c_0_48])]) ).
! [X1: $i] :
[inference(spm,[status(thm)],[c_0_37,c_0_49]),c_0_31])]) ).
! [X1: $i] :
inference(spm,[status(thm)],[c_0_16,c_0_50]) ).
inference(spm,[status(thm)],[c_0_51,c_0_52]) ).
! [X1: $i] :
| ~ ( epred1_2 @ e @ ( s @ X1 ) @ ( f @ e @ X1 ) ) ),
inference(spm,[status(thm)],[c_0_53,c_0_21]) ).
inference(spm,[status(thm)],[c_0_54,c_0_52]) ).
inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_55,c_0_34]),c_0_19]) ).
Who Finds the Short Proof? 23
inference(spm,[status(thm)],[c_0_56,c_0_52]) ).
ind @ ( p @ e ),
[inference(spm,[status(thm)],[c_0_37,c_0_57]),c_0_31])]) ).
[inference(spm,[status(thm)],[c_0_58,c_0_52]),c_0_59])]),[proof] ).
% Running higher-order theorem proving
% Running: /home/tptp/Systems/E---3.0/eprover-ho --delete-bad-limit=2000000000
% --definitional-cnf=24 -s --print-statistics -R --print-version --proof-object
% --auto-schedule=8 --cpu-limit=60 /tmp/AjzF3EheOy/SOT_78srga
% # Version: 3.0pre003-ho
% # partial match(1): HSMSSMSSSSSNFFN
% # Preprocessing class: HSMSSMSSSSSNHFN.
% # Scheduled 4 strats onto 8 cores with 60 seconds (480 total)
% # Starting new_ho_10 with 300s (5) cores
% # Starting new_bool_1 with 60s (1) cores
% # Starting full_lambda_5 with 60s (1) cores
% # Starting new_ho_10_unif with 60s (1) cores
% # new_ho_10 with pid 63439 completed with status 0
% # Result found by new_ho_10
% # partial match(1): HSMSSMSSSSSNFFN
% # Preprocessing class: HSMSSMSSSSSNHFN.
% # Scheduled 4 strats onto 8 cores with 60 seconds (480 total)
% # Starting new_ho_10 with 300s (5) cores
% # No SInE strategy applied
% # Search class: HGUSS-FFSF21-MHFFFSBN
% # partial match(2): HGUSS-FFSF11-MHHFFSBN
% # Scheduled 6 strats onto 5 cores with 300 seconds (300 total)
% # Starting new_ho_10_unif with 163s (1) cores
% # Starting new_ho_10 with 31s (1) cores
% # Starting lpo8_s with 28s (1) cores
% # Starting sh5l with 28s (1) cores
% # Starting sh2lt with 28s (1) cores
% # new_ho_10 with pid 63446 completed with status 0
% # Result found by new_ho_10
% # partial match(1): HSMSSMSSSSSNFFN
% # Preprocessing class: HSMSSMSSSSSNHFN.
% # Scheduled 4 strats onto 8 cores with 60 seconds (480 total)
% # Starting new_ho_10 with 300s (5) cores
% # No SInE strategy applied
% # Search class: HGUSS-FFSF21-MHFFFSBN
% # partial match(2): HGUSS-FFSF11-MHHFFSBN
% # Scheduled 6 strats onto 5 cores with 300 seconds (300 total)
% # Starting new_ho_10_unif with 163s (1) cores
% # Starting new_ho_10 with 31s (1) cores
% # Preprocessing time : 0.004 s
% # Presaturation interreduction done
% # Proof found!
% # SZS status Theorem
% # SZS output start CNFRefutation
% See solution above
% # Parsed axioms : 14
% # Removed by relevancy pruning/SinE : 0
% # Initial clauses : 19
% # Removed in clause preprocessing : 6
% # Initial clauses in saturation : 13
% # Processed clauses : 4451
% # ...of these trivial : 952
% # ...subsumed : 2217
% # ...remaining for further processing : 1282
% # Other redundant clauses eliminated : 0
% # Clauses deleted for lack of memory : 0
% # Backward-subsumed : 77
% # Backward-rewritten : 139
% # Generated clauses : 54638
% # ...of the previous two non-redundant : 49054
% # ...aggressively subsumed : 0
% # Contextual simplify-reflections : 95
% # Paramodulations : 54638
% # Factorizations : 0
% # NegExts : 0
% # Equation resolutions : 0
% # Propositional unsat checks : 0
% # Propositional check models : 0
% # Propositional check unsatisfiable : 0
% # Propositional clauses : 0
% # Propositional clauses after purity: 0
% # Propositional unsat core size : 0
% # Propositional preprocessing time : 0.000
% # Propositional encoding time : 0.000
24 Who Finds the Short Proof?
% # Propositional solver time : 0.000
% # Success case prop preproc time : 0.000
% # Success case prop encoding time : 0.000
% # Success case prop solver time : 0.000
% # Current number of processed clauses : 1053
% # Positive orientable unit clauses : 443
% # Positive unorientable unit clauses: 10
% # Negative unit clauses : 4
% # Non-unit-clauses : 596
% # Current number of unprocessed clauses: 44213
% # ...number of literals in the above : 85482
% # Current number of archived formulas : 0
% # Current number of archived clauses : 229
% # Clause-clause subsumption calls (NU) : 180166
% # Rec. Clause-clause subsumption calls : 178829
% # Non-unit clause-clause subsumptions : 2351
% # Unit Clause-clause subsumption calls : 4024
% # Rewrite failures with RHS unbound : 0
% # BW rewrite match attempts : 9295
% # BW rewrite match successes : 149
% # Condensation attempts : 4451
% # Condensation successes : 0
% # Termbank termtop insertions : 4271812
% # -------------------------------------------------
% # User time : 1.728 s
% # System time : 0.068 s
% # Total time : 1.795 s
% # Maximum resident set size: 1844 pages
% # -------------------------------------------------
% # User time : 8.644 s
% # System time : 0.424 s
% # Total time : 9.067 s
% # Maximum resident set size: 1732 pages
Who Finds the Short Proof? 25
D BCP with ind and pas comprehension
instances encoded in TPTP THF
See also the online sources at
% Declarations
thf(e_decl,type, e: $i ).
thf(s_decl,type, s: $i > $i ).
thf(d_decl,type, d: $i > $o ).
thf(f_decl,type, f: $i > $i > $i ).
% Auxiliary declarations
thf(ind_decl,type, ind: ( $i > $o ) > $o ).
thf(p_decl,type, p: $i > $i > $o ).
% Axioms
! [N: $i] :
! [Y: $i] :
! [X: $i,Y: $i] :
d @ e ).
! [X: $i] :
% Shorthand notations
! [Q: $i > $o] :
( ( ind @ Q )
& ! [X: $i] :
! [X: $i,Y: $i] :
= ( ^ [Z: $i] : ( ! [Q: $i > $o] :
( ( ind @ Q )
% Conjecture
E Plain proof by Leo-III for BCP with ind and
pas comprehension instances
Leo-III’s proof is provided online at:
26 Who Finds the Short Proof?
F BCP with L encoded in TPTP THF
Encoding of BCP with L in TPTP THF syntax; cf. also the online sources:
% Declarations
thf(e_decl,type, e: $i ).
thf(s_decl,type, s: $i > $i ).
thf(d_decl,type, d: $i > $o ).
thf(f_decl,type, f: $i > $i > $i ).
% Axioms
! [N: $i] :
! [Y: $i] :
! [X: $i,Y: $i] :
d @ e ).
! [X: $i] :
% Shorthand notation p as comprehension instance
? [P: $i > $i > $o] :
( P
= ( ^ [X: $i,Y: $i] :
( ^ [Z: $i] :
! [R: $i > $o] :
( ( ^ [Q: $i > $o] :
& ! [X: $i] :
@ R )
% Conjecture
G Plain proof by E for BCP with L
E’s proof is provided online at:
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
cvc5 is the latest SMT solver in the cooperating validity checker series and builds on the successful code base of CVC4. This paper serves as a comprehensive system description of cvc5 ’s architectural design and highlights the major features and components introduced since CVC4 1.8. We evaluate cvc5 ’s performance on all benchmarks in SMT-LIB and provide a comparison against CVC4 and Z3.
Full-text available
We designed a superposition calculus for a clausal fragment of extensional polymorphic higher-order logic that includes anonymous functions but excludes Booleans. The inference rules work on $$\beta \eta $$ β η -equivalence classes of $$\lambda $$ λ -terms and rely on higher-order unification to achieve refutational completeness. We implemented the calculus in the Zipperposition prover and evaluated it on TPTP and Isabelle benchmarks. The results suggest that superposition is a suitable basis for higher-order reasoning.
Full-text available
Decades of work have gone into developing efficient proof calculi, data structures, algorithms, and heuristics for first-order automatic theorem proving. Higher-order provers lag behind in terms of efficiency. Instead of developing a new higher-order prover from the ground up, we propose to start with the state-of-the-art superposition prover E and gradually enrich it with higher-order features. We explain how to extend the prover’s data structures, algorithms, and heuristics to $$\lambda $$ λ -free higher-order logic, a formalism that supports partial application and applied variables. Our extension outperforms the traditional encoding and appears promising as a stepping stone toward full higher-order logic.
Full-text available
Leo-III is an automated theorem prover for extensional type theory with Henkin semantics and choice. Reasoning with primitive equality is enabled by adapting paramodulation-based proof search to higher-order logic. The prover may cooperate with multiple external specialist reasoning systems such as first-order provers and SMT solvers. Leo-III is compatible with the TPTP/TSTP framework for input formats, reporting results and proofs, and standardized communication between reasoning systems, enabling, e.g., proof reconstruction from within proof assistants such as Isabelle/HOL. Leo-III supports reasoning in polymorphic first-order and higher-order logic, in many quantified normal modal logics, as well as in different deontic logics. Its development had initiated the ongoing extension of the TPTP infrastructure to reasoning within non-classical logics.
Full-text available
Automating proofs by induction can be challenging, not least because proofs might need auxiliary lemmas, which themselves need to be proved by induction. In this paper we survey various techniques for automating the discovery of such lemmas, including both top-down techniques attempting to generate a lemma from an ongoing proof attempt, as well as bottom-up theory exploration techniques trying to construct interesting lemmas about available functions and datatypes, thus constructing a richer background theory.
The Thousands of Problems for Theorem Provers (TPTP) World is a well-established infrastructure that supports research, development and deployment of automated theorem proving systems. This paper provides an overview of the logic languages of the TPTP World, from classical first-order form (FOF), through typed FOF, up to typed higher-order form, and beyond to non-classical forms. The logic languages are described in a non-technical way and are illustrated with examples using the TPTP language.
In 1987, George Boolos gave an interesting and vivid concrete example of the considerable speed-up afforded by higher-order logic over first-order logic. (A phenomenon first noted by Kurt Gödel in 1936.) Boolos’s example concerned an inference I with five premises, and a conclusion, such that the shortest derivation of the conclusion from the premises in a standard system for first-order logic is astronomically huge; while there exists a second-order derivation whose length is of the order of a page or two. Boolos gave a short sketch of that second-order derivation, which relies on the comprehension principle of second-order logic. Here, Boolos’s inference is formalized into fourteen lemmas, each quickly verified by the automated-theorem-proving assistant Isabelle/HOL.
E 2.3 is a theorem prover for many-sorted first-order logic with equality. We describe the basic logical and software architecture of the system, as well as core features of the implementation. We particularly discuss recently added features and extensions, including the extension to many-sorted logic, optional limited support for higher-order logic, and the integration of SAT techniques via PicoSAT. Minor additions include improved support for TPTP standard features, always-on internal proof objects, and lazy orphan removal. The paper also gives an overview of the performance of the system, and describes ongoing and future work.