Content uploaded by Christoph Benzmüller

Author content

All content in this area was uploaded by Christoph Benzmüller on Jan 27, 2023

Content may be subject to copyright.

Who Finds the Short Proof?

An Exploration of Boolos’ Curious Inference using

Higher-order Automated Theorem Proving

Christoph Benzm¨uller1,2*, David Fuenmayor1,2,

Alexander Steen3and Geoﬀ Sutcliﬀe4

1AI Systems Engineering, Otto-Friedrich-University Bamberg,

An der Weberei 5, Bamberg, 96049, Germany.

2Department of Mathematics and Computer Science, FU Berlin,

Arnimallee 7, Berlin, 14195, Germany.

3Institute of Mathematics and Computer Science, University of

Greifswald, Walter-Rathenau-Str. 47, Greifswald, 17489,

Germany.

4Department of Computer Science, University of Miami,

1365 Memorial Drive, Coral Gables, FL 33124-4245, USA.

*Corresponding author(s). E-mail(s):

christoph.benzmueller@uni-bamberg.de;

Contributing authors: david.fuenmayor@uni-bamberg.de;

alexander.steen@uni-greifswald.de;geoﬀ@cs.miami.edu;

Abstract

This paper reports on an exploration of Boolos’ Curious Inference, using

higher-order automated theorem provers (ATPs). Surprisingly, only suit-

able shorthand notations had to be provided by hand for ATPs to ﬁnd a

short proof. The higher-order lemmas required for constructing a short

proof are automatically discovered by the ATPs. Given the observa-

tions and suggestions in this paper, full proof automation of Boolos’ and

related examples now seems to be within reach of higher-order ATPs.

Keywords: Speedup of proofs; Boolos’ Curious Inference; Higher-order

automated theorem proving; Cut-introduction; Comprehension

1

2Who Finds the Short Proof?

1 Which Automated Theorem Prover to

choose for Really Diﬃcult Problems?

Consider the following thought experiment: Shortly after their death, Folbert

and Holly arrive at the gates of heaven, and both want to enter. Unfortunately

only one of the two can be admitted, and they have to settle this in a little

contest: Folbert and Holly are both asked to (i) choose either a ﬁrst-order

(FO) automated theorem prover (ATP), or, alternatively, an higher-order (HO)

ATP; and then (ii) pose a FO proof problem, encode it in FO logic, and give

it to the other’s ATP. The one whose ATP solves the proof problem posed to

it the faster will be admitted to heaven. There is a time limit for the contest

– until midnight the same day. If the battle ends in a draw because neither

ATP ﬁnds a proof by midnight, the contest is repeated the next day, and so

on. Which ATP should Folbert and Holly choose, and which proof problem?

Is there a winning strategy?

FOlbert turns out to be a FO logic enthusiast, and announces that he

will use a FO ATP. HOlly likes HO logic better, and chooses an HO ATP.

Which exact system they choose is not revealed to the other person, nor how

it functions internally. At ﬁrst glance, Folbert seems to be in a more promising

position because the proof problems they must provide have to be encoded in

FO logic. However, as will be seen, only Holly has a realistic chance to win this

contest (maybe not on the ﬁrst day though), provided she chooses a suitable

proof problem.

Key to Holly’s advantage are the (hyper-)exponentially shorter proofs that

are possible as one moves up the ladder of expressiveness from ﬁrst-order logic

to second-order logic, to third-order logic, and so on [1]. The fact that the

proof problems are stated in FO logic does not matter. When stating the same

problem in the same FO way but in higher-order logic, much shorter proofs

are possible, some of which might be (hyper-)exponentially shorter than the

proofs that can be found with FO ATPs. A very prominent example of such a

short proof is that of Boolos’ Curious Inference [2].

The key to the shorter proofs in HO logic is the possibility of introducing

powerful shorthand notations in which structural aspects of the problem at

hand can be expressed in ways that are not possible in FO logic. The use of

HO variables and quantiﬁers in these shorthand notations provides an oppor-

tunity to shorten proof arguments by explicitly talking about and working

with the mathematical structures involved. In Boolos’ Curious Inference this

concerns an inductively deﬁned, very fast growing, Ackermann-style function.

A fascinating perspective on meta-level aspects of an otherwise mathemati-

cally boring, repetitive FO inference can be provided and exploited this way.

A most interesting aspect of this paper is that the required “curious” lemmas,

supplied by hand by the ingenious Boolos, can now be synthesized by modern

HO ATPs.

Based on our experiments we argue that there is a winning strategy in the

contest for only Holly. Holly is the only person who is able to reliably prevent

Who Finds the Short Proof? 3

her opponent from winning, provided she chooses Boolos’ challenge problem

to be given to Folbert. Folbert can of course try something similar, but this

cannot prevent Holly’s victory in the long run, since full proof automation of

Boolos’ challenge problem is now within reach of HO ATPs. As the experiments

reported in this paper show, signiﬁcant progress has been achieved in HO ATP

systems in recent years, which enables the automatic exploration of variants

of Boolos’ Curious Inference. In particular, powerful and intuitive lemmas can

be automatically discovered by HO ATPs, and used in short proofs. The only

thing left for the human to do has been to suggest suitable shorthand notations.

In Section 3the proofs produced by the E ATP [3], which are based on

three lemmas, are discussed in some detail (we chose E because its proofs

were the most readable and because E also performed best). Further proofs

are reported by the HO ATPs cvc5 [4], Ehoh [5], Leo-III [6], and Zipperposi-

tion [7]; these contributions will be studied in more depth in future work. This

paper also discusses the remaining key challenge that needs to be addressed by

HO ATPs to enable full automation of ﬁnding a solution to Boolos’ and sim-

ilar problems: constructing the required shorthand notations using controlled

cut/comprehension introduction, as was already hinted at in [8].

Paper outline: Section 2brieﬂy recaps Boolos’ challenge problem, and

points to some formalisations that have been veriﬁed with interactive HO

proof assistants. Section 3demonstrates that HO ATPs can easily solve Boo-

los’ problem and produce short proofs when suitable shorthand notations are

provided. Section 3presents and discusses the proofs found by E. Section 4

provides a survey of the results from several HO ATPs. Section 5explains why

the required shorthand notation cannot (yet) be generated with state-of-the

art HO ATPs, and points to interesting future research on controlled cut/com-

prehension introduction. Section 6concludes the paper: Holly is the only one

who will one day enter heaven.

2 Boolos’ Curious Inference

In his article “A Curious Inference” [2], Boolos presents the following proof

problem, consisting of axioms A1-A5 with the conjecture C. This challenge is

referred to as BCP (for Boolos’ Curious Problem) in the rest of this paper:

∀n.f(n, e) = s(e) (A1)

∀y.f (e, s(y)) = s(s(f(e, y))) (A2)

∀x y.f (s(x), s(y)) = f(x, f(s(x), y)) (A3)

d(e) (A4)

∀x.d(x)→d(s(x)) (A5)

d(f(s(s(s(s(e)))), s(s(s(s(e)))))) (C)

In the ﬁrst three axioms there are three uninterpreted FO constant symbols: a

nullary FO constant symbol e(intuitively, think of it as the number one, if it

4Who Finds the Short Proof?

helps), an unary FO function symbol s(think of is as the successor function),

and a binary FO function symbol f, whose semantics is inductively charac-

terised in the axioms A1-A3 (where A2 and A3 are recursive; note that A3 is

actually double recursive). These axioms capture the fact that fbelongs to

a class of extremely fast growing functions, also known as Ackermann(-style)

functions [9]. Exhaustive evaluation of the term f(s(s(s(s(e)))), s(s(s(s(e)))))

with these recursive equations unfolds it to a term that contains more s’s

than there are atoms in the universe. The conjecture C formulated by Boo-

los is to prove that d(think of it as the is-a-natural-number predicate) holds

for f(s(s(s(s(e)))), s(s(s(s(e))))). dis actually an uninterpreted FO predicate

symbol that holds for e(A4), and is propagated from any xto sx (A5). A

formalisation of BCP in the TPTP THF language [10,11] can be found in

Appendix A.

It is easy to see that this proof problem is solvable in FO logic by applying

an astronomically large number of modus ponens steps to A4 and (instances

of) A5. Boolos showed that for a cut-free FO calculus the magnitude of this

number is comparable to f(5,5) for the Ackermann function finterpreted on

the naturals. Due to the enormous number of required inference steps it is

certain that no such proof can realistically be found, let alone represented,

within cut-free FO proof systems. However, Boolos also showed that there

exists a short proof of BCP in second-order logic. This proof has been encoded

and veriﬁed in the mathematical proof assistant systems Omega [12] and Mizar

[13] by Benzm¨uller and Brown [14], at essentially the same level of granularity

as that of Boolos’ proof. This earlier work was recently repeated by Ketland

[15] in the interactive proof assistant Isabelle/HOL [16].

The short proof by Boolos makes use of speciﬁc instances of the compre-

hension principle of second-order logic, an axiom schema given by

∃R.∀x1. . . xn. R(x1, . . . , xn)↔φ(x1, . . . , xn) (COM)

where φ(x1, . . . , xn) is a second-order formula with x1, . . . , xnamong its free

variables, and Ris a second-order variable not free in φ. The comprehen-

sion principle postulates the existence of relations (or predicates) that can be

deﬁned within the second-order language, or, equivalently, that there exists

some Rthat can act as a shorthand notation for the property expressed by

φ(x1, . . . , xn). For function symbols a similar axiom scheme exists. The idea is

to choose the right instances of COM such that helpful lemmas can be used to

enable a short (second-order) proof for BCP. Details are provided in Boolos’

paper.

In a HO logic based on λ-terms, such as Church’s type theory [17–

19], the existence of Rin COM is generally guaranteed: simply choose

R=λx1, . . . , xn. φ(x1, . . . , xn). The COM instances are easily provable, and

axiomatisations of the COM principle are generally avoided in HO ATPs.

During HO proof search, λ-terms for Rare often synthesised on the ﬂy by

HO uniﬁcation. In less trivial cases, however, parts of the term Rneed to be

Who Finds the Short Proof? 5

“guessed” by the ATPs, e.g., by applying a technique called primitive substi-

tution. A well known case is Cantor’s theorem, where an initial part of the

diagonal set description needs to be guessed before the rest can be synthesised;

see, e.g., [20,21] for further details on primitive substitution.

3 Proof Variants

Without further support BCP cannot be solved by today’s HO ATP systems.

Quite surprisingly (to the authors, at least), with a little help provided in the

form of suitable shorthand notations, the ATPs can automatically discover

lemmas that enable a short proof.

In the remainder of this paper we use Boolos’ convention to write sssse

instead of s(s(s(s(e)))); analogously for ssse,sse,se,sx, etc. Moreover, we

generally assume HO notation, so that e.g. f(sssse, sssse) is represented as

((f sssse)sssse) or simply as fssssesssse, avoiding unnecessary parentheses

and even spaces when the correct term and formula structures can be easily

inferred in context.

3.1 Variants using Two Shorthand Notations

When given the following two shorthand notations, various HO ATPs can ﬁnd

a short proof for BCP. The HO representation of BCP is augmented with

two additional uninterpreted symbols ind and p. They are governed by two

additional axioms, (Def ind) and (Def p), stating equalities between constant

symbols and λ-terms.

ind =λX. X e ∧ ∀x. Xx →Xsx (Def ind)

p=λx y. (λz. ∀X. ind X →Xz)fxy (Def p)

Recalling the discussion of COM in Section 2, note that ind and pare just

shorthand notations for the HO λ-terms given on the right. The λ-term abbre-

viated by ind expresses what it means to be an inductive set deﬁned over

eand s. The λ-term abbreviated by pexpresses that for any xand ygiven

as arguments, fxy is in the smallest inductive set deﬁned over eand s.1By

adding these shorthand notations HO ATPs can ﬁnd short proofs for BCP.

In Section 4the experiments conducted using the TPTP THF infrastructure

for HO logic [11,22] are summarized. Interesting lemmas are automatically

discovered, proven, and used. No mathematical or logical ingenuity is needed.

1It is acknowledged that the use of terminology might be slightly misleading here: smight not

be injective, and emight have predecessors; in particular s-cycles have not been axiomatised away.

There is thus an alternative explanation and terminology that could be used: The declarations

of the constants e:: iand s:: i→iexpress (semantically speaking) that every model behaves

as an algebra over the signature Σ = {e/0, s/1}. Then ind X means that Xis a Σ-(sub)algebra

of the model. Thus, the term “smallest inductive set deﬁned over eand s” (suggestively called

“N” by Boolos) corresponds to the inﬁmum/intersection of all Σ-(sub)algebras. From universal

algebra it is known that the set of all subalgebras of any algebra forms a complete lattice, where

inﬁmum corresponds to set-intersection, and thus N:= Tind is guaranteed to exist. Hence p

can be alternatively written as λx y. N (fxy).

6Who Finds the Short Proof?

The proof found by E can easily be converted into the following quite intuitive

informal proof, divided into four parts. The ﬁrst three parts I–III introduce

and prove the key lemmas. This was done fully automatically by E. In the

last part IV, the ﬁnal refutation argument is automatically constructed. The

new lemmas are used in this proof to derive a contradiction from the negated

proof statement. The clause names mentioned in the text below reference the

annotated formula names in the proofs shown in Appendices Band C, where

the proof graphs and the proofs generated by E are presented.2

Part I

From A4, A5, and Def ind, it follows that dis an inductive set:

C48 :ind d (L1a)

From A1, Def ind, and Def p, it follows that:

C31 :pxe for all x(L1b)

From A1, A2, Def ind, and Def p, it follows that pe is an inductive set:

C59 :ind pe (L1c)

(Remember that p is a binary predicate. Modulo currying and the identiﬁ-

cation of sets with their characteristic functions, this means that pe, and its

η-expansion λz. pez, denotes an unary predicate on entities.)

Proof. Fully formal proofs are provided in Appendix C.

The proof of L1a (see the derivation of clause C48 from A4, A5, and Def ind)

is obvious.

Proof of L1b (see the derivation of clause C31 from axiom A1, Def ind, and

Def p): From Def ind get C23 :ind U →Ue and C18 :ind U →(Ux →Usx),

for all Uand x. Moreover, from Def pinfer that for all xand ythere exists

a property qxy (depending on xand y) encoded by epred1 2 @ X @ Y in

Appendix C, such that C16 :qxy (f xy)→pxy and C19 :ind qxy ∨pxy. From C19

and C23 it follows C25 :qxy e∨pxy, for all xand y. From C19 and C18, for all x,y

and z, get C21 : (qxy z →qxy sz)∨pxy. From A1 and C16 get C28 :qze se →pze,

for all z. Finally obtain L1b, i.e., clause C31, from C28 ,C25, and C21.

Proof of L1c (see the derivation of clause C59 from A1, A2, Def ind,

and Def p): Instantiating zwith s(fez′) in C21 and applying the equation

fesz′=ssfez′, which follows from A2, get C50 : (qxy(sf ez)′→qxy(fesz′)) ∨

pxy, for all z′. From this and C16 obtain C53 :qesz′(sf ez′)→pesz′and

also C55 :qesz′(fez′)→pesz′, both for all z′. Now, from A1 and C16 get

C28 :qze se →pze, for all z. Moreover, Def ind implies that for all properties

2In order to improve readability, Ciis used as a label for the clause named c 0 iin the formal

proof by E. Labels are prepended to clauses with a colon, as in Ci:φ, where φis some formula.

Each clause reference is a hyperlink pointing to the clause representation in the original proof

output of E (in a Github repository).

Who Finds the Short Proof? 7

Uthere exists some property kU (depending on U) encoded by esk1 1@U

in Appendix C, such that C27 :Ue →(ind U ∨U kU ) and C37 : (U e ∧UskU)→

ind U . From Def pit follows C26 : (ind U ∧pxy)→U fxy, for all x,yand U.

Using C26 and C27 get that for all xthere exists (k px) such that for all Uholds

C30 : (ind U ∧pxe)→(ind px ∨U fx(k px)). Apply L1b to infer that for all x

there exists (k px) such that for all Uholds C34 :ind U →(ind px ∨Uf x(k px)).

Using C55 it then follows from C34 that C57 :ind pe ∨pes(k pe). From C57 and

C37 ﬁnally obtain L1c, i.e., clause C59.□

Part II

From A1, A2, A3, Def ind, and Def p, it follows:

C47 :∀x.∀Y. (ind px ∧ind psx ∧ind Y )→Y f sxsssse (L2)

Informally, for all entities xand sets Y, if px,psx, and Yare inductive sets

(over e and s), then f sx sssse is in Y.

Proof. See the derivation of clause C47 from clauses C8–C11, corresponding to

Def p, A2, A1, and Def ind, in Appendix C: From A1 and A2 infer C17 :fese =

ssse, so that with C16 it follows C20 :qese ssse →pese, where qis again encoded

by epred1 2 in Appendix C. From this and C21 obtain C22 :qese sse →pese

and C24 :qese se →pese, which together with C25 leads to C29 :pese. Using C26,

i.e., Def p, and f ese =ssse obtain C32 :ind U →u(ssse), for all U. Further

application of C26 leads to C36 : (ind U ∧ind px)→U f xssse and subsequently

to C43 : (ind U ∧ind px ∧ind py)→U fxfyssse. Finally, since fsxsy =f xfsxy

by A3, obtain L2, i.e., clause C47.□

Part III

From A1, A3, Def ind, and Def p, it follows:

C52 :∀x. ind px →ind psx (L3)

Informally, for any x: if px is an inductive set (over eand s), then so is psx.

Proof. See the derivation of clause C52 from clauses C8,C10,C11, and

C35, corresponding to Def p, A1, Def ind, and A3, in Appendix C: From A1,

Def ind, and Def plemma L1b has been established, given by C31 :pxe, for

all x. From Def ind it follows that C37 : (Ue ∧UskU)→ind U for all prop-

erties U. Clauses C26 and C34 have already been inferred above. Together

with A3, it follows that C45 : (ind px ∧ind U )→(U fsx s(k psx)∨ind psx)

for every xand every property U. From clause C45 and Def pit follows that

C49 :ind px →(p sx s(k psx)∨ind psx) for all x, where kis again the Skolem

function called esk1 1 in Appendix C. An application of C49 with C37 yields

(ind px ∧psxe)→ind psx, i.e., choose U=psx. With a simple application of

lemma L1b, instantiated for sx, obtain L3, i.e., clause C52.□

8Who Finds the Short Proof?

Part IV: Proof of C by reductio ad absurdum

To prove C assume ¬C and derive a contradiction. Assume ¬d fssssesssse

(clauses C42 and C46). From this and L1 obtain that psssse is not an inductive

set or that pssse is not an inductive set, i.e., clause C51. Use L3 to show

that pssse,psse and pse are not inductive sets (clauses C54,C56, and C58).

From these, together with L1 and L3, derive a contradiction. By reductio ad

absurdum Cholds. □

Part IV′: Constructive proof of C

The refutation argument IV can be converted into constructive argument IV′,

avoiding reduction ad absurdum: From L1 and L3 pse is an inductive set. By

further applications of L3, psse,pssse and psssse are inductive sets. Use L1a

and L2 to conclude d f ssssesssse.□

Note on shorter proofs

By systematic experimentation using HO ATPs it is possible to further simplify

the proof found by E, and to reduce dependencies. For example (as pointed

out by an anonymous reviewer), the ind px in lemma L2 can be avoided and

L2 can be proved from just Def pand Def ind.

3.2 Variants using One Shorthand Notation

It turns out that the two shorthand notations, ind and pas introduced above,

are dispensable. ind can be avoided when unfolded (but not β-reduced) in p.

The HO ATPs solve this alternative problem formulation even quicker than

before. Instead of ind and puse:

p′=λx y. (λz. ∀X. (λQ. Qe ∧ ∀w. Qw →Qsw)X→Xz)fxy (Def p′)

Alternatively, just the following lemma can be suggested, proven, and subse-

quently used:

∃p′.p′=λx y. (λz. ∀X. (λQ. Qe ∧ ∀w. Qw →Qsw)X→Xz)fxy (L)

This corresponds to (and is equivalent to) the following instance L′of COM:

∃p′.∀x y. p′xy ↔(λz. ∀X. (λQ. Qe ∧ ∀w. Qw →Qsw)X→Xz)fxy (L′)

L and L′illustrate the relationship to comprehension and cut-introduction.

Once introduced, these lemmas can be easily proven by HO ATPs using HO

uniﬁcation, which simply instantiates the existentially quantiﬁed p′with the

λ-term: λx y. (λz. ∀X. (λQ. Qe ∧ ∀w. Qw →Qsw)X→Xz)f xy. Solving BCP

fully automatically using HO ATPs thus boils down to speculating L (or L′),

proving it, and then proving BCP using L (or L′). The HO ATPs can use the

shorthand notations to automatically discover the required cut-lemmas L1a,

Who Finds the Short Proof? 9

L1b, L2, and L3. The encoding of BCP with only L as an axiom is presented

in Appendix F. A refutation proof for this formulation of BCP, found by E in

a few milliseconds, is shown in Appendix G.

4 Results of Experiments with HO ATP’s

To assess the performance and robustness of HO ATPs in ﬁnding short proofs

for BCP, experiments with diﬀerent problem encodings were done, using the

ATPs cvc5 1.0 [4], E 3.0 [3], Leo-III 1.7.0 [6], Vampire 4.7 [23], and Zipperpo-

sition 2.1 [7]. These systems were deployed on the StarExec [24] Miami cluster

running octa-core Intel(R) Xeon(R) E5-2620 v4 @ 2.10GHz CPUs, 128GB

memory, and CentOS Linux 7.4.1708 (Core) operating system. A CPU time

limit of 300s was set.

The range of CPU times reported below is indicative of the range of dif-

ﬁculty of the problems for ATP. Comparison of the proof inference statistics

between diﬀerent ATPs is not meaningful, but for the E ATP the statistics

indicate that the proofs are of comparable complexity.

4.1 Using Two Shorthand Notations

When using the two shorthand notations ind and pdiscussed in Section 3.1,

several HO ATPs are able to prove BCP encoding given in Appendix A. It

is solved by E (13.8s CPU time, 45 inferences in the refutation, 16 inferences

deep), Leo-III (231.0s CPU time, 445 inferences in the refutation, 36 inferences

deep), and Zipperposition (85.7s CPU time, 29 inferences in the refutation,

6 inferences deep).3The proof by E, discussed in Section 3, is presented in

Appendix C.

When encoding ind and pas comprehension instances (see Appendix D)

BCP is proven by only Leo-III (215.8s CPU time, 409 inferences in the refuta-

tion, 37 inferences deep). The resolution proof generated by Leo-III is presented

in Appendix E.

4.2 Using One Shorthand Notation

A problem encoding using the comprehension instance Lfrom Section 3.2 as

an axiom is shown in Appendix F. This problem is solved by only E (0.2s

CPU time, 53 inferences in the refutation, 20 inferences deep). The resolution

proof is shown in Appendix G. The related encoding using Def p′as an axiom

was also proven by only E (0.3s CPU time, 47 inferences in the refutation, 17

inferences deep).

5 Future Challenge: Shorthand Invention

Why are the shorthand notations ind and pnot found automatically by the HO

ATPs? Or, alternatively, why is lemma L (or L′) not automatically introduced,

3An anonymous reviewer was able to ﬁnd a proof also with Vampire 4.6, when using a speciﬁc

parameter setting.

10 Who Finds the Short Proof?

proven, and then used to obtain a short proofs for BCP, by the HO ATPs? The

answer is that the HO ATPs, in the tradition of FO ATPs, put a strong focus

on cut-elimination. They do not incorporate suitable support for controlled

cut/comprehension introduction in their proof search. While cut-freedom is

clearly a desirable property of calculi (a kind of quality seal), things become

really interesting, from a mathematical, cognitive, and AI perspective, when

cut-elimination is given up at least partially, and some forms of controlled cut-

introduction are applied. Once controlled cut-introduction steps are applied,

powerful further key lemmas can be synthesised automatically within a cut-

free resolution-style calculus. In this sense HO ATPs somehow still neglect a

crucial expressivity advantage that they enjoy over FO ATPs. The situation

has already been discussed in a prior paper [8], but so far not much attention

has been paid to this aspect in the HO ATP community.

The experiments with HO ATPs conducted for this paper, and the

discussion above, illustrate the following:

1. Cut-elimination, which has attracted the interest of many ATP researchers

and theoreticians, is important for achieving robust proof automation. The

price of cut-elimination is, however, that some short proofs are eliminated

from the search space; see also [25]. This includes short proofs that can be

found by today’s HO ATPs when the right cut/comprehension introduction

steps are applied. The complete avoidance of cut/comprehension introduc-

tion turns certain solvable problems into unsolvable ones; BCP is one such

example. A hybrid approach seems to be required.

2. Controlled cut/comprehension introduction should be considered to be a

challenge for the 21st century, expanding on the progress that has been

made with regards to cut-elimination since the last century. Machine

learning may have an interesting role to play here.

3. To some extent statement 2 is not entirely new. Earlier research on induc-

tion theorem proving, proof planning and proof methods [26–28] can be seen

as pioneering work on controlled cut/comprehension introduction. Unfor-

tunately this line of research did not receive the attention that it deserved

at the time, due the lack of robustness and coverage of the proof plan-

ning approaches, especially when compared with the much more technically

advanced FO ATPs of the time; cf. the discussions by Bundy [29] and

Benzm¨uller et al. [30]. There is, however, successful recent related work in

this area, including, e.g., work on lemma discovery for induction [31].

The shorthand notations ind and p, resp. lemma L, introduced to solve

BCP with HO ATPs, are actually less out of the blue than they might seem

at ﬁrst glance. Rather, they are indicative of a general proof method:

1. Introducing shorthand notation for set predicates such as ind seems a good

idea in general when certain inductive deﬁnitions are found in the input

problem, as exempliﬁed here by axioms A1–A3 of the Ackermann function.

The formulation of such predicates is quite straightforward.

Who Finds the Short Proof? 11

2. The systematic introduction of related p-predicates also seems quite practi-

cable. The idea is to express that results of applying an inductively deﬁned

function are always contained in the smallest ind-set. As noted in foot-

note 1, this sort of operation can also be viewed from an compositional

(algebraic) perspective.

3. Alternatively, these two steps can be combined and a respective analogon

to lemma L can be used.

In future work we plan to experiment with the implementation and assess-

ment of such a general method for controlled cut/comprehension introduction

in HO ATPs, in order to enable HO ATPs to ﬁnd short proofs for problems like

BCP. This will include further investigation of the surprise eﬀect revealed in

this paper: By proposing appropriate shorthand notation, inspirational lemma

introduction steps that would normally require further cut-introductions

are now synthesized by the cut-free search procedures in HO ATPs. This

unexpected observation deserves further attention and clariﬁcation.

The ability to ﬁnd or construct short proofs plays an important role not

only for FO and HO ATPs, but also for SAT and SMT solvers; see, e.g., the

techniques presented by Heule et al. [32]. Such techniques are orthogonal to

the ﬁndings reported in this paper, which exploit the gain in expressivity when

moving from a less expressive logic to a more expressive one.

6 Conclusion

Holly has a strategy to never lose the contest, provided she chooses BCP as

her challenge problem. Whatever FO ATP Folbert chooses, it will never be

able to ﬁnd and express a short proof for BCP in its FO calculus. On the

ﬂip side, if Holly chooses a state of the art HO ATP, while it might not be

able to solve Folbert’s challenge problem on the ﬁrst days of the contest, one

day it will integrate the controlled cut/comprehension introduction techniques

(and include further related techniques exploiting HO expressiveness), so that

it will be able to speculate the necessary lemmas, and be able ﬁnd a short

proof for BCP (and other related challenge problems). The core observation

is that Holly’s HO ATP can steadily be improved regarding (i) its traditional

FO proof search capabilities, and (ii) clever exploitation of its HO expressivity

advantage. In contrast, Folbert’s FO ATP is stuck with only (i), which will

never lead to a solution of BCP. Even if the contest might (for the time being)

end up in numerous draws, only Holly ever has a chance of entering heaven.

Acknowledgements

We thank the anonymous reviewers for their valuable feedback.

References

[1] G¨odel, K.: ¨

Uber die L¨ange von Beweisen. In: Menger, K., G¨odel, K., Wald,

12 Who Finds the Short Proof?

A. (eds.) Ergebnisse Eines Mathematischen Kolloquiums: Heft X7: 1934-

1935, pp. 23–24. Franz Deuticke, Vienna (1936)

[2] Boolos, G.: A curious inference. J. Philos. Log. 16(1), 1–12 (1987). https:

//doi.org/10.1007/BF00250612

[3] Schulz, S., Cruanes, S., Vukmirovic, P.: Faster, higher, stronger: E 2.3.

In: Fontaine, P. (ed.) CADE 2019. Lecture Notes in Computer Science,

vol. 11716, pp. 495–507. Springer, Cham (2019). https://doi.org/10.1007/

978-3-030-29436-6 29

[4] Barbosa, H., Barrett, C.W., Brain, M., Kremer, G., Lachnitt, H., Mann,

M., Mohamed, A., Mohamed, M., Niemetz, A., N¨otzli, A., Ozdemir, A.,

Preiner, M., Reynolds, A., Sheng, Y., Tinelli, C., Zohar, Y.: cvc5: A ver-

satile and industrial-strength SMT solver. In: Fisman, D., Rosu, G. (eds.)

Tools and Algorithms for the Construction and Analysis of Systems -

28th International Conference, TACAS 2022, Held as Part of the Euro-

pean Joint Conferences on Theory and Practice of Software, ETAPS 2022,

Munich, Germany, April 2-7, 2022, Proceedings, Part I. Lecture Notes

in Computer Science, vol. 13243, pp. 415–442. Springer, Cham (2022).

https://doi.org/10.1007/978-3-030-99524-9 24

[5] Vukmirovic, P., Blanchette, J., Cruanes, S., Schulz, S.: Extending a

brainiac prover to lambda-free higher-order logic. International Journal

on Software Tools for Technology Transfer 24(1), 67–87 (2022). https:

//doi.org/10.1007/s10009-021-00639-7

[6] Steen, A., Benzm¨uller, C.: Extensional higher-order paramodulation in

Leo-III. Journal of Automated Reasoning 65(6), 775–807 (2021). https:

//doi.org/10.1007/s10817-021-09588-x

[7] Bentkamp, A., Blanchette, J., Tourret, S., Vukmirovic, P., Waldmann,

U.: Superposition with lambdas. Journal of Automated Reasoning 65(7),

893–940 (2021). https://doi.org/10.1007/s10817-021-09595-y

[8] Benzm¨uller, C., Kerber, M.: A lost proof. In: Proceed-

ings of the IJCAR 2001 Workshop: Future Directions

in Automated Reasoning, Siena, Italy, pp. 13–24 (2001).

https://www.inf.ed.ac.uk/publications/online/0046/b040.pdf

[9] Ackermann, W.: Zum Hilbertschen Aufbau der reellen Zahlen. Math-

ematische Annalen 99(1), 118–133 (1928). https://doi.org/10.1007/

BF01459088

[10] Sutcliﬀe, G.: The logic languages of the TPTP world. Logic Journal of

the IGPL (2022). https://doi.org/10.1093/jigpal/jzac068

Who Finds the Short Proof? 13

[11] Sutcliﬀe, G., Benzm¨uller, C.: Automated reasoning in higher-order logic

using the TPTP THF infrastructure. Journal of Formalized Reasoning

3(1), 1–27 (2010). https://doi.org/10.6092/issn.1972-5787/1710

[12] Autexier, S., Benzm¨uller, C., Dietrich, D., Siekmann, J.: OMEGA:

Resource-adaptive processes in an automated reasoning systems. In:

Crocker, M.W., Siekmann, J. (eds.) Resource-Adaptive Cognitive Pro-

cesses. Cognitive Technologies, pp. 389–423. Springer, Berlin, Heidelberg

(2010). https://doi.org/10.1007/978-3-540-89408-7 17

[13] Bancerek, G., Bylinski, C., Grabowski, A., Kornilowicz, A., Matuszewski,

R., Naumowicz, A., Pak, K., Urban, J.: Mizar: State-of-the-art and

beyond. In: Kerber, M., Carette, J., Kaliszyk, C., Rabe, F., Sorge,

V. (eds.) Intelligent Computer Mathematics - International Conference,

CICM 2015, Washington, DC, USA, July 13-17, 2015, Proceedings. Lec-

ture Notes in Computer Science, vol. 9150, pp. 261–279. Springer, Cham

(2015). https://doi.org/10.1007/978-3-319-20615-8 17

[14] Benzm¨uller, C., Brown, C.: The curious inference of Boolos in MIZAR and

OMEGA. In: Matuszewski, R., Zalewska, A. (eds.) From Insight to Proof

– Festschrift in Honour of Andrzej Trybulec. Studies in Logic, Gram-

mar, and Rhetoric, vol. 10(23), pp. 299–388. The University of Bialystok,

Poland (2007). http://mizar.org/trybulec65/20.pdf

[15] Ketland, J.: Boolos’s curious inference in Isabelle/HOL.

Archive of Formal Proofs, 1–19 (2022). https://isa-

afp.org/entries/Boolos Curious Inference.html

[16] Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL - A Proof Assis-

tant for Higher-Order Logic. Lecture Notes in Computer Science,

vol. 2283. Springer, Berlin, Heidelberg (2002). https://doi.org/10.1007/

3-540-45949-9

[17] Church, A.: A formulation of the simple theory of types. Journal of

Symbolic Logic 5(2), 56–68 (1940). https://doi.org/10.2307/2266170

[18] Andrews, P.B.: An Introduction to Mathematical Logic and Type Theory.

Applied Logic Series. Springer, Netherlands (2002). https://doi.org/10.

1007/978-94-015-9934-4

[19] Benzm¨uller, C., Andrews, P.: Church’s type theory. In: Zalta, E.N. (ed.)

The Stanford Encyclopedia of Philosophy, Summer 2019 edn., pp. 1–

62. Metaphysics Research Lab, Stanford University, CA, USA (2019).

https://plato.stanford.edu/entries/type-theory-church/

[20] Andrews, P.B.: On connections and higher-order logic. Journal of

Automated Reasoning 5(3), 257–291 (1989). https://doi.org/10.1007/

14 Who Finds the Short Proof?

BF00248320

[21] Benzm¨uller, C., Sultana, N., Paulson, L.C., Theiss, F.: The higher-order

prover LEO-II. Journal of Automated Reasoning 55(4), 389–404 (2015).

https://doi.org/10.1007/s10817-015-9348-y

[22] Sutcliﬀe, G.: The TPTP problem library and associated infrastructure -

from CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning 59(4),

483–502 (2017). https://doi.org/10.1007/s10817-017-9407-7

[23] Kov´acs, L., Voronkov, A.: First-order theorem proving and Vampire.

In: Sharygina, N., Veith, H. (eds.) Computer Aided Veriﬁcation - 25th

International Conference, CAV 2013, Saint Petersburg, Russia, July 13-

19, 2013. Proceedings. Lecture Notes in Computer Science, vol. 8044,

pp. 1–35. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/

978-3-642-39799-8 1

[24] Stump, A., Sutcliﬀe, G., Tinelli, C.: StarExec: A cross-community infras-

tructure for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.)

Automated Reasoning - 7th International Joint Conference, IJCAR 2014,

Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Aus-

tria, July 19-22, 2014. Proceedings. Lecture Notes in Computer Science,

vol. 8562, pp. 367–373. Springer, Cham (2014). https://doi.org/10.1007/

978-3-319-08587-6 28

[25] Boolos, G.: Don’t eliminate cut. Journal of Philosophical Logic 13(4),

373–378 (1984). https://doi.org/10.1007/BF00247711

[26] Boyer, R.S., Moore, J.S.: A Computational Logic. ACM monograph series.

Academic Press, London (1979)

[27] Bundy, A.: A science of reasoning. In: Lassez, J., Plotkin, G.D.

(eds.) Computational Logic - Essays in Honor of Alan Robin-

son, pp. 178–198. The MIT Press, Cambridge, MA, USA (1991).

https://mitpress.mit.edu/9780262121569/

[28] Melis, E., Siekmann, J.H.: Knowledge-based proof planning. Arti-

ﬁcial Intelligence 115(1), 65–105 (1999). https://doi.org/10.1016/

S0004-3702(99)00076-4

[29] Bundy, A.: A critique of proof planning. In: Kakas, A.C., Sadri, F. (eds.)

Computational Logic: Logic Programming and Beyond, Essays in Honour

of Robert A. Kowalski, Part II. Lecture Notes in Computer Science, vol.

2408, pp. 160–177. Springer, Berlin, Heidelberg (2002). https://doi.org/

10.1007/3-540-45632-5 7

Who Finds the Short Proof? 15

[30] Benzm¨uller, C., Meier, A., Melis, E., Pollet, M., Sorge, V.: Proof plan-

ning: A fresh start? In: Proceedings of the IJCAR 2001 Workshop:

Future Directions in Automated Reasoning, Siena, Italy, pp. 25–37 (2001).

https://www.researchgate.net/publication/319393675

[31] Johansson, M.: Lemma discovery for induction - A survey. In: Kaliszyk,

C., Brady, E.C., Kohlhase, A., Coen, C.S. (eds.) Intelligent Computer

Mathematics - 12th International Conference, CICM 2019, Prague, Czech

Republic, July 8-12, 2019, Proceedings. Lecture Notes in Computer Sci-

ence, vol. 11617, pp. 125–139. Springer, Cham (2019). https://doi.org/10.

1007/978-3-030-23250-4 9

[32] Heule, M.J.H., Kiesl, B., Biere, A.: Short proofs without new vari-

ables. In: de Moura, L. (ed.) Automated Deduction - CADE 26 - 26th

International Conference on Automated Deduction, Gothenburg, Swe-

den, August 6-11, 2017, Proceedings. Lecture Notes in Computer Science,

vol. 10395, pp. 130–147. Springer, Cham (2017). https://doi.org/10.1007/

978-3-319-63046-5 9

16 Who Finds the Short Proof?

A BCP with ind and pencoded in TPTP THF

Encoding of BCP with ind and pin TPTP THF syntax [11]; see also the online

sources at https://github.com/cbenzmueller/BoolosCuriousInference-ATP/

tree/main/Boolos1.p.

% Declarations

thf(e_decl,type, e: $i ).

thf(s_decl,type, s: $i > $i ).

thf(d_decl,type, d: $i > $o ).

thf(f_decl,type, f: $i > $i > $i ).

% Auxiliary declarations

thf(ind_decl,type, ind: ( $i > $o ) > $o ).

thf(p_decl,type, p: $i > $i > $o ).

% Axioms

thf(a1,axiom,

! [N: $i] :

((f@N@e)

=(s@e))).

thf(a2,axiom,

! [Y: $i] :

((f@e@(s@Y))

=(s@(s@(f@e@Y))))).

thf(a3,axiom,

! [X: $i,Y: $i] :

((f@(s@X)@(s@Y))

=(f@X@(f@(s@X)@Y)))).

thf(a4,axiom,

d @ e ).

thf(a5,axiom,

! [X: $i] :

((d@X)

=>(d@(s@X)))).

% Shorthand notations

thf(ind_def,axiom,

( ind

= ( ^ [Q: $i > $o] :

((Q@e)

& ! [X: $i] :

((Q@X)

=>(Q@(s@X))))))).

thf(p_def,axiom,

( p

= ( ^ [X: $i,Y: $i] :

( ^ [X2: $i] :

! [X3: $i > $o] :

( ( ind @ X3 )

=> ( X3 @ X2 ) )

@(f@X@Y))))).

% Conjecture

thf(conj_0,conjecture,

d@(f@(s@(s@(s@(s@e))))@(s@(s@(s@(s@e)))))).

18 Who Finds the Short Proof?

C Plain proof by E for BCP with ind and p

See also the online sources at https://github.com/cbenzmueller/

BoolosCuriousInference-ATP/tree/main/Boolos1.proof. The exact call to

prover E was (where /tmp/AjzF3EheOy/SOT_78srga refers to the input ﬁle for E as

presented in Appendix A):

/home/tptp/Systems/E---3.0/eprover-ho --delete-bad-limit=2000000000 --definitional-cnf=24

-s --print-statistics -R --print-version --proof-object --auto-schedule=8 --cpu-limit=60

/tmp/AjzF3EheOy/SOT_78srga

%------------------------------------------------------------------------------

% File : E---3.0

% Problem : SOT_78srga : v?.?.?

% Transfm : none

% Format : tptp:raw

% Command : run_E %s %d THM

% Computer : quoll.cs.miami.edu

% Model : x86_64 x86_64

% CPU : Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz

% Memory : 128720MB

% OS : Linux 3.10.0-1160.71.1.el7.x86_64

% CPULimit : 60s

% WCLimit : 0s

% DateTime : Fri Jul 29 06:14:27 EDT 2022

% Result : Theorem 14.38s 1.87s

% Output : CNFRefutation 14.38s

% Verified :

% SZS Type : Refutation

% Derivation depth : 16

% Number of leaves : 16

% Syntax : Number of formulae : 69 ( 26 unt; 8 typ; 0 def)

% Number of atoms : 141 ( 12 equ; 0 cnn)

% Maximal formula atoms : 7 ( 2 avg)

% Number of connectives : 545 ( 54 ~; 61 |; 7 &; 416 @)

% ( 2 <=>; 5 =>; 0 <=; 0 <~>)

% Maximal formula depth : 13 ( 6 avg)

% Number of types : 2 ( 0 usr)

% Number of type conns : 33 ( 33 >; 0 *; 0 +; 0 <<)

% Number of symbols : 10 ( 8 usr; 2 con; 0-3 aty)

% Number of variables : 77 ( 4 ^ 73 !; 0 ?; 77 :)

% Comments :

%------------------------------------------------------------------------------

thf(decl_22,type,

e: $i ).

thf(decl_23,type,

s: $i > $i ).

thf(decl_24,type,

d: $i > $o ).

thf(decl_25,type,

f: $i > $i > $i ).

thf(decl_26,type,

ind: ( $i > $o ) > $o ).

thf(decl_27,type,

p: $i > $i > $o ).

thf(decl_28,type,

esk1_1: ( $i > $o ) > $i ).

thf(decl_29,type,

epred1_2: $i > $i > $i > $o ).

thf(p_def,axiom,

( p

= ( ^ [X3: $i,X2: $i] :

( ^ [X5: $i] :

! [X6: $i > $o] :

( ( ind @ X6 )

=> ( X6 @ X5 ) )

@(f@X3@X2)))),

file(’/tmp/AjzF3EheOy/SOT_78srga’,p_def) ).

thf(a2,axiom,

! [X2: $i] :

((f@e@(s@X2))

=(s@(s@(f@e@X2)))),

Who Finds the Short Proof? 19

file(’/tmp/AjzF3EheOy/SOT_78srga’,a2) ).

thf(a1,axiom,

! [X1: $i] :

((f@X1@e)

=(s@e)),

file(’/tmp/AjzF3EheOy/SOT_78srga’,a1) ).

thf(ind_def,axiom,

( ind

= ( ^ [X4: $i > $o] :

((X4@e)

& ! [X3: $i] :

( ( X4 @ X3 )

=>(X4@(s@X3)))))),

file(’/tmp/AjzF3EheOy/SOT_78srga’,ind_def) ).

thf(a5,axiom,

! [X3: $i] :

((d@X3)

=>(d@(s@X3))),

file(’/tmp/AjzF3EheOy/SOT_78srga’,a5) ).

thf(a3,axiom,

! [X3: $i,X2: $i] :

((f@(s@X3)@(s@X2))

=(f@X3@(f@(s@X3)@X2))),

file(’/tmp/AjzF3EheOy/SOT_78srga’,a3) ).

thf(a4,axiom,

d @ e,

file(’/tmp/AjzF3EheOy/SOT_78srga’,a4) ).

thf(conj_0,conjecture,

d@(f@(s@(s@(s@(s@e))))@(s@(s@(s@(s@e))))),

file(’/tmp/AjzF3EheOy/SOT_78srga’,conj_0) ).

thf(c_0_8,plain,

! [X13: $i,X14: $i] :

( ( p @ X13 @ X14 )

<=> ! [X6: $i > $o] :

( ( ind @ X6 )

=> ( X6 @ ( f @ X13 @ X14 ) ) ) ),

inference(fof_simplification,[status(thm)],

[inference(fof_simplification,[status(thm)],[p_def])]) ).

thf(c_0_9,plain,

! [X16: $i] :

((f@e@(s@X16))

=(s@(s@(f@e@X16)))),

inference(variable_rename,[status(thm)],[a2]) ).

thf(c_0_10,plain,

! [X15: $i] :

( ( f @ X15 @ e )

=(s@e)),

inference(variable_rename,[status(thm)],[a1]) ).

thf(c_0_11,plain,

! [X12: $i > $o] :

( ( ind @ X12 )

<=> ( ( X12 @ e )

& ! [X3: $i] :

( ( X12 @ X3 )

=>(X12@(s@X3))))),

inference(fof_simplification,[status(thm)],

[inference(fof_simplification,[status(thm)],[ind_def])]) ).

thf(c_0_12,plain,

! [X24: $i,X25: $i,X26: $i > $o,X27: $i,X28: $i] :

( ( ~ ( p @ X24 @ X25 )

| ~ ( ind @ X26 )

|(X26@(f@X24@X25)))

& ( ( ind @ ( epred1_2 @ X27 @ X28 ) )

| ( p @ X27 @ X28 ) )

& ( ~ ( epred1_2 @ X27 @ X28 @ ( f @ X27 @ X28 ) )

| ( p @ X27 @ X28 ) ) ),

inference(distribute,[status(thm)],[inference(shift_quantors,[status(thm)],

[inference(skolemize,[status(esa)],

[inference(variable_rename,[status(thm)],[inference(shift_quantors,[status(thm)],

[inference(fof_nnf,[status(thm)],[c_0_8])])])])])]) ).

thf(c_0_13,plain,

! [X1: $i] :

((f@e@(s@X1))

=(s@(s@(f@e@X1)))),

inference(split_conjunct,[status(thm)],[c_0_9]) ).

thf(c_0_14,plain,

! [X1: $i] :

((f@X1@e)

20 Who Finds the Short Proof?

=(s@e)),

inference(split_conjunct,[status(thm)],[c_0_10]) ).

thf(c_0_15,plain,

! [X20: $i > $o,X21: $i,X22: $i > $o] :

(((X20@e)

| ~ ( ind @ X20 ) )

& ( ~ ( X20 @ X21 )

| ( X20 @ ( s @ X21 ) )

| ~ ( ind @ X20 ) )

& ( ( X22 @ ( esk1_1 @ X22 ) )

|~(X22@e)

| ( ind @ X22 ) )

& ( ~ ( X22 @ ( s @ ( esk1_1 @ X22 ) ) )

|~(X22@e)

| ( ind @ X22 ) ) ),

inference(distribute,[status(thm)],

[inference(shift_quantors,[status(thm)],[inference(skolemize,[status(esa)],

[inference(variable_rename,[status(thm)],[inference(shift_quantors,[status(thm)],

[inference(fof_nnf,[status(thm)],[c_0_11])])])])])]) ).

thf(c_0_16,plain,

! [X1: $i,X2: $i] :

( ( p @ X1 @ X2 )

| ~ ( epred1_2 @ X1 @ X2 @ ( f @ X1 @ X2 ) ) ),

inference(split_conjunct,[status(thm)],[c_0_12]) ).

thf(c_0_17,plain,

((f@e@(s@e))

=(s@(s@(s@e)))),

inference(spm,[status(thm)],[c_0_13,c_0_14]) ).

thf(c_0_18,plain,

! [X1: $i,X4: $i > $o] :

( ( X4 @ ( s @ X1 ) )

|~(X4@X1)

| ~ ( ind @ X4 ) ),

inference(split_conjunct,[status(thm)],[c_0_15]) ).

thf(c_0_19,plain,

! [X1: $i,X2: $i] :

( ( ind @ ( epred1_2 @ X1 @ X2 ) )

| ( p @ X1 @ X2 ) ),

inference(split_conjunct,[status(thm)],[c_0_12]) ).

thf(c_0_20,plain,

((p@e@(s@e))

|~(epred1_2@e@(s@e)@(s@(s@(s@e))))),

inference(spm,[status(thm)],[c_0_16,c_0_17]) ).

thf(c_0_21,plain,

! [X1: $i,X2: $i,X3: $i] :

( ( epred1_2 @ X1 @ X2 @ ( s @ X3 ) )

| ( p @ X1 @ X2 )

| ~ ( epred1_2 @ X1 @ X2 @ X3 ) ),

inference(spm,[status(thm)],[c_0_18,c_0_19]) ).

thf(c_0_22,plain,

((p@e@(s@e))

|~(epred1_2@e@(s@e)@(s@(s@e)))),

inference(spm,[status(thm)],[c_0_20,c_0_21]) ).

thf(c_0_23,plain,

! [X4: $i > $o] :

( ( X4 @ e )

| ~ ( ind @ X4 ) ),

inference(split_conjunct,[status(thm)],[c_0_15]) ).

thf(c_0_24,plain,

((p@e@(s@e))

| ~ ( epred1_2 @ e @ ( s @ e ) @ ( s @ e ) ) ),

inference(spm,[status(thm)],[c_0_22,c_0_21]) ).

thf(c_0_25,plain,

! [X1: $i,X2: $i] :

( ( epred1_2 @ X1 @ X2 @ e )

| ( p @ X1 @ X2 ) ),

inference(spm,[status(thm)],[c_0_23,c_0_19]) ).

thf(c_0_26,plain,

! [X1: $i,X2: $i,X4: $i > $o] :

((X4@(f@X1@X2))

|~(p@X1@X2)

| ~ ( ind @ X4 ) ),

inference(split_conjunct,[status(thm)],[c_0_12]) ).

thf(c_0_27,plain,

! [X4: $i > $o] :

( ( X4 @ ( esk1_1 @ X4 ) )

| ( ind @ X4 )

| ~ ( X4 @ e ) ),

Who Finds the Short Proof? 21

inference(split_conjunct,[status(thm)],[c_0_15]) ).

thf(c_0_28,plain,

! [X1: $i] :

((p@X1@e)

| ~ ( epred1_2 @ X1 @ e @ ( s @ e ) ) ),

inference(spm,[status(thm)],[c_0_16,c_0_14]) ).

thf(c_0_29,plain,

p@e@(s@e),

inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_24,c_0_21]),c_0_25]) ).

thf(c_0_30,plain,

! [X1: $i,X4: $i > $o] :

( ( X4 @ ( f @ X1 @ ( esk1_1 @ ( p @ X1 ) ) ) )

| ( ind @ ( p @ X1 ) )

|~(p@X1@e)

| ~ ( ind @ X4 ) ),

inference(spm,[status(thm)],[c_0_26,c_0_27]) ).

thf(c_0_31,plain,

! [X1: $i] : ( p @ X1 @ e ),

inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_28,c_0_21]),c_0_25]) ).

thf(c_0_32,plain,

! [X4: $i > $o] :

((X4@(s@(s@(s@e))))

| ~ ( ind @ X4 ) ),

inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_26,c_0_29]),c_0_17]) ).

thf(c_0_33,plain,

! [X19: $i] :

(~(d@X19)

|(d@(s@X19))),

inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[a5])]) ).

thf(c_0_34,plain,

! [X1: $i,X4: $i > $o] :

( ( X4 @ ( f @ X1 @ ( esk1_1 @ ( p @ X1 ) ) ) )

| ( ind @ ( p @ X1 ) )

| ~ ( ind @ X4 ) ),

inference(cn,[status(thm)],[inference(rw,[status(thm)],[c_0_30,c_0_31])]) ).

thf(c_0_35,plain,

! [X17: $i,X18: $i] :

((f@(s@X17)@(s@X18))

=(f@X17@(f@(s@X17)@X18))),

inference(variable_rename,[status(thm)],[a3]) ).

thf(c_0_36,plain,

! [X1: $i,X4: $i > $o] :

((X4@(f@X1@(s@(s@(s@e)))))

|~(ind@(p@X1))

| ~ ( ind @ X4 ) ),

inference(spm,[status(thm)],[c_0_26,c_0_32]) ).

thf(c_0_37,plain,

! [X4: $i > $o] :

( ( ind @ X4 )

| ~ ( X4 @ ( s @ ( esk1_1 @ X4 ) ) )

| ~ ( X4 @ e ) ),

inference(split_conjunct,[status(thm)],[c_0_15]) ).

thf(c_0_38,plain,

! [X1: $i] :

((d@(s@X1))

| ~ ( d @ X1 ) ),

inference(split_conjunct,[status(thm)],[c_0_33]) ).

thf(c_0_39,plain,

d @ e,

inference(split_conjunct,[status(thm)],[a4]) ).

thf(c_0_40,plain,

! [X1: $i,X2: $i,X4: $i > $o] :

((X4@(f@X1@(f@X2@(esk1_1@(p@X2)))))

| ( ind @ ( p @ X2 ) )

|~(ind@(p@X1))

| ~ ( ind @ X4 ) ),

inference(spm,[status(thm)],[c_0_26,c_0_34]) ).

thf(c_0_41,plain,

! [X1: $i,X2: $i] :

((f@(s@X1)@(s@X2))

=(f@X1@(f@(s@X1)@X2))),

inference(split_conjunct,[status(thm)],[c_0_35]) ).

thf(c_0_42,negated_conjecture,

~(d@(f@(s@(s@(s@(s@e))))@(s@(s@(s@(s@e)))))),

inference(fof_simplification,[status(thm)],[inference(assume_negation,[status(cth)],[conj_0])]) ).

22 Who Finds the Short Proof?

thf(c_0_43,plain,

! [X1: $i,X2: $i,X4: $i > $o] :

((X4@(f@X1@(f@X2@(s@(s@(s@e))))))

|~(ind@(p@X2))

|~(ind@(p@X1))

| ~ ( ind @ X4 ) ),

inference(spm,[status(thm)],[c_0_26,c_0_36]) ).

thf(c_0_44,plain,

( ( ind @ d )

| ~ ( d @ ( esk1_1 @ d ) ) ),

inference(cn,[status(thm)],[inference(rw,[status(thm)],

[inference(spm,[status(thm)],[c_0_37,c_0_38]),c_0_39])]) ).

thf(c_0_45,plain,

! [X1: $i,X4: $i > $o] :

((X4@(f@(s@X1)@(s@(esk1_1@(p@(s@X1))))))

|(ind@(p@(s@X1)))

|~(ind@(p@X1))

| ~ ( ind @ X4 ) ),

inference(spm,[status(thm)],[c_0_40,c_0_41]) ).

thf(c_0_46,negated_conjecture,

~(d@(f@(s@(s@(s@(s@e))))@(s@(s@(s@(s@e)))))),

inference(split_conjunct,[status(thm)],[c_0_42]) ).

thf(c_0_47,plain,

! [X1: $i,X4: $i > $o] :

((X4@(f@(s@X1)@(s@(s@(s@(s@e))))))

|~(ind@(p@(s@X1)))

|~(ind@(p@X1))

| ~ ( ind @ X4 ) ),

inference(spm,[status(thm)],[c_0_43,c_0_41]) ).

thf(c_0_48,plain,

ind @ d,

inference(cn,[status(thm)],[inference(rw,[status(thm)],

[inference(spm,[status(thm)],[c_0_44,c_0_27]),c_0_39])]) ).

thf(c_0_49,plain,

! [X1: $i] :

((p@(s@X1)@(s@(esk1_1@(p@(s@X1)))))

|(ind@(p@(s@X1)))

|~(ind@(p@X1))),

inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_16,c_0_45]),c_0_19]) ).

thf(c_0_50,plain,

! [X1: $i,X2: $i,X3: $i] :

( ( epred1_2 @ X1 @ X2 @ ( f @ e @ ( s @ X3 ) ) )

| ( p @ X1 @ X2 )

| ~ ( epred1_2 @ X1 @ X2 @ ( s @ ( f @ e @ X3 ) ) ) ),

inference(spm,[status(thm)],[c_0_21,c_0_13]) ).

thf(c_0_51,negated_conjecture,

(~(ind@(p@(s@(s@(s@(s@e))))))

|~(ind@(p@(s@(s@(s@e)))))),

inference(cn,[status(thm)],[inference(rw,[status(thm)],

[inference(spm,[status(thm)],[c_0_46,c_0_47]),c_0_48])]) ).

thf(c_0_52,plain,

! [X1: $i] :

((ind@(p@(s@X1)))

|~(ind@(p@X1))),

inference(cn,[status(thm)],[inference(rw,[status(thm)],

[inference(spm,[status(thm)],[c_0_37,c_0_49]),c_0_31])]) ).

thf(c_0_53,plain,

! [X1: $i] :

((p@e@(s@X1))

|~(epred1_2@e@(s@X1)@(s@(f@e@X1)))),

inference(spm,[status(thm)],[c_0_16,c_0_50]) ).

thf(c_0_54,negated_conjecture,

~(ind@(p@(s@(s@(s@e))))),

inference(spm,[status(thm)],[c_0_51,c_0_52]) ).

thf(c_0_55,plain,

! [X1: $i] :

((p@e@(s@X1))

| ~ ( epred1_2 @ e @ ( s @ X1 ) @ ( f @ e @ X1 ) ) ),

inference(spm,[status(thm)],[c_0_53,c_0_21]) ).

thf(c_0_56,negated_conjecture,

~(ind@(p@(s@(s@e)))),

inference(spm,[status(thm)],[c_0_54,c_0_52]) ).

thf(c_0_57,plain,

((p@e@(s@(esk1_1@(p@e))))

|(ind@(p@e))),

inference(csr,[status(thm)],[inference(spm,[status(thm)],[c_0_55,c_0_34]),c_0_19]) ).

Who Finds the Short Proof? 23

thf(c_0_58,negated_conjecture,

~(ind@(p@(s@e))),

inference(spm,[status(thm)],[c_0_56,c_0_52]) ).

thf(c_0_59,plain,

ind @ ( p @ e ),

inference(cn,[status(thm)],[inference(rw,[status(thm)],

[inference(spm,[status(thm)],[c_0_37,c_0_57]),c_0_31])]) ).

thf(c_0_60,negated_conjecture,

$false,

inference(cn,[status(thm)],[inference(rw,[status(thm)],

[inference(spm,[status(thm)],[c_0_58,c_0_52]),c_0_59])]),[proof] ).

%------------------------------------------------------------------------------

%----ORIGINAL SYSTEM OUTPUT

% Running higher-order theorem proving

% Running: /home/tptp/Systems/E---3.0/eprover-ho --delete-bad-limit=2000000000

% --definitional-cnf=24 -s --print-statistics -R --print-version --proof-object

% --auto-schedule=8 --cpu-limit=60 /tmp/AjzF3EheOy/SOT_78srga

% # Version: 3.0pre003-ho

% # partial match(1): HSMSSMSSSSSNFFN

% # Preprocessing class: HSMSSMSSSSSNHFN.

% # Scheduled 4 strats onto 8 cores with 60 seconds (480 total)

% # Starting new_ho_10 with 300s (5) cores

% # Starting new_bool_1 with 60s (1) cores

% # Starting full_lambda_5 with 60s (1) cores

% # Starting new_ho_10_unif with 60s (1) cores

% # new_ho_10 with pid 63439 completed with status 0

% # Result found by new_ho_10

% # partial match(1): HSMSSMSSSSSNFFN

% # Preprocessing class: HSMSSMSSSSSNHFN.

% # Scheduled 4 strats onto 8 cores with 60 seconds (480 total)

% # Starting new_ho_10 with 300s (5) cores

% # No SInE strategy applied

% # Search class: HGUSS-FFSF21-MHFFFSBN

% # partial match(2): HGUSS-FFSF11-MHHFFSBN

% # Scheduled 6 strats onto 5 cores with 300 seconds (300 total)

% # Starting new_ho_10_unif with 163s (1) cores

% # Starting new_ho_10 with 31s (1) cores

% # Starting lpo8_s with 28s (1) cores

% # Starting sh5l with 28s (1) cores

% # Starting sh2lt with 28s (1) cores

% # new_ho_10 with pid 63446 completed with status 0

% # Result found by new_ho_10

% # partial match(1): HSMSSMSSSSSNFFN

% # Preprocessing class: HSMSSMSSSSSNHFN.

% # Scheduled 4 strats onto 8 cores with 60 seconds (480 total)

% # Starting new_ho_10 with 300s (5) cores

% # No SInE strategy applied

% # Search class: HGUSS-FFSF21-MHFFFSBN

% # partial match(2): HGUSS-FFSF11-MHHFFSBN

% # Scheduled 6 strats onto 5 cores with 300 seconds (300 total)

% # Starting new_ho_10_unif with 163s (1) cores

% # Starting new_ho_10 with 31s (1) cores

% # Preprocessing time : 0.004 s

% # Presaturation interreduction done

%

% # Proof found!

% # SZS status Theorem

% # SZS output start CNFRefutation

% See solution above

% # Parsed axioms : 14

% # Removed by relevancy pruning/SinE : 0

% # Initial clauses : 19

% # Removed in clause preprocessing : 6

% # Initial clauses in saturation : 13

% # Processed clauses : 4451

% # ...of these trivial : 952

% # ...subsumed : 2217

% # ...remaining for further processing : 1282

% # Other redundant clauses eliminated : 0

% # Clauses deleted for lack of memory : 0

% # Backward-subsumed : 77

% # Backward-rewritten : 139

% # Generated clauses : 54638

% # ...of the previous two non-redundant : 49054

% # ...aggressively subsumed : 0

% # Contextual simplify-reflections : 95

% # Paramodulations : 54638

% # Factorizations : 0

% # NegExts : 0

% # Equation resolutions : 0

% # Propositional unsat checks : 0

% # Propositional check models : 0

% # Propositional check unsatisfiable : 0

% # Propositional clauses : 0

% # Propositional clauses after purity: 0

% # Propositional unsat core size : 0

% # Propositional preprocessing time : 0.000

% # Propositional encoding time : 0.000

24 Who Finds the Short Proof?

% # Propositional solver time : 0.000

% # Success case prop preproc time : 0.000

% # Success case prop encoding time : 0.000

% # Success case prop solver time : 0.000

% # Current number of processed clauses : 1053

% # Positive orientable unit clauses : 443

% # Positive unorientable unit clauses: 10

% # Negative unit clauses : 4

% # Non-unit-clauses : 596

% # Current number of unprocessed clauses: 44213

% # ...number of literals in the above : 85482

% # Current number of archived formulas : 0

% # Current number of archived clauses : 229

% # Clause-clause subsumption calls (NU) : 180166

% # Rec. Clause-clause subsumption calls : 178829

% # Non-unit clause-clause subsumptions : 2351

% # Unit Clause-clause subsumption calls : 4024

% # Rewrite failures with RHS unbound : 0

% # BW rewrite match attempts : 9295

% # BW rewrite match successes : 149

% # Condensation attempts : 4451

% # Condensation successes : 0

% # Termbank termtop insertions : 4271812

%

% # -------------------------------------------------

% # User time : 1.728 s

% # System time : 0.068 s

% # Total time : 1.795 s

% # Maximum resident set size: 1844 pages

%

% # -------------------------------------------------

% # User time : 8.644 s

% # System time : 0.424 s

% # Total time : 9.067 s

% # Maximum resident set size: 1732 pages

%

%------------------------------------------------------------------------------

Who Finds the Short Proof? 25

D BCP with ind and pas comprehension

instances encoded in TPTP THF

See also the online sources at https://github.com/cbenzmueller/

BoolosCuriousInference-ATP/tree/main/Boolos1alt.p.

% Declarations

thf(e_decl,type, e: $i ).

thf(s_decl,type, s: $i > $i ).

thf(d_decl,type, d: $i > $o ).

thf(f_decl,type, f: $i > $i > $i ).

% Auxiliary declarations

thf(ind_decl,type, ind: ( $i > $o ) > $o ).

thf(p_decl,type, p: $i > $i > $o ).

% Axioms

thf(a1,axiom,

! [N: $i] :

((f@N@e)

=(s@e))).

thf(a2,axiom,

! [Y: $i] :

((f@e@(s@Y))

=(s@(s@(f@e@Y))))).

thf(a3,axiom,

! [X: $i,Y: $i] :

((f@(s@X)@(s@Y))

=(f@X@(f@(s@X)@Y)))).

thf(a4,axiom,

d @ e ).

thf(a5,axiom,

! [X: $i] :

((d@X)

=>(d@(s@X)))).

% Shorthand notations

thf(ind_def,definition,

! [Q: $i > $o] :

( ( ind @ Q )

=((Q@e)

& ! [X: $i] :

((Q@X)

=>(Q@(s@X)))))).

thf(p_def,definition,

! [X: $i,Y: $i] :

((p@X@Y)

= ( ^ [Z: $i] : ( ! [Q: $i > $o] :

( ( ind @ Q )

=>(Q@Z)))@(f@X@Y)))).

% Conjecture

thf(conj_0,conjecture,

d@(f@(s@(s@(s@(s@e))))@(s@(s@(s@(s@e)))))).

E Plain proof by Leo-III for BCP with ind and

pas comprehension instances

Leo-III’s proof is provided online at: https://github.com/cbenzmueller/

BoolosCuriousInference-ATP/tree/main/Boolos1Alt.proof.

26 Who Finds the Short Proof?

F BCP with L encoded in TPTP THF

Encoding of BCP with L in TPTP THF syntax; cf. also the online sources:

% Declarations

thf(e_decl,type, e: $i ).

thf(s_decl,type, s: $i > $i ).

thf(d_decl,type, d: $i > $o ).

thf(f_decl,type, f: $i > $i > $i ).

% Axioms

thf(a1,axiom,

! [N: $i] :

((f@N@e)

=(s@e))).

thf(a2,axiom,

! [Y: $i] :

((f@e@(s@Y))

=(s@(s@(f@e@Y))))).

thf(a3,axiom,

! [X: $i,Y: $i] :

((f@(s@X)@(s@Y))

=(f@X@(f@(s@X)@Y)))).

thf(a4,axiom,

d @ e ).

thf(a5,axiom,

! [X: $i] :

((d@X)

=>(d@(s@X)))).

% Shorthand notation p as comprehension instance

thf(p_def,axiom,

? [P: $i > $i > $o] :

( P

= ( ^ [X: $i,Y: $i] :

( ^ [Z: $i] :

! [R: $i > $o] :

( ( ^ [Q: $i > $o] :

((Q@e)

& ! [X: $i] :

((Q@X)

=>(Q@(s@X))))

@ R )

=>(R@Z))

@(f@X@Y))))).

% Conjecture

thf(conj_0,conjecture,

d@(f@(s@(s@(s@(s@e))))@(s@(s@(s@(s@e)))))).

G Plain proof by E for BCP with L

E’s proof is provided online at: https://github.com/cbenzmueller/

BoolosCuriousInference-ATP/tree/main/BoolosComp.proof.