Content uploaded by Christoph Walther

Author content

All content in this area was uploaded by Christoph Walther on Feb 23, 2017

Content may be subject to copyright.

Fermat, Euler, Wilson -

Three Case Studies in Number Theory

Christoph Walther and Nathan Wasser

Fachbereich Informatik

Technische Universität Darmstadt ?

Abstract. We report on computer assisted proofs of three theorems

from Number Theory, viz. Fermat’s Little Theorem, Euler’s generaliza-

tion of Fermat’s statement and Wilson’s Theorem. Common to the for-

mal proofs is that permutation of certain number lists has to be proved,

which causes the main e¤ort in the development. We give a short survey

of the XeriFun system used in this experiment and illustrate the proofs

before presenting them formally. We also discuss alternative solutions,

report on the required e¤ort and conclude with some experiences gained

from this experiment.

1 Introduction

Case studies and experiments with reasoning systems demonstrate the state of

the art and provide a valuable source for comparing systems and their further

development. In this paper, we report on computer assisted proofs of three the-

orems found in the …rst chapters of textbooks of Number Theory, viz. Fermat’s

Little Theorem, Euler’s generalization of Fermat’s statement and Wilson’s The-

orem. Common to these proofs is that permutation of certain number lists has

to be shown. This causes the main e¤ort in developing the formal proofs instead

of (contrary to what some might expect) the work of proving “deep” number

theoretic facts about prime numbers, divisibility, residue classes etc.

The formal proofs presented here are performed with the XeriFun system [1].1

This system was designed and developed as an easy to learn and easy to use tool

for teaching Automated Reasoning, Semantics, Veri…cation and similar subjects

and has been used in beginner courses about Formal Methods as well as in

practical courses about Program Veri…cation for about 15 years [11].

The system’s object language consists of principles for de…ning polymorphic

data structures, procedures operating on them, and for statements (called “lem-

mas”) about the data structures and procedures. Fig. 1 displays some examples.

The data structure bool and the data structure Nfor natural numbers built

with the constructors 0and +(: : :)for the successor function are prede…ned in

?Tech n ical R ep o rt V F R 1 6/ 01 — Se pt em b er 1 5, 20 16 . A cc ep te d for p ub li ca ti on in

Jo u rn a l of A u to m a ted Reas o ni n g. T he …n al p ublication is available at S p rin g er v ia

htt p://d x. do i.or g/ 10 .1 00 7/s 10 81 7-016 -9 38 7- z.

1An acronym for “A Veri…er for Functional Programs”.

structure bool <= true, false

structure N<= 0, +(:N)

structure list[@I] <= ø, [infixr,10] ::(hd : @I, tl : list[@I])

function 2(k : list[N], p : N) : N<=

ifk=ø

then 1

else if tl(k) = ø

then (hd(k) mod p)

else 2(tl(tl(k)), p) (hd(k) hd(tl(k)) mod p)

end_if

end_if

lemma P(p) ^p-x!I(I(x, p), p) = (x mod p) <=8p, x : N

if{P(p), if{:(x mod p) = 0, I(I(x, p), p) = (x mod p), true}, true}

Fig. 1. Data structures, procedures and lemmas in XeriFun

the system. (: : :)is the selector of +(: : :)thus representing the predecessor

function. list is the only user de…ned data structure required for the case stud-

ies of this paper. Identi…ers preceded by @denote type variables, and therefore

polymorphic lists are de…ned here. Lists are built with the constructors øfor the

empty list and :: (given in in…x-notation). The functions hd and tl (for head

and tail) are the selectors of :: yielding the leftmost list element and the list

with the leftmost element removed respectively.

Procedures are de…ned with conditionals if :bool !, functional

composition and recursion (case-conditionals and let -expressions may be used

as well). E.g., procedure 2of Fig. 1 (which will be used for proving Wilson’s

Theorem in Sec. 4) computes the product of a number list kby pairwise mul-

tiplication modulo pof the list elements (where the procedures and mod for

multiplication and the remainder operation are de…ned elsewhere).

Unde…nedness is implemented by underspeci…cation, i.e. by using incomplete

conditionals. Such a feature is required when working with polymorphic data

structures because results cannot be de…ned for certain function applications. For

instance, no result can be assigned to last(ø) for a procedure last computing

the last element of a polymorphic list, and hd(ø) remains unde…ned for the same

reason. This feature is useful for monomorphic data structures too as it avoids

the need for stipulating arti…cial results, e.g. for xmod 0.

Lemmas are de…ned with conditionals if :bool bool bool !bool as

the only connective (but negation :,case-conditionals and let -expressions may

be used as well). Only universal quanti…cation is allowed for the variables of

a lemma. Fig. 1 displays a lemma about procedure Iwhich will be used in

Section 4. The string in the headline is just an identi…er assigning a name to

the lemma for reference and must not be confused with the lemma body which

2

is a term of type bool representing the real statement of the lemma. For easing

readability, we will subsequently refer to lemmas by their names only.

Lemmas are proved with the HPL-calculus (abbreviating Hypotheses,Pro-

grams and Lemmas ) [11]. The most relevant proof rules of this calculus are

Induction,Use Lemma,Apply Equation,Unfold Procedure,Case Analysis and

Simpli…cation. Formulas are given as sequents of form H; IH `goal, where H

is a …nite set of hypotheses given as literals, i.e. negated or unnegated if -free

boolean terms, IH is a …nite set of induction hypotheses given as possibly quan-

ti…ed boolean terms and goal is a boolean term, called the goalterm of the

sequent. A deduction in the HPL-calculus is represented by a tree whose nodes

are given by sequents. A lemma `is developed i¤ goal equals true for each se-

quent at a leaf of the proof tree associated with `, and is veri…ed if additionally

each lemma applied by Use Lemma or Apply Equation when building the proof

tree is veri…ed.2

The Induction rule creates the base and step cases for a sequent from an

induction axiom, where such an axiom is obtained from the recursion structure

of some terminating procedure. For proving Fermat’s Little Theorem in Section

2.2, for instance, we de…ne a procedure

function <(k : list[N]) : bool <=

if k = ø then false else <(k nmax(k)) end_if

(where knndeletes the leftmost occurrence of nin kif n2kand max(k)

computes a maximal element of k6=ø) and instruct the system with Induction

to prove a certain lemma with body 8k:list[N]; n:Ngoal[n; k ]by induction upon

<.3The system then expands the proof tree of the lemma at the root node

fg;fg ` goal[n; k]by adding the successor nodes fk=øg;fg ` goal[n; k]and

fk6=øg;f8n0:Ngoal[n0; knmax(k)]g ` goal[n; k].

By choosing Simpli…cation, the system’s …rst-order theorem prover, called

the Symbolic Evaluator, is started for rewriting a sequent’s goalterm by using

the hypotheses and induction hypotheses of the sequent, the de…nitions of the

data structures and procedures as well as the lemmas already veri…ed [12]. Since

non-valid proof obligations are frequently encountered in the systems’s domain,

this reasoner is necessarily incomplete as a potential looping system would result

otherwise. It is guided by heuristics, e.g. for deciding whether to use a procedure

de…nition, for supporting goal directed equality reasoning, for speeding up proof

search by …ltering out useless lemmas, etc. In particular, a good compromise

between theorem proving performance, run time e¢ ciency and usefulness of re-

sults had to be found, as a poor performing prover necessitates frequent user

interactions, greatly delayed answer time disturbs the concentration of the im-

patient user and results di¤erent from true must enable the user to interactively

2The base of this recursive de…nition is given by lemmas being proved without using

other lemmas.

3In allmost all cases, the induction axiom required for proving a lemma `is provided

by one of the procedures used in `. However, in rare cases the user must de…ne an

additional procedure like <for stipulating a speci…c induction axiom, either because

this is needed or (as in the present case) eases a proof considerably.

3

continue with the proof. The Symbolic Evaluator is implemented as a completely

automated tool over which the user has no control, thus leaving the HPL-proof

rules as the only means to control the system’s theorem proving behavior.

Also the HPL-calculus is controlled by heuristics. By applying the Verify

command to a lemma, the system starts to compute a proof tree by choosing

appropriate HPL-proof rules heuristically. If a proof attempt gets stuck, the user

has to step in by applying a proof rule to some leaf of the proof tree (sometimes

after pruning some unwanted branch of the tree), and the system then takes over

control again. Also it may happen that a further lemma must be formulated by

the user before the proof under consideration can be completed.

Upon the de…nition of a procedure, XeriFun’s automated termination analy-

sis (based on the method of Argument-Bounded Functions [10]) is invoked for

generating termination hypotheses which are su¢ cient for the procedure’s ter-

mination and are proved like lemmas. If the system fails to compute termi-

nation hypotheses, the user must provide an appropriate termination function

(which is not required for the procedures of the case studies discussed here).

Afterwards induction axioms are computed from the terminating procedures’

recursion structure to be on stock for future use.

Proof development is supported by further tools: The system may generalize

proof obligations to enable induction proofs, procedures can be tested by running

them in an interpreter, and a disprover is automatically invoked for checking

system generated conjectures, like termination hypotheses or generalizations, or

is called by the user to test a lemma before a proof attempt is started.

Great e¤ort had been spend upon system development to ease the use of the

system so that even students in the 1st half of the 2nd year were able to solve

veri…cation challenges for basic arithmetic, sorting and searching algorithms,

pattern matching, string and list processing, tautology checkers, etc. The most

important design principles had been Adequate Terms of Interaction allowing a

user to support the system by only suggesting useful lemmas and HPL-proof rule

applications when an automated proof attempt gets stuck, Transparency, which

means that the HPL-proofs as well as the proofs computed by the Symbolic Eval-

uator can be inspected and are presented in a user friendly and understandable

way, and Hidden Implementation to enable a user to succeed in a veri…cation

challenge without knowing the system’s implementation and heuristics.4Usabil-

ity is supported in addition by a high degree of automation, see [11] for more

details. XeriFun is implemented in Java and installers for running the system

under Windows,Unix/Linux or Mac are available from the web [1].

When working with the system, we use proof libraries which had been set up

over the years by extending them with de…nitions and lemmas being of general

interest and therefore may be useful when working on a new case study. When

importing a de…nition or a lemma from a library into a case study, all program

elements and proofs the imported item depends on are imported as well. For

4Unfortunately, we did not succeed completely in hiding system internals, as there are

2“tricks”in the formulation of lemmas which support automated equality reasoning,

see Section 2.1 for a discussion.

4

instance, the proof of Wilson’s Theorem depends on 27 procedures and 156

proven lemmas ranging from simple statements like the associativity of addition

up to “deep”theorems like Fermat’s Little Theorem. In the sequel we will only

list the lemmas which are essential to understand the proofs and refer to [1] for

a complete account of all used lemmas and their proofs.

2 Fermat’s Little Theorem

Fermat’s Little Theorem states that ap1mod p = 1 for numbers anot divisible

by a prime p. It belongs to the standard repertoire of Number Theory and is use-

ful to prove other statements like, for instance, the correctness of RSA encryption

and Wilson’s Theorem.

2.1 The Binomial Theorem

One proof quite often found in textbooks of Number Theory uses the Binomial

Theorem 5

8x; n:Nx6= 0 !(x+ 1)n=

n

P

i=0 n

ixi(1)

for proving 6

8p; a:N P(p)!apmod p =a mod p (2)

by induction upon a: The proofs of the base cases a= 0 and a= 1 are trivial,

and we prove the step case (a+ 1)pmod p = (a+ 1) mod p under the induction

hypothesis apmod p =a mod p, where P(p)and a6= 0 are assumed. Then by

the Binomial Theorem (1)

(a+ 1)p=

p

P

i=0 p

iai=p

0a0+p

pap+

p1

P

i=1 p

iai= 1 + ap+

p1

P

i=1 p

iai

hence (a+ 1)pmod p = (ap+ 1) mod p as pdivides Pp1

i=1 p

iai. Therefore

(a+ 1)pmod p = (a+1) mod p as (ap+1) mod p = (a+1) mod p follows directly

from the induction hypotheses, and (2) is proved. Now if pdoes not divide a,

division by aat both sides of the equation in (2) yields Fermat’s statement

ap1mod p = 1.

To obtain a formal proof, we de…ne a procedure

function S(m:N,n:N,h:N,x:N) : N<=

if h 1

then if m = 0

then 0

else S((m), n, h, x) + n _(m - h) x^m

end_if

end_if

5x6= 0 is demanded in (1) as otherwise 00has to be de…ned.

6P(p)denotes that pis prime.

5

such that S(m; n; h; x)computes Pm

i=1 n

ihxiif h1, and formulate the

auxiliary lemmas 7

8m; n; h; x:Nh1^x= 0 !

m

P

i=1 n

ihxi= 0 (3)

8m; n; x; y:Nm

P

i=1 n

ixix+x+y=m+1

P

i=1 n

i1xi+y(4)

8m; n; x:Nn6= 0 !

m

P

i=1 n1

i1xi+

m

P

i=1 n1

ixi=

m

P

i=1 n

ixi(5)

which the system proves by induction upon mwith the help of one user inter-

action for (5).

Obviously, addition of yin lemma (4) can be cancelled, and the system

does so in fact before starting the induction proof. However, the equation in

(4) is embedded into a (right) context : : : +y=: : : +yfor supporting equality

reasoning: XeriFun uses AC(I)-Matching and conditional term rewriting, where

the orientation of equations is heuristically established [12]. Now consider e.g. the

associative operator +, some left-to-right orientated equation (E)8x:Nf(x) +

f(x) = g(x)and a term (T)a+ (f(b)+(f(b) + c)) . To enable the use of (E),

associativity given by 8x; y; z:N(x+y)+z=x+(y+z)has to be applied interacti-

vely in (T). This yields (T0)a+((f(b)+f(b))+c)and then the system uses (E)for

computing (T00)a+ (g(b) + c). This interaction could be avoided if the system

would consider all rearrangements of (T)modulo associativity for …nding a redex

for (E). However, the set of all associative rearrangements of a term tgrows

exponentially with the number of +-arguments in t, so this approach is unfeasible

as it would slow down system performance considerably. We therefore sometimes

formulate an equation in form of (E0)8x; y:Nf(x)+(f(x) +y) = g(x)+yinstead

of (E)as this enables an automatic replacement of (T)by (T00). The use of (E0)

is foolproof in the sense that it applies automatically to an initially given (T0)

as well. Since associativity is used as a left-to-right oriented equation too, an

equation embedding into a left context is needless. The embedding “trick” is

applicable to all associative operators which are right cancelable, but is very

rarely used (e.g. lemmas (4) and (7) are the only examples in this paper).

But there is another “trick” which is used more often. Lemma (3) provides

an example as it uses a variable xinstead of the constant 0at the left-hand

side of the equation. Abstracting a term by a variable and making the binding

explicit by a condition supports automated equality reasoning as well: Consider

e.g. the library lemma (L)8x:Nxx= 0 and some proof obligation (P)

a=b!c=ab. The system orients the equation in (L)from left to right, but

it cannot be used as xxdoes not match ab. We therefore use (L0)8x; y:N

x=y!xy= 0 instead of (L). Now xymatches ab, and the Symbolic

Evaluator simpli…es (P)to a=b!(a=b!c= 0) ^(a6=b!c=ab)

7Here we use mathematical notation like Pm+1

i=1 n

i1xiinstead of S(m+ 1; n; 1; x)

for sake of readability.

6

by using (L0)in an intermediate step and to a=b!c= 0 in the next step.

The use of (L0)is foolproof as an unconditional proof obligation (P0)c=aa

is replaced by (a=a!c= 0) ^(a6=a!c=aa)and to c= 0 in turn.

However, a conditional equation is only applied if the resulting conditions can

be established in fact, and therefore (L0)fails on (P00)c=ab.

We continue with a variant

8x; n:N(x+ 1)n= 1 +

n

P

i=1 n

ixi(6)

of the Binomial Theorem (1) which is enough for our purposes and is automati-

cally proved with the use of lemmas (3) (5).8

Next we prove some facts about the binomial coe¢ cients, viz.

8m; n; k:Nnk!n

k(k)! (nk)! m= (n)! m(7)

8n; k:Nnk!n

k= (n)!=((k)! (nk)!) (8)

8n; k:Nnk!(k)! (nk)! j(n)! (9)

which require 6user interactions in total. The proof of lemma (7), which is used

to prove (8) and (9) with …rst-order reasoning only, is by induction upon n.

Using the latter two lemmas as well as the library lemmas

8n; x:N P(n)^n>x!n-(x)!

8n; x:N P(n)^n-x!gcd(n; x)=1

8x; y; z:Nz6= 0 ^zjxy^gcd(x; z)=1!zjy

the system proves

8p; j:N P(p)^0< j < p !pjp

j(10)

by …rst-order reasoning with the help of 7HPL-proof rule suggestions. Finally,

lemma

8p; j; a:N P(p)^j < p !pj

j

P

i=1 p

iai(11)

is proved by induction upon j, where the system has to be instructed to use

lemma (10) as well as the library lemma

8x; y; z:Nz6= 0 ^zjx!(y+x)mod z =y mod z . (12)

We now have all prerequisites available to prove lemma (2): We call the

system to use Peano Induction upon a, and the system responds by proving the

8The use of (6) is not a “hack” for supporting the system, as lemma (1) has an

automatic proof as well. But it is good practice in mathematics to formulate proofs

as simple as possible, and this principle applies to formal reasoning in particular

(saving a lemma in the present case).

7

base case. For proving the step case,9we instruct the system to replace aat the

left-hand side of the equation with (a1) + 1 and the system responds by using

the Binomial Theorem (6) to replace ((a1) + 1)pyielding

P(p)^a6= 0 !(1 +

p

P

i=1 p

i(a1)i)mod p =a mod p .(13)

Then we command the system to unfold procedure Sand it computes

P(p)^a6= 0 !(1 + (a1)p+

p1

P

i=1 p

i(a1)i)mod p =a mod p (14)

which in turn is replaced with

P(p)^a6= 0 !((a1)p+

p1

P

i=1 p

i(a1)i)mod p = (a1) mod p (15)

after a call for using library lemma

8x; y; z; n:Nz6= 0 ^(xy)mod z !(x+ny+n)mod z

(with 1substituted for n).10 Finally, we instruct the system to use lemma (12)

for replacing (15) with

P(p)^a6= 0 !(a1)pmod p = (a1) mod p . (16)

The system then responds by discharging the resulting proof obligation

pjPp1

i=1 p

i(a1)iwith lemma (11) and uses the induction hypothesis for

replacing (a1)pmod p in (16) with (a1) mod p, so that lemma (2) is proved.

Finally we prove Fermat’s Little Theorem

8p; a:N P(p)^p-a!a^(p1) mod p = 1 (17)

by instructing the system to use lemma (2) and to unfold procedure ^yielding

P(p)^p-a^a^(p1) a mod p =a mod p !a^(p1) mod p = 1 . (18)

Then we use the library lemma

8n; x; y; z:N P(n)^n-z^(xzyz)mod n !(xy)mod n (19)

(with a^(p1) substituted for x,1for y,afor zand pfor n) for replacing

a^(p1) mod p at the left-hand side of the equation in (18) with 1mod p which

then simpli…es to 1, thus proving Fermat’s Little Theorem.

9Instead of constructor induction with a step case 8n:N'(n)!'(n+ 1),XeriFun

uses destructor induction with a step case 8n:Nn6= 0 ^'(n1) !'(n)as the latter

is more general. For instance, there is no constructive counterpart for the induction

schema obtained from procedure <of Section 1.

10 As usual, (xy)mod z stands for x mod z =y mod z.

8

2.2 The Pigeon Hole Principle

The proof by the Binomial Theorem is rather technical in the sense that it pro-

vides more insight into the nature of binomial coe¢ cients than into the nature of

primes. We used 15 lemmas for the proof, where 10 are related to binomial coef-

…cients or to basic arithmetic, but only 5lemmas were related to some properties

of primes. In this section we present another proof of Fermat’s Theorem which

will illustrate more interesting properties of primes. And di¤erent to the proof of

Sec. 2.1, the mathematical concepts used in this proof later can be generalized

to prove Euler’s Theorem, which is a generalization of Fermat’s Little Theorem.

The key fact used for the alternative proof is that a permutation of the

number list (p1;:::;1) is obtained if each element in that list is replaced

by the residue mod p of its product with a, provided that prime pdoes not

divide a. For p= 5 and a= 8, for instance, we obtain (32;24;16;8) by mul-

tiplying each element in (4;3;2;1) with 8, and replacement of these list el-

ements by their residue mod 5yields (2;4;1;3), i.e. a permutation of the

initial list. Now we conclude Qp1

i=1 i=Qp1

i=1 (ai mod p)from (p1;:::;1)

(a(p1) mod p; : : : ; a 1mod p). Hence (Qp1

i=1 i)mod p = (Qp1

i=1 (ai mod p))

mod p = (Qp1

i=1 ai)mod p = (ap1Qp1

i=1 i)mod p, because the inner mod’s

in the product can be cancelled. Finally we divide both sides of the resulting

equation by Qp1

i=1 iso that 1 = ap1mod p is obtained.

For proving the permutation property we use the Pigeon Hole Principle (21),

which is formulated in [3] for verifying Fermat’s statement, by proving

8k:list[N]; n:Nk6=ø^purged (k)^nDk^0=2k^jkj=n!max(k) = n(20)

8k:list[N]; n:Npurged (k)^nDk^0=2k^ jkj=n!(n : : : 1) k. (21)

Here purged (k)is true i¤ list kdoes not contain multiple occurrences of iden-

tical elements, nDkholds i¤ nis an upper bound of the number list k, and

(n:::m)computes the number list (n; n 1; : : : ; m)if nmand the empty list

øotherwise. The system computes a proof of (20) after being called to use the

induction axiom given by procedure <from Section 1. Thus the hypothesis k6=

øand the induction hypothesis

8n0:Nknmax(k)6=ø^purged (knmax(k)) ^n0Dknmax(k)^

0=2knmax(k)^ jknmax(k)j=n0!max(knmax(k)) = n0

is obtained which is useful as nis replaced by n1in the induction conclusion

and then the induction hypothesis can be applied because list k(being purged )

has exactly one maximal element and therefore nDkentails n1Dknmax(k).

The proof of (21) is by induction upon nyielding the hypothesis n6= 0 and the

induction hypothesis

8k0:list[N]purged (k0)^n1Dk0^0=2k0^ jk0j=n1!(n1: : : 1) k0.

The system now instantiates k0in the induction hypothesis with knnand com-

pletes the proof after being instructed to replace knnwith knmax(k)by using

lemma (20).

9

Next it has to be proved that the prerequisites of lemma (21) are satis…ed for

a number list (a(p : : : 1) MOD p), where list mkis obtained by multiplication

of each i2kwith mand list (k MOD n)emerges from kby replacing each

i2kwith i mod n. Using these de…nitions, the system proves the lemmas

8p; m; n; a:N P(p)^p>m>n^p-a!(ma mod p)=2(a(n : : : 1) M OD p)

(22)

8p; n; a:Np6= 0 !p1D(a(n : : : 1) MOD p)(23)

8p; n; a:N P(p)^p>n^p-a!0=2(a(n : : : 1) MOD p)(24)

8p; n; a:N P(p)^p>n^p-a!purged((a(n : : : 1) MOD p)) (25)

by induction upon nusing some basic number theoretic lemmas imported from

the library and requiring 8user interactions in total. With these results we prove

8p; a:N P(p)^p-a!(p1: : : 1) (a(p1: : : 1) MOD p)(26)

by instructing the system to use the Pigeon Hole Principle of lemma (21) with

p1substituted for nand (a(p1: : : 1) MOD p)substituted for k. The

system then completes the proof by discharging the resulting proof obligations

using the auxiliary lemmas (22) (25) as well as the library lemmas

8k:list[N]; n:Nj(k MOD n)j=jkj(27)

8k:list[N]; n:Nj(nk)j=jkj(28)

8n:Nj(n : : : 1)j=n. (29)

Next we prove the auxiliary lemma

8p; a :N P(p)^p-a!(a^(p1)((p1: : : 1)) ((p1: : : 1))) mod p (30)

(where (k)computes the product of the numbers in list k) by calling the

system to replace ((p1: : : 1)) at the right-hand side of the equation with

((a(p1: : : 1) MOD p)) which is justi…ed by lemma (26) and the library

lemma

8k; l:list [N]kl!(k) = (l). (31)

The system then responds by using the library lemma

8k:list[N]; n:Nn6= 0 !(k MOD n)mod n = (k)mod n (32)

for replacing ((a(p1: : : 1) MOD p)) mod p with (a(p1: : : 1)) mod p

which in turn is replaced by (a^(p1) ((p1: : : 1))) mod p by using the

library lemmas (29) and

8k:list[N]; n:Nn6= 0 !(nk) = n^jkj (k). (33)

10

This completes the proof of (30) (with the help of 2user interactions) as now

both sides of the equation are identical. Fermat’s Little Theorem (17) now is

easily proved by calling the system to use the instance

P(p)^p-((p1: : : 1))

^(a^(p1) ((p1: : : 1)) 1((p1: : : 1)))) mod p

!(a^(p1) 1) mod p

of library lemma (19). The system then uses lemma (30) for simplifying the proof

goal by symbolic evaluation to

P(p)^p-a!(p-((p1: : : 1)) _a^(p1) mod p = 1) .

Finally, it uses the library lemma

8n; x:N P(n)^n>x!n-((x : : : 1)) (34)

to establish p-((p1: : : 1)), and Fermat’s Theorem is proved with one user

interaction.11

2.3 An Easier Proof of Fermat’s Statement

Unfortunately, the Pigeon Hole Principle cannot be used for number lists which

do not consist of all numbers in an interval [n; : : : ; m]. But the permutation

property need to be proved for those lists as well, in particular when proving

Euler’s generalization of Fermat’s statement. Therefore we had to develop an-

other approach for proving permutation when considering Euler’s Theorem, and

this approach also yields an easier proof of Fermat’s Theorem: Instead of the

Pigeon Hole Principle (21) we use the library lemma

8k; l:list [@I]kl^ jkj=jlj ! kl(35)

for proving permutation. denotes the subbag relation which does not consider

the order of list elements, but is sensitive to the number of element occurrences

when comparing lists. For instance, (3;2) (2;2;3) but not vice versa.

Instead of the lemmas (23) (25) we now prove

8p; n; a:N P(p)^p>n^p-a!(a(n : : : 1) MOD p)(p1: : : 1) (36)

which requires auxiliary lemma (22). Both proofs are by induction upon nand

need 7hints to the system for completing the proofs as well as the presence

11 As XeriFun does not provide a data structure set, we had to use lists in our formal

proof. If sets were present, purged(: : :)had to be replaced by true and by =in the

lemmas, so that the lemmas (25) and (31) would become needless. This means that

the use of lists does not complicate the proof, as only 1user interaction for proving

(25) and 2user interactions for proving (31) could be saved. In the subsequent proofs

of sections 2.3 and 3, only lemma (31) would become needless, reducing the savings

even more.

11

Proc. Lem. Rules User Sys. % Steps mm:ss

Fermat (Bin Th) 12 81 397 60 337 84;9 6:324 01:28

Fermat (PHP) 21 102 454 57 397 87;4 7:818 00:42

Fermat 18 87 394 48 346 87;8 6:196 00:32

Fermat (Euler) 20 102 473 62 411 86;9 7:495 00:35

Euler 17 93 433 49 384 88;7 6:601 00:30

Wilson 27 157 751 128 623 83;0 14:090 04:19

Wilson (PHP) 27 155 753 134 619 82;2 14:629 05:09

Fig. 2. Proof Statistics

of library lemma (19). Now instead of the Pigeon Hole Principle, we instruct

the system to use lemma (35) for proving the permutation property (26). The

system then completes the proof by discharging the resulting proof obligations

j(p1: : : 1)j=j(a(p1: : : 1) MOD p)jand (a(p1: : : 1) M OD p)

(p1: : : 1) with (27),(28) and (36). We now continue as before and obtain an

easier proof.

Fig. 2 illustrates the total e¤ort required for proving Fermat’s Little Theo-

rem (including all procedures and lemmas which had to be imported from the

library). Column Proc. gives the number of user de…ned procedures, Lem. is the

number of user de…ned lemmas, and Rules counts the total number of HPL-

proof rule applications, separated into user invoked (User) and system initiated

(Sys.) ones. Column %gives the automation degree, i.e. the ratio between Sys.

and Rules,Steps lists the number of …rst-order proof steps performed by the

Symbolic Evaluator and mm:ss displays the runtime of the Symbolic Evaluator

needed for a case study.12

Row Fermat (Bin Th) gives the values when using the Binomial Theorem,

row Fermat (PHP) displays the values when using the Pigeon Hole Principle and

the row below shows the statistics for the easier solution. As the numbers reveal,

the easier proof requires the least e¤ort for the user as well as for the system

and the proof by the Binomial Theorem is most costly. It is striking that the

runtime for the easier proof is only a third and the runtime for the proof using the

Pigeon Hole Principle is less than the half of the runtime spend when using the

Binomial Theorem, although the number of proof steps is almost identical or even

higher. We suspect that the latter proof is most costly because more arithmetic

is involved thus becoming more expensive. In contrast, the proofs using lists

avoid some arithmetic reasoning in favour of reasoning about lists which is less

complicated, thus easing the proofs. And …nally, the easier proof requires least

e¤ort as it avoids certain lemmas (and proof steps in turn) necessitated by the

Pigeon Hole Principle.

12 Time refers to running XeriFun 3.5 under Windows 7 Enterprise with an INTEL

Core i7-2640M 2.80 GHz CPU using Java 1.8.0_45.

12

3 Euler’s Theorem

Euler’s Theorem generalizes Fermat’s Little Theorem by considering also non-

prime moduli. It states that a(n)mod n = 1 for numbers arelatively prime to

some n2, where denotes Euler’s phi -function (also called totient-function in

the literature). The mathematical concept used in the proof of Euler’s Theorem

is called a reduced residue system modulo n, where n6= 0 is demanded. This is

the set of all numbers from n1to 1being relatively prime to n. We represent

this set by a list rrs(n)and de…ne the phi-function by the length of this list,

i.e. (n) := jrrs(n)j. Since rrs(p) = (p1;:::;1) and therefore (p) = p1

for prime numbers p, Euler’s Theorem is a true generalization of Fermat’s Little

Theorem.

Similar to the proof of Fermat’s Little Theorem, the key fact used in the proof

of Euler’s Theorem is that a permutation of rrs(n)is obtained if each element

in that list is replaced by the residue mod n of its product with a, provided a

is relatively prime to n. E.g., for n= 10 and a= 7 we obtain (63;49;21;7)

by multiplying each element in rrs(10) = (9;7;3;1) with 7, and replacement of

these list elements by their residue mod 10 yields (3;9;1;7), i.e. a permutation

of rrs(10). Now we conclude (rrs(n)) = ((arrs(n)M OD n)) from rrs(n)

(arrs(n)MOD n), and therefore (rrs (n)) mod n = ((arrs (n)M OD n))

mod n = (arrs (n)) mod n = (ajrrs(n)j(rrs (n))) mod n, because M OD n

in the product can be eliminated by (32). Finally we divide both sides of the

resulting equation by (rrs(n)) so that 1 = a(n)mod n eventually is obtained.

For proving Euler’s Theorem, we de…ne a procedure rrs (n; y)by recursion

upon y, viz.

function rrs(n:N, y:N) : list[N]<=

if n >y

then if y = 0

then ø

else if gcd(n, y) = 1

then y :: rrs(n, (y))

else rrs(n, (y))

end_if

end_if

else ø

end_if

which computes a partial reduced residue system modulo n, i.e. a list of minimal

length containing all numbers from fy; : : : ; 1gbeing relatively prime to n.

Now we formulate the required lemmas in terms of rrs (n; y )which allows

us to prove them by induction upon y. By instantiating ywith n1, the wanted

propositions about rrs (n)are obtained afterwards as rrs(n) = rrs(n; n 1). In

particular, the main statement

8n; y; a:Ngcd(n; a)=1!(arrs (n; y )MOD n)rrs (n; n 1) (37)

13

about rrsis required in the sequel which necessitates the auxiliary lemmas

8n; y:Nn>y1!rrs (n; y )6=ø

8n; x; y:Nx>y!x =2rrs(n; y )

8n; x; y:Nn>yx1^gcd(n; x)=1!x2rrs(n; y)

8n; x; y; a:Nn>x>y^gcd(n; a)=1!(ax mod n)=2(arrs (n; y)M OD n).

All proofs are by induction upon yand require 8user interactions in total. The

library lemma

8x; y; z:Ngcd(x; y) = 1 !gcd(x; y z) = gcd(x; z)(38)

is required in particular for proving (37) and is needed as well for the proof of

(42) subsequently. For proving the permutation property

8n; a:Nn6= 0^gcd(n; a)=1!(arrs (n; n1) M OD n)rrs(n; n1) (39)

we call the system to use library lemma (35). It then completes the proof by

discharging the resulting subgoals (arrs(n; n 1) MOD n)rrs(n; n 1)

and jrrs(n; n 1)j=j(arrs (n; n 1) M OD n)jwith the library lemmas

(27) and (28) and the auxiliary lemma (37) just proved. Next we prove the

auxiliary lemma

8n; a:Nn6= 0 ^gcd(n; a)=1

!((rrs(n; n 1)) a^jrrs (n; n 1)j (rrs (n; n 1))) mod n (40)

by calling the system to replace (rrs(n; n 1)) at the right-hand side of the

equation with ((arrs(n; n 1) M OD n)) which is justi…ed by the lemmas

(39) and (31). The system then responds by using the library lemmas (32) and

(33) for replacing ((arrs(n; n 1) M OD n)) mod n at the right-hand side

with a^jrrs(n; n 1)j (rrs (n; n 1)) mod n. This completes the proof of

(40) (with the help of 2user interactions) as now both sides of the equation are

identical. With the library lemma

8n; m:Ngcd(m; n)=1^(mxmy)mod n !(xy)mod n (41)

and the auxiliary lemma (with an automated induction upon ybut needing one

interactive case analysis)

8n; y:Ngcd((rrs (n; y )); n)=1 (42)

we now have all required statements available for proving Euler’s Theorem

8n; a:Nn2^gcd(n; a)=1!a^(n)mod n = 1 .

14

Classes 12 11 10 9 8 7 6 5 4 3 2 1

=13 12 6 4 3 5 2 11 8 10 9 7 1

sort —11 6 10 4 9 3 8 5 7 2 —

Fig. 3. Inverses of the residue classes 6= 0 modulo prime 13

We call the system to use the instance

gcd((rrs (n; n 1)); n)=1

^((rrs(n; n 1)) a^jrrs (n; n 1)j (rrs (n; n 1)) 1) mod n

!(a^jrrs(n; n 1)j 1) mod n

(43)

of library lemma (41), and the system then responds by using lemma (40) for

simplifying the proof goal to

n2^gcd(n; a)=1!(gcd((rrs (n; n 1)); n)=1_a^(n)mod n = 1) .

Finally, it applies auxiliary lemma (42) to establish gcd((rrs(n; n 1)); n) = 1,

and Euler’s Theorem is proved with one user interaction.

Fig. 2 shows the e¤ort for proving Euler’s Theorem which is similar to the

e¤ort needed for the (easier) proof of Fermat’s Little Theorem. This seems to be

an odd result as Euler’s Theorem implies Fermat’s statement, hence one might

expect more work. But we have to reason about primes when proving Fermat’s

Theorem (what is not required for Euler’s Theorem) causing additional work.

And in fact, when proving Fermat’s Theorem as a corollary of Euler’s Theorem,

certain lemmas about primes are needed in addition raising a greater e¤ort than

the direct proof of Fermat’s Little Theorem, cf. row Fermat (Euler) in Fig. 2.

4 Wilson’s Theorem

Wilson’s Theorem states that (p1)! mod p =p1for each prime p.13 The

key observation for proving this theorem is that the residue classes modulo a

prime p(minus the class for 0) form a group.14 For our purposes it is enough

to consider the numbers in fp1;:::;1gas canonical representatives of the

residue classes. The group operation is given by multiplication mod p,1is the

neutral element and the inverse =p(x)of some x2 fp1;:::;1gis de…ned by

xp2mod p. Figure 3 gives an example for p= 13, and we …nd, for instance,

8 =13(8) mod 13 = 8 5mod 13 = 40 mod 13 = 1.

Two facts about =pare essential for proving Wilson’s Theorem: If p-xthen

13 pbeing prime is not only su¤cient but also necessary for the statement to hold as

(n1)! mod n 6=n1for all non-primes n6= 1. See [1] for a computer assisted

proof.

14 As it happens, the residue classes modulo a prime form a …eld, however this more

general fact is not needed here.

15

=p(=p(x)) = x mod p, therefore =pis injective in fp1;:::;1gand conse-

quently list (=p(p1);:::;=p(1)) is a permutation of list (p1;:::;1);

1and p1represent the only residue classes inverse to themselves, i.e.

=p(x) = xentails x mod p = 1 or x mod p =p1and =p(1) = 1 as well as

=p(p1) = p1. Consequently list (=p(p2);:::;=p(2)) is a permutation

of list (p2;:::;2).

The following statements about =p(written as I(x; p)in the lemmas) are

required for a formal proof of Wilson’s Theorem:

8p; x:N P(p)^p-x!I(I(x; p); p) = x mod p (44)

8p; x:N P(p)^p-x!xI(x; p)mod p = 1

8p; x:N P(p)^p-x!I(x; p)6= 0

8p; x:N P(p)^p-x^I(x; p) = x!(x mod p = 1 _x mod p =p1)

8p; x:N P(p)!I(x; p)mod p =I(x; p)

8p; x; y:N P(p)^xy mod p = 1 !x mod p =I(y; p)

8p; x:N P(p)^x mod p = 1 !I(x; p)=1

8p; x:N P(p)^x mod p =p1!I(x; p) = p1

8p; x:N P(p)^p-x^I(x; p)=1!x mod p = 1

8p; x:N P(p)^p-x^I(x; p) = p1!x mod p =p1

8p; x:N P(p)^p2x!p2I(x; p).

All lemmas are proved by …rst-order reasoning requiring 26 user interactions

in total, where Fermat’s Little Theorem (17), lemma (19) and the library lemmas

8p; x; y:N P(p)^pjx^y!pjx

8x; y; z:Nz6= 0 !x(y mod z)mod z =xy mod z (45)

8x; y; z:Ny6= 0 ^z6= 0 !(x mod z)^y mod z =x^y mod z

8p; x:N P(p)^x2mod p = 1 !x mod p = 1 _x mod p =p1

8n; x:Nn2^(x mod n = 1 _x mod n =n1) !x2mod n = 1

have been used.

Wilson’s Theorem obviously is true for p= 2 and p= 3, so let us assume

p5. Then list k:= (p2;:::;2) has (even) length p32and by the facts

about =pgiven above we may de…ne a permutation ksort of list kby (a1;=p(a1);

:::;an;=p(an)), where n:= (p3)=2. The row labelled with sort in Fig. 3

gives an example of such a list. Now we calculate Qn

i=1(ai =p(ai)mod p)=1,

because ai=p(ai)mod p = 1, hence (ksort )mod p = (Qn

i=1 ai=p(ai)) mod p =

(Qn

i=1(ai =p(ai)mod p)) mod p = 1 as the inner mod’s in the product can be

cancelled by (45). Therefore (p2)! mod p = (k)mod p = (ksort)mod p = 1,

16

because kksort entails (k) = (ksort ). Finally, we calculate (p1)! mod p =

(p1)(p2)! mod p = (p1)((p2)! mod p)mod p = (p1) 1mod p =p1

by using (45) and (p2)! mod p = 1, and Wilson’s Theorem is proved.

For de…ning the above number list ksort, we use a procedure sort given by

function sort(k:list[N], p:N) : list[N]<=

if k = ø

then ø

else hd(k) :: I(hd(k), p) :: sort(tl(k) nI(hd(k), p), p)

end_if

and aim to prove

8p:N P(p)!(p2: : : 2) sort((p2: : : 2); p)(46)

by using library lemma (35). To this e¤ect we formulate the lemmas

8k:list[N];n:Nksort(k; n)(47)

8p:N P(p)! j(p2: : : 2)j=jsort((p2: : : 2); p)j(48)

to verify the subgoals resulting from the use of (35). An induction proof of

(47) is easily computed, but a proof of (48) is more challenging. This is be-

cause the permutation property is implicitly present in the length requirement:

If j(p2: : : 2)j<jsort((p2: : : 2); p)j, then I(x; p)=2(p2: : : 2) for some

x2(p2: : : 2) and consequently (46) must be false.

To prove the length requirement (48), we modify the Pigeon Hole Principle

(21) of Section 2.2 for our purposes and assert

8k:list[N];n:Npurged(k)^n6= 0^2Ek^nDk^ jkj n1! jkj=n1(49)

where mEkmeans that mis a lower bound of k. Having instructed the system to

use the knmax(k)-induction we had chosen for proving lemma (20) in Section 2.2,

the system automatically completes the proof. Now in order to use lemma (49)

for proving (48), we have to verify that the requirements of (49) are satis…ed if

sort(k; p)is substituted for kand p2for n. This is expressed by the lemmas

8k:list[N];p:N P(p)^p2Dk!p2Dsort(k; p)(50)

8k:list[N];p:N P(p)^2Ek^p2Dk!2Esort(k; p)(51)

8k:list[N]; n;p:N P(p)^p1Dk^n2sort(k; p)!(n2k_I(n; p)2k)(52)

8k:list[N];p:N P(p)^purged(k)^2Ek^p2Dk!purged(sort(k; p)) . (53)

All these lemmas are proved by induction corresponding to the recursion struc-

ture of procedure sort and with an extensive use of the above lemmas about

I(x; p), where 13 user interactions are required to obtain the proofs.

The use of (50) –(53) with (p2: : : 2) substituted for kcreate further

proof obligations. For instance, p2D(p2: : : 2) has to be proved if p2D

17

sort((p2: : : 2); p)shall be veri…ed with lemma (50). But these proofs are

trivial having the obvious and easily veri…ed library lemmas like e.g. 8h; m; n:N

hn!hD(n:::m)at hand.

The length requirement (48) now is proven by pure …rst-order reasoning

(requiring 5user interactions) by calling the system to use (49) with p2

substituted for nand sort((p2: : : 2); p)for k. The system then discharges the

resulting proof obligations by the just proven lemmas (47) ;(50) –(53) and the

library lemma 8k; l:list[@I]kl! jkj jljfor discharging the proof obligation

jsort((p2: : : 2); p)j p3. Now for proving the permutation requirement (46),

it is enough to instruct the system to apply (35). The system then completes

the proof by discharging the resulting proof obligations using (47) and (48).

For modeling the pairwise multiplication of list elements modulo pwe use

procedure 2from Fig. 1 and command the system to prove the lemmas

8k:list[N];p:N P(p)^2Ek^p2Dk!2(sort(k; p); p)=1 (54)

8k:list[N];n:Nn6= 0 !(2(k; n)(k)) mod n (55)

which requires 2user interactions for the proof of (54) and the presence of the

library lemma

8x; y; z; n:Nz6= 0 ^(xy)mod z !(nxny)mod z

for the proof of (55). Next we prove the auxiliary lemma

8p:N P(p)!(p2)! mod p = 1 mod p (56)

by calling the system to use lemma (54) for replacing 1at the right-hand side

of the equation with 2(sort((p2: : : 2); p); p). The system then responds by

discharging the resulting proof obligations 2E(p2: : : 2) and p2D(p2: : : 2)

and applies lemma (55) yielding (p2)! mod p = (sort((p2: : : 2); p))

mod p. Then we instruct the system to use library lemma (31) to replace

(sort(( p2: : : 2); p)) by (p2: : : 2) which is justi…ed by lemma (46). Sub-

sequently the system replaces (p2: : : 2) by (p2)!, and lemma (56) is proved

with the support of 3user interactions as identical terms are obtained on both

sides of the equation.

We now have all prerequisites available to prove Wilson’s Theorem

8k:list[N];p:N P(p)!(p1)! mod p =p1.

Having called the system to unfold procedure !and to use lemma (45), we obtain

(p1) ((p2)! mod p)mod p at the left-hand side of the equation. The system

then responds by applying lemma (56) yielding (p1) (1 mod p)mod p which

in turn simpli…es to p1by the de…nitions of and mod. As both sides of the

equation now are identical, Wilson’s Theorem is proved with the support of 2

user interactions.

18

Fig. 2 displays the e¤ort required for proving Wilson’s Theorem. As list

(p2: : : 2) consists of consecutive numbers, a slight generalization of the Pigeon

Hole Principle (21) where requirement jkj=nis replaced by jkj ncan be used

alternatively for proving the permutation requirement (46). The statistics for

this proof are displayed in the row labelled Wilson (PHP). Both solutions use

the proof of Fermat’s Little Theorem as illustrated in Section 2.3 and their costs

di¤er not as much as they do for the Fermat proof. This is because the Pigeon

Hole Principle is used in both proofs of Wilson’s Theorem, either in form of our

modi…cation (49) or in its slightly generalized form.

5 Related Work

The theorems considered here attracted mathematicians already from the …rst

days of Number Theory, and several proofs based on di¤erent mathematical

concepts had been published over the centuries (see [6] for a presentation and

comparison of the variety of methods that were used to prove these theorems

and their generalizations). Likewise, these proofs attracted developers and users

of interactive proof systems, proof assistants or proof checkers respectively, to

challenge their systems by redoing the proofs.

The proofs performed with the interactive proof assistant Coq [13] uses the

theory of cyclic groups and …nite rings to prove Euler’s Theorem, and proof

scripts for Wilson’s and Fermat’s Little Theorem exists as well. Euler’s Theorem

is proved with the interactive proof assistant HOL Light [14], and Fermat’s The-

orem then is proved as a consequence of the Euler theorem. Another proof uses

the Binomial Theorem for proving Fermat’s statement. The proof of Wilson’s

Theorem is based on quadratic residues.

Di¤erent proofs are also computed with the interactive proof assistant Isa-

belle [15]: One approach uses the theory of unit groups for proving the theorems

of Euler and Wilson, and Fermat’s Little Theorem then is proved as an instance

of Euler’s Theorem. A second approach uses the framework of bijection relations

developed in [7] to prove the theorems of Fermat, Euler and Wilson. Here per-

mutation is shown by proving bijectivity of fa;n(x) := ax mod n where rrs(n)

is the domain of fa;n, prime nas well as n-ais assumed for Fermat’s Little The-

orem and n6= 0 as well as gcd(n; a)=1is assumed for Euler’s Theorem. In case

of Wilson’s theorem, bijectivity of =phas to be proved where f2; : : : ; p 2gis

the domain of =pand pis prime. Also the proof of Wilson’s Theorem presented

in [9] is redone in Isabelle which necessitates similar e¤ort as the bijection

relation approach. However, once this framework has been established, it can be

reused for proving theorems depending on similar permutation properties thus

reducing user e¤ort.

As XeriFun does not provide higher-order logic, such a framework cannot

be formulated. Moreover, interactive proof systems like HOL Light,Coq and

Isabelle support the creation of theory hierarchies, like the theory of …elds

evolve from the theory of rings which in turn uses the theory of groups etc. This

allows the development of a comprehensive mathematical apparatus supporting a

19

user when working on a new proof challenge. As our system lacks such a feature,

we cannot redo proofs based on abstract mathematical concepts directly. For

instance, although we have proved in Section 4 that multiplication mod p,1

and =psatisfy the axioms of a group, we do not have the concept of a group.

Therefore we cannot inherit theorems proved elsewhere about groups, e.g. that

=pis involutory, but must prove such theorems additionally, cf. lemma (44).

Rather than comparable with the interactive proof assistants, our system is

quite similar to the Boyer-Moore induction theorem prover [2][4] in the sense

that mathematical notions are algorithmically de…ned, induction is the main

inference rule, and various heuristics guide the system for automating induction

and for proving the base and step cases of an induction proof by …rst-order means.

The object language of the Boyer-Moore prover does not allow polymorphism,

but uses Lisp which is extended by a principle for de…ning data structures by

constructors and selectors (as we do). When working with the system, a user

prepares a …le with the de…nitions of the procedures, data structures and lemmas

(given as Lisp-expressions) which then is executed by the system. If unsuccessful,

the user analyzes the computed protocol and inserts so-called “hints”into the …le

which then is given to the prover for another try. Some of these hints correspond

to the interactive calls of HPL-proof rules in our system, like the Use-Lemma hint

instructs the system to use a speci…c instance of a certain lemma, the Induction

hint stipulates the induction axiom and the variables to induct upon, etc. Other

user hints like annotations stating how a lemma is to be used in a proof (e.g.

as a rewrite rule) or the Disable feature which excludes procedure calls from

being executed as well as lemmas from being used have no correspondence in

our system as they are implemented by appropriate heuristics.15

Fermat’s Little Theorem has been proved with the Boyer-Moore theorem

prover some time ago [3], and the proof presented in Sec. 2.2 is quite similar

to this proof. Also Wilson’s Theorem has been proved with the Boyer-Moore

prover [9]. This proof is quite similar to our proof in Sec. 4 up to the point

where permutation is to be proved: Instead of reordering the list (p2: : : 2) with

procedure sort, the list of adjacent inverses is computed directly by a procedure

INVERSE.LIST(n; m), where the correspondence to our approach is given by

INVERSE.LIST(p2; p)n1 = sort((p2: : : 2); p)for primes p. Now for proving

permutation, a modi…ed version of the Pigeon Hole Principle is used where requi-

rement jkj=nis replaced by (n : : : 1) k.16 This represents an elegant solution

for proving the permutation property, although the motivation for doing so “This

theorem [i.e. (21)] is not applicable to the problem at hand because it contains

a hypotheses concerning the length of the list.”can be refuted, cf. Section 4.

15 But we may ban the execution of procedure calls in a case term or in an instance

of a lemma upon use of Case Analysis or Use Lemma respectively. We may also use

the HPL-rule Normalization working like Simpli…cation, except that procedure calls

in the goalterm are not executed.

16 denotes the subset relation which neither considers the order of list elements

nor the number of element occurrences when comparing lists. For instance, (3;2)

(2;2;3) and vice versa.

20

When concerned with automated reasoning, the demand for user support

is of particular interest. One criterion is the number of user suggested proof

rule applications for solving a certain theorem proving challenge: The number

of user calls for Induction relative to the number of all induction rule applica-

tions is a measure for the induction heuristic’s quality, the amount of calls for

Apply Equation provides insight into the system’s automated equality reasoning

capability, and the frequency of the other proof rule applications is reciprocally

proportional to the …rst-order theorem prover’s performance. A further measure

is the number of lemmas which the user has to submit to the system for guiding

it to success: A …rst-order provable lemma may be needed in one system, but is

obsolete in another one as a stronger …rst-order prover is available, and a lemma

provable by induction might be required in one system, whereas another system

spots this lemma automatically by generalization, say, or is not needed at all.

However, little is known from the literature about these …gures. It has been

reported in [5] that the induction heuristic of the Boyer-Moore prover has a suc-

cess rate of 79% where 70% of the proven lemmas are …rst-order provable,17

whereas we count an average success rate of 95% for our induction heuristic,

where the share of …rst-order provable lemmas is less than 35% on the aver-

age.18 But unfortunately, the proofs presented in [3] and [9] do not allow a more

detailed comparison with our system in terms of user e¤ort, and the proof scripts

of the proof assistants HOL Light,Coq and Isabelle found on the web also do

not provide insight in this respect. But it seems obvious at least that the solution

illustrated in Sec. 2.3 also eases the proof of Fermat’s statement when using the

Boyer-Moore prover.

6 Conclusion

Proving theorems with pencil and paper is work, and using a reasoning tool in-

stead comes with additional burden. This is because the intention for developing

formal proofs does not allow gaps by omitting “obvious” proof steps. Central

to the proofs of the three theorems considered here is the permutation property

of certain lists, but a search for the respective proofs in textbooks of Number

Theory will be in vain. For instance, it is proved in [8] that (arrs(n)MOD n)

is a reduced residue system modulo n(if ais relatively prime to n), and it is

concluded that “the integers in (arrs (n)M OD n)must be the integers in

rrs(n)in some order.”Although this fact is obvious, the proof for it is not.

A further burden involved with formal proofs is caused by the boundaries of

the used logic. Mathematical notions are algorithmically speci…ed in XeriFun ,

like e.g. a reduced residue system modulo nis de…ned by procedure rrs . This

17 Proofs of 16:000 theorems had been reported in [5], where 3800 were proven by

automated induction and 1000 by user suggested induction.

18 For the case studies in Fig. 2, the success rate of the induction heuristic / the share

of …rst-order provable lemmas ranges from 93% /32% in the Fermat proof using the

Pigeon Hole Principle to 97% /35% when using the Binomial Theorem, and is 94%

96% /31% 33% for the remaining pro ofs.

21

approach allows for powerful heuristics for the proof search, thus avoiding fre-

quent user interactions upon the computation of proofs. However, this bene…t

comes not for free as it may necessitate the invention of non-obvious lemmas

corresponding to the formulation of loop invariants when verifying loops of some

imperative programming language. This e¤ect is noticeable in particular in Num-

ber Theory as certain mathematical concepts cannot be straightly de…ned by

recursion here. For instance, primes cannot be de…ned in terms of primes (like

divisibility can be de…ned in terms of divisibility), but must be de…ned by some

kind of a loop instead. Now for proving a statement about primes, auxiliary

lemmas corresponding to loop invariants might be required which sometimes are

not that easy to spot.19

And …nally, the incidence for user support strongly corresponds to the dif-

…culty of the mathematics present in a case study. While we may achieve an

automation degree up to 100 % in mathematically simple domains, e.g. when

sorting lists, proofs in Number Theory require signi…cantly more user interven-

tions. This is because quite often elaborate ideas for developing a proof are

needed here which are beyond the ability of the heuristics guiding the proof

search. An example is the proof of lemma (38)

8x; y; z:Ngcd(x; y) = 1 !gcd(x; y z) = gcd(x; z)

where we call the system with Apply Equation to replace xin gcd(x; y z)by

xgcd(1; z)and to use the times-gcd distributivity yielding gcd(gcd(x1; x z);

yz).XeriFun responds by using the de…nition and the commutativity of mul-

tiplication as well as the associativity of gcd and the times-gcd distributivity

for computing gcd(x; z gcd(x; y)), and then completes the proof by using the

hypothesis for replacing gcd(x; y)by 1. The idea for the …rst two replacement

steps is crucial for the proof but is non-obvious, at least for a machine. User

interaction is needed in such a case as the system’s heuristics do not support

proof steps of this kind.

References

1. http://verifun.de

2. Boyer, R.S., Moore, J S.: A Computational Logic. Academic Press, New York,

(1979)

3. Boyer, R.S., Moore, J S.: Proof Checking the RSA Public Key Encryption Algo-

rithm. American Mathematical Monthly 91(3), 181–189 (1984)

4. Boyer, R.S., Moore, J S.: A theorem prover for a computational logic (Keynote

Address). Proc. 10th Intern. Conf. on Automated Deduction (CADE-10), vol. 449

of Lecture Notes in Comp. Science, pp. 1–15, Kaiserslautern, (1990)

5. Boyer, R.S., Moore, J S.: On the Di¢ culty of Automating Inductive Reasoning.

Remarks made at a CADE-11 workshop on inductive reasoning, Saratoga Springs,

(1992) (available from the web)

19 See e.g. the de…nition of primes and the required loop invariant called prime1.basic

in [2].

22

6. Dickson, L.E.: History of the Theory of Numbers, Vol 1: Divisibility and Primality.

Carnegie Institution of Washington, Publication No. 256, Washington (1919)

7. Rasmussen, T.M.: An Inductive Approach to Formalizing Notions of Number The-

ory Proofs. Computer Mathematics: Proc. of the 5th Asian Symposium (ASCM

2001), Matsuyama, Japan, 131–140 (2001)

8. Rosen, K.H.: Elementary Number Theory and Its Applications, 5th edn. Pearson

Addison Wesley, Boston (2005)

9. Russino¤, D.M.: An Experiment with the Boyer-Moore Theorem Prover: A Proof

of Wilson’s Theorem. Journal of Automated Reasoning 1(2), 121–139 (1985)

10. Walther, C.: On Proving the Termination of Algorithms by Machine. Arti…cial

Intelligence 71(1), 101–157 (1994)

11. Walther, C., Schweitzer, S.: Veri…cation in the Classroom. Journal of Automated

Reasoning - Special Issue on Automated Reasoning and Theorem Proving in Ed-

ucation 32(1), 35–73 (2004)

12. Walther, C., Schweitzer, S.: A Pragmatic Approach to Equality Reasoning. Techni-

cal Report VFR 06/02, Technische Universität Darmstadt, 1–19 (2006) (available

from [1])

13. https://coq.inria.fr

14. https://github.com/jrh13/hol-light

15. https://isabelle.in.tum.de

23