Technical ReportPDF Available

Fermat, Euler, Wilson - Three Case Studies in Number Theory



[Accepted for publication in Journal of Automated Reasoning. The fi…nal publication is available at Springer via] We report on computer assisted proofs of three theorems from Number Theory, viz. Fermat's Little Theorem, Euler's generalization of Fermat's statement and Wilson's Theorem. Common to the formal proofs is that permutation of certain number lists has to be proved, which causes the main effort in the development. We give a short survey of the VeriFun system used in this experiment and illustrate the proofs before presenting them formally. We also discuss alternative solutions, report on the required effort and conclude with some experiences gained from this experiment.
Fermat, Euler, Wilson -
Three Case Studies in Number Theory
Christoph Walther and Nathan Wasser
Fachbereich Informatik
Technische Universität Darmstadt ?
Abstract. We report on computer assisted proofs of three theorems
from Number Theory, viz. Fermat’s Little Theorem, Euler’s generaliza-
tion of Fermat’s statement and Wilson’s Theorem. Common to the for-
mal proofs is that permutation of certain number lists has to be proved,
which causes the main e¤ort in the development. We give a short survey
of the XeriFun system used in this experiment and illustrate the proofs
before presenting them formally. We also discuss alternative solutions,
report on the required e¤ort and conclude with some experiences gained
from this experiment.
1 Introduction
Case studies and experiments with reasoning systems demonstrate the state of
the art and provide a valuable source for comparing systems and their further
development. In this paper, we report on computer assisted proofs of three the-
orems found in the …rst chapters of textbooks of Number Theory, viz. Fermat’s
Little Theorem, Euler’s generalization of Fermat’s statement and Wilson’s The-
orem. Common to these proofs is that permutation of certain number lists has
to be shown. This causes the main e¤ort in developing the formal proofs instead
of (contrary to what some might expect) the work of proving “deep” number
theoretic facts about prime numbers, divisibility, residue classes etc.
The formal proofs presented here are performed with the XeriFun system [1].1
This system was designed and developed as an easy to learn and easy to use tool
for teaching Automated Reasoning, Semantics, Veri…cation and similar subjects
and has been used in beginner courses about Formal Methods as well as in
practical courses about Program Veri…cation for about 15 years [11].
The system’s object language consists of principles for de…ning polymorphic
data structures, procedures operating on them, and for statements (called “lem-
mas”) about the data structures and procedures. Fig. 1 displays some examples.
The data structure bool and the data structure Nfor natural numbers built
with the constructors 0and +(: : :)for the successor function are prede…ned in
?Tech n ical R ep o rt V F R 1 6/ 01 — Se pt em b er 1 5, 20 16 . A cc ep te d for p ub li ca ti on in
Jo u rn a l of A u to m a ted Reas o ni n g. T he …n al p ublication is available at S p rin g er v ia
htt p://d x. do i.or g/ 10 .1 00 7/s 10 81 7-016 -9 38 7- z.
1An acronym for “A Veri…er for Functional Programs”.
structure bool <= true, false
structure N<= 0, +(:N)
structure list[@I] <= ø, [infixr,10] ::(hd : @I, tl : list[@I])
function 2(k : list[N], p : N) : N<=
then 1
else if tl(k) = ø
then (hd(k) mod p)
else 2(tl(tl(k)), p) (hd(k) hd(tl(k)) mod p)
lemma P(p) ^p-x!I(I(x, p), p) = (x mod p) <=8p, x : N
if{P(p), if{:(x mod p) = 0, I(I(x, p), p) = (x mod p), true}, true}
Fig. 1. Data structures, procedures and lemmas in XeriFun
the system. (: : :)is the selector of +(: : :)thus representing the predecessor
function. list is the only user de…ned data structure required for the case stud-
ies of this paper. Identi…ers preceded by @denote type variables, and therefore
polymorphic lists are de…ned here. Lists are built with the constructors øfor the
empty list and :: (given in in…x-notation). The functions hd and tl (for head
and tail) are the selectors of :: yielding the leftmost list element and the list
with the leftmost element removed respectively.
Procedures are de…ned with conditionals if :bool !, functional
composition and recursion (case-conditionals and let -expressions may be used
as well). E.g., procedure 2of Fig. 1 (which will be used for proving Wilson’s
Theorem in Sec. 4) computes the product of a number list kby pairwise mul-
tiplication modulo pof the list elements (where the procedures and mod for
multiplication and the remainder operation are de…ned elsewhere).
Unde…nedness is implemented by underspeci…cation, i.e. by using incomplete
conditionals. Such a feature is required when working with polymorphic data
structures because results cannot be de…ned for certain function applications. For
instance, no result can be assigned to last(ø) for a procedure last computing
the last element of a polymorphic list, and hd(ø) remains unde…ned for the same
reason. This feature is useful for monomorphic data structures too as it avoids
the need for stipulating arti…cial results, e.g. for xmod 0.
Lemmas are de…ned with conditionals if :bool bool bool !bool as
the only connective (but negation :,case-conditionals and let -expressions may
be used as well). Only universal quanti…cation is allowed for the variables of
a lemma. Fig. 1 displays a lemma about procedure Iwhich will be used in
Section 4. The string in the headline is just an identi…er assigning a name to
the lemma for reference and must not be confused with the lemma body which
is a term of type bool representing the real statement of the lemma. For easing
readability, we will subsequently refer to lemmas by their names only.
Lemmas are proved with the HPL-calculus (abbreviating Hypotheses,Pro-
grams and Lemmas ) [11]. The most relevant proof rules of this calculus are
Induction,Use Lemma,Apply Equation,Unfold Procedure,Case Analysis and
Simpli…cation. Formulas are given as sequents of form H; IH `goal, where H
is a …nite set of hypotheses given as literals, i.e. negated or unnegated if -free
boolean terms, IH is a …nite set of induction hypotheses given as possibly quan-
ti…ed boolean terms and goal is a boolean term, called the goalterm of the
sequent. A deduction in the HPL-calculus is represented by a tree whose nodes
are given by sequents. A lemma `is developed goal equals true for each se-
quent at a leaf of the proof tree associated with `, and is veri…ed if additionally
each lemma applied by Use Lemma or Apply Equation when building the proof
tree is veri…ed.2
The Induction rule creates the base and step cases for a sequent from an
induction axiom, where such an axiom is obtained from the recursion structure
of some terminating procedure. For proving Fermat’s Little Theorem in Section
2.2, for instance, we de…ne a procedure
function <(k : list[N]) : bool <=
if k = ø then false else <(k nmax(k)) end_if
(where knndeletes the leftmost occurrence of nin kif n2kand max(k)
computes a maximal element of k6=ø) and instruct the system with Induction
to prove a certain lemma with body 8k:list[N]; n:Ngoal[n; k ]by induction upon
<.3The system then expands the proof tree of the lemma at the root node
fg;fg ` goal[n; k]by adding the successor nodes fk=øg;fg ` goal[n; k]and
fk6=øg;f8n0:Ngoal[n0; knmax(k)]g ` goal[n; k].
By choosing Simpli…cation, the system’s …rst-order theorem prover, called
the Symbolic Evaluator, is started for rewriting a sequent’s goalterm by using
the hypotheses and induction hypotheses of the sequent, the de…nitions of the
data structures and procedures as well as the lemmas already veri…ed [12]. Since
non-valid proof obligations are frequently encountered in the systems’s domain,
this reasoner is necessarily incomplete as a potential looping system would result
otherwise. It is guided by heuristics, e.g. for deciding whether to use a procedure
de…nition, for supporting goal directed equality reasoning, for speeding up proof
search by …ltering out useless lemmas, etc. In particular, a good compromise
between theorem proving performance, run time e¢ ciency and usefulness of re-
sults had to be found, as a poor performing prover necessitates frequent user
interactions, greatly delayed answer time disturbs the concentration of the im-
patient user and results di¤erent from true must enable the user to interactively
2The base of this recursive de…nition is given by lemmas being proved without using
other lemmas.
3In allmost all cases, the induction axiom required for proving a lemma `is provided
by one of the procedures used in `. However, in rare cases the user must de…ne an
additional procedure like <for stipulating a speci…c induction axiom, either because
this is needed or (as in the present case) eases a proof considerably.
continue with the proof. The Symbolic Evaluator is implemented as a completely
automated tool over which the user has no control, thus leaving the HPL-proof
rules as the only means to control the system’s theorem proving behavior.
Also the HPL-calculus is controlled by heuristics. By applying the Verify
command to a lemma, the system starts to compute a proof tree by choosing
appropriate HPL-proof rules heuristically. If a proof attempt gets stuck, the user
has to step in by applying a proof rule to some leaf of the proof tree (sometimes
after pruning some unwanted branch of the tree), and the system then takes over
control again. Also it may happen that a further lemma must be formulated by
the user before the proof under consideration can be completed.
Upon the de…nition of a procedure, XeriFuns automated termination analy-
sis (based on the method of Argument-Bounded Functions [10]) is invoked for
generating termination hypotheses which are su¢ cient for the procedure’s ter-
mination and are proved like lemmas. If the system fails to compute termi-
nation hypotheses, the user must provide an appropriate termination function
(which is not required for the procedures of the case studies discussed here).
Afterwards induction axioms are computed from the terminating procedures’
recursion structure to be on stock for future use.
Proof development is supported by further tools: The system may generalize
proof obligations to enable induction proofs, procedures can be tested by running
them in an interpreter, and a disprover is automatically invoked for checking
system generated conjectures, like termination hypotheses or generalizations, or
is called by the user to test a lemma before a proof attempt is started.
Great e¤ort had been spend upon system development to ease the use of the
system so that even students in the 1st half of the 2nd year were able to solve
veri…cation challenges for basic arithmetic, sorting and searching algorithms,
pattern matching, string and list processing, tautology checkers, etc. The most
important design principles had been Adequate Terms of Interaction allowing a
user to support the system by only suggesting useful lemmas and HPL-proof rule
applications when an automated proof attempt gets stuck, Transparency, which
means that the HPL-proofs as well as the proofs computed by the Symbolic Eval-
uator can be inspected and are presented in a user friendly and understandable
way, and Hidden Implementation to enable a user to succeed in a veri…cation
challenge without knowing the system’s implementation and heuristics.4Usabil-
ity is supported in addition by a high degree of automation, see [11] for more
details. XeriFun is implemented in Java and installers for running the system
under Windows,Unix/Linux or Mac are available from the web [1].
When working with the system, we use proof libraries which had been set up
over the years by extending them with de…nitions and lemmas being of general
interest and therefore may be useful when working on a new case study. When
importing a de…nition or a lemma from a library into a case study, all program
elements and proofs the imported item depends on are imported as well. For
4Unfortunately, we did not succeed completely in hiding system internals, as there are
2“tricks”in the formulation of lemmas which support automated equality reasoning,
see Section 2.1 for a discussion.
instance, the proof of Wilson’s Theorem depends on 27 procedures and 156
proven lemmas ranging from simple statements like the associativity of addition
up to “deep”theorems like Fermat’s Little Theorem. In the sequel we will only
list the lemmas which are essential to understand the proofs and refer to [1] for
a complete account of all used lemmas and their proofs.
2 Fermat’s Little Theorem
Fermat’s Little Theorem states that ap1mod p = 1 for numbers anot divisible
by a prime p. It belongs to the standard repertoire of Number Theory and is use-
ful to prove other statements like, for instance, the correctness of RSA encryption
and Wilson’s Theorem.
2.1 The Binomial Theorem
One proof quite often found in textbooks of Number Theory uses the Binomial
Theorem 5
8x; n:Nx6= 0 !(x+ 1)n=
i=0 n
for proving 6
8p; a:N P(p)!apmod p =a mod p (2)
by induction upon a: The proofs of the base cases a= 0 and a= 1 are trivial,
and we prove the step case (a+ 1)pmod p = (a+ 1) mod p under the induction
hypothesis apmod p =a mod p, where P(p)and a6= 0 are assumed. Then by
the Binomial Theorem (1)
(a+ 1)p=
i=0 p
i=1 p
iai= 1 + ap+
i=1 p
hence (a+ 1)pmod p = (ap+ 1) mod p as pdivides Pp1
i=1 p
iai. Therefore
(a+ 1)pmod p = (a+1) mod p as (ap+1) mod p = (a+1) mod p follows directly
from the induction hypotheses, and (2) is proved. Now if pdoes not divide a,
division by aat both sides of the equation in (2) yields Fermat’s statement
ap1mod p = 1.
To obtain a formal proof, we de…ne a procedure
function S(m:N,n:N,h:N,x:N) : N<=
if h 1
then if m = 0
then 0
else S((m), n, h, x) + n _(m - h) x^m
5x6= 0 is demanded in (1) as otherwise 00has to be de…ned.
6P(p)denotes that pis prime.
such that S(m; n; h; x)computes Pm
i=1 n
ihxiif h1, and formulate the
auxiliary lemmas 7
8m; n; h; x:Nh1^x= 0 !
i=1 n
ihxi= 0 (3)
8m; n; x; y:Nm
i=1 n
i=1 n
8m; n; x:Nn6= 0 !
i=1 n1
i=1 n1
i=1 n
which the system proves by induction upon mwith the help of one user inter-
action for (5).
Obviously, addition of yin lemma (4) can be cancelled, and the system
does so in fact before starting the induction proof. However, the equation in
(4) is embedded into a (right) context : : : +y=: : : +yfor supporting equality
reasoning: XeriFun uses AC(I)-Matching and conditional term rewriting, where
the orientation of equations is heuristically established [12]. Now consider e.g. the
associative operator +, some left-to-right orientated equation (E)8x:Nf(x) +
f(x) = g(x)and a term (T)a+ (f(b)+(f(b) + c)) . To enable the use of (E),
associativity given by 8x; y; z:N(x+y)+z=x+(y+z)has to be applied interacti-
vely in (T). This yields (T0)a+((f(b)+f(b))+c)and then the system uses (E)for
computing (T00)a+ (g(b) + c). This interaction could be avoided if the system
would consider all rearrangements of (T)modulo associativity for …nding a redex
for (E). However, the set of all associative rearrangements of a term tgrows
exponentially with the number of +-arguments in t, so this approach is unfeasible
as it would slow down system performance considerably. We therefore sometimes
formulate an equation in form of (E0)8x; y:Nf(x)+(f(x) +y) = g(x)+yinstead
of (E)as this enables an automatic replacement of (T)by (T00). The use of (E0)
is foolproof in the sense that it applies automatically to an initially given (T0)
as well. Since associativity is used as a left-to-right oriented equation too, an
equation embedding into a left context is needless. The embedding “trick” is
applicable to all associative operators which are right cancelable, but is very
rarely used (e.g. lemmas (4) and (7) are the only examples in this paper).
But there is another “trick” which is used more often. Lemma (3) provides
an example as it uses a variable xinstead of the constant 0at the left-hand
side of the equation. Abstracting a term by a variable and making the binding
explicit by a condition supports automated equality reasoning as well: Consider
e.g. the library lemma (L)8x:Nxx= 0 and some proof obligation (P)
a=b!c=ab. The system orients the equation in (L)from left to right, but
it cannot be used as xxdoes not match ab. We therefore use (L0)8x; y:N
x=y!xy= 0 instead of (L). Now xymatches ab, and the Symbolic
Evaluator simpli…es (P)to a=b!(a=b!c= 0) ^(a6=b!c=ab)
7Here we use mathematical notation like Pm+1
i=1 n
i1xiinstead of S(m+ 1; n; 1; x)
for sake of readability.
by using (L0)in an intermediate step and to a=b!c= 0 in the next step.
The use of (L0)is foolproof as an unconditional proof obligation (P0)c=aa
is replaced by (a=a!c= 0) ^(a6=a!c=aa)and to c= 0 in turn.
However, a conditional equation is only applied if the resulting conditions can
be established in fact, and therefore (L0)fails on (P00)c=ab.
We continue with a variant
8x; n:N(x+ 1)n= 1 +
i=1 n
of the Binomial Theorem (1) which is enough for our purposes and is automati-
cally proved with the use of lemmas (3) (5).8
Next we prove some facts about the binomial coe¢ cients, viz.
8m; n; k:Nnk!n
k(k)! (nk)! m= (n)! m(7)
8n; k:Nnk!n
k= (n)!=((k)! (nk)!) (8)
8n; k:Nnk!(k)! (nk)! j(n)! (9)
which require 6user interactions in total. The proof of lemma (7), which is used
to prove (8) and (9) with …rst-order reasoning only, is by induction upon n.
Using the latter two lemmas as well as the library lemmas
8n; x:N P(n)^n>x!n-(x)!
8n; x:N P(n)^n-x!gcd(n; x)=1
8x; y; z:Nz6= 0 ^zjxy^gcd(x; z)=1!zjy
the system proves
8p; j:N P(p)^0< j < p !pjp
by …rst-order reasoning with the help of 7HPL-proof rule suggestions. Finally,
8p; j; a:N P(p)^j < p !pj
i=1 p
is proved by induction upon j, where the system has to be instructed to use
lemma (10) as well as the library lemma
8x; y; z:Nz6= 0 ^zjx!(y+x)mod z =y mod z . (12)
We now have all prerequisites available to prove lemma (2): We call the
system to use Peano Induction upon a, and the system responds by proving the
8The use of (6) is not a “hack” for supporting the system, as lemma (1) has an
automatic proof as well. But it is good practice in mathematics to formulate proofs
as simple as possible, and this principle applies to formal reasoning in particular
(saving a lemma in the present case).
base case. For proving the step case,9we instruct the system to replace aat the
left-hand side of the equation with (a1) + 1 and the system responds by using
the Binomial Theorem (6) to replace ((a1) + 1)pyielding
P(p)^a6= 0 !(1 +
i=1 p
i(a1)i)mod p =a mod p .(13)
Then we command the system to unfold procedure Sand it computes
P(p)^a6= 0 !(1 + (a1)p+
i=1 p
i(a1)i)mod p =a mod p (14)
which in turn is replaced with
P(p)^a6= 0 !((a1)p+
i=1 p
i(a1)i)mod p = (a1) mod p (15)
after a call for using library lemma
8x; y; z; n:Nz6= 0 ^(xy)mod z !(x+ny+n)mod z
(with 1substituted for n).10 Finally, we instruct the system to use lemma (12)
for replacing (15) with
P(p)^a6= 0 !(a1)pmod p = (a1) mod p . (16)
The system then responds by discharging the resulting proof obligation
i=1 p
i(a1)iwith lemma (11) and uses the induction hypothesis for
replacing (a1)pmod p in (16) with (a1) mod p, so that lemma (2) is proved.
Finally we prove Fermat’s Little Theorem
8p; a:N P(p)^p-a!a^(p1) mod p = 1 (17)
by instructing the system to use lemma (2) and to unfold procedure ^yielding
P(p)^p-a^a^(p1) a mod p =a mod p !a^(p1) mod p = 1 . (18)
Then we use the library lemma
8n; x; y; z:N P(n)^n-z^(xzyz)mod n !(xy)mod n (19)
(with a^(p1) substituted for x,1for y,afor zand pfor n) for replacing
a^(p1) mod p at the left-hand side of the equation in (18) with 1mod p which
then simpli…es to 1, thus proving Fermat’s Little Theorem.
9Instead of constructor induction with a step case 8n:N'(n)!'(n+ 1),XeriFun
uses destructor induction with a step case 8n:Nn6= 0 ^'(n1) !'(n)as the latter
is more general. For instance, there is no constructive counterpart for the induction
schema obtained from procedure <of Section 1.
10 As usual, (xy)mod z stands for x mod z =y mod z.
2.2 The Pigeon Hole Principle
The proof by the Binomial Theorem is rather technical in the sense that it pro-
vides more insight into the nature of binomial coe¢ cients than into the nature of
primes. We used 15 lemmas for the proof, where 10 are related to binomial coef-
cients or to basic arithmetic, but only 5lemmas were related to some properties
of primes. In this section we present another proof of Fermat’s Theorem which
will illustrate more interesting properties of primes. And di¤erent to the proof of
Sec. 2.1, the mathematical concepts used in this proof later can be generalized
to prove Euler’s Theorem, which is a generalization of Fermat’s Little Theorem.
The key fact used for the alternative proof is that a permutation of the
number list (p1;:::;1) is obtained if each element in that list is replaced
by the residue mod p of its product with a, provided that prime pdoes not
divide a. For p= 5 and a= 8, for instance, we obtain (32;24;16;8) by mul-
tiplying each element in (4;3;2;1) with 8, and replacement of these list el-
ements by their residue mod 5yields (2;4;1;3), i.e. a permutation of the
initial list. Now we conclude Qp1
i=1 i=Qp1
i=1 (ai mod p)from (p1;:::;1)
(a(p1) mod p; : : : ; a 1mod p). Hence (Qp1
i=1 i)mod p = (Qp1
i=1 (ai mod p))
mod p = (Qp1
i=1 ai)mod p = (ap1Qp1
i=1 i)mod p, because the inner mods
in the product can be cancelled. Finally we divide both sides of the resulting
equation by Qp1
i=1 iso that 1 = ap1mod p is obtained.
For proving the permutation property we use the Pigeon Hole Principle (21),
which is formulated in [3] for verifying Fermat’s statement, by proving
8k:list[N]; n:Nk6=ø^purged (k)^nDk^0=2k^jkj=n!max(k) = n(20)
8k:list[N]; n:Npurged (k)^nDk^0=2k^ jkj=n!(n : : : 1) k. (21)
Here purged (k)is true i¤ list kdoes not contain multiple occurrences of iden-
tical elements, nDkholds i¤ nis an upper bound of the number list k, and
(n:::m)computes the number list (n; n 1; : : : ; m)if nmand the empty list
øotherwise. The system computes a proof of (20) after being called to use the
induction axiom given by procedure <from Section 1. Thus the hypothesis k6=
øand the induction hypothesis
8n0:Nknmax(k)6=ø^purged (knmax(k)) ^n0Dknmax(k)^
0=2knmax(k)^ jknmax(k)j=n0!max(knmax(k)) = n0
is obtained which is useful as nis replaced by n1in the induction conclusion
and then the induction hypothesis can be applied because list k(being purged )
has exactly one maximal element and therefore nDkentails n1Dknmax(k).
The proof of (21) is by induction upon nyielding the hypothesis n6= 0 and the
induction hypothesis
8k0:list[N]purged (k0)^n1Dk0^0=2k0^ jk0j=n1!(n1: : : 1) k0.
The system now instantiates k0in the induction hypothesis with knnand com-
pletes the proof after being instructed to replace knnwith knmax(k)by using
lemma (20).
Next it has to be proved that the prerequisites of lemma (21) are satis…ed for
a number list (a(p : : : 1) MOD p), where list mkis obtained by multiplication
of each i2kwith mand list (k MOD n)emerges from kby replacing each
i2kwith i mod n. Using these de…nitions, the system proves the lemmas
8p; m; n; a:N P(p)^p>m>n^p-a!(ma mod p)=2(a(n : : : 1) M OD p)
8p; n; a:Np6= 0 !p1D(a(n : : : 1) MOD p)(23)
8p; n; a:N P(p)^p>n^p-a!0=2(a(n : : : 1) MOD p)(24)
8p; n; a:N P(p)^p>n^p-a!purged((a(n : : : 1) MOD p)) (25)
by induction upon nusing some basic number theoretic lemmas imported from
the library and requiring 8user interactions in total. With these results we prove
8p; a:N P(p)^p-a!(p1: : : 1) (a(p1: : : 1) MOD p)(26)
by instructing the system to use the Pigeon Hole Principle of lemma (21) with
p1substituted for nand (a(p1: : : 1) MOD p)substituted for k. The
system then completes the proof by discharging the resulting proof obligations
using the auxiliary lemmas (22) (25) as well as the library lemmas
8k:list[N]; n:Nj(k MOD n)j=jkj(27)
8k:list[N]; n:Nj(nk)j=jkj(28)
8n:Nj(n : : : 1)j=n. (29)
Next we prove the auxiliary lemma
8p; a :N P(p)^p-a!(a^(p1)((p1: : : 1)) ((p1: : : 1))) mod p (30)
(where (k)computes the product of the numbers in list k) by calling the
system to replace ((p1: : : 1)) at the right-hand side of the equation with
((a(p1: : : 1) MOD p)) which is justi…ed by lemma (26) and the library
8k; l:list [N]kl!(k) = (l). (31)
The system then responds by using the library lemma
8k:list[N]; n:Nn6= 0 !(k MOD n)mod n = (k)mod n (32)
for replacing ((a(p1: : : 1) MOD p)) mod p with (a(p1: : : 1)) mod p
which in turn is replaced by (a^(p1) ((p1: : : 1))) mod p by using the
library lemmas (29) and
8k:list[N]; n:Nn6= 0 !(nk) = n^jkj  (k). (33)
This completes the proof of (30) (with the help of 2user interactions) as now
both sides of the equation are identical. Fermat’s Little Theorem (17) now is
easily proved by calling the system to use the instance
P(p)^p-((p1: : : 1))
^(a^(p1) ((p1: : : 1)) 1((p1: : : 1)))) mod p
!(a^(p1) 1) mod p
of library lemma (19). The system then uses lemma (30) for simplifying the proof
goal by symbolic evaluation to
P(p)^p-a!(p-((p1: : : 1)) _a^(p1) mod p = 1) .
Finally, it uses the library lemma
8n; x:N P(n)^n>x!n-((x : : : 1)) (34)
to establish p-((p1: : : 1)), and Fermat’s Theorem is proved with one user
2.3 An Easier Proof of Fermat’s Statement
Unfortunately, the Pigeon Hole Principle cannot be used for number lists which
do not consist of all numbers in an interval [n; : : : ; m]. But the permutation
property need to be proved for those lists as well, in particular when proving
Euler’s generalization of Fermat’s statement. Therefore we had to develop an-
other approach for proving permutation when considering Euler’s Theorem, and
this approach also yields an easier proof of Fermat’s Theorem: Instead of the
Pigeon Hole Principle (21) we use the library lemma
8k; l:list [@I]kl^ jkj=jlj ! kl(35)
for proving permutation. denotes the subbag relation which does not consider
the order of list elements, but is sensitive to the number of element occurrences
when comparing lists. For instance, (3;2) (2;2;3) but not vice versa.
Instead of the lemmas (23) (25) we now prove
8p; n; a:N P(p)^p>n^p-a!(a(n : : : 1) MOD p)(p1: : : 1) (36)
which requires auxiliary lemma (22). Both proofs are by induction upon nand
need 7hints to the system for completing the proofs as well as the presence
11 As XeriFun does not provide a data structure set, we had to use lists in our formal
proof. If sets were present, purged(: : :)had to be replaced by true and by =in the
lemmas, so that the lemmas (25) and (31) would become needless. This means that
the use of lists does not complicate the proof, as only 1user interaction for proving
(25) and 2user interactions for proving (31) could be saved. In the subsequent proofs
of sections 2.3 and 3, only lemma (31) would become needless, reducing the savings
even more.
Proc. Lem. Rules User Sys. % Steps mm:ss
Fermat (Bin Th) 12 81 397 60 337 84;9 6:324 01:28
Fermat (PHP) 21 102 454 57 397 87;4 7:818 00:42
Fermat 18 87 394 48 346 87;8 6:196 00:32
Fermat (Euler) 20 102 473 62 411 86;9 7:495 00:35
Euler 17 93 433 49 384 88;7 6:601 00:30
Wilson 27 157 751 128 623 83;0 14:090 04:19
Wilson (PHP) 27 155 753 134 619 82;2 14:629 05:09
Fig. 2. Proof Statistics
of library lemma (19). Now instead of the Pigeon Hole Principle, we instruct
the system to use lemma (35) for proving the permutation property (26). The
system then completes the proof by discharging the resulting proof obligations
j(p1: : : 1)j=j(a(p1: : : 1) MOD p)jand (a(p1: : : 1) M OD p)
(p1: : : 1) with (27),(28) and (36). We now continue as before and obtain an
easier proof.
Fig. 2 illustrates the total e¤ort required for proving Fermat’s Little Theo-
rem (including all procedures and lemmas which had to be imported from the
library). Column Proc. gives the number of user de…ned procedures, Lem. is the
number of user de…ned lemmas, and Rules counts the total number of HPL-
proof rule applications, separated into user invoked (User) and system initiated
(Sys.) ones. Column %gives the automation degree, i.e. the ratio between Sys.
and Rules,Steps lists the number of …rst-order proof steps performed by the
Symbolic Evaluator and mm:ss displays the runtime of the Symbolic Evaluator
needed for a case study.12
Row Fermat (Bin Th) gives the values when using the Binomial Theorem,
row Fermat (PHP) displays the values when using the Pigeon Hole Principle and
the row below shows the statistics for the easier solution. As the numbers reveal,
the easier proof requires the least e¤ort for the user as well as for the system
and the proof by the Binomial Theorem is most costly. It is striking that the
runtime for the easier proof is only a third and the runtime for the proof using the
Pigeon Hole Principle is less than the half of the runtime spend when using the
Binomial Theorem, although the number of proof steps is almost identical or even
higher. We suspect that the latter proof is most costly because more arithmetic
is involved thus becoming more expensive. In contrast, the proofs using lists
avoid some arithmetic reasoning in favour of reasoning about lists which is less
complicated, thus easing the proofs. And …nally, the easier proof requires least
ort as it avoids certain lemmas (and proof steps in turn) necessitated by the
Pigeon Hole Principle.
12 Time refers to running XeriFun 3.5 under Windows 7 Enterprise with an INTEL
Core i7-2640M 2.80 GHz CPU using Java 1.8.0_45.
3 Eulers Theorem
Euler’s Theorem generalizes Fermat’s Little Theorem by considering also non-
prime moduli. It states that a(n)mod n = 1 for numbers arelatively prime to
some n2, where denotes Euler’s phi -function (also called totient-function in
the literature). The mathematical concept used in the proof of Euler’s Theorem
is called a reduced residue system modulo n, where n6= 0 is demanded. This is
the set of all numbers from n1to 1being relatively prime to n. We represent
this set by a list rrs(n)and de…ne the phi-function by the length of this list,
i.e. (n) := jrrs(n)j. Since rrs(p) = (p1;:::;1) and therefore (p) = p1
for prime numbers p, Euler’s Theorem is a true generalization of Fermat’s Little
Similar to the proof of Fermat’s Little Theorem, the key fact used in the proof
of Euler’s Theorem is that a permutation of rrs(n)is obtained if each element
in that list is replaced by the residue mod n of its product with a, provided a
is relatively prime to n. E.g., for n= 10 and a= 7 we obtain (63;49;21;7)
by multiplying each element in rrs(10) = (9;7;3;1) with 7, and replacement of
these list elements by their residue mod 10 yields (3;9;1;7), i.e. a permutation
of rrs(10). Now we conclude (rrs(n)) = ((arrs(n)M OD n)) from rrs(n)
(arrs(n)MOD n), and therefore (rrs (n)) mod n = ((arrs (n)M OD n))
mod n = (arrs (n)) mod n = (ajrrs(n)j(rrs (n))) mod n, because M OD n
in the product can be eliminated by (32). Finally we divide both sides of the
resulting equation by (rrs(n)) so that 1 = a(n)mod n eventually is obtained.
For proving Euler’s Theorem, we de…ne a procedure rrs (n; y)by recursion
upon y, viz.
function rrs(n:N, y:N) : list[N]<=
if n >y
then if y = 0
then ø
else if gcd(n, y) = 1
then y :: rrs(n, (y))
else rrs(n, (y))
else ø
which computes a partial reduced residue system modulo n, i.e. a list of minimal
length containing all numbers from fy; : : : ; 1gbeing relatively prime to n.
Now we formulate the required lemmas in terms of rrs (n; y )which allows
us to prove them by induction upon y. By instantiating ywith n1, the wanted
propositions about rrs (n)are obtained afterwards as rrs(n) = rrs(n; n 1). In
particular, the main statement
8n; y; a:Ngcd(n; a)=1!(arrs (n; y )MOD n)rrs (n; n 1) (37)
about rrsis required in the sequel which necessitates the auxiliary lemmas
8n; y:Nn>y1!rrs (n; y )6=ø
8n; x; y:Nx>y!x =2rrs(n; y )
8n; x; y:Nn>yx1^gcd(n; x)=1!x2rrs(n; y)
8n; x; y; a:Nn>x>y^gcd(n; a)=1!(ax mod n)=2(arrs (n; y)M OD n).
All proofs are by induction upon yand require 8user interactions in total. The
library lemma
8x; y; z:Ngcd(x; y) = 1 !gcd(x; y z) = gcd(x; z)(38)
is required in particular for proving (37) and is needed as well for the proof of
(42) subsequently. For proving the permutation property
8n; a:Nn6= 0^gcd(n; a)=1!(arrs (n; n1) M OD n)rrs(n; n1) (39)
we call the system to use library lemma (35). It then completes the proof by
discharging the resulting subgoals (arrs(n; n 1) MOD n)rrs(n; n 1)
and jrrs(n; n 1)j=j(arrs (n; n 1) M OD n)jwith the library lemmas
(27) and (28) and the auxiliary lemma (37) just proved. Next we prove the
auxiliary lemma
8n; a:Nn6= 0 ^gcd(n; a)=1
!((rrs(n; n 1)) a^jrrs (n; n 1)j  (rrs (n; n 1))) mod n (40)
by calling the system to replace (rrs(n; n 1)) at the right-hand side of the
equation with ((arrs(n; n 1) M OD n)) which is justi…ed by the lemmas
(39) and (31). The system then responds by using the library lemmas (32) and
(33) for replacing ((arrs(n; n 1) M OD n)) mod n at the right-hand side
with a^jrrs(n; n 1)j  (rrs (n; n 1)) mod n. This completes the proof of
(40) (with the help of 2user interactions) as now both sides of the equation are
identical. With the library lemma
8n; m:Ngcd(m; n)=1^(mxmy)mod n !(xy)mod n (41)
and the auxiliary lemma (with an automated induction upon ybut needing one
interactive case analysis)
8n; y:Ngcd((rrs (n; y )); n)=1 (42)
we now have all required statements available for proving Euler’s Theorem
8n; a:Nn2^gcd(n; a)=1!a^(n)mod n = 1 .
Classes 12 11 10 9 8 7 6 5 4 3 2 1
=13 12 6 4 3 5 2 11 8 10 9 7 1
sort 11 6 10 4 9 3 8 5 7 2
Fig. 3. Inverses of the residue classes 6= 0 modulo prime 13
We call the system to use the instance
gcd((rrs (n; n 1)); n)=1
^((rrs(n; n 1)) a^jrrs (n; n 1)j  (rrs (n; n 1)) 1) mod n
!(a^jrrs(n; n 1)j  1) mod n
of library lemma (41), and the system then responds by using lemma (40) for
simplifying the proof goal to
n2^gcd(n; a)=1!(gcd((rrs (n; n 1)); n)=1_a^(n)mod n = 1) .
Finally, it applies auxiliary lemma (42) to establish gcd((rrs(n; n 1)); n) = 1,
and Euler’s Theorem is proved with one user interaction.
Fig. 2 shows the e¤ort for proving Euler’s Theorem which is similar to the
ort needed for the (easier) proof of Fermat’s Little Theorem. This seems to be
an odd result as Euler’s Theorem implies Fermat’s statement, hence one might
expect more work. But we have to reason about primes when proving Fermat’s
Theorem (what is not required for Euler’s Theorem) causing additional work.
And in fact, when proving Fermat’s Theorem as a corollary of Euler’s Theorem,
certain lemmas about primes are needed in addition raising a greater e¤ort than
the direct proof of Fermat’s Little Theorem, cf. row Fermat (Euler) in Fig. 2.
4 Wilson’s Theorem
Wilson’s Theorem states that (p1)! mod p =p1for each prime p.13 The
key observation for proving this theorem is that the residue classes modulo a
prime p(minus the class for 0) form a group.14 For our purposes it is enough
to consider the numbers in fp1;:::;1gas canonical representatives of the
residue classes. The group operation is given by multiplication mod p,1is the
neutral element and the inverse =p(x)of some x2 fp1;:::;1gis de…ned by
xp2mod p. Figure 3 gives an example for p= 13, and we …nd, for instance,
8 =13(8) mod 13 = 8 5mod 13 = 40 mod 13 = 1.
Two facts about =pare essential for proving Wilson’s Theorem: If p-xthen
13 pbeing prime is not only su¤cient but also necessary for the statement to hold as
(n1)! mod n 6=n1for all non-primes n6= 1. See [1] for a computer assisted
14 As it happens, the residue classes modulo a prime form a …eld, however this more
general fact is not needed here.
 =p(=p(x)) = x mod p, therefore =pis injective in fp1;:::;1gand conse-
quently list (=p(p1);:::;=p(1)) is a permutation of list (p1;:::;1);
1and p1represent the only residue classes inverse to themselves, i.e.
=p(x) = xentails x mod p = 1 or x mod p =p1and =p(1) = 1 as well as
=p(p1) = p1. Consequently list (=p(p2);:::;=p(2)) is a permutation
of list (p2;:::;2).
The following statements about =p(written as I(x; p)in the lemmas) are
required for a formal proof of Wilson’s Theorem:
8p; x:N P(p)^p-x!I(I(x; p); p) = x mod p (44)
8p; x:N P(p)^p-x!xI(x; p)mod p = 1
8p; x:N P(p)^p-x!I(x; p)6= 0
8p; x:N P(p)^p-x^I(x; p) = x!(x mod p = 1 _x mod p =p1)
8p; x:N P(p)!I(x; p)mod p =I(x; p)
8p; x; y:N P(p)^xy mod p = 1 !x mod p =I(y; p)
8p; x:N P(p)^x mod p = 1 !I(x; p)=1
8p; x:N P(p)^x mod p =p1!I(x; p) = p1
8p; x:N P(p)^p-x^I(x; p)=1!x mod p = 1
8p; x:N P(p)^p-x^I(x; p) = p1!x mod p =p1
8p; x:N P(p)^p2x!p2I(x; p).
All lemmas are proved by …rst-order reasoning requiring 26 user interactions
in total, where Fermat’s Little Theorem (17), lemma (19) and the library lemmas
8p; x; y:N P(p)^pjx^y!pjx
8x; y; z:Nz6= 0 !x(y mod z)mod z =xy mod z (45)
8x; y; z:Ny6= 0 ^z6= 0 !(x mod z)^y mod z =x^y mod z
8p; x:N P(p)^x2mod p = 1 !x mod p = 1 _x mod p =p1
8n; x:Nn2^(x mod n = 1 _x mod n =n1) !x2mod n = 1
have been used.
Wilson’s Theorem obviously is true for p= 2 and p= 3, so let us assume
p5. Then list k:= (p2;:::;2) has (even) length p32and by the facts
about =pgiven above we may de…ne a permutation ksort of list kby (a1;=p(a1);
:::;an;=p(an)), where n:= (p3)=2. The row labelled with sort in Fig. 3
gives an example of such a list. Now we calculate Qn
i=1(ai =p(ai)mod p)=1,
because ai=p(ai)mod p = 1, hence (ksort )mod p = (Qn
i=1 ai=p(ai)) mod p =
i=1(ai =p(ai)mod p)) mod p = 1 as the inner mods in the product can be
cancelled by (45). Therefore (p2)! mod p = (k)mod p = (ksort)mod p = 1,
because kksort entails (k) = (ksort ). Finally, we calculate (p1)! mod p =
(p1)(p2)! mod p = (p1)((p2)! mod p)mod p = (p1) 1mod p =p1
by using (45) and (p2)! mod p = 1, and Wilson’s Theorem is proved.
For de…ning the above number list ksort, we use a procedure sort given by
function sort(k:list[N], p:N) : list[N]<=
if k = ø
then ø
else hd(k) :: I(hd(k), p) :: sort(tl(k) nI(hd(k), p), p)
and aim to prove
8p:N P(p)!(p2: : : 2) sort((p2: : : 2); p)(46)
by using library lemma (35). To this e¤ect we formulate the lemmas
8k:list[N];n:Nksort(k; n)(47)
8p:N P(p)! j(p2: : : 2)j=jsort((p2: : : 2); p)j(48)
to verify the subgoals resulting from the use of (35). An induction proof of
(47) is easily computed, but a proof of (48) is more challenging. This is be-
cause the permutation property is implicitly present in the length requirement:
If j(p2: : : 2)j<jsort((p2: : : 2); p)j, then I(x; p)=2(p2: : : 2) for some
x2(p2: : : 2) and consequently (46) must be false.
To prove the length requirement (48), we modify the Pigeon Hole Principle
(21) of Section 2.2 for our purposes and assert
8k:list[N];n:Npurged(k)^n6= 0^2Ek^nDk^ jkj  n1! jkj=n1(49)
where mEkmeans that mis a lower bound of k. Having instructed the system to
use the knmax(k)-induction we had chosen for proving lemma (20) in Section 2.2,
the system automatically completes the proof. Now in order to use lemma (49)
for proving (48), we have to verify that the requirements of (49) are satis…ed if
sort(k; p)is substituted for kand p2for n. This is expressed by the lemmas
8k:list[N];p:N P(p)^p2Dk!p2Dsort(k; p)(50)
8k:list[N];p:N P(p)^2Ek^p2Dk!2Esort(k; p)(51)
8k:list[N]; n;p:N P(p)^p1Dk^n2sort(k; p)!(n2k_I(n; p)2k)(52)
8k:list[N];p:N P(p)^purged(k)^2Ek^p2Dk!purged(sort(k; p)) . (53)
All these lemmas are proved by induction corresponding to the recursion struc-
ture of procedure sort and with an extensive use of the above lemmas about
I(x; p), where 13 user interactions are required to obtain the proofs.
The use of (50) (53) with (p2: : : 2) substituted for kcreate further
proof obligations. For instance, p2D(p2: : : 2) has to be proved if p2D
sort((p2: : : 2); p)shall be veri…ed with lemma (50). But these proofs are
trivial having the obvious and easily veri…ed library lemmas like e.g. 8h; m; n:N
hn!hD(n:::m)at hand.
The length requirement (48) now is proven by pure …rst-order reasoning
(requiring 5user interactions) by calling the system to use (49) with p2
substituted for nand sort((p2: : : 2); p)for k. The system then discharges the
resulting proof obligations by the just proven lemmas (47) ;(50) (53) and the
library lemma 8k; l:list[@I]kl! jkj  jljfor discharging the proof obligation
jsort((p2: : : 2); p)j  p3. Now for proving the permutation requirement (46),
it is enough to instruct the system to apply (35). The system then completes
the proof by discharging the resulting proof obligations using (47) and (48).
For modeling the pairwise multiplication of list elements modulo pwe use
procedure 2from Fig. 1 and command the system to prove the lemmas
8k:list[N];p:N P(p)^2Ek^p2Dk!2(sort(k; p); p)=1 (54)
8k:list[N];n:Nn6= 0 !(2(k; n)(k)) mod n (55)
which requires 2user interactions for the proof of (54) and the presence of the
library lemma
8x; y; z; n:Nz6= 0 ^(xy)mod z !(nxny)mod z
for the proof of (55). Next we prove the auxiliary lemma
8p:N P(p)!(p2)! mod p = 1 mod p (56)
by calling the system to use lemma (54) for replacing 1at the right-hand side
of the equation with 2(sort((p2: : : 2); p); p). The system then responds by
discharging the resulting proof obligations 2E(p2: : : 2) and p2D(p2: : : 2)
and applies lemma (55) yielding (p2)! mod p = (sort((p2: : : 2); p))
mod p. Then we instruct the system to use library lemma (31) to replace
(sort(( p2: : : 2); p)) by (p2: : : 2) which is justi…ed by lemma (46). Sub-
sequently the system replaces (p2: : : 2) by (p2)!, and lemma (56) is proved
with the support of 3user interactions as identical terms are obtained on both
sides of the equation.
We now have all prerequisites available to prove Wilson’s Theorem
8k:list[N];p:N P(p)!(p1)! mod p =p1.
Having called the system to unfold procedure !and to use lemma (45), we obtain
(p1) ((p2)! mod p)mod p at the left-hand side of the equation. The system
then responds by applying lemma (56) yielding (p1) (1 mod p)mod p which
in turn simpli…es to p1by the de…nitions of and mod. As both sides of the
equation now are identical, Wilson’s Theorem is proved with the support of 2
user interactions.
Fig. 2 displays the e¤ort required for proving Wilson’s Theorem. As list
(p2: : : 2) consists of consecutive numbers, a slight generalization of the Pigeon
Hole Principle (21) where requirement jkj=nis replaced by jkj  ncan be used
alternatively for proving the permutation requirement (46). The statistics for
this proof are displayed in the row labelled Wilson (PHP). Both solutions use
the proof of Fermat’s Little Theorem as illustrated in Section 2.3 and their costs
di¤er not as much as they do for the Fermat proof. This is because the Pigeon
Hole Principle is used in both proofs of Wilson’s Theorem, either in form of our
modi…cation (49) or in its slightly generalized form.
5 Related Work
The theorems considered here attracted mathematicians already from the …rst
days of Number Theory, and several proofs based on di¤erent mathematical
concepts had been published over the centuries (see [6] for a presentation and
comparison of the variety of methods that were used to prove these theorems
and their generalizations). Likewise, these proofs attracted developers and users
of interactive proof systems, proof assistants or proof checkers respectively, to
challenge their systems by redoing the proofs.
The proofs performed with the interactive proof assistant Coq [13] uses the
theory of cyclic groups and …nite rings to prove Euler’s Theorem, and proof
scripts for Wilson’s and Fermat’s Little Theorem exists as well. Euler’s Theorem
is proved with the interactive proof assistant HOL Light [14], and Fermat’s The-
orem then is proved as a consequence of the Euler theorem. Another proof uses
the Binomial Theorem for proving Fermat’s statement. The proof of Wilson’s
Theorem is based on quadratic residues.
Di¤erent proofs are also computed with the interactive proof assistant Isa-
belle [15]: One approach uses the theory of unit groups for proving the theorems
of Euler and Wilson, and Fermat’s Little Theorem then is proved as an instance
of Euler’s Theorem. A second approach uses the framework of bijection relations
developed in [7] to prove the theorems of Fermat, Euler and Wilson. Here per-
mutation is shown by proving bijectivity of fa;n(x) := ax mod n where rrs(n)
is the domain of fa;n, prime nas well as n-ais assumed for Fermat’s Little The-
orem and n6= 0 as well as gcd(n; a)=1is assumed for Euler’s Theorem. In case
of Wilson’s theorem, bijectivity of =phas to be proved where f2; : : : ; p 2gis
the domain of =pand pis prime. Also the proof of Wilson’s Theorem presented
in [9] is redone in Isabelle which necessitates similar e¤ort as the bijection
relation approach. However, once this framework has been established, it can be
reused for proving theorems depending on similar permutation properties thus
reducing user e¤ort.
As XeriFun does not provide higher-order logic, such a framework cannot
be formulated. Moreover, interactive proof systems like HOL Light,Coq and
Isabelle support the creation of theory hierarchies, like the theory of …elds
evolve from the theory of rings which in turn uses the theory of groups etc. This
allows the development of a comprehensive mathematical apparatus supporting a
user when working on a new proof challenge. As our system lacks such a feature,
we cannot redo proofs based on abstract mathematical concepts directly. For
instance, although we have proved in Section 4 that multiplication mod p,1
and =psatisfy the axioms of a group, we do not have the concept of a group.
Therefore we cannot inherit theorems proved elsewhere about groups, e.g. that
=pis involutory, but must prove such theorems additionally, cf. lemma (44).
Rather than comparable with the interactive proof assistants, our system is
quite similar to the Boyer-Moore induction theorem prover [2][4] in the sense
that mathematical notions are algorithmically de…ned, induction is the main
inference rule, and various heuristics guide the system for automating induction
and for proving the base and step cases of an induction proof by …rst-order means.
The object language of the Boyer-Moore prover does not allow polymorphism,
but uses Lisp which is extended by a principle for de…ning data structures by
constructors and selectors (as we do). When working with the system, a user
prepares a …le with the de…nitions of the procedures, data structures and lemmas
(given as Lisp-expressions) which then is executed by the system. If unsuccessful,
the user analyzes the computed protocol and inserts so-called “hints”into the …le
which then is given to the prover for another try. Some of these hints correspond
to the interactive calls of HPL-proof rules in our system, like the Use-Lemma hint
instructs the system to use a speci…c instance of a certain lemma, the Induction
hint stipulates the induction axiom and the variables to induct upon, etc. Other
user hints like annotations stating how a lemma is to be used in a proof (e.g.
as a rewrite rule) or the Disable feature which excludes procedure calls from
being executed as well as lemmas from being used have no correspondence in
our system as they are implemented by appropriate heuristics.15
Fermat’s Little Theorem has been proved with the Boyer-Moore theorem
prover some time ago [3], and the proof presented in Sec. 2.2 is quite similar
to this proof. Also Wilson’s Theorem has been proved with the Boyer-Moore
prover [9]. This proof is quite similar to our proof in Sec. 4 up to the point
where permutation is to be proved: Instead of reordering the list (p2: : : 2) with
procedure sort, the list of adjacent inverses is computed directly by a procedure
INVERSE.LIST(n; m), where the correspondence to our approach is given by
INVERSE.LIST(p2; p)n1 = sort((p2: : : 2); p)for primes p. Now for proving
permutation, a modi…ed version of the Pigeon Hole Principle is used where requi-
rement jkj=nis replaced by (n : : : 1) k.16 This represents an elegant solution
for proving the permutation property, although the motivation for doing so “This
theorem [i.e. (21)] is not applicable to the problem at hand because it contains
a hypotheses concerning the length of the list.”can be refuted, cf. Section 4.
15 But we may ban the execution of procedure calls in a case term or in an instance
of a lemma upon use of Case Analysis or Use Lemma respectively. We may also use
the HPL-rule Normalization working like Simpli…cation, except that procedure calls
in the goalterm are not executed.
16 denotes the subset relation which neither considers the order of list elements
nor the number of element occurrences when comparing lists. For instance, (3;2)
(2;2;3) and vice versa.
When concerned with automated reasoning, the demand for user support
is of particular interest. One criterion is the number of user suggested proof
rule applications for solving a certain theorem proving challenge: The number
of user calls for Induction relative to the number of all induction rule applica-
tions is a measure for the induction heuristic’s quality, the amount of calls for
Apply Equation provides insight into the system’s automated equality reasoning
capability, and the frequency of the other proof rule applications is reciprocally
proportional to the …rst-order theorem prover’s performance. A further measure
is the number of lemmas which the user has to submit to the system for guiding
it to success: A …rst-order provable lemma may be needed in one system, but is
obsolete in another one as a stronger …rst-order prover is available, and a lemma
provable by induction might be required in one system, whereas another system
spots this lemma automatically by generalization, say, or is not needed at all.
However, little is known from the literature about these …gures. It has been
reported in [5] that the induction heuristic of the Boyer-Moore prover has a suc-
cess rate of 79% where 70% of the proven lemmas are …rst-order provable,17
whereas we count an average success rate of 95% for our induction heuristic,
where the share of …rst-order provable lemmas is less than 35% on the aver-
age.18 But unfortunately, the proofs presented in [3] and [9] do not allow a more
detailed comparison with our system in terms of user e¤ort, and the proof scripts
of the proof assistants HOL Light,Coq and Isabelle found on the web also do
not provide insight in this respect. But it seems obvious at least that the solution
illustrated in Sec. 2.3 also eases the proof of Fermat’s statement when using the
Boyer-Moore prover.
6 Conclusion
Proving theorems with pencil and paper is work, and using a reasoning tool in-
stead comes with additional burden. This is because the intention for developing
formal proofs does not allow gaps by omitting “obvious” proof steps. Central
to the proofs of the three theorems considered here is the permutation property
of certain lists, but a search for the respective proofs in textbooks of Number
Theory will be in vain. For instance, it is proved in [8] that (arrs(n)MOD n)
is a reduced residue system modulo n(if ais relatively prime to n), and it is
concluded that “the integers in (arrs (n)M OD n)must be the integers in
rrs(n)in some order.”Although this fact is obvious, the proof for it is not.
A further burden involved with formal proofs is caused by the boundaries of
the used logic. Mathematical notions are algorithmically speci…ed in XeriFun ,
like e.g. a reduced residue system modulo nis de…ned by procedure rrs . This
17 Proofs of 16:000 theorems had been reported in [5], where 3800 were proven by
automated induction and 1000 by user suggested induction.
18 For the case studies in Fig. 2, the success rate of the induction heuristic / the share
of …rst-order provable lemmas ranges from 93% /32% in the Fermat proof using the
Pigeon Hole Principle to 97% /35% when using the Binomial Theorem, and is 94%
96% /31% 33% for the remaining pro ofs.
approach allows for powerful heuristics for the proof search, thus avoiding fre-
quent user interactions upon the computation of proofs. However, this bene…t
comes not for free as it may necessitate the invention of non-obvious lemmas
corresponding to the formulation of loop invariants when verifying loops of some
imperative programming language. This e¤ect is noticeable in particular in Num-
ber Theory as certain mathematical concepts cannot be straightly de…ned by
recursion here. For instance, primes cannot be de…ned in terms of primes (like
divisibility can be de…ned in terms of divisibility), but must be de…ned by some
kind of a loop instead. Now for proving a statement about primes, auxiliary
lemmas corresponding to loop invariants might be required which sometimes are
not that easy to spot.19
And …nally, the incidence for user support strongly corresponds to the dif-
culty of the mathematics present in a case study. While we may achieve an
automation degree up to 100 % in mathematically simple domains, e.g. when
sorting lists, proofs in Number Theory require signi…cantly more user interven-
tions. This is because quite often elaborate ideas for developing a proof are
needed here which are beyond the ability of the heuristics guiding the proof
search. An example is the proof of lemma (38)
8x; y; z:Ngcd(x; y) = 1 !gcd(x; y z) = gcd(x; z)
where we call the system with Apply Equation to replace xin gcd(x; y z)by
xgcd(1; z)and to use the times-gcd distributivity yielding gcd(gcd(x1; x z);
yz).XeriFun responds by using the de…nition and the commutativity of mul-
tiplication as well as the associativity of gcd and the times-gcd distributivity
for computing gcd(x; z gcd(x; y)), and then completes the proof by using the
hypothesis for replacing gcd(x; y)by 1. The idea for the …rst two replacement
steps is crucial for the proof but is non-obvious, at least for a machine. User
interaction is needed in such a case as the system’s heuristics do not support
proof steps of this kind.
2. Boyer, R.S., Moore, J S.: A Computational Logic. Academic Press, New York,
3. Boyer, R.S., Moore, J S.: Proof Checking the RSA Public Key Encryption Algo-
rithm. American Mathematical Monthly 91(3), 181–189 (1984)
4. Boyer, R.S., Moore, J S.: A theorem prover for a computational logic (Keynote
Address). Proc. 10th Intern. Conf. on Automated Deduction (CADE-10), vol. 449
of Lecture Notes in Comp. Science, pp. 1–15, Kaiserslautern, (1990)
5. Boyer, R.S., Moore, J S.: On the Di¢ culty of Automating Inductive Reasoning.
Remarks made at a CADE-11 workshop on inductive reasoning, Saratoga Springs,
(1992) (available from the web)
19 See e.g. the denition of primes and the required loop invariant called prime1.basic
in [2].
6. Dickson, L.E.: History of the Theory of Numbers, Vol 1: Divisibility and Primality.
Carnegie Institution of Washington, Publication No. 256, Washington (1919)
7. Rasmussen, T.M.: An Inductive Approach to Formalizing Notions of Number The-
ory Proofs. Computer Mathematics: Proc. of the 5th Asian Symposium (ASCM
2001), Matsuyama, Japan, 131–140 (2001)
8. Rosen, K.H.: Elementary Number Theory and Its Applications, 5th edn. Pearson
Addison Wesley, Boston (2005)
9. Russino¤, D.M.: An Experiment with the Boyer-Moore Theorem Prover: A Proof
of Wilsons Theorem. Journal of Automated Reasoning 1(2), 121–139 (1985)
10. Walther, C.: On Proving the Termination of Algorithms by Machine. Arti…cial
Intelligence 71(1), 101–157 (1994)
11. Walther, C., Schweitzer, S.: Veri…cation in the Classroom. Journal of Automated
Reasoning - Special Issue on Automated Reasoning and Theorem Proving in Ed-
ucation 32(1), 35–73 (2004)
12. Walther, C., Schweitzer, S.: A Pragmatic Approach to Equality Reasoning. Techni-
cal Report VFR 06/02, Technische Universität Darmstadt, 1–19 (2006) (available
from [1])
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Proving the termination of a recursively defined algorithm requires a certain creativity of the (human or automated) reasoner for inventing a hypothesis whose truth implies that the algorithm terminates. We present a reasoning method for simulating this kind of creativity by machine. The proposed method works automatically, i.e. without any human support. We show, (1) how a termination hypothesis for an algorithm is synthesized by machine, (2) which knowledge about algorithms is required for an automated synthesis, and (3) how this knowledge is computed. Our method solves the problem for a relevant class of algorithms, including classical sorting algorithms and algorithms for standard arithmetical operations, which are given in a pure functional notation. The soundness of the method is proved and several examples are presented for illustrating the performance of the proposal. The method has been implemented and proved successful in practice.
Full-text available
We report on a university course on computer-aided program verification using the VeriFun system. Requirements for a system to be used in a verification course are discussed and a sketch of our system is given. We illustrate the problems presented to the students, starting with simple sorting algorithms and finally verifying the unsolvability of the halting problem, the correctness of RSA encryption and a first-order matching algorithm. We give an account of our experience with the course and the conclusions to be drawn from this event. We finally report on further experience obtained when the system was used in an undergraduate course.
We briefly review a mechanical theorem-prover for a logic of recursive functions over finitely generated objects including the integers, ordered pairs, and symbols. The prover, known both as NQTHM and as the Boyer-Moore prover, contains a mechanized principle of induction and implementations of linear resolution, rewriting, and arithmetic decision procedures. We describe some applications of the prover, including a proof of the correct implementation of a higher level language on a microprocessor defined at the gate level. We also describe the ongoing project of recoding the entire prover as an applicative function within its own logic. 2 1 Introduction We feel honored to be invited to give the keynote address for CADE-10. We thank Mark Stickel and the program committee for the invitation. It has been suggested that we discuss our theorem prover and its application to proving the correctness of computations. We have been working on our prover, on and off, since about 1972 [9]. This ...
The authors describe the use of a mechanical theorem-prover to check the published proof of the invertibility of the public key encryption algorithm of Rivest, Shamir and Adleman: (M mod n) mod N=M, provided n is the product of two distinct primes p and q, M<n, and e and d are multiplicative inverses in the ring of integers modulo (p-1)*(q-1). Among the lemmas proved mechanically and used in the main proof are many familiar theorems of number theory, including Fermat's theorem: M mod p=1, when p M. The axioms underlying the proofs are those of Peano arithmetic and ordered pairs. 2 The development of mathematics toward greater precision has led, as is well known, to the formalization of large tracts of it, so that one can prove any theorem using nothing but a few mechanical rules. -- Godel [11] But formalized mathematics cannot in practice be written down in full, and therefore we must have confidence in what might be called the common sense of the mathematician ... We shall therefo...
In certain proofs of theorems of, e.g., number theory and the algebra of nite elds, one-to-one correspondences and the pairing o" of elements often play an important role. In textbook proofs these concepts are often not made precise but if one wants to develop a rigorous formalization they have to be. We have, using an inductive approach, developed constructs for handling these concepts. We illustrate their usefulness by considering formalizations of Euler-Fermat's and Wilson's Theorems. The formalizations have been mechanized in Isabelle/HOL, making a comparison with other approaches possible.
A Pragmatic Approach to Equality Reasoning
  • C Walther
  • S Schweitzer
Walther, C., Schweitzer, S.: A Pragmatic Approach to Equality Reasoning. Technical Report VFR 06/02, Technische Universität Darmstadt, 1-19 (2006) (available from [1])
On the Di¢ culty of Automating Inductive Reasoning. Remarks made at a CADE-11 workshop on inductive reasoning
  • R S Boyer
  • J S Moore
Boyer, R.S., Moore, J S.: On the Di¢ culty of Automating Inductive Reasoning. Remarks made at a CADE-11 workshop on inductive reasoning, Saratoga Springs, (1992) (available from the web)