ArticlePDF Available

Half− GCD, Fast Rational Recovery, and Planar Lattice Reduction

Authors:

Abstract

Over the past few decades several variations on a "half GCD" algorithm for obtaining the pair of terms in the middle of a Euclidean sequence have been proposed. In the integer case algorithm design and proof of correctness are complicated by the effect of carries. This paper will demonstrate a variant with a relatively simple proof of correctness. We then apply this to rational recovery for a linear algebra solver. After showing how this same task might be accomplished by lattice reduction, albeit more slowly, we proceed to use the half GCD to obtain asymptotically fast planar lattice reduction. This is an extended version of a paper presented at ISSAC 2005 [17]. It also contains minor changes.
Half−GCD, Fast Rational Recovery, and Planar Lattice Reduction
Daniel Lichtblau
Wolfram Research, Inc.
100 Trade Center Dr.
Champaign IL 61820
danl@wolfram.com
ABSTRACT. Over the past few decades several variations on a "half GCD" algorithm for obtaining the pair of
terms in the middle of a Euclidean sequence have been proposed. In the integer case algorithm design and proof
of correctness are complicated by the effect of carries. This paper will demonstrate a variant with a relatively
simple proof of correctness. We then apply this to rational recovery for a linear algebra solver. After showing
how this same task might be accomplished by lattice reduction, albeit more slowly, we proceed to use the half
GCD to obtain asymptotically fast planar lattice reduction.
This is an extended version of a paper presented at ISSAC 2005 [17]. It also contains minor changes.
Categories and Subject Descriptors
F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Algorithms and Problems−−− Number−theo−
retic computations; I.1.2 [Symbolic and Algebraic Manipulation]: Algorithms−−− Algebraic Algorithms; G.4
[Mathematical Software]−−− Algorithm design and analysis
General Terms
Algorithms, Performance
Keywords
integer gcd, subquadratic arithmetic, rational recovery
1.
INTRODUCTION AND RELATED WORK
The "half GCD" (
HGCD
) algorithm, as described in [19] and [1], works by taking the high parts of a pair of inputs in a
Euclidean domain, first recursively finding the pair of elements in the Euclidean sequence for that pair that straddles the
middle of the larger input as well as the
2´2
matrix that converts to this middle pair. It then uses these values to
convert the original pair to something sufficiently smaller. This is repeated one time, along with Euclidean steps at
appropriate points, in such a way that one obtains the corresponding middle values for the original pair of inputs.
Various analyses explain why this is asymptotically fast compared to direct computation of the full Euclidean sequence
(we refer below to this latter as the "standard" Euclidean algorithm). The method itself is loosely based on an earlier
asymptotically fast algorithm presented for continued fractions in [22]. As is indicated in that work, it in turn can be
adapted to find a GCD although there appears to be some extra bookkeeping. That work was in turn an improvemnt on
a slower though still subquadratic method presented in [14].
Since its introduction in the early 1970’s, the asymptotically fast
HGCD
idea has given rise to several variants and
descriptions thereof. This state of affairs has come to pass because of difficulties encountered in proofs of correctness. It
turns out that the integer case is particularly troublesome due to the possibility of carries that may cause intermediate
values to be too large or too small relative to what the algorithm requires. Several papers ([3], [25], and [20]) redress
this with fix−up steps that involve a limited number of Euclidean steps or reversals thereof. These papers tend to have
proofs that involve analysis of many detailed cases, thus making them difficult to follow, let alone implement. (To be
fair, they strive for greater generality in some respects). The main contribution of this paper is to provide a simple
formulation with straightforward proofs. We should mention that the method of [22] is not known to this author to
suffer from issues of correctness, though for GCD purposes it is likely to be a bit slower and is also not as convenient as
HGCD for purposes of rational recovery.
As testimony to its relative simplicity, the gcd method we present is now implemented as of version 5.1 of Mathematica
(TM) [29]. It is an improved version of that which appeared in version 5.0. The prior work was coded by David Terr,
with assistance from Mark Sofroniou and the author, in early 2001. It could be described as a "Las Vegas" approach
insofar as it is always correct but only probabilistically fast; in practice we have never noticed it to falter. The fully
deterministic method of this paper was coded by the author and Mark Sofroniou.
Some important uses of asymptotically fast gcd to date are in finding greatest common divisors of pairs of large univari−
ate polynomials or integers. An important advantage it enjoys is that, with little loss in efficiency, it finds corresponding
cofactors when needed (that is, it computes the extended gcd). This is required, for example, in Hermite normal form
computations. Moreover in finding cofactors for steps that take us half the distance to the gcd, the HGCD is ideally
1
cofactors when needed (that is, it computes the extended gcd). This is required, for example, in Hermite normal form
computations. Moreover in finding cofactors for steps that take us half the distance to the gcd, the HGCD is ideally
suited to fast recovery of rationals from p−adic images (as we will see, the code involved is trivial). The second contribu−
tion of this paper is to show this as applied to linear equation solving. This will give some indication of speed improve−
ment over a standard Euclidean algorithm based recovery method. We will also describe a method of rational recovery
based on planar lattice reduction, and, reversing the process, show how to do fast planar lattice reduction via HGCD.
In another recent paper, [24] take a different direction by operating on the low (rather than high) end of the inputs. This
has the advantage that carries are no longer an issue. A possible drawback is that rational recovery becomes slightly less
transparent though they show how it may still be done. They present a timing comparison that clearly demonstrates the
efficacy of their code. At present it is not simple to compare directly to ours, due to different installations of the underly−
ing GMP bignum arithmetic facility [12] as well as possible differences in memory management and timing thereof, but
they appear to be in the same ballpark. I have also learned that recent work described in [18] and [21] is similar to the
present work in regard to asymptotically fast GCD computation. The former is quite promising insofar as there is
efficient code written for comparison of several related approaches. It is expected that the best will eventually go into
public domain software [12].
I thank two anonymous referees for detailed remarks and suggestions that improved the exposition of this paper, and
thank the second referee as well for bringing several errors in the draft to my attention. I thank Erich Kaltofen for
posing questions that caused me to look more closely at the earlier work of Schönhage in [22]. I thank Damien Stehlé,
Niels Möller, and Fritz Eisenbrand for email correspondence that helped to clarify some points about their related work.
2.
A QUICK REVIEW OF EUCLIDEAN REMAINDER SEQUENCES
Much of what we discuss in this section applies to general Euclidean domains once one adjusts definitions (e.g. of the
floor function) as needed, but we restrict our attention to integers as the case of interest. We are given a pair of positive
integers
(we will use column vector notation throughout, as we frequently multiply on the left by a matrix) with
m>n
. We are interested in elements in the Euclidean remainder sequence
m=m0,n=m1, ..., mk
. The integer quotients
are the floor of the divisions of successive terms in this sequence.
qj=emj-1mju
. We define the matrix
Rj
such that
RjJm
nN=Kmj
mj+1O
. For example,
R1=J0 1
1-q1N
(matrices of this form are called "elementary") and
Rj=K0 1
1-qjORj-1
.
From this last it is clear that the top row of
Rj+1
is just the bottom row of
Rj
. Hence we may write
Rj=Ksjtj
sj+1tj+1O
. We
state a few basic facts about these quantities.
LEMMA 1. Assume
Rj
is a nontrivial product of elementary matrices.
(i)
sj-1=sj+1+qjsj
and
tj-1=tj+1+qjtj
.
(ii) If
RjJm
nN=Kmj
mj+1O
then
mj-1=mj+1+
qjmj
.
(iii) The signs in
Rj
alternate in both rows and columns:
sjtj<0
and
sjsj+1<0
.
(iv) The sizes grow top to bottom and left to right:
sj+1>sj
,
tj+1>tj
, and
tj>sj
.
(v)
qj=esj+1sju=etj+1tju
.
PROOF.
(i)−(iv) Quickly proven by writing out the product
Rj=K0 1
1-qjORj-1
.
(v) As
qjsj=sj-1-sj+1
, parts (iii) and (iv) together imply that
sj+1³qjsj>Isj+1-sjM
. This suffices to give
the first floor equality for
qj
. The second is done similarly. á
This lemma shows how to compute
Rj-1
and
given
Rj
and the remainder sequence pair
Kmj
mj+1O
. The significance
is that we can "go back" in the Euclidean sequence should we happen to overshoot (this will be discussed later). Note
also that in the special case of
R0
, which is an elementary matrix, we can obtain
q0
immediately. We easily recognize
this case, as it arises if and only if the first matrix element is zero. We also use part (iii) to prove the next lemma.
LEMMA 2. Assume
Rj
is a nontrivial product of elementary matrices. Then
ýsjý£nmj-1
and
ýtjý£mmj-1
.
PROOF. This is done by induction. The base case gives equalities. For the inductive step we will show that
sj+1£nmj
; the case for
tj+1
is handled similarly. By lemma 1(iii) we know
esj+1sju=qj
. Hence
q s
2
e  u
sj+1sj£qj
. Thus
sj+1£qjsj
. By the inductive hypothesis
sj£nmj-1
. Hence
sj+1£qjnmj-1=emj-1mjunmj-1£nmj
. á
This lemma is used to bound various quantities in the lemmas of the next section.
LEMMA 3. For
m>n>0
suppose we are given a product of elementary matrices times
Jm
nN
such that the result,
Ju
vN
satisfies
u>v>0
. Then
Ju
vN
are a consecutive pair in the remainder sequence for
Jm
nN
.
This is presented as Fact 1 in [25]. It is important because it tells us that we may take a product of elementary matrices
of the form
Rj
above, computed with respect to a new pair of integers, and still arrive at a consecutive pair in the
remainder sequence for the original pair.
Finally we remark that there is obviously an index
k
for which the pair
Jmk
mk+1N
straddle
m
, i.e.
mk³m>mk+1
.
These together, and in order, are referred to as the "middle" pair in the remainder sequence (regardless of where the
index occurs in the sequence of such indices).
3.
BASIC THEORY FOR THE
HGCD
ALGORITHM
Again we begin with a pair of positive integers
with
m>n
. Take
k
to be a positive integer less than the size of
m
in
bits (initially it will be
elog2m2u
but we do not use that until the next section). We write
Jm
nN=2kf0+f1
2kg0+g1
with
8f1,g1<<2k
and recursively compute the middle pair, and corresponding multiplier matrix, for the pair
Jf0
g0N
. This gives
a matrix
Ri
and pair
Jri
ri+1N
with
RiJf0
g0N=Jri
ri+1N
and
ri+1<f0£ri
. We want to use
Ri
, or a close relative, on the
original pair
. We have
Ri2kf0+f1
2kg0+g1=2kJri
ri+1N+RiJf1
g1N.
We will call this product
Jvi
vi+1N
. The next two
lemmas will find bounds, one upper and one lower, for these elements. We first handle
vi+1
. We bound the absolute
value and, under certain circumstances, we place a tighter bound on how negative it may become. This is important
because, in order to invoke lemma 3, we will need a way to correct for the negative case.
LEMMA 4.
(i)
ývi+1ý<2k+1f0
.
(ii) Suppose
ri>2 f0
. Then
vi+1> -2k-1f0
.
PROOF. Note that
8si+1,ti+1<<f0ri<f0
. We have
vi+1=2kri+1+si+1f1+ti+1g1
.
(i) Using the upper bound of
2k
on
8f1,g1<
and the alternating signs in the matrix
Ri
, the absolute value is bounded by
ývi+1ý<2kf0+2kf0=2k+1f0
.
(ii) Since
ri>2f0
we have
vi+1> -2kf0
2f0
= -2k-1f0
. á
We will use these same notions in subsequent lemmas (particularly the sign alternation, in effect to ignore one of the
three terms) without further mention.
We now look at
vi=2kri+sif1+tig1
.
LEMMA 5.
(i) Suppose
ri>2 f0
. Then
vi>2k-1f0
.
(ii) Suppose
ri£2 f0
. Then
vi>2k-1
.
3
PROOF. Lemmas 1 and 2 sign alteration and size bounds in
Ri
gives
vi=2kri+sif1+tig1>2kri-2kf0
ri-1
.
(i) The hypothesis and the fact that
ri-1>ri
yield
vi>2kri-f0
2f0
>2kf0
2=2k-1f0
.
(ii) Now we write the lower bounding value as
2k
ri-1Hriri-1-f0L
. Since
f0£ri<ri-1
and the latter two are integers,
there is an
a>1
with
ri-1=f0+a
. So we have
vi>2k
Jf0+aNJriJf0+aN-f0N
. This in turn is larger than
2k
Jf0+aNJf0Jf0+aN-f0N=2k
Jf0+aNaf0
which is bounded below by
2k-1
. á
For the pair
Jvi
vi+1N=RiJm
nN
lemmas 4 and 5 give an upper bound on one element and a lower bound on the other.
There will be situations in which we must backtrack a Euclidean step to use
Ri-1
, that is, the multiplier matrix preceding
Ri
in the remainder sequence for
Jf0
g0N
. In this case we need to bound
Jvi-1
viN=Ri-1Jm
nN
.
LEMMA 6.
(i)
vi-1>2k-1f0
.
(ii) Suppose
ri£2 f0
. Then
vi<2k3 f0
.
PROOF.
(i)
vi-1=2kri-1+si-1f1+ti-1g1>2kri-1-2kf0ri-2=
2kri-1-2kf0
qi-1ri-1+ri
>2kJri-1-f0
2riN>2kf0-f0
2=2k-1f0
.
(ii)
vi=2kri+sif1+tig1£2k2f0+2kf0
ri-1
<2k2f0+2kf0
f0
=2k3f0
. á
Given a pair
Jm
nN
with
m>n>0
we will see that the above lemmas allow us to find a pair
Jvi
vi+1N
with magnitudes in
the desired ranges (this will be explained more carefully in the next section). Two problems may arise. One is that we
require both to be nonnegative; the lemmas will only guarantee that
vi>0
. Second, we require that
vi>
vi+1
. These
requirements are in order to meet the hypotheses of lemma 3 and thus assert that we have a consecutive pair in the
remainder sequence for our inputs. We now provide a lemma to assist in repairing our intermediate pair, should either
of these possible flaws arise.
LEMMA 7. Given an elementary matrix
K0 1
1-qjO
(this implies
qj
is a positive integer). Then for any integer
h<qj
the
product
J1 0
h 1 NK 0 1
1-qjO
is also an elementary matrix. In particular this holds for any negative integer
h
.
PROOF. The product is simply
K0 1
1h-qjO
and by definition this is elementary precisely when
h-qj<0
.á
We will use such products to repair deficiencies in sign or order of a pair
Jvi
vi+1N
.
4.
THE
HGCD
ALGORITHM
Input: A pair of nonnegative integers
m>n.
Output: A pair
Jvi
vi+1N
of consecutive integers in the Euclidean remainder sequence for
with
vi³m>vi+1
, and a
matrix
Ri
which is the product of elementary transformation matrices, such that
RiJm
nN=Jvi
vi+1N
.
Step 1: With the same input specification as in the previous sections, we begin by choosing
k=flog2m
2v
. Thus, as above,
<2kk
4
f v
we write
Jm
nN=2kf0+f1
2kg0+g1
with
8f1,g1<<2k
. Moreover the choice of
k
gives
2k>m>2k-1
and
g0£f0<2m
.
Step 2: Recursively compute
HGCDJf0
g0N
. With notation as in the last section, the result is a matrix
Ri
and pair
Jri
ri+1N
with
RiJf0
g0N=Jri
ri+1N
and
0£ri+1<f0<ri
.
Step 3: Compute
Jvi
vi+1N=RiJm
nN
. Note that we already have the "upper part" of the resulting vector computed as
Jri
ri+1N
; this can be used to reduce the size of the multiplications in this step.
Step 4: The bounds presented in the lemmas do not rule out the possibility that
vi+1
may be negative, or that
vi<vi+1
. If
vi>vi+1>0
then we set
Ju
vN=Jvi
vi+1N
and move to step 5 at this point. Otherwise we must repair the pair in such a way
that the transformation matrix remains a product of elementary matrices. This is necessary so that we may invoke
lemma 3 to know the resulting vector is a consecutive pair in the remainder sequence for
. We split into three cases
that together comprise all possibilities.
Case (i). Suppose
vi+1>vi
. Take the matrix
H=J1 0
-h1N
where
h=dvi+1vit³1
. By lemma 7
H Ri
is a product of
elementary matrices. The new pair thus obtained is
Ju
vN=HJvi
vi+1N=Jvi
vi+1-h viN
which satisfies the requirement that
u>v>0
. For purposes of notation we continue to call the resulting matrix
Ri
. Note that the value of
u
is unchanged
(hence lemma bounds still apply), while the absolute value of
v
has diminished. We now move on to step 5.
Case (ii). Suppose
vi+1<0
and
vi+1+vi³0
.
Subcase (a). First assume
qi>1
. Then we use the matrix
H=J1 0
1 1 N
and proceed, as we did in case (i) above, to obtain
a positive pair via
Ju
vN=HJvi
vi+1N
. This is appropriate because the product
J1 0
1 1 NJ 0 1
1-qiN
is an elementary matrix so
we may invoke lemma 3. Again we call the resulting pair
Ju
vN
, and continue to call the transformation matrix
Ri
. Note
that
u-v=ývi+1ý<2k+1f0
. This means that a Euclidean step will bring the pair into the range claimed in step 6
below. As it also shows that
u>v
, we have a consecutive pair in the remainder sequence.
Subcase (b). If
qi=1
the situation is a bit more subtle. Again we use the matrix
H
as defined above, and again we
obtain a positive pair in the correct order; unfortunately the product
H Ri
is
J0 1
1 0 N
, which is not an elementary matrix.
To correct for this we multiply by
J0 1
1 0 N
again, giving as product the identity matrix. This has the effect of flipping
Ju
vN
. Thus we have used premultipliers to take us from
Ri
to
Ri-1
, which we know is also a product of elementary
matrices. We have also obtained as our vector
Jv
uN
; it has appropriate components except they are in the wrong order.
As this is exactly the situation of case (i) above we proceed there to correct it.
Case (iii). If
vi+1<0
and
vi+1+vi<0
then either
vi£2k-1f0
or
vi+1£ -2k-1f0
. In either case, lemmas 5(i) and
4(ii) respectively guarantee that
ri£2f0
. We will perform a reversal of a Euclidean step, obtaining the pair
Ju
vN=Jvi-1
viN=Ri-1Jm
nN
. As
ri£2f0
, lemma 5(ii) guarantees that
and furthermore
v<2k3f0
by lemma 6,
so again the bounds given in step 6 will hold. If
u<v
then we go to case (i) above.
We remark that cases (ii−b) and (iii) are identical in terms of actual treatment. We separated them in the way we did in
order to explicate the rationale. But since
qi=1
in case (ii−b), and we adjusted via the matrix
H=J1 0
1 1 N
, we have
simply done nearly a reversal of a Euclidean step. The only difference in the outcome is we also reversed the order to
N N
v
5
N
simply done nearly a reversal of a Euclidean step. The only difference in the outcome is we also reversed the order to
Jvi
vi-1N
. The next adjustment in (ii−b), to get a valid elementary matrix, flipped the order to get
Jvi-1
viN
; thus we now
have indeed done a Euclidean step reversal, just as is used in case (iii). From the hypotheses of case (ii−b) we know
vi-1<vi
, hence we must proceed to case (i) to correct this. Again, this is something we check for in case (iii). The
upshot is that in actual code cases (ii−b) and (iii) will be handled as one.
Step 5: Perform a Euclidean reduction on
Ju
vN
. We obtain the next consecutive pair
Jv
wN
in the remainder sequence for
Jm
nN
, with elementary transformation matrix
Q=J0 1
1-qN
, where
q=duvt
, and
w=u-q v
. We form the correspond−
ing transformation matrix
R=Q Ri
.
Step 6: At this point we examine the values of our pair
Jv
wN
. Lemmas 4, 5, 6, the remarks from the step 4 cases, and our
choice of
k
(implying
f0<2m
and
2k-1<m
) guarantee that
0<v<2k3f0<2k+123m14<2323m34
and
u>2k-1>m4
.
Case (i).
w<m
. If
v³m
we have our pair straddling
m
. We return it along with the transforming matrix
R
. If
v<m
we do reverse Euclidean steps, updating our remainder sequence pair and transformation matrix using the
formulas in lemma 1. Since
u>m4
and it immediately precedes
v
in the remainder sequence, we have at most five
such steps before an element exceeds
m
(possibly one could decrease this upper bound by constructing tighter
bounds in the lemmas). We perform as many such steps as is needed to obtain the pair straddling
m
, returning it and
corresponding transformation matrix.
Case (ii).
m£w<v<2323m34
(in typical examples,
w
and
v
will both be close to
m34
). Similarly to step 1, we
take
l=elog2mu-elog2vu
(so
2l
is within a factor of
2
of
mv
; we will soon see why this is the appropriate value).
Observe that
l
is roughly between one fourth and one half the bit length of
m
. Specifically, we have
elog2mu4-3<l<elog2mu2+3
. We proceed to step 7.
Step 7: This time we write
Jv
wN=2lf2+f3
2lg2+g3
with
elog2f2u=elog2vu-l
. The upper bound on
log2v
and lower bound
on
l
show that
f2
and
g2
are no larger than
OImM
. This fact is required for the claim of asymptotic speed (though not
for correctness).
As in step 2, recursively compute
HGCDJf2
g2N
. As in steps 3 and 4 we obtain a transformation matrix
S
, and a consecu−
tive pair
Kvj
vj+1O
in the remainder sequence for
, with
vj>vj+1³0
. If
vj£2l-2f2
then the condition of lemma
5(i) cannot hold, and thus lemma 6(ii) applies. So we do a single reverse Euclidean step to get the previous consecutive
pair in the sequence. At this point we have a consecutive pair, call it
Jx
yN
, wherein lemma 6 guarantees that
y<2l+2f2
and
x>2l-2f2
.
Step 8: From step 7 we know that
f2
is within a factor of
2
of
2-lv
and hence
2lf2»2l2v»m
vv=m
where the approximation from first to last is within a factor of
2
because each intermediate approximation is within a
factor of
2
. The inequalities at the end of step 7 therefore imply
y<8m
and
x>m8
; this was the point in
selecting
l
as we did. Thus with a limited number of Euclidean steps, or reversals thereof, we obtain the consecutive
pair in the remainder sequence that straddles
m
, and the transformation matrix that gives this pair. Possibly with care
we might tighten the bound on the number of forward or reverse Euclidean steps. In practice this is unimportant. One
simply codes a
while
loop for the iterations; that it terminates in a fixed number of steps suffices to prove the claim of
asymptotic speed.
6
5.
APPLICATIONS OF THE
HGCD
ALGORITHM
First note that the asymptotic complexity is
OHn MHnLL
where
n
is the bit size of the inputs and
MHnL
is the complexity of
multiplying a pair of number of that size. This is well known (see the various references) and follows from the fact that
we do two recursive steps on numbers no larger than roughly
n2
(see steps 1 and 7 above), along with a bounded
number of multiplications, Euclidean steps, and reverses thereof. It is this speed that motivates the various applications
mentioned below.
The
HGCD
algorithm is used recursively in gcd computations. An
HGCD
computation followed by a Euclidean step is
guaranteed to reduce the size of the inputs (in bits) by at least half. Another advantage is that one gets the corresponding
multiplier matrix for free, so computation of the extended gcd is not much more costly than that of the ordinary gcd.
This is important for e.g. matrix Hermite normal form or integer Gröbner basis computations [16], where speed of
extended gcds is paramount. As a standard benchmark example we will find the gcd of a pair of consecutive large
Fibonacci numbers. This and all other timings are from runs using version 5.1 of Mathematica under Linux on a 1.4
GHz Athelon processor.
fibs =8Fibonacci@10^7D, Fibonacci@10^7 +1D<;
Each has about two million digits. We compute both regular and extended gcd and check that the result is plausible.
Timing@gcd =Apply@GCD, fibsD;D
Timing@8gcd2, mults<=Apply@ExtendedGCD , fibsD;D
mults.fibs gcd gcd2 == 1
831.04 Second, Null<
840.21 Second, Null<
True
A particularly nice application of the
HGCD
is in recovering rational numbers from p−adic approximations. This is
explained in some detail in chapter 5 of [10]. Given a prime power
pk
and a smaller nonnegative integer
x
not divisible
by
p
, we can obtain a rational
ab
equivalent to
x
modulo
pk
with both numerator and denominator smaller than the
square root of the prime power. It is obtained directly from the
HGCD
matrix and middle pair given by
HGCDKpk
xO
. In
brief, we have a matrix
Rj=Ksjtj
sj+1tj+1O
with
RjKpk
xO=Ju
vN
. Moreover
9v,tj+1=<pk
and
v
tj+1
ºpkx
because
sj+1pk+tj+1x=v
. Thus we have our desired rational.
The below code will do this recovery given the input pair
9x,pk=
.
rationalRecover@x_, pk_D:=
HHð@@2, 2DD  ð@@1, 2, 2DDL &L@Internal‘HGCD@pk, xDD
For contrast we also give the standard Euclidean sequence method as well as a simple method based on lattice reduction.
rationalRecover2@a_, b_D:=Module@
8mat, aa =a, bb =b, cc =1, dd =0, quo<,
mat =88aa, cc<,8bb, dd<<;
While@Abs@aaD³Sqrt@bD,
quo =Quotient@bb, aaD;
88aa, cc<,8bb, dd<< =
88bb, dd<-quo *8aa, cc<,8aa, cc<<;D;
aa ccD
rationalRecover3@n_, pq_D:=
Hð@@1DD  ð@@2DD &L@First@LatticeReduce@88n, 1<,8pq, 0<<DDD
We illustrate this application by solving linear systems over the rationals, using a simple p−adic linear solver based on
the method presented in [6] (the code for pAdicSolve is in the appendix). To get some idea of speed we will compare
to the built in LinearSolve function. The latter at this time uses a Gaussian elimination via one−step row reduction
[2]. The tests we use will involve creating random linear systems of a given dimension and coefficient size in decimal
digits. In the results we will show timings, a check that the results agree, and the size in decimal digits of the largest
denominator in the result.
7
testPAdicSolver@dim_Integer , csize_Integer , recoveryfunc_D:=
Module@
8ls1, ls2, mat, b<,
mat =Table@Random@Integer ,
8-10^csize, 10^csize<D,8dim<,8dim<D;
b=Table@Random@Integer ,
8-10^csize, 10^csize<D,8dim<D;
8First@Timing@ls1 =
pAdicSolve@mat, b, recoveryfuncDDD,
First@Timing@ls2 =LinearSolve@mat, bDDD,
ls1 === ls2, Max@Log@10., Denominator@ls1DDD<D
In the set of tests below input data will consist of 10−digit integers. First we try a
50 x 50
system.
testPAdicSolver@50, 10, rationalRecoverD
81.35 Second, 0.62 Second, True, 517.912<
The built in method was faster by a factor between 2 and 3. The standard Euclidean algorithm of rationalRecover2
makes it about three times slower still, thus indicating that even at this low dimension most of the time might be spent
in rational recovery if we use a pedestrian approach. This example takes about 17 seconds using rationalRecover3.
We now double the dimension.
testPAdicSolver@100, 10, rationalRecoverD
811.79 Second, 9.04 Second, True, 1053.32<
This time the speeds are quite close. Doubling again will show the p−adic solver well ahead.
testPAdicSolver@200, 10, rationalRecoverD
893.26 Second, 173.27 Second, True, 2136.99<
We remark that most of the time is spent in finding the p−adic approximate solutions. The utility of fast rational
recovery is indirectly witnessed in the above computations; were we to use a less efficient method, it would become
more prominent in the timings. As it stands, the overwhelming component is now in the improvement iterations.
We should note that one can use a very different iterative approach when the input matrix is well conditioned. One can
solve the stem numerically to sufficiently high precision using the iterative method of [11]. Then the exact result may
be recovered using rationalization of a high precision approximate result. Interestingly, the technology underlying this
type of rational recovery involves continued fractions; efficient computation of these is similar to the divide−and−
conquer approach of
HGCD
. A potential drawback to this method (in addition to the conditioning requirement) is that it
requires an a priori precision estimate that might be quite large, or else an expensive check of correctness that could
outweigh the cost of the actual construction of a solution.
Quite recently a related method based on iterative refinement of numerical solutions was described in [28]. It uses
rescaling of residuals and stepwise rational approximations to construct its result. At this time it appears to be the state
of the art in solving linear systems over the rationals. That said, clearly there remains a need for fast rational recovery
from a p−adic approximation. For example, other recent methods requiring rational recovery have been discussed in [8]
and [5]. Both derive speed by clever use of level 3 BLAS for modulo prime linear algebra as described in [7]. While [5]
describes ways to speed this process considerably even when using the standard Euclidean method, it remains true that
an asymptotically fast rational recovery is a desirable further improvement.
Another application of HGCD is in the Smith|Cornacchia algorithm [4] for solving
x2+d y2=n
with
Hd,nL
relatively
prime. This can be used to factor primes of the form
4k+1
into products of Gaussian primes. Yet another application,
which we present later, is to fast planar lattice reduction.
8
6.
RATIONAL RECOVERY VIA LATTICE REDUCTION
We now show that the method based on integer lattice reduction is more than just a heuristic. While not particularly fast
we feel this is of interest in its own right as yet another simple application of lattice methods. We first set up the prob−
lem. Given integers
m>n>0
, here is a simple approach to finding a "small" rational
r
s
with
s n ºmr
. We form a 2x 2
lattice
Jm0
n1N
and reduce it via LLL [15], say to
Jr s
t uN
, where the top row is smaller in Euclidean norm than the bottom.
We then take
r
s
as our reconstructed rational. Heuristically we expect this to work frequently because typically we will
have
8r,s<<m
and this is roughly what we require for the rational recovery procedure. It turns out that under mild
hypotheses (that in essence amount to lifting half a bit more than we otherwise might), we can guarantee that we obtain
the correct value.
Remark: Lattice reduction via LLL assumes a value for a certain parameter, often called Α in the literature and taken to
be
3
4
as in the original paper. But it can be any value in the open interval
I1
4, 1M
. If it is not the standard
3
4
then one must
modify accordingly the lifting bounds and proof of the below theorem.
THEOREM. Suppose we have a bound
k
on numerator and denominator of a rational, and moreover we have a power
of a prime
pq
and a
p
−adic image
n
of the value (obtained, say, as in the linear algebra examples in the previous
section). Suppose moreover that
pq>2 2 k2
and
r
s
is a rational equal to
n
modulo
pq
with
8ýrý, s<<pq
and
r
relatively prime to
s
(that is, it is the value we seek). Form the lattice
L
with row vectors given (in matrix form) as
Kpq0
n 1O
. Reduce it to
Jt u
v wN
with
t2+u2£v2+w2
(that is, rows ordered by norm). Then
8t, u<= ±8r, s<
and hence we
recover our rational from the reduced lattice.
PROOF.
(i) First we show that
8r,s<ÎL
. By assumption there is an integer
j
with
s n +j pq=r
. Thus
j8pq, 0<+s8n, 1<=8r,s<
.
(ii) Next we claim that
8r,s<
is a minimal vector in
L
. The Euclidean norm squared is
r2+s2<2k2<pq
2
. If
8x,y<ÎL
is any vector independent of
8r,s<
then we must have
x2+y2>2pq
, because the product of the norms of any
independent pair much be at least as large as the lattice determinant,
pq
. If instead
8x,y<
is a scalar multiple of
8r,s<
then the scalar must be at least
1
in absolute value, by the assumption that
r
and
s
are relatively prime.
(iii) Finally we show that
±8r,s<
is in the LLL−reduced basis. This follows from the fact that the smallest vector
8t,u<
in the reduced basis has
t2+u2£2Ir2+s2M
by 1.11(iii) of [15], and we know this is smaller than
2pq
by (ii) above.
Again from (ii) we know that any vector in
L
independent of
8r,s<
cannot satisfy this inequality. If instead
8t,u<
were a
nontrivial multiple of
8r,s<
, we would not have a correct basis because
L
contains
8r,s<
by (i) above.á
We remark that this is a sort of opposite extreme to the method for finding extended greatest common divisors pre−
sented in [13]. They use a form of column−weighted LLL to obtain extended gcds of more than two integers, such that
the multipliers are small (moreover, by considering multiple columns and geometrically scaled weights, they derive an
Hermite normal form algorithm based on LLL). For the sort of lattice we construct above, such weighting would
counter the tendency of LLL to take one toward the middle pair in the remainder sequence.
We give a quick demonstration of this method using one of our earlier tests.
testPAdicSolver@50, 10, rationalRecover3D
817.5 Second, 0.62 Second, True, 517.912<
Not surprisingly, this is not competitive in speed with the asymptotically fast HGCD method.
7.
PLANAR LATTICE REDUCTION VIA HGCD
Having seen that rational recovery can be effected by planar lattice reduction, it should be no surprise to find that such
reduction can be handled by means of HGCD computations. We demonstrate how this might be done; relevant theory
may be found in [9]. Earlier asymptotically fast methods wre presented in [] and [26].
Suppose we have a
integral matrix
M=Ja b
c dN
where we regard the rows as generating a lattice in
Z2
. The goal is
to find a reduced form, that is, a unimodular multiplier matrix
A
such that
A M =L
where
L
is the lattice reduced form
of
M
. We compute the reduction as follows.
9
J N
of
M
. We compute the reduction as follows.
Step 1. Put
M
into Hermite normal form. As is well known, this uses the extended gcd algorithm, hence (for large
inputs) amounts to a few HGCD invocations. We obtain
M1=Jg j
0kN
, where
g=gcdHa,cL
, and a unimodular transforma−
tion matrix
A1
with
A1M=M1
.
Step 2. See if this is lattice reduced. If so, we are finished. If not, we now have a "small" element in the upper left and a
zero beneath it. We now work on the second column.
Step 3. Find
HGCDHj,kL
.
Step 4. Use the multiplier matrix
A2
to form
A2A1M=A2M1=Js m
t n N
.
Step 5. We now have a short vector. If the second vector is not short we can reduce it using Euclidean steps (this
method of reducing planar vectors is due to Gauss). Since we divide by an element in the short vector, the number of
such steps is bounded.
Here is a simple example. We work with a lattice of two row vectors. We construct it in such a way as to be quite far
from reduced. Specifically, the second row is (with high probability) a small offset of the first. We first reduce using
LLL in order to find the expected lengths of the resulting elements.
SeedRandom@1111D;
row =Table@Random@Integer , 8-10^100, 10^100<D,82<D;
lat =8row, row+810^10, 10 ^ 20<<;
redlat =LatticeReduce@latD;
We check the sizes of the initial and reduced vectors.
Log@10., Abs@latDD
Log@10., Abs@redlatDD
8899.9937, 99.8255<,899.9937, 99.8255<<
8810., 20.<,899.9937, 89.9937<<
We now do step 1 and again check sizes (the zero corresponds to an entry of unity, and the −¥ corresponds to an entry
of zero).
8a0, hnf<=Developer‘HermiteNormalForm@latD;
Log@10., Abs@8a0, hnf<DD
88898.8304, 98.8304<,899.9937, 99.9937<<,880., 118.83<,8- ¥, 119.994<<<
Each row in the Hermite form has a large entry so we deduce it is not reduced. We now do step 3.
8a1, col2<=Internal‘HGCD@Apply@Sequence, hnf@@All, 2DDDD;
We check that this is correct.
a1.hnf@@All, 2DD col2
True
We’ll now recover a short vector using step 4.
lattoo =a1.a0.lat;
Log@10., Abs@lattooDD
889.22796, 109.994<,810., 20.<<
It is clear that we can use the second vector to reduce the magnitude of the first by making the second component much
smaller. We do so as per step 5. Specifically, we can take a quotient, form a multiplier matrix similar to that used in
HGCD, and obtain a reduction of the larger vector.
q=Quotient@lattoo@@1, 2DD, lattoo@@2, 2DDD;
a2 =881, -q<,80, 1<<;
latthree =a2.lattoo
88-9 856 220 730 047 694 401 632 388 898 359 410 889 450 006 931 893 281 121 833 193 315
680 250 563 903 159 129 824 369 221 690 297 741, 75 868 793 220 380 069 225<,
810 000 000 000, 100 000 000 000 000 000 000<<
We check that the transformations are unimodular and get the sizes of the elements in the reduced lattice.
10
Det@a2.a1.a0D
Log@10., Abs@latthreeDD
-1
8899.9937, 19.8801<,810., 20.<<
It is straightforward to verify that these row norms are comparable to those of
redlat
and indeed they have the same
small vector.
Short code that does this, without checking for the special cases, is in teh appendix. An expanded version is used in [27]
in order to compute Frobenius numbers for sets of three elements. It appears to be on average far faster than previously
known methods as discussed in [27] and references therein.
8.
SUMMARY
We have demonstrated a correct, asymptotically fast integer gcd algorithm based on the classical Half−GCD method.
The various correction steps needed to address deficiencies caused by integer carries are, we believe, relatively simple
both from the standpoint of theory and practical implementation. We apply this to solving large linear systems over the
rationals, obtaining results that scale well with dimension.
After demonstrating that a similar rational recovery result can be attained via planar lattice reduction, we then proceed
to do such reduction using HGCD.
9.
APPENDIX:
P
−ADIC SOLVER AND PLANAR REDUCTION CODE
Below is code for a simple p−adic solver for linear systems with integer coefficients; extension to rationals is straightfor−
ward. The code below computes a solution modulo a particular prime. In production code one would make sure the
system was solvable modulo that prime or else resort to another tactic.
vectorNorm@vec_D:=Apply@Plus, Map@Abs, vecDD
matrixNorm@mat_D:=Apply@Times, Map@Sqrt@N@ðD.N@ðDD &, matDD
powerUp@vals_, mod_D:=Map@FromDigits@ð, modD&, Transpose@Reverse@valsDDD
pAdicSolve@mat_ ? MatrixQ , rhs_ ?VectorQ , recoveryfunc_D:=
Module@8len =Length@matD, b, mod =Prime@2222D,
mnorm, lud, sol =8<, corr, power, j=0, logpow =0, logm<,
logm =Log@N@modDD;
lud =LUDecomposition@mat, Modulus ®modD;
b=rhs;
power =1;
mnorm =2. *Log@H2. *matrixNorm@matD*vectorNorm@rhsDLD;
While@logpow <mnorm +.5,
j++;
corr =LUBackSubstitution@lud, b, Modulus ®modD;
b=1mod *Hb-mat.corrL;
sol =8sol, corr<;
logpow += logmD;
power =mod^j;
sol2 =Partition@Flatten@solD, lenD;
sol2 =powerUp@sol2, modD;
Map@recoveryfunc@ð, powerD&, sol2DD
Below is a version of planar reduction that will tend to find a reduced lattice with smallest vector. It is based loosely on
the exposition in [9].
planarReduce@88a_Integer , b_Integer<,8c_Integer , d_Integer<<D :=Module@
8hgcd, mult, lat, g, u11, u12, col2, c22, r1, r2, k, n=0<,
8g, 8u11, u12<< =ExtendedGCD@a, cD;
col2 =88u11, u12<,8-c, a<  g<.8b, d<;
c22 =col2@@2DD;
col2@@1DD =Mod@col2@@1DD, c22, Ceiling@-c22 2DD;
8mult, hgcd<=Apply@Internal‘HGCD , col2D;
lat =Transpose@8mult@@All, 1DD *g, hgcd<D;
8r1, r2<=Sort@lat, Norm@N@ðD, 2D&D;
While@k=!= 0 && n £3, n++;
k=Round@Hr1.r2LHr1.r1LD;
r2 =r2 -k*r1;
8r1, r2<=Sort@8r1, r2<,Hð1.ð1< ð2.ð2L&D;D;
8r1, r2<D
A more elaborate version is used in [27] to recover several small lattice vectors. This appears to give the fastest cur−
rently known method for computing Frobenius numbers of sets of three large integers.
11
rently known method for computing Frobenius numbers of sets of three large integers.
10.
REFERENCES
[1]
A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison−Wesley Publishing
Company, Reading Massachussetts, 1974.
[2]
E. H. Bareiss. Sylvester’s identity and multistep integer−preserving Gaussian elimination. Math. Comp. 22(103):565−578.
1968.
[3]
R. P. Brent, F. G. Gustavson, and D. Y. Y. Yun. Fast solution of Toeplitz systems of equations and computation of P
approximants. Journal of Algorithms 1:259−295. 1980.
[4]
J. Buhler and S. Wagon. Basic number theory algorithms. Surveys in Algorithmic Number Theory, J. P. Buhler and P.
Stevenhagen, eds. Mathematical Sciences Research Institute Publications vol. 44. Cambridge University Press. To appear.
[5]
Z. Chen and A. Storjohann. A BLAS based C library for exact linear algebra on integer matrices. Proceedings of the 2005
International Symposium on Symbolic and Algebraic Computation (ISSAC 2005), M. Kauers, ed. 92−99. ACM Press, New
York City, 2005.
[6]
J. D. Dixon. Exact solutions of linear equations using p−adic expansions. Numerische Math. 40:137−141. 1982.
[7]
J. G. Dumas, T. Gautier, and C. Pernet. Finite field linear algebra subroutines. Proceedings of the 2002 International
Symposium on Symbolic and Algebraic Computation (ISSAC 2002), T. Mora, ed.. 63−74. ACM Press, New York City,
2002.
[8]
J. G. Dumas, P. Giorgi, and C. Pernet. Finite field linear algebra package. Proceedings of the 2004 International Symposium
on Symbolic and Algebraic Computation (ISSAC 2004), J. Gutierrez, ed. 119−126. ACM Press, New York City, 2004.
[9]
F. Eisenbrand. Short vectors of planar lattices via continued fractions. Information Processing Letters 79:121−126, 2001
[10]
J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 1999.
[11]
K. O. Geddes and W. W. Zheng. Exploiting fast hardware floating point in high precision computation. Proceedings of the
2003 International Symposium on Symbolic and Algebraic Computation (ISSAC 2003), J. R. Sendra, ed. 111−118. ACM
Press, New York City, 2003.
[12]
GMP: The Gnu Multiprecision Bignum Library. Web site: http://www.swox.com/gmp/
[13]
G. Havas, B. S. Majewski, K. R. Matthews. Extended GCD and Hermite normal form algorithms via lattice basis reduction.
Experimental Mathematics 7(2): 125−126. A. K. Peters, Ltd., 1998.
[14]
D. Knuth. The analysis of algorithms. Proceedings of the 1970 International Congress of Mathematicians (Nice, France),
3:269−274, 1970.
[15]
A. K. Lenstra, H. W. Lenstra, Jr., L. Lovász. Factoring polynomials with rational coefficients. Mathematische Annalen
261:515−534. 1982.
[16]
D. Lichtblau. Revisiting strong Gröbner bases over Euclidean domains. Manuscript, 2003.
[17]
D. Lichtblau. Half−GCD and Fast Rational Recovery. Proceedings of the 2005 International Symposium on Symbolic and
Algebraic Computation (ISSAC 2005), M. Kauers, ed. 231−236. ACM Press, New York City, 2005.
[18]
N. Möller. On Schönhage’s algorithm and subquadratic integer gcd computation. Manuscript, 2005.
[19]
R. T. Moenck. Fast computation of GCDs. Proceedings of the 5th ACM Annual Symposium on Theory of Computing.
142−151. ACM Press, New York City, 1973.
[20]
V. Y. Pan and X. Wang. Acceleration of Euclidean algorithm and extensions. Proceedings of the 2002 International
Symposium on Symbolic and Algebraic Computation (ISSAC 2002), T. Mora, ed. 207−213. ACM Press, New York City,
2002.
[21]
A divide−and−conquer method for integer−to−rational conversion. Proceedings of the Symposium in Honor of Bruno
Buchberger’s 60th Birthday (Logic, Mathematics and Computer Science: Interactions). October 20−22, 2002, RISC−Linz,
Castle of Hagenberg, Austria. K. Nakagawa, ed. 231−243. 2002.
[22]
A. Schönhage. Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informatica 1:139−144, 1971.
[23]
A. Schönhage. Fast reduction and composition of binary quadratic forms. Proceedings of the 1991 International Symposium
on Symbolic and Algebraic Computation (ISSAC 1991), S. Watt, ed. 128−133. ACM Press, New York City, 2002.
[24]
D. Stehlé and P. Zimmermann. A binary recursive GCD algorithm. Proceedings of the Algorithmic Number Theory 6th
International Symposium (ANTS−VI), Lecture Notes in Computer Science 3076, D. Buell, ed. 411−425. Springer, Berlin,
2004.
Draft appearing as: Rapport de recherche INRIA 5050. 2003.
[25]
K. Thull and C. K. Yap. A unified approach to HGCD algorithms for polynomials and integers. Manuscript, 1990. Avail−
able at: http://cs.nyu.edu/cs/faculty/yap/allpapers.html
[26]
C. K. Yap. Fast unimodular reduction: planar lattices. Proceedings of the 33rd Annual Symposium on Foundations of
Computer Science (Pittsburgh USA), 437−446. IEEE Computer Society Press, 1992.
[27]
S. Wagon, D. Einstein, D. Lichtblau, A. Strzebonski. Frobenius numbers by lattice point enumeration. Submitted, 2006.
[28]
Z. Wan. An algorithm to solve integer linear systems exactly using numerical methods. To appear, Journal of Symbolic
Computation.
Earlier draft: Exactly solve integer linear systems using numerical methods (2004) available at: http://www.eecis.udel.e−
du/~wan/
[29]
S. Wolfram. The Mathematica Book. Fifth edition. Wolfram Media, Cambridge, 2003.
12
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Buchberger and Kandri−Rody and Kapur defined a strong Gröbner basis for a polynomial ideal over a Euclidean domain in a way that gives rise to canonical reductions. This retains what is perhaps the most important property of Gröbner bases over fields. A difficulty is that these can be substantially harder to compute than their field counterparts. We extend their results for computing these bases to give an algorithm that is effective in practice. In particular we show how to use S− polynomials (rather than "critical pairs") so that the algorithm becomes quite similar to that for fields, and thus known strategies for the latter may be employed. We also show how Buchberger's important criteria for detection of unneeded S−polynomials can be extended to work over a Euclidean domain. We then provide simple examples as well as applications to solving equations in quotient rings, Hensel lifting, Hermite normal form computations, and reduction of univariate polynomial lattices. These serve to demonstrate why Gröbner basis computations over such rings are indeed worthy of consideration.
Book
Computer algebra systems are now ubiquitous in all areas of science and engineering. This highly successful textbook, widely regarded as the 'bible of computer algebra', gives a thorough introduction to the algorithmic basis of the mathematical engine in computer algebra systems. Designed to accompany one- or two-semester courses for advanced undergraduate or graduate students in computer science or mathematics, its comprehensiveness and reliability has also made it an essential reference for professionals in the area. Special features include: detailed study of algorithms including time analysis; implementation reports on several topics; complete proofs of the mathematical underpinnings; and a wide variety of applications (among others, in chemistry, coding theory, cryptography, computational logic, and the design of calendars and musical scales). A great deal of historical information and illustration enlivens the text. In this third edition, errors have been corrected and much of the Fast Euclidean Algorithm chapter has been renovated.
Article
Extended gcd calculation has a long history and plays an important role in computational number theory and linear algebra. Recent results have shown that finding optimal multipliers in extended gcd calculations is difficult. We present an algorithm which uses lattice basis reduction to produce small integer multipliers x 1 ; : : : ; xm for the equation s = gcd (s 1 ; : : : ; s m ) = x 1 s 1 + Delta Delta Delta + xm s m ; where s 1 ; : : : ; s m are given integers. The method generalises to produce small unimodular transformation matrices for computing the Hermite normal form of an integer matrix.
Article
A method is developed which permits integer-preserving elimination in systems of linear equations, AX = B, such that (a) the magnitudes of the coefficients in the transformed matrices are minimized, and (b) the computational efficiency is considerably increased in comparison with the corresponding ordinary (single-step) Gaussian elimination. The algorithms presented can also be used for the efficient evaluation of determinants and their leading minors. Explicit algorithms and flow charts are given for the two-step method. The method should also prove superior to the widely used fraction-producing Gaussian elimination when A is nearly singular.
Article
A method is described for computing the exact rational solution to a regular systemAx=b of linear equations with integer coefficients. The method involves: (i) computing the inverse (modp) ofA for some primep; (ii) using successive refinements to compute an integer vector [`(x)]\bar x such that A[`(x)] º bA\bar x \equiv b (modp m ) for a suitably large integerm; and (iii) deducing the rational solutionx from thep-adic approximation [`(x)]\bar x . For matricesA andb with entries of bounded size and dimensionsnn andn1, this method can be implemented in timeO(n 3(logn)2) which is better than methods previously used.
Article
We show that a shortest vector of a 2-dimensional integral lattice with respect to the ` -norm can be computed with a constant number of extended-gcd computations, one common-convergent computation and a constant number of arithmetic operations. It follows that in two dimensions, a fast basis-reduction algorithm can be solely based on Schnhage&apos;s classical algorithm on the fast computation of continued fractions and the reduction algorithm of Gau. Keywords: Algorithms, computational geometry, number theoretic algorithms 1
Article
We present two new algorithms, ADT and MDT, for solving order-n Toeplitz systems of linear equations Tz = b in time O(n log2n) and space O(n). The fastest algorithms previously known, such as Trench's algorithm, require time Ω(n2) and require that all principal submatrices of T be nonsingular. Our algorithm ADT requires only that T be nonsingular. Both our algorithms for Toeplitz systems are derived from algorithms for computing entries in the Padé table for a given power series. We prove that entries in the Padé table can be computed by the Extended Euclidean Algorithm. We describe an algorithm EMGCD (Extended Middle Greatest Common Divisor) which is faster than the algorithm HGCD of Aho, Hopcroft and Ullman, although both require time O(n log2n), and we generalize EMGCD to produce PRSDC (Polynomial Remainder Sequence Divide and Conquer) which produces any iterate in the PRS, not just the middle term, in time O(n log2n). Applying PRSDC to the polynomials U0(x) = x2n+1 and U1(x) = a0 + a1x + … + a2nx2n gives algorithm AD (Anti-Diagonal), which computes any (m, p) entry along the antidiagonal m + p = 2n of the Padé table for U1 in time O(n log2n). Our other algorithm, MD (Main-Diagonal), computes any diagonal entry (n, n) in the Padé table for a normal power series, also in time O(n log2n). MD is related to Schönhage's fast continued fraction algorithm. A Toeplitz matrix T is naturally associated with U1, and the (n, n) Padé approximation to U1 gives the first column of T−1. We show how a formula due to Trench can be used to compute the solution z of Tz = b in time O(n log n) from the first row and column of T−1. Thus, the Padé table algorithms AD and MD give O(n log2n) Toeplitz algorithms ADT and MDT. Trench's formula breaks down in certain degenerate cases, but in such cases a companion formula, the discrete analog of the Christoffel-Darboux formula, is valid and may be used to compute z in time O(n log2n) via the fast computation (by algorithm AD) of at most four Padé approximants. We also apply our results to obtain new complexity bounds for the solution of banded Toeplitz systems and for BCH decoding via Berlekamp's algorithm.