## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

We accelerate the extended Euclidean algorithm for integers, the rational number reconstruction, and consequently, the stage of the recovery of the solution of a nonsingular integer system of linear equations via Hensel's lifting. The acceleration is by the order of magnitude and yields nearly optimal randomized algorithms. In the highly important case of Toeplitz, Hankel, and Toeplitz/Hankel-like linear systems, the accleration is potentially practical.

To read the full-text of this research,

you can request a copy directly from the authors.

... Such a denominator bound may be much too large and may be expensive to compute. Some other applications where rational reconstruction is used, central to Computer Algebra, include solving linear systems over Q (see [13]), early detection of factors in the Berlekamp-Hensel procedure (see [18]), and Gröbner basis computation over Q (see [1] and [8]). Let n, d ∈ Z with d > 0 and gcd(n, d) = 1. ...

... with the same complexity as the asymptotically fast integer GCD algorithm of Schönhage ([14]), is described by Pan and Wang in [13] . Thus the theoretical bit complexity of rational reconstruction is O(n log 2 n log log n) where n = log m here. ...

... For modular algorithms where m = p1 × p2 × ... × p k the improvement is a reduction of the number of primes needed by up to a factor of 2. Algorithm MQRR is based on the classical extended Euclidean algorithm and thus has time complexity O(log 2 m). We expect that the efficiency improvements made to Wang's algorithm by the authors in [2] and [13] should be applicable to our algorithm because both algorithms are based on the Euclidean algorithm. The new algorithm can be applied to the problem of rational function reconstruction in one parameter t over a finite field GF(q) with q elements. ...

Let n/d ∈ Q, m be a positive integer and let u = n/d mod m. Thus $u$ is the image of a rational number modulo m. The rational reconstruction problem is; given u and m find n/d. A solution was first given by Wang in 1981. Wang's algorithm outputs n/d when m > 2 M2 where M = max(|n|,d). Because of the wide application of this algorithm in computer algebra, several authors have investigated its practical efficiency and asymptotic time complexity.In this paper we present a new solution which is almost optimal in the following sense; with controllable high probability, our algorithm will output n/d when m is a modest number of bits longer than 2 |n| d. This means that in a modular algorithm where m is a product of primes, the modular algorithm will need one or two primes more than the minimum necessary to reconstruct n/d; thus if |n| ⇐ d or d ⇐ |n| the new algorithm saves up to half the number of primes. Further, our algorithm will fail with high probability when m

... Use rational reconstruction -see [5,18,22,27]. Otherwise a denominator bound would be necessary, but such bounds are generally too large. The defect bound, usually the (reduced [2]) discriminant, which is part of the denominator bound, is usually also too large. ...

... Although our goal was to construct an algorithm which is practical in the sense that the complexity is comparable to polynomial multiplication and eact division, there are other possible designs for a modular GCD algorithm for L which may be asymptotically fast. If we apply reconstruction after 1, 2, 4, 8, 16, ..., 2 n primes, we can reduce the reconstruction cost to m log 2 m log log m per integer coefficient by using the fast Chinese remaindering algorithm (see 10.3 in [7]) and fast rational reconstruction algorithm (see (1.9) in [22]). This, however, has no asymptotic benefit unless we can also reduce the coefficients in the inputs modulo the primes more rapidly than in O(m 2 ) operations in Z p per coefficient. ...

We consider the problem of computing the monic gcd of two polynomials over a
number field L = Q(alpha_1,...,alpha_n). Langemyr and McCallum have already
shown how Brown's modular GCD algorithm for polynomials over Q can be modified
to work for Q(alpha) and subsequently, Langemyr extended the algorithm to L[x].
Encarnacion also showed how to use rational number to make the algorithm for
Q(alpha) output sensitive, that is, the number of primes used depends on the
size of the integers in the gcd and not on bounds based on the input
polynomials.
Our first contribution is an extension of Encarnacion's modular GCD algorithm
to the case n>1, which, like Encarnacion's algorithm, is is output sensitive.
Our second contribution is a proof that it is not necessary to test if p
divides the discriminant. This simplifies the algorithm; it is correct without
this test.
Our third contribution is a modification to the algorithm to treat the case
of reducible extensions. Such cases arise when solving systems of polynomial
equations.
Our fourth contribution is an implementation of the modular GCD algorithm in
Maple and in Magma. Both implementations use a recursive dense polynomial data
structure for representing polynomials over number fields with multiple field
extensions.
Our fifth contribution is a primitive fraction-free algorithm. This is the
best non-modular approach. We present timing comparisons of the Maple and Magma
implementations demonstrating various optimizations and comparing them with the
monic Euclidan algorithm and our primitive fraction-free algorithm.

... Proof. To support the theorem, it is sufficient to apply the algorithms in any of the papers [PW02], [WP03], [PW04], or [M04]. ...

... [MC79], [D82] were the first papers on Hensel's and Newton's lifting for solving integer linear systems of equations and integer matrix inversion. Nearly optimal exact solution of Toeplitz and Toeplitz-like linear system of equations based on the lifting and MBA algorithms was first sketched in the proceedings paper [PW02] with the focus on fast reconstruction of a rational number from its modular and numerical approximations. The papers [WP03] and [M04] cover the latter topic in some detail and include earlier bibliography. ...

Our subject is the solution of a structured linear system of equations, which is closely linked to computing a shortest displacement
generator for the inverse of its structured coefficient matrix. We consider integer matrices with the displacement structure
of Toeplitz, Hankel, Vandermonde, and Cauchy types and combine the unified divide-and-conquer MBA algorithm (due to Morf 1974,
1980 and Bitmead and Anderson 1980) with the Chinese remainder algorithm to solve both computational problems within nearly
optimal randomized Boolean and word time bounds. The bounds cover the cost of both solution and its correctness verification.
The algorithms and nearly optimal time bounds are extended to the computation of the determinant of a structured integer matrix,
its rank and a basis for its null space and further to some fundamental computations with univariate polynomials that have
integer coefficients.
Mathematics Subject Classification (2000)68W30–68W20–68Q25–68W40

... If one uses the ordinary Euclidean algorithm, the complexity of Wang's algorithm is O(log 2 m). In 2002 Pan and Wang in [13] modified the fast Euclidean algorithm of Schönhage [14] to solve the rational number reconstruction problem in time O(M(k) log k) where k = log m is the length of the modulus m and M(k) is the cost of multiplying integers of length k. The authors did not implement their algorithm and remarked during their presentation at ISSAC 2002 that the algorithm might not be practical. ...

... In 1971 Schönhage in [14] presented a fast integer GCD algorithm with time complexity O(n log 2 n log log n). An asymptotically fast rational number reconstruction algorithm based on Schönhage's algorithm was presented by Pan and Wang in [13]. Before that Allan Steel had implemented in Magma a fast rational number reconstruction algorithm based on the half-gcd algorithm presented in Montgomery's PhD thesis [12] for polynomials in F [x]. Currently, Mathematica v. 5.0 and Magma v. 2.10 both have a fast GCD and fast rational number reconstruction. ...

Let F be a field and let f and g be polynomials in F(t) satisfying degf > degg. Recall that on input of f and g the extended Euclidean algorithm computes a sequence of polynomials (si,ti,ri) satisfying sif + tig = ri. Thus for i with gcd(ti,f) = 1, we obtain rational functions ri/ti2 F(t) satisfying ri/ti g (mod f). In this paper we modify the fast extended Euclidean al- gorithm to compute the smallest ri/ti, that is, an ri/ti min- imizing degri + degti. This means that in an output sen- sitive modular algorithm when we are recovering rational functions in F(t) from their images modulo f(t) where f(t) is increasing in degree, we can recover them as soon as the degree of f is large enough and we can do this fast. We have implemented our modified fast Euclidean algo- rithm for F = Zp, p a word sized prime, in Java. Our fast algorithm beats the ordinary Euclidean algorithm around degree 200. This has application to polynomial gcd compu- tation and linear algebra over Zp(t).

... The rational number reconstruction of Step 5 is done at the cost of O(log 1+o(1) N ) = O(n 1+o(1) ) bit operations [17]. Since r < 2 ℓ , Step 7 is done using O(ℓn 1+o(1) log 1+o(1) q) bit operations. ...

We present a randomized quantum algorithm for polynomial factorization over finite fields. For polynomials of degree $n$ over a finite field $\F_q$, the average-case complexity of our algorithm is an expected $O(n^{1 + o(1)} \log^{2 + o(1)}q)$ bit operations. Only for a negligible subset of polynomials of degree $n$ our algorithm has a higher complexity of $O(n^{4 / 3 + o(1)} \log^{2 + o(1)}q)$ bit operations. This breaks the classical $3/2$-exponent barrier for polynomial factorization over finite fields \cite{guo2016alg}.

... One can further refine this algorithm by incorporating and refining generalized Hensel's lifting (proposed and elaborated upon in [77]- [80]) and by applying the lifting algorithms directly to block Toeplitz/Hankel linear systems, versus their auxiliary representation as Toeplitz/Hankel-like linear systems in [73]. ...

Multiplicative preconditioning is a popular tool for handling linear systems of equations provided the relevant information about the associated singular values is available. We propose using additive preconditioners,
which are readily available for both general and structured ill conditioned input matrices and which preserve matrix structure. We introduce primal and dual additive preconditioning and combine it with two aggregation
techniques. Our extensive analysis and numerical experiments show the efficiency of the resulting numerical algorithms for solving linear systems of equations and some other fundamental matrix computations. Our study provides some new insights into preconditioning, links it to various related subjects of matrix computations, and leads to some results of independent interest.

... Schönhage [22] improved this time complexity in 1971 since his GCD algorithm can be achieved in O(n log 2 n log log n) time, which is, until now, the fastest sequential GCD algorithm. Many fast sequential GCD algorithms reached this performance with similar divide and conquer approach [24,18,20,21]. ...

In this paper, we consider a gcd algorithm based on subtraction, defined by the sequence
$u_{n+2}=|u_{n+1}-u_n|$ where $u_0,u_1 \geq 1$ are two integers. The sequence stops at
the first zero, the previous non zero integer is the gcd of $u$ and $v$.
This procedure is similar to an early GCD algorithm, described by Euclid, more than two
thousand years ago. See for example the paper "Subtractive Algorithm for GCD} of
D.E. Knuth and A.C. Yao" for the average analysis of this algorithm. The sequence in
Euclid alogorithm is defined by $(u,v) \, \longrightarrow (|u-v|,\min\,\{u,v\})$,
while our algorithm is based on a different transformation
$(u,v) \, \longrightarrow (v,|u-v|)$. The aim of this paper is to study
the worst case analysis of this new algorithm
we call the Sub-like gcd algorithm.

... The Schönhage's algorithm use divide-and-conquer approach to recursively computes the quotient sequence (Thull and Yap, 1990). Another similar algorithm is described tersely in Pan and Wang (2002). The algorithm is straightforward to apply to polynomial GCD, but both analysis and actual implementation are quite difficult and error prone. ...

We propose a novel algorithm for integer’s greatest common
divisor (GCD) computation that hybridises both Euclidian and binary
algorithms according to a new schema, in order to accelerate the GCD
computation especially in the case of large bit difference between the two
inputs. The proposed algorithm run slightly faster than existing algorithms
and is very much easier to implement. We provide a simple proof of
correctness for the algorithm, and we show that best performances can be
achieved for large integers when subtraction operator complexity is relatively
low with respect to the division one.

... It turns out that the integer case is particularly troublesome due to the possibility of carries that may cause intermediate values to be too large or too small relative to what the algorithm requires. Several papers ([3], [25], and [20]) redress this with fix−up steps that involve a limited number of Euclidean steps or reversals thereof. These papers tend to have proofs that involve analysis of many detailed cases, thus making them difficult to follow, let alone implement. ...

Over the past few decades several variations on a "half GCD" algorithm for obtaining the pair of terms in the middle of a Euclidean sequence have been proposed. In the integer case algorithm design and proof of correctness are complicated by the effect of carries. This paper will demonstrate a variant with a relatively simple proof of correctness. We then apply this to rational recovery for a linear algebra solver. After showing how this same task might be accomplished by lattice reduction, albeit more slowly, we proceed to use the half GCD to obtain asymptotically fast planar lattice reduction. This is an extended version of a paper presented at ISSAC 2005 [17]. It also contains minor changes.

... For high precision computations, it is also recommended to speed the LLL-algorithm up using a similar dichotomic algorithm as for fast g.c.d. computations [Moe73,PW02]. ...

Let L∈K(z)[∂] be a linear differential operator, where K is an effective algebraically closed subfield of C. It can be shown that the differential Galois group of L is generated (as a closed algebraic group) by a finite number of monodromy matrices, Stokes matrices and matrices in local exponential groups. Moreover, there exist fast algorithms for the approximation of the entries of these matrices.In this paper, we present a numeric–symbolic algorithm for the computation of the closed algebraic subgroup generated by a finite number of invertible matrices. Using the above results, this yields an algorithm for the computation of differential Galois groups, when computing with a sufficient precision.Even though there is no straightforward way to find a “sufficient precision” for guaranteeing the correctness of the end result, it is often possible to check a posteriori whether the end result is correct. In particular, we present a non-heuristic algorithm for the factorization of linear differential operators.

... It turns out that the integer case is particularly troublesome due to the possibility of carries that may cause intermediate values to be too large or too small relative to what the algorithm requires. Several papers [3, 15, 13] redress this with fix-up steps that involve a limited number of Euclidean steps or reversals thereof. These papers tend to have proofs that involve analysis of many detailed cases, thus making them difficult to follow, let alone implement. ...

Over the past few decades several variations on a "half GCD" algorithm for obtaining the pair of terms in the middle of a Euclidean sequence have been proposed. In the integer case algorithm design and proof of correctness are complicated by the effect of carries. This paper will demonstrate a variant with a relatively simple proof of correctness. We then apply this to the task of rational recovery for a linear algebra solver.

... 3. Compute result with error probability at most ǫ/2, by [11,Algorithm: LargestInvariantFactor] or [16,Algorithm Leading Smith Factor]. In EGV the error probability is 2/3, but we can achieve any error probability by repeating the loop [11, steps 3 -6 in Algorithm: LargestInvariantFactor] enough times. ...

We present a variation of the fast Monte Carlo algorithm of Eberly, Giesbrecht and Villard for computing the Smith form of an integer matrix. It is faster in practice, but with the same asymptotic complexity, and it also handles the singular case. Then we will apply the key principle to improve Storjohann's algorithm and Iliopoulos' algorithm. We have a soft linear time algorithm for the special case of a diagonal matrix. A local Smith form Algorithm is also considered.We offer analysis and experimental results regarding these algorithms, with a view to the construction of an adaptive algorithm exploiting each algorithm at it's best range of performance. Finally, based on this information, we sketch the proposed structure of an adaptive Smith form algorithm for matrices over the integers. Our experiments use implementations in LinBox, a library for exact computational linear algebra available at linalg.org.

... A clear description of the algorithm details, for both the integer and polynomial case, can be found in a paper by Klaus Thull and Chee K. Yap [12]. A similar algorithm is described tersely in [6]. Schönhage's 1971 algorithm is straightforward to apply to polynomial gcd, but, to quote [12]: " The integer hgcd algorithm turns out to be rather intricate " . ...

We describe a new subquadratic left-to-right GCD algorithm, inspired by Schoenhage's algorithm for reduction of binary quadratic forms, and compare it to the first subquadratic GCD algorithm discovered by Knuth and Schoenhage, and to the binary recursive GCD algorithm of Stehle and Zimmermann. The new GCD algorithm runs slightly faster than earlier algorithms, and it is much simpler to implement. The key idea is to use a stop condition for HGCD that is based not on the size of the remainders, but on the size of the next difference. This subtle change is sufficient to eliminate the back-up steps that are necessary in all previous subquadratic left-to-right GCD algorithms. The subquadratic GCD algorithms all have the same asymptotic running time, O(n (log n)^ 2 log log n) .

The extended GCD (XGCD) calculation, which computes Bézout coefficients ba, bb such that ba ∗ a0 + bb ∗ b0 = GCD(a0, b0), is a critical operation in many cryptographic applications. In particular, large-integer XGCD is computationally dominant for two applications of increasing interest: verifiable delay functions that square binary quadratic forms within a class group and constant-time modular inversion for elliptic curve cryptography. Most prior work has focused on fast software implementations. The few works investigating hardware acceleration build on variants of Euclid’s division-based algorithm, following the approach used in optimized software. We show that adopting variants of Stein’s subtraction-based algorithm instead leads to significantly faster hardware. We quantify this advantage by performing a large-integer XGCD accelerator design space exploration comparing Euclid- and Stein-based algorithms for various application requirements. This exploration leads us to an XGCD hardware accelerator that is flexible and efficient, supports fast average and constant-time evaluation, and is easily extensible for polynomial GCD. Our 16nm ASIC design calculates 1024-bit XGCD in 294ns (8x faster than the state-of-the-art ASIC) and constant-time 255-bit XGCD for inverses in the field of integers modulo the prime 2255−19 in 85ns (31× faster than state-of-the-art software). We believe our design is the first high-performance ASIC for the XGCD computation that is also capable of constant-time evaluation. Our work is publicly available at https://github.com/kavyasreedhar/sreedhar-xgcd-hardware-ches2022.

Let n/d∈Q,m be a positive integer and Let u=n/d mod m. Thus u is the image of a rational number modulo m. The rational reconstruction problem is: given u and m find n/d. Classical Euclidean Algorithm outputs n/d when m>;2M2, where M=max(|n|,d). The rational reconstruction problem was generally solved by classical Euclidean algorithm. In this paper, we achieve an Extended Euclidean algorithm for rational, and obtain an solvability criterion, hence, the number of solutions. Its Computational complexity is O(mlog2mloglogm).

Given polynomials with floating-point number coefficients, one can now compute the approximate GCD stably, except in ill-conditioned cases where the GCD has small or large leading coefficient/constant term. The cost is O(m2), where m is the maximum of degrees of given polynomials. On the other hand, for polynomial with integer coefficients, one can compute the polynomial GCD faster by using the half-GCD method with the cost less than O(m2). In this paper, we challenge to compute the approximate GCD faster, with the cost less than O(m2). Our idea is to use the displacement technique and the half-GCD method.

We extend Hensel lifting for solving general and structured linear sys-tems of equations to the rings of integers modulo nonprimes, e.g. modulo a power of two. This enables significant saving of word operations. We elaborate upon this approach in the case of Toeplitz linear systems. In this case, we initialize lifting with the MBA superfast algorithm, estimate that the overall bit operation (Boolean) cost of the solution is optimal up to roughly a logarithmic factor, and prove that the degeneration is un-likely even where the basic prime is fixed but the input matrix is random. We also comment on the extension of our algorithm to some other fun-damental computations with (possibly singular) general and structured matrices and univariate polynomials as well as to the computation of the sign and the value of the determinant of an integer matrix.

The classical and intensively studied problem of solving a Toeplitz/Hankel linear system of equations is omnipresent in computations in sciences, engineering and communication. Its equiva-lent formulations include computing polynomial gcd and lcm, Padé approximation, and Berlekamp-Massey's problem of recovering the linear recurrence coefficients. To improve the current record asymptotic bit operation cost of the solution, we rely on Hensel's p-adic lifting. We accelerate its recovery stage by exploiting randomization and the correlation between lifting and the computa-tion of Smith's invariant factors of the input matrix. Furthermore, for the average input, the 2-adic version of lifting is sufficient, allowing entire computation in binary form, which promises to be valuable for practical computations. Our resulting algorithms solve a nonsingular Toeplitz/Hankel linear system of n equations by using O(m(n)nµ(log n)) bit operations (versus the information lower bound of the order of n 2 log n), where m(n) and µ(d) bound the arithmetic and Boolean cost of multiplying polynomials of degree n and integers modulo 2 d + 1, respectively, and where the input coefficients are in n O(1) . Our algorithms can be applied to a larger class of Toeplitz/Hankel-like linear systems.

We reexamine the Wiedemann—Coppersmith—Kaltofen—Villard algorithm for randomized computation of the determinant of an integer
matrix and substantially simplify and accelerate its bottleneck stage of computing the minimum generating matrix polynomial,
to make the algorithm practically promising while keeping it asymptotically fast. Bibliography: 58 titles.

In this paper, we present a new algorithm for the exact solutions of linear systems with integer coefficients using numerical methods. It terminates with the correct answer in well-conditioned cases or quickly aborts in ill-conditioned cases. Success of this algorithm on a linear equation requires that the linear system must be sufficiently well-conditioned for the numeric linear algebra method being used to compute a solution with sufficient accuracy. Our method is to find an initial approximate solution by using a numerical method, then amplify the approximate solution by a scalar, and adjust the amplified solution and corresponding residual to integers so that they can be computed without large integer arithmetic involved and can be stored exactly. Then we repeat these steps to refine the solution until sufficient accuracy is achieved, and finally reconstruct the rational solution. Our approximating, amplifying, and adjusting idea enables us to compute the solutions without involving high precision software floating point operations in the whole procedure or large integer arithmetic except at the final rational reconstruction step. We will expose the theoretical cost and show some experimental results.

Let L be an algebraic function field in k ≥ 0 parameters t;1;, ..., t;k;. Let f;1;, f;2; be non-zero polynomials in L[x]. We give two algorithms for computing their gcd. The first, a modular GCD algorithm, is an extension of the modular GCD algorithm of Brown for Z[x;1;,...,x;n;] and Encarnacion for Q(α)[x] to function fields. It is uses rational number and rational function reconstruction and trial division. The second, a fraction-free algorithm, is a modification of the Moreno Maza and Rioboo algorithm for computing gcds over triangular sets. The modification reduces coefficient growth in L to be linear. We show how to extend the modular GCD algorithm to work when the minimal polynomial for L is not irreducible. We give an empirical comparison of the two algorithms using implementations in Maple.

An iterative refinement approach is taken to rational linear system solving. Such methods produce, for each entry of the solution vector, a rational approximation with denominator a power of 2. From this the correct rational entry can be reconstructed. Our iteration is a numeric-symbolic hybrid in that it uses an approximate numeric solver at each step together with a symbolic (exact arithmetic) residual computation and symbolic rational reconstruction. The rational solution may be checked symbolically (exactly). However, there is some possibility of failure of convergence, usually due to numeric ill-conditioning. Alternatively, the algorithm may be used to obtain an extended precision floating point approximation of any specified precision. In this case we cannot guarantee the result by rational reconstruction and an exact solution check, but the approach gives evidence (not proof) that the probability of error is extremely small. The chief contributions of the method and implementation are (1) confirmed continuation, (2) improved rational reconstruction, and (3) faster and more robust performance.

Subquadratic divide-and-conquer algorithms for computing the greatest common divisor have been studied for a couple of decades.
The integer case has been notoriously difficult, with the need for “backup steps” in various forms. This paper explains why
backup steps are necessary for algorithms based directly on the quotient sequence, and proposes a robustness criterion that
can be used to construct a “half-gcd” algorithm without any backup steps.

We accelerate the known algorithms for computing a selected entry of the extended Euclidean algorithm for integers and, consequently, for the modular and numerical rational number reconstruction problems. The acceleration is from quadratic to nearly linear time, matching the known complexity bound for the integer gcd, which our algorithm computes as a special case.

We combine our novel SVD-free additive preconditioning with aggregation and other relevant techniques to facilitate the solution of a linear system of equations and other fundamental matrix computations. Our analysis and experiments show the power of our algorithms, guide us in selecting most effective policies of preconditioning and aggregation, and provide some new insights into these and related subjects. Compared to the popular SVD-based multiplicative preconditioners, our additive preconditioners are generated more readily and for a much larger class of matrices. Furthermore, they better preserve matrix structure and sparseness and have a wider range of applications (e.g., they facilitate the solution of a consistent singular linear system of equations and of the eigenproblem).

A probabilistic algorithm is exhibited that calculates the gcd of many integers using gcds of pairs of integers; the expected number of pairwise gcds required is less than two.

A probabilistic algorithm is exhibited that calculates the gcd of many integers using gcds of pairs of integers; the expected number of pairwise gcds required is less than two.

In this paper we consider deterministic computation of the exact determinant of a dense matrix M of integers. We present a new algorithm with worst case complexity O Gamma n 4 (log n + log jjM jj) + n 3 log 2 jjM jj Delta , where n is the dimension of the matrix and jjM jj is a bound on the entries in M , but with average expected complexity O Gamma n 4 + n 3 (log n + log jjM jj) 2 Delta , assuming some plausible properties about the distribution of M . We will also describe a practical version of the algorithm and include timing data to compare this algorithm with existing ones. Our result does not depend on "fast" integer or matrix techniques. 1 Introduction One of the most fundamental characteristics of a square matrix is its determinant. Its being 0 expresses the non-- invertibility of the matrix, i.e. the fact that it has a non-- trivial kernel. For a real matrix, its absolute value is the volume of the multi--dimensional parallelepiped with generating ed...

Computer algebra systems are now ubiquitous in all areas of science and engineering. This highly successful textbook, widely regarded as the 'bible of computer algebra', gives a thorough introduction to the algorithmic basis of the mathematical engine in computer algebra systems. Designed to accompany one- or two-semester courses for advanced undergraduate or graduate students in computer science or mathematics, its comprehensiveness and reliability has also made it an essential reference for professionals in the area. Special features include: detailed study of algorithms including time analysis; implementation reports on several topics; complete proofs of the mathematical underpinnings; and a wide variety of applications (among others, in chemistry, coding theory, cryptography, computational logic, and the design of calendars and musical scales). A great deal of historical information and illustration enlivens the text. In this third edition, errors have been corrected and much of the Fast Euclidean Algorithm chapter has been renovated.

We estimate parallel complexity of several matrix computations under both Boolean and arithmetic machine models using deterministic and probabilistic approaches. Those computations include the evaluation of the inverse, the determinant, and the characteristic polynomial of a matrix. Recently, processor efficiency of the previous parallel algorithms for numerical matrix inversion has been substantially improved in (Pan and Reif, 1987), reaching optimum estimates up to within a logarithmic factor; that work, however, applies neither to the evaluation of the determinant and the characteristic polynomial nor to exact matrix inversion nor to the numerical inversion of ill-conditioned matrices. We present four new approaches to the solution of those latter problems (having several applications to combinatorial computations) in order to extend the suboptimum time and processor bounds of (Pan and Reif, 1987) to the case of computing the inverse, determinant, and characteristic polynomial of an arbitrary integer input matrix. In addition, processor efficient algorithms using polylogarithmic parallel time are devised for some other matrix computations, such as triangular and QR-factorizations of a matrix and its reduction to Hessenberg form.

Es wird ein Algorithmus zur Berechnung des Produktes von zweiN-stelligen Dualzahlen angegeben. Zwei Arten der Realisierung werden betrachtet: Turingmaschinen mit mehreren Bndern und logische Netze (aus zweistelligen logischen Elementen aufgebaut).An algorithm is given for computing the product of twoN-digit binary numbers byO (N lgN lg lgN) steps. Two ways of implementing the algorithm are considered: multitape Turing machines and logical nets (with step=binary logical element.)

A method is described for computing the exact rational solution to a regular systemAx=b of linear equations with integer coefficients. The method involves: (i) computing the inverse (modp) ofA for some primep; (ii) using successive refinements to compute an integer vector
[`(x)]\bar x
such that
A[`(x)] º bA\bar x \equiv b
(modp
m
) for a suitably large integerm; and (iii) deducing the rational solutionx from thep-adic approximation
[`(x)]\bar x
. For matricesA andb with entries of bounded size and dimensionsnn andn1, this method can be implemented in timeO(n
3(logn)2) which is better than methods previously used.

We apply a new parametrized version of Newton's iteration in order to compute (over any field F of constants) the solution, or at least-squares solution, to linear system Bx = v with an n × n Toeplitz or Toeplitz-like matrix B, as well as the determinant of B and the coefficients of its characteristic polynomial, det(λI − B), dramatically improving the processor efficiency of the known fast parallel algorithms. Our algorithms, together with some previously known and some recent results of [1–5], as well as with our new techniques for computing polynomial god's and lcm's, imply respective improvement of the known estimates for parallel arithmetic complexity of several fundamental computations with polynomials, and with both structured and general matrices.

We present two new algorithms, ADT and MDT, for solving order-n Toeplitz systems of linear equations Tz = b in time O(n log2n) and space O(n). The fastest algorithms previously known, such as Trench's algorithm, require time Ω(n2) and require that all principal submatrices of T be nonsingular. Our algorithm ADT requires only that T be nonsingular. Both our algorithms for Toeplitz systems are derived from algorithms for computing entries in the Padé table for a given power series. We prove that entries in the Padé table can be computed by the Extended Euclidean Algorithm. We describe an algorithm EMGCD (Extended Middle Greatest Common Divisor) which is faster than the algorithm HGCD of Aho, Hopcroft and Ullman, although both require time O(n log2n), and we generalize EMGCD to produce PRSDC (Polynomial Remainder Sequence Divide and Conquer) which produces any iterate in the PRS, not just the middle term, in time O(n log2n). Applying PRSDC to the polynomials U0(x) = x2n+1 and U1(x) = a0 + a1x + … + a2nx2n gives algorithm AD (Anti-Diagonal), which computes any (m, p) entry along the antidiagonal m + p = 2n of the Padé table for U1 in time O(n log2n). Our other algorithm, MD (Main-Diagonal), computes any diagonal entry (n, n) in the Padé table for a normal power series, also in time O(n log2n). MD is related to Schönhage's fast continued fraction algorithm. A Toeplitz matrix T is naturally associated with U1, and the (n, n) Padé approximation to U1 gives the first column of T−1. We show how a formula due to Trench can be used to compute the solution z of Tz = b in time O(n log n) from the first row and column of T−1. Thus, the Padé table algorithms AD and MD give O(n log2n) Toeplitz algorithms ADT and MDT. Trench's formula breaks down in certain degenerate cases, but in such cases a companion formula, the discrete analog of the Christoffel-Darboux formula, is valid and may be used to compute z in time O(n log2n) via the fast computation (by algorithm AD) of at most four Padé approximants. We also apply our results to obtain new complexity bounds for the solution of banded Toeplitz systems and for BCH decoding via Berlekamp's algorithm.

We combine Cramer's rule, p-adic lifting, and rational interpolation in an unusual way, in order to reduce the problems of computing the determinant, det A, and all other coefficients of the characteristic polynomial, det(λI - A), of a matrix A to inverting A and⧸or to solving linear systems of equations. Such a reduction enables us to apply Hensel's effective p-adic lifting to the evaluation of det A and det(λI - A); from that practically important point of view, no comparable alternative is known. On the theoretical side, no other ways of reduction of computing det A to solving few linear systems are known.

An integer greatest common divisor (GCD) algorithm due to Schönhage is generalized to hold in all euclidean domains which possess a fast multiplication algorithm. It is shown that if two N precision elements can be multiplied in O(N loga N), then their GCD can be computed in O(N loga+1 N). As a consequence, a new faster algorithm for multivariate polynomial GCD's can be derived and with that new bounds for rational function manipulation.

Parallel randomized algorithms are presented that solve n-dimensional systems of linear equations and compute inverses of n x n non-singular matrices over a field in O((log n)z) time, where each time unit represents an arithmetic operation in the field generated by the matrix entries. The algorithms utilize within a O(log n) factor as many processors as are needed to multiply two n x n matrices. The algorithms avoid zero divisions with controllably high probability provided the O(n) random elements used are selected uniformly from a sufficiently large set. For fields of small positive characteristic, the processor count measures of our solutions are somewhat higher.

A method, given by D. E. Knuth for the computation of the greatest common divisor of two integers u, v and of the continued fraction for u/v is modified in such a way that only O(n(lg n)2(lglg n)) elementary steps are used for u,v<.2
n.

this paper we generalize the well-known Schonhage-Strassen algorithm for multiplying large integers to an algorithm for multiplying polynomials with coefficients from an arbitrary, not necessarily commutative, not necessarily associative, algebra A. Our main result is an algorithm to multiply polynomials of degree ! n in

1 Computations with Toeplitz and Toeplitz-like matrices are fundamental for many areas of algebraic and numerical computing. The list of computational problems reducible to Toeplitz and Toeplitz-like computations includes, in particular, the evaluation of the gcd, the lcm, and the resultant of two polynomials, computing Pad'e approximation and the Berlekamp-Massey recurrence coefficients, as well as numerous problems reducible to these. Transition to Toeplitz and Toeplitz-like computations is currently the basis for the design of the fastest known parallel (RNC) algorithms for these computational problems. Our main result is in contructing nearly optimal randomized parallel algorithms for Toeplitz and Toeplitz-like computations and, consequently, for numerous related computational problems (including the computational problems listed above), where all the input values are integers and all the output values are computed exactly. This includes randomized parallel algorithms for computing...

Can We Optimize Computations with Structured Matrices? preprint

- V Y Pan
- Pan V. Y.