## About

93

Publications

5,042

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

6,651

Citations

Citations since 2016

Introduction

**Skills and Expertise**

## Publications

Publications (93)

While tremendously useful, automated techniques for tuning the precision of floating-point programs face important scalability challenges. We present Blame Analysis, a novel dynamic approach that speeds up precision tuning. Blame Analysis performs floating-point instructions using different levels of accuracy for their operands. The analysis determ...

Given the variety of numerical errors that can occur, floating-point programs are difficult to write, test and debug. One common practice employed by developers without an advanced background in numerical analysis is using the highest available precision. While more robust, this can degrade program performance significantly. In this paper we presen...

Matrix Riccati Differential Equations (MRDEs) X′ = A 21 - XA 11 + A 22 X - XA 12 X, X(0) = X0, where A ij ≡ A ij (t), appear frequently throughout applied mathematics, science, and engineering. Naturally, the existing conventional Runge-Kutta methods and linear multi-step methods can be adapted to solve MRDEs numerically. Indeed, they have been ada...

The eigenvectors of an Hermitian matrix H are the columns of some complex unitary matrix Q. For any diagonal unitary matrix Ω the columns of Q·Ω are eigenvectors too. Among all such Q·Ω at least one has a skew-Hermitian Cayley transform S≔(I+Q·Ω)−1·(I−Q·Ω) with just zeros on its diagonal. Why? The proof is unobvious, as is the further observation t...

New releases of the widely used LAPACK and ScaLAPACK numerical linear algebra libraries are planned. Based on an on-going
user survey (www.netlib.org/lapack-dev) and research by many people, we are proposing the following improvements: Faster algorithms,
including better numerical methods, memory hierarchy optimizations, parallelism, and automatic...

We present the design and testing of an algorithm for iterative refinement of the solution of linear equations, where the residual is computed with extra precision. This algorithm was originally proposed in the 1960s [6, 22] as a means to compute very accurate solutions to all but the most ill-conditioned linear systems of equations. However two ob...

IEEE 754 a standard for binary floating-point arithmetic has revolutionized the portability and reliability of programs that use binary floating-point arithmetic. Floating point is almost universally implemented with special-purpose hardware that tucks into a small corner of the CPU chip and runs in the hundreds of Mflops to Gflops range. Single-st...

We consider the efficient and accurate computation of Givens rotations. When f and g are positive real numbers, this simply amounts to computing the values of c = f/√f2 + g2, s = g/√f2 + g2, and r = √f2 + g2. This apparently trivial computation merits closer consideration for the following three reasons. First, while the definitions of c, s and r s...

Suppose U is an upper-triangular matrix, and D a nonsingular diagonal matrix whose diagonal entries appear in nondescending order of magnitude down the diagonal. It is proved that $$|D{-1}UD|ge|U|$$ for any matrix norm that is reduced by a pinching. In addition to known examples -weakly unitarily invariant norms - we show that any matrix norm defi...

This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more ac...

This paper describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accu...

Suppose U is an upper-triangular matrix, and D a nonsingular diagonal matrix whose diagonal entries appear in nondescending order of magnitude down the diagonal. It is proved that kD ;1 UDkkUk for any matrix norm that is reduced by a pinching. In addition to known examples -- weakly unitarily invariant norms -- weshow that any matrix norm defined b...

Programs in symbolic algebraic manipulation systems can compute certain classes of symbolic indefinite integrals in closed form. Although these answers are ordinarily formally correct algebraic anti-derivatives, their form is often unsuitable for further numerical or even analytical processing. In particular, we address cases in which such "exact a...

Divided differences are enormously useful in developing stable and accurate numerical formulas. For example, programs to compute f(x)-f(y) as might occur in integration, can be notoriously inaccurate. Such problems can be cured by approaching these computations through divided difference formulations. This paper provides a guide to divided differen...

This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more ac...

With the growing demands from disciplinary and interdisciplinary fields of science and engineering for the numerical solution of the nonsymmetric eigenvalue problem, competitive new techniques have been developed for solving the problem. In this paper, we examine the start-of-the-art of the algorithmic techniques and the software scene for the prob...

By far the majority of computers in use to-day are AMD/Cyrix/Intel-based PCs, and a big fraction of the rest are old 680x0-based Apple Macintoshes. Owners of these machines are mostly unaware that their floating-point arithmetic hardware is capable of delivering routinely better results than can be expected from the more prestigious and more expens...

Many models of physical and chemical processes give rise to or- dinary dierential equations with special structural properties that go unex- ploited by general-purpose software designed to solve numerically a wide range of dierential equations. If those properties are to be exploited fully for the sake of better numerical stability, accuracy and/or...

An unconventional numerical method for solving a restrictive and yet often-encountered class of ordinary differential equations is proposed. The method has a crucial, what we callreflexive, property and requires solving one linear system per time-step, but is second-order accurate. A systematical and easily implementable scheme is proposed to enhan...

A Language Compatible Arithmetic Standard (LCAS) has been proposed as International Standard ISO/IEC 10967:1991 for Language Compatible Arithmetic, Project JTC.22.28, Version 3.1 (1 March 1991), by Drs. Mary Payne and Brian Wichmann. An earlier version appeared in both ACM SIGNUM and ACM SIGPLAN 25 1 (Jan. 1990). The following remonstrance has been...

Computing the singular values of a bidiagonal matrix is the final phase of the standard algorithm for the singular value decomposition of a general matrix. We present a new algorithm which computes all the singular values of a bidiagonal matrix to high relative accuracy independent of their magnitudes. In contrast, the standard algorithm for bidiag...

A program to solve a real cubic equation efficiently and as accurately as the data deserve is not yet an entirely cut-and-dried affair. An iterative method is the best found so far. This method plus some other issues, like accuracy, scaling, preconditioning and testing, are discussed in these notes in enough detail to convey an impression of what N...

The IBM ACRITH package of numerical software is advertised as reliable and easy to use; but sometimes its results must astonish or confuse a naive user. This report exhibits a few of the surprises. For instance, a finite continued fraction, easy to evaluate in two dozen keystrokes on a handheld calculator, causes ACRITH to overflow either exponent...

The Microprocessor Standards Committee of the IEEE Computer Society sponsors two groups drafting proposed standard for floating-point arithmetic. The first, Task P754, reported Draft 10.0 of a Proposed Standard for Binary Floating-point Arithmetic out of committee in December. 1982. That document is now a de facto standard and is progressing slowly...

The Microprocessor Standards Committee of the IEEE Computer Society sponsors two groups drafting proposed standards for floating-point arithmetic. The first, Task P754, reported Draft 10.0 of a Proposed Standard for Binary Floating-point Arithmetic out of committee in December, 1982. The document is now a de facto standard and is progressing slowly...

The problem is, given A, B, C, to find D such that $\left\| ( {\begin{array}{*{20}c} A & C \\ B & D \\ \end{array} )} \right\| \leqq \mu $; here we deal with Hilbert-space operators, A, B, and C are given, and $\mu $ is a given positive number. We give explicit formulas for all solutions D. In case the D sought is a finite-dimensional matrix, we ex...

For nonnormal matrices the norms of the residuals of approximate eigenvectors are not by themselves sufficient information to bound the error in the approximate eigenvalue. It is sufficient however to give a bound on the distance to the nearest matrix for which the given approximations are exact. This result is extended to cover approximate invaria...

\... the programmer must be able to state which properties he re-quires... Usually programmers don't do so because, for lack of tra-dition as to what properties can be taken for granted, this would require more explicitness than is otherwise desirable. The prolifera-tion of machines with lousy oating-point hardware { together with the misapprehensi...

Implementors of the proposed standard, described in a special issue of the SIGNUM Newsletter (1) and in an article by J. Coonen in the IEEE Journal “COMPUTER” (2), are encouraged to provide two “directed rounding modes” by which the endpoints of intervals may be rounded outward automatically without unnecessarily spreading degenerate (one-point) in...

A standard for binary floating-point arithmetic is being proposed and there is a very real possibility that it will be adopted by many manufacturers and implemented on a wide range of computers. This development matters to all of us concerned with numerical software. One of the principal motivations for the standard is to distribute more evenly the...

This standard is a product of the Floating Point Working Group of the Microprocessor Standards Subcommittee of the IEEE Computer Society Computer Standards Committee. It is intended that the standard embody the essence of "Specifications For a Proposed Standard for Floating Point Arithmetic" by Jerome Coonen.

A standard for binary floating point arithmetic is briefly described. There is a very real possibility that it will be adopted by many manufacturers and implemented on a wide range of computers.

The Lanczos algorithm can be used to approximate both the largest and smallest eigenvalues of a symmetric matrix whose order is so large that similarity transformations are not feasible. The algorithm builds up a tridiagonal matrix row by row and the key question is when to stop. An analysis leads to a stopping criterion which is inspired by a usef...

This report responds to repeated requests for explanations of the arithmetic paradoxes perpetrated by the T.I. SR-52. (Author)

The Lanczos algorithm can be used to approximate both the largest and smallest eigenvalues of a symmetric matrix whose order is so large that similarity transformations are not feasible. The algorithm builds up a tridiagonal matrix row by row and the key question is when to stop. An analysis leads to a stopping criterion which is inspired by a usef...

The problem concerns four variables a,b,c,d to be interpreted as centre (c,d) and principal semiaxes a,b of an ellipse [EQUATION] We wish to know when E lies inside the unit disk D: x2 + y2 ≤ 1. More precisely, we seek a set of polynomials {Pj(a2, b2, c2, d2,)} for j=1,2,...,n with the property that E≤Dif and only if all Pj(a2, b2, c2, d2,) ≤0. It...

When properly ordered, the respective eigenvalues of an n×n Hermitian matrix A and of a nearby non-Hermitian matrix A + B cannot differ by more than formula presented moreover; for all n≥4, examples A and B exist for which this bound is in excess by at most about a factor 3. This bound is contrasted with other previously published over estimates th...

The Lanczos algorithm is presented as a way of generating bases for a sequence of Krylov subspaces. Explicit expressions are given for the departure of the bases from orthogonality. These relations enable one to comprehend the behavior of the algorithm in practice with a minimum of conventional error analysis. In particular this approach sheds ligh...

Solving a linear system Ax = b by Gaussian Elimination usually entails pivotal inter-changes designed to inhibit that explosive growth of intermediate results which would otherwise, through roundoff, vitiate the calculation. But these interchanges, motivated by numerical desiderata, frequently conflict with combinatorial desiderata like "Sparsity"....

The title's inequality is proved for the operator bound-norm in a unitary space. An example is exhibited to show that the inequality cannot be improved by more than about 8% when n is large. The numerical range, of an n × n matrix Z with real spectrum, is then shown to be not arbitrarily different in shape from the spectrum.

The title’s inequality is proved for the operator bound-norm in a unitary space. An example is exhibited to show that the inequality cannot be improved by more than about 8

Certain problems are ill-conditioned, in the sense that their solutions are hypersensitive to small changes in data, only because a slight change in data could cause those solutions to exhibit singular behaviour associated with various kinds of confluence. For example, an over- or under- determined linear system solved by least-squares can be ill-c...

Computer professionals everywhere will have occasion to share with Hirondo's family and friends that sense of grievous loss evoked by his premature death. Almost all of us who do scientific calculations on IBM equipment benefit unknowingly from his craftsmanship, unknowingly because Hirondo was too self-effacing to secure the public recognition his...

The first annual George Forsythe Memorial Lecture will be presented by W. Kahan, University of California at Berkeley, on the topic of dealing with ill-defined numerical problems.

Rounding error is just one kind of error, and an easier kind to analyze than some others. Error and uncertainty in data is a more important kind, and not so easy to estimate nor analyze; here is where error analysts are currently busiest. The most refractory kind of error is attributable to flaws in the design of computer systems, both hardware and...

The QR iteration for the eigenvalues of a symmetric tridiagonal matrix can be accelerated by incorporating a sequence of origin shifts. The origin shift may be either subtracted directly from the diagonal elements of the matrix or incorporated by means of an implicit algorithm. Both methods have drawbacks: the direct method can unnecessarily degrad...

Given an invariant subspace of a Hermitian matrix and the corresponding invariant subspace of a perturbed matrix, the object is to bound the amount by which the two subspaces differ as a function of the magnitudes of the perturbation and of the gaps between appropriate parts of the spectra. This paper centers on four theorems of that sort.

Examples are presented, and sometimes analysed in detail, to reveal the unpleasant implications for scientific computation of flaws in the design of the arithmetic unit and in the supervisory software associated with it. Attempts to axiomatize floating point arithmetic are discussed and the reasons why they are irrelevant. It is shown that Interval...

This case study drawn from an elementary numerical analysis course is aimed at computer language designers and implementors who took no competent course on the subject or forgot it and consequently subscribe to principles of language design inimical to the best interests of writers and users of software with a little floating-point arithmetic in it...

Simplicity is a Virtue; yet we continue to cram ever more complicated circuits ever more densely into silicon chips, hoping all the while that their internal complexity will promote simplicity of use. This paper exhibits how well that hope has been fulfilled by several inexpensive devices widely used nowadays for numerical computation. One of them...

Research on ways to organize a body of numerical procedures in such a way that they may be invoked automatically by processes which accept symbolic and algebraic specifications from a user, and produce combined symbolic, numeric and graphical output is described. Efforts are made to make these algebraic systems as flexible and useful as possible in...

One of the three main processes associated with polynomials is evaluation; the two other ones be-ing interpolation and root finding. Higham [1, chap. 5] devotes an entire chapter to polynomials and more especially to polynomial evaluation. The small backward error the Horner scheme introduce when evaluated in floating point arithmetic justifies its...

Techniques are introduced to help decide whether roundoff errors will abrogate the monotonicity properties of a function when it is computed. Those techniques are applied to several expressions, among them z/(1+z), z + z/(1+z), 2y -y 2 , w + 1/w, t -t/(1+4/t 2), ... , that have turned up during the calculation of certain elementary transcendental f...

When a Hermitian linear operator A is slightly perturbed, by how much can its invariant subspaces change? Given some approxima tions to a cluster of neighboring eigenvalues and to the corresponding eigenvectors of a real symmetric matrix, and given a lower bound S > 0 for the gap that separates the cluster from all other eigenvalues, how much can...

An ellipsoid G is associated uniquely with a positive definite matrix A via
Note that all ellipsoids discussed here are centred at 0. Given G 1 , and G 2 we seek another ellipsoid circumscribed about G 1 ∩ G 2 . It is easy to see that
if and only if x'hx ≤ max i x'a i x for all vectors x.

Given an Nth degree polynomial $P(z)$ one may use Laguerre’s method to generate a sequence of complex numbers $x_0 ,x_1 ,x_2 , \ldots $ which usually converges to a, zero of $P(z)$. This note shows that each circle $\left| {z - x_n } \right| \leqq \sqrt N \left| {x_{n + 1} - x_n } \right|$ contains at least one zero of $P(z)$. If N is not a perfect...

Having established tight bounds for the quotient of two different lub-norms of the same tri-diagonal matrix J, the author observes that these bounds could be of use in an error-analysis provided a suitable algorithm were found. Such an algorithm is exhibited, and its errors are thoroughly accounted for, including the effects of scaling, over/underf...

The primordial problems of linear algebra are the solution of a system of linear equations
and the solution of the eigenvalue problem
for the eigenvalues λ k , and corresponding eigenvectors of a given matrix A.

A numerically stable and fairly fast scheme is described to compute the unitary matrices U and V which transform a given matrix A into a diagonal form Σ = U* AV, thus exhibiting A's singular values on Σ's diagonal. The scheme first transforms A to a bidiagonal matrix J, then diagonalizes J. The scheme described here is complicated but does not suff...

A technique for generating normally distributed random numbers is described. It is faster than those currently in general use and is readily applicable to both binary and decimal computers.

A selection of the definitions prepared by the ACM Standards Committee's Subcommittee on Programming Terminology is presented for review by the ACM membership.

A selection of the definitions prepared by the ACM Standards Committee's Subcommittee on Programming Terminology is presented for review by the ACM membership.

An abstract is not available.

## Projects

Projects (2)