ArticlePDF Available


Not Available
114 Computer
f you were a programmer using float-
ing-point computations in the 1960s
and 1970s, you had to cope with a
wide variety of configurations, with
each computer supporting a differ-
ent range and accuracy for floating-point
numbers. While most of these differences
were merely annoying, some were very
serious. One computer, for example,
might have values that behaved as non-
zero for additions but behaved as zero
for division. Sometimes a programmer
had to multiply all values by 1.0 or exe-
cute a statement such as X = (X +X) X
to make a program work reliably. These
factors made it extremely difficult to
write portable and reliable numerical
In 1976, Intel began to plan for a float-
ing-point coprocessor for the Intel
i8086/8 and i432 microprocessors. John
Palmer convinced Intel that they needed
to develop a thorough standard to spec-
ify the arithmetic operations for their
coprocessor so that all Intel processors
would produce the same results. Because
William Kahan had extensive experience
with the IBM, Cray, and Control Data
Corp. (CDC) floating point, he was one
of the few who understood the challenges
of writing accurate numerical code. In
1976, Kahan’s influence on floating-
point processing escalated when Intel
hired him as a consultant to help design
the arithmetic for the 8087 processor.
As a result, he had a hand in the birth
of the IEEE 754 specification for float-
ing-point computations.
—Charles Severance
Charles Severance: When Intel hired
you as a consultant in 1976, what did
they want you to do?
William Kahan: The folks at Intel
decided that they wanted really good
arithmetic. The DEC VAX was really not
that bad, so my reasoning went: Why not
copy the VAX? Intel wanted the best
arithmetic, so Palmer and I got together
to think about what the best arithmetic
should be. One of the things Palmer told
me was that Intel anticipated selling these
coprocessors in very large numbers. The
best arithmetic was what was best for a
large market, which subsequently started
to frighten Silicon Valley because of
rumors that Intel was building floating
point on a single chip, the i8087. And
when they heard rumors of what was
going to be on that chip, they were
CS: Out of this thinking grew IEEE
WK: People have said from time to
time (as a joke) that the other Silicon
Valley companies got worried and joined
the IEEE 754 working group. I realized at
this first meeting that the members of the
committee were very serious. CDC did-
n’t bother to attend that meeting in
November 1977 because it was a micro-
processor committee—they had no idea
that microprocessors would mean any-
thing at all. Cray felt the same way. IBM
was only there in an observer capacity—
they knew microprocessors were coming
but they couldn’t say much.
CS: What were the meetings like?
WK: One of my friends said that
attending one of these meetings was like
a visit to the Grand Canyon: just awe-
some. In the usual standards meeting
everybody wants to grandfather in his
own product. I think that it is nice to
have at least one example—and the float-
ing-point standard is one—where sleaze
did not triumph. Cray, CDC, and IBM
could have weighed in, if they wanted to,
and destroyed the whole thing. But CDC
and Cray must have thought, “Micro-
processors. Why worry?”
CS: What happened next?
WK: After the first meeting, I went
back to Intel and asked to participate in
the standards effort. Then Gerome
Kunan, Harold Stone, and I prepared a
draft document of the Intel specification
in the format of an IEEE standard and
brought it back to an IEEE 754 meeting.
CS: Were there any complications?
WK: I got Palmer’s verbal permission
to disclose the specifications for the non-
transcendental functions on the chip, but
not the specifications for the architecture.
I could describe the precision, exponent
ranges, special values, and storage for-
mats. I could also disclose some of the
reasoning behind the decisions. We did-
n’t say a word about the i8087’s tran-
scendental functions—I had to bite my
tongue. [Commonly used transcendental
functions include sine, cosine, loga-
IEEE 754: An
Interview with
William Kahan
Editor: Charles Severance, Michigan State
University, Department of Computer
Science, 1338 Engineering Bldg., East
Lansing, MI 48824; voice (517) 353-2268;
fax (517) 355-7516;;
I think that it is nice
to have at least one
example—and the
floating-point standard
is one—where sleaze
did not triumph.
March 1998 115
rithms, and exponentials. —CS] We were
going to put the transcendental functions
on the 8087 chip, and it was going to
have an interesting architecture. We
really didn’t want to give away the whole
ball of wax. Intel was going to spring a
real surprise on the world. We were
going to have a chip that had most of the
essentials of a math library using only
40,000 transistors.
CS: So you brought the draft back to
the IEEE 754 group, but there were mul-
tiple proposals being put forward. DEC
was suggesting that their format be
adopted and there were other proposals
as well. The initial reaction to your doc-
ument was mixed, wasn’t it?
WK:Initially, it looked pretty compli-
cated. But what distinguished our pro-
posal from the others was that we had
reasoned out the details. What we had to
do was enhance the likelihood that the
code would get correct results and we
had to arrange it so that the people who
were really experts in floating point
could write portable software and prove
that it worked. Also, the design had to
be feasible. I had to be reasonably confi-
dent that when floating-point arithmetic
was built into hardware it would still run
at a competitive speed. At the same time
I had to be careful. There were things
going on at Intel that I couldn’t talk
about with the committee. This was par-
ticularly the case of the gradual under-
flow—the subnormal numbers. I had in
mind a way to support gradual under-
flow at high speeds, but I couldn’t talk
about that.
CS: What happened with the propos-
WK: The existing DEC VAX format
had the advantage of a broadly installed
base. Originally, the DEC double-preci-
sion format had the same number of
exponent bits as its single-precision val-
ues, which turned out to be too few
exponent bits for some double-precision
computations. DEC addressed this by
introducing its G double-precision for-
mat, which supported an 11-bit expo-
nent and which was the same as the CDC
floating-point format. With the G for-
mat, the major remaining difference
between the Intel format and the VAX
format was gradual underflow.
[Gradual underflow provides a num-
ber of advantages over abrupt underflow.
Without it, the gap between zero and the
smallest floating-point number is much
larger than the gap between successive
small floating-point numbers. Without
gradual underflow one can find two val-
ues, X and Y (such that X is not equal to
Y), and yet when you subtract them their
result is zero. While a skilled numerical
analyst could work around this limita-
tion in many situations, this anomaly
would tend to cause problems for less
skilled programmers.—CS]
CS: Given the advantages of under-
flow, why was anyone opposed to it?
WK: The primary reason that some
committee members were opposed to
gradual underflow was the claim that it
would slow performance. After my con-
fidentiality obligations expired, I could
talk about ways of doing gradual under-
flow in hardware without slowing down
all floating-point operations.
At one of the meetings in the late
1970s, DEC came in with a hardware
engineer who said that it was going to be
impossible to build fast hardware to sup-
port the proposed standard. It just so
happened that we had a student, George
Taylor, who had taken up the task of
producing a new floating-point board for
a VAX. We were going to remove the
floating-point boards and substitute our
own with IEEE standard arithmetic.
Otherwise, it conformed to the DEC
VAX instruction set. We were going to
compare a good arithmetic (the VAX
arithmetic) with the IEEE arithmetic and
see what it was going to be like. So
George came to a meeting, showed how
it was going to work, and it was perfectly
clear to everyone there that this was emi-
nently feasible.
CS: Wasn’t there also an attempt to
prove that gradual underflow was bad
from a theoretical viewpoint?
WK: Yes, DEC had been struggling to
persuade us that gradual underflow was
a bad thing. If they could prove it was
unnecessary, there was no reason not to
use DEC’s exponent bias. The excep-
tional handling and other details could
be done with small tweaks. DEC finally
commissioned one of the most prominent
error analysts in the east, G.W. (Pete)
Stewart, to perform the study. He was to
look into the error analysis aspects to
demonstrate that gradual underflow was
not all that I had cracked it up to be.
CS: And what happened?
WK: At a meeting in Boston in 1981,
Stewart reported that, on balance, he
thought gradual underflow was the right
thing to do. The DEC folk who had com-
missioned the report were rather disap-
pointed and they said, “OK, we’ll
publish this later.” They were really
annoyed because this was on their home
turf. Having suffered that rather sub-
stantial defeat, they got disheartened.
CS: With all the success of IEEE 754,
what’s missing?
WK: Compilers and programming
languages new and old—from Java to
Fortran—still lack competent support
for features of IEEE 754 so painstakingly
provided by practically all hardware
nowadays. SANE, the Standard Apple
Numerical Environment, on old Moto-
rola 68K-based Macs is the main excep-
tion. Programmers seem unaware that
IEEE 754 is a standard for their pro-
gramming environment, not just for
The new C9X proposal before ANSI
X3J11 is a fragile attempt to insinuate
their support into the C and C++
language standards. It deserves the
informed consideration of the program-
mers it tries to serve, not indifference.
s new microprocessor-based com-
puters have become widespread,
we have clearly benefited from the
widely available floating-point standard.
Users and programmers alike need to
thank William Kahan and the others
involved in IEEE 754 for their efforts.
For more detail on the subject, see http://
William Kahan won the ACM Turing
Award in 1989 and is currently professor
of computer science at the University of
California, Berkeley. He can be con-
tacted at
... In scientific data analytics, besides the consideration of computing performance, accuracy of the results and dynamic range of data representation are critical features that must be considered, and improving existing FPGA floating-point IP cores is a significant way to obtain better results. At present, the floating-point IP cores in FPGA design use IEEE standard for floating-point arithmetic -IEEE 754, and which is firstly released by Institute of Electrical and Electronics Engineers (IEEE) in 1985 [3][4][5]. ...
... A-f rac and B-f rac with fixed-point multiplication.3. Normalize Result-f rac. ...
Full-text available
The high performance, power efficiency and reconfigurable characteristic of FPGA attract more and more attention in big data processing. In scientific data analytics, besides the consideration of computing performance, accuracy of the results and dynamic range of data representation are critical features that must be considered. At present, the floating-point IP cores in FPGA design use IEEE standard for floating-point arithmetic – IEEE 754. For FPGA based scientific data application, improving existing floating-point IP cores is a significant way to obtain better results. Posit is a floating-point arithmetic format first proposed by John L. Gustafson in 2017. In posit, the variable precision and efficient representation of exponent contribute a higher accuracy and larger dynamic range than IEEE 754. This work researches on the FPGA implementation of posit arithmetic for extending floating-point IP cores for FPGA based scientific data analytics. We design the logic for hardware implementation and implement it on FPGA. We compare the precision representation, dynamic range and performance of implemented posit FPU (Floating-Point Unit) with IEEE 754 floating-point IP cores. Posit exhibits better superiority in precision representation and dynamic range than IEEE 754, and through further optimization of the implementation, posit can be a good candidate for floating-point IP cores.
... It greatly facilitated the design and portability of numerical software. The reader interested in the history of the birth of the Standard can read the interview with William Kahan by Severance (1998). A significant revision of IEEE-754 was published in 2008, and a minor revision of the 2008 version was released in 2019 (IEEE 2019). ...
Full-text available
Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.
... For there to be generic hardware to process mathematical functions it was also required that the standard for representing the real numbers they would generate was agreed upon. This was part of the reason for the formation of the IEEE-754 working group and the subsequent standard [144]. By the early 1980s hardware implementation of some of the trigonometric functions began to be widespread. ...
The four arithmetic floating-point operations (+,−,÷and×) have been precisely specified in IEEE-754 since 1985, but the situation for floating-point mathematical libraries and even some hardware operations such as fused multiply-add is more nuanced as there are varying opinions on which standards should be followed and when it is acceptable to allow some error or when it is necessary to be correctly-rounded. Deterministic correctly-rounded elementary mathematical functions are important in many applications. Others are tolerant to some level of error and would benefit from less accurate, better-performing approximations. We found that, despite IEEE-754 (2008 and 2019 only)specifying that ‘recommended functions’ such as sin, cos or log should be correctly rounded, the mathematical libraries available through standard interfaces in popular programming languages provide neither correct-rounding nor maximally performing approximations, partly due to the differing accuracy requirements of these functions in conflicting standards provided for some languages, such as C. This dissertation seeks to explore the current methods used for the implementation of mathematical functions, show the error present in them and demonstrate methods to produce both low-cost correctly-rounded solutions and better approximations for specific use-cases. This is achieved by: First, exploring the error within existing mathematical libraries and examining how it is impacting existing applications and the development of programming language standards. We then make two contributions which address the accuracy and standard conformance problems that were found: 1) an approach for a correctly-rounded 32-bit implementation of the elementary functions with minimal additional performance cost on modern hardware; and 2) an approach for developing a better performing incorrectly-rounded solution for use when some error is acceptable and conforming with the IEEE-754 standard is not a requirement. For the latter contribution, we introduce a tool for semi-automated generic code sensitivity analysis and approximation. Next, we target the creation of approximations for the standard activation functions used in neural networks. Identifying that significant time is spent in the computation of the activation functions, we generate approximations with different levels of error and better performance characteristics. These functions are then tested in standard neural networks to determine if the approximations have any detrimental effect on the output of the network. We show that, for many networks and activation functions, very coarse approximations are suitable replacements to train the networks equally well at a lower overall time cost. This dissertation makes original contributions to the area of approximate computing. We demonstrate new approaches to safe-approximation and justify approximate computation generally by showing that existing mathematical libraries are already suffering the downsides of approximation and latent error without fully exploiting the optimisation space available due to the existing tolerance to that error and showing that correctly-rounded solutions are possible without a significant performance impact for many 32-bit mathematical functions.
... In addition, there are 80-bit doubleextended precision numbers, known as type long double in C/C++; the floating-point system is not precisely specified for these, but the constraint on precision is that t ≥ 64 [Kah97]. Although IEEE 754 has brought standardisation to computer hardware, compilers and programming languages are considered lacking in their support for the standard [Mon08,Sev98]. Arithmetic operations as specified in a program do not necessarily adhere to the standard, and so may be interpreted differently on different platforms. ...
... NaN was excluded from the standard as it was not supported in VAX floating point (see e.g. Severance, 1998) and the Starlink software was not ported to machines supporting IEEE floating point until the 1990s (e.g., Clayton, 1991). Unlike FITS, which did not officially support floating point until 1990point until (Schlesinger et al., 1991Wells and Grosbøl, 1989) when they were able to adopt NaN as part of the standard, much software pre-existed in the Starlink environment at this time and embodied direct tests for magic values in data. ...
The extensible N-Dimensional Data Format (NDF) was designed and developed in the late 1980s to provide a data model suitable for use in a variety of astronomy data processing applications supported by the UK Starlink Project. Starlink applications were used extensively, primarily in the UK astronomical community, and form the basis of a number of advanced data reduction pipelines today. This paper provides an overview of the historical drivers for the development of NDF and the lessons learned from using a defined hierarchical data model for many years in data reduction software, data pipelines and in data acquisition systems.
Computational processes execute on computing systems, including computer systems and computing application systems. Systems thinking is the way of thinking to make computational processes practical. Systems thinking must systematically and thoroughly address all necessary details and complexities. The main character of systems thinking is: using abstractions to compose modules into a system, to enable seamless execution of computational processes.
As stated in the introduction, roughly speaking, a radix-β floating-point number x is a number of the form $$\displaystyle{m \cdot \beta ^{e},}$$ where β is the radix of the floating-point system, m such that | m | < β is the significand of x, and e is its exponent.
Our main focus in this chapter is the IEEE 754-2008 Standard for Floating-Point Arithmetic [267] , a revision and merge of the earlier IEEE 754-1985 [12] and IEEE 854-1987 [13] standards. A paper written in 1981 by Kahan, Why Do We Need a Floating-Point Standard? [315], depicts the rather messy situation of floating-point arithmetic before the 1980s. Anybody who takes the view that the current standard is too constraining and that circuit and system manufacturers could build much more efficient machines without it should read that paper and think about it. Even if there were at that time a few reasonably good environments, the various systems available then were so different that writing portable yet reasonably efficient numerical software was extremely difficult. For instance, as pointed out in [553], sometimes a programmer had to insert multiplications by 1. 0 to make a program work reliably.
ResearchGate has not been able to resolve any references for this publication.