Elisardo Antelo

Elisardo Antelo
University of Santiago de Compostela | USC · Department of Computers and Electronics

PhD

About

64
Publications
6,465
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
964
Citations
Citations since 2016
5 Research Items
283 Citations
201620172018201920202021202201020304050
201620172018201920202021202201020304050
201620172018201920202021202201020304050
201620172018201920202021202201020304050
Introduction
Skills and Expertise
Additional affiliations
March 1998 - present
University of Santiago de Compostela
Position
  • Professor (Associate)

Publications

Publications (64)
Preprint
Full-text available
ExaScale systems will be a key driver for simulations that are essential for advance of science and economic growth. We aim to present a new concept of microprocessor for floating-point computations useful for being a basic building block of ExaScale systems and beyond. The proposed microprocessor architecture has a frontend for programming interfa...
Article
Hardware signatures based on Bloom filters are used to support and accelerate membership query in a set of items. They use modest hardware at the cost of false positives, but never produce false negatives. Signatures were traditionally used in different distributed and network applications, but in recent years their use has been extended to other f...
Article
We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous...
Technical Report
Full-text available
ExaScale systems will be a key driver for simulations that are essential for advance of science and economic growth. Current technology trends indicate that there might be a big energy wall by the end of the decade. Different reports call for strong changes at all levels for ExaScale computer systems. This academic position paper addresses this pro...
Article
Full-text available
The four articles in this special section focus on the topic of computer arithmetic and its applications.
Article
Full-text available
With the advent of chip multiprocessors, new techniques have been developed to make parallel programing easier and more reliable. New parallel programing paradigms and new methods of making the execution of programs more efficient and more reliable have been developed. Usually, these improvements require hardware support to avoid a system slowdown....
Article
Two's complement multipliers are important for a wide range of applications. In this paper, we present a technique to reduce by one row the maximum height of the partial product array generated by a radix-4 Modified Booth Encoded multiplier, without any increase in the delay of the partial product generation stage. This reduction may allow for a fa...
Article
In this work we propose a new decimal redundant CORDIC algorithm to manage transcendental functions, using floating-point representation. The algorithms determine the direction of the elementary rotation using sign estimations. Unlike binary redundant CORDIC, repetition of iterations are not required to ensure convergence since novel decimal codes...
Conference Paper
We present a novel method for hardware design of combined binary/decimal multi-operand adders. More specifically, we apply this method to architectures based on binary CSA (carry-save adder) trees, which are of interest for VLSI implementation of high performance multipliers and other low latency arithmetic units. A remarkable feature of the propos...
Article
The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multipli...
Conference Paper
Full-text available
In this paper we propose a simple Cache Filtering Mechanism (CFM-TM) for TM systems that are coupled from caches with the aim of reducing the useof the transactional memory baseline system. We propose to use CFM-TM with LogTM-SE as the baseline system , because it uses signatures for conflict detection (a resource that might be used for other purpo...
Article
A recent work proposed to simplify fat-trees with adaptive routing by means of a load-balancing deterministic routing algorithm. The resultant network has performance figures comparable to the more complex adaptive routing fat-trees when packets need to be delivered in order. In a second work by the same authors published in IEEE CAL, they propose...
Conference Paper
Adders are critical for microprocessor design. Current designs use variations of parallel prefix schemes. A method introduced by Ling [7] may improve this kind of adders. However, as recent research publications demonstrate, the use of the Ling scheme in prefix adders is not a mature and clear concept. In this work we show how to easily extend any...
Article
The unfolded and pipelined CORDIC is a high-performance hardware element that produces a wide variety of one and two argument functions with high throughput. The reduction in delay, power, and area (cost) are of significant interest regarding this module due to its high demand for resources. The linear approximation to rotation has been proposed to...
Conference Paper
Full-text available
In this paper we present the algorithm and architecture a radix-10 floating-point divider based on an SRT non-restoring digit-by-digit algorithm. The algorithm uses conventional techniques developed to speed-up radix-2<sup>k</sup> division such as signed-digit (SD) redundant quotient and digit selection by constant comparison using a carry-save est...
Conference Paper
Full-text available
This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry-save multioperand addition that uses a novel BCD-4221 recoding for decimal digits. It significantly improves the area and latency of the partial product reduction tree with respect to previous proposals. We...
Chapter
We present a high-radix Cordic rotation algorithm, which results in a reduction of the number of iterations. Carry-save representation is used and the selection function is performed by rounding, except for i=0 where a small table is necessary. The scale factor is not constant, but is efficiently computed in logarithmic form and compensated by a hi...
Article
Full-text available
In this paper, we propose a class of division algorithms with the aim of reducing the delay of the selection of the quotient digit by introducing more concurrency and flexibility in its computation. From the proposed class of algorithms, we select one that moves part of the selection function out of the critical path, with a corresponding reduction...
Conference Paper
Full-text available
The reciprocal and square-root reciprocal operations are important in several applications. For these operations, we present algorithms that combine a digit-by-digit module and one iteration of a quadratic-convergence approximation. The latter is implemented by a digit-recurrence, which uses the digits produced by the digit-by-digit part. In this w...
Conference Paper
Full-text available
The pipelined CORDIC with linear approximation to rotation has been proposed to achieve reductions in delay, power and area; however, the schemes for rotation (multiplication) and vectoring (division) complicate implementation in a single unit. In this work, we improve the linear approximation scheme, leading to a unified implementation for rotatio...
Article
Graphics processors require strong arithmetic support to perform computational kernels over data streams. Because of the current implementation using the basic arithmetic operations, the algorithms are given in algebraic terms. However, since the operations are really of a geometric nature, it seems to us that more flexibility in the implementation...
Article
In this work, we present a reciprocal square root algorithm by digit recurrence and selection by a staircase function and the radix-4 implementation. As in similar algorithms for division and square root, the results are obtained correctly rounded in a straightforward manner (in contrast to existing methods to compute the reciprocal square root). A...
Article
In this work we present an implementation of the exponential function in double precision, in a unit that supports IEEE floating-point arithmetic. As existing proposals, the implementation is based on the use of a floating-point multiplier and additional hardware. We decompose the computation into three subexponentials. The first and third subexpon...
Conference Paper
Since a large portion of the critical path in an implementation of radix-4 division corresponds to the delay of the quotient-digit selection module, it is of interest to reduce this delay. The proposal of this paper extends the approach presented recently of prestoring the selection constants corresponding to the actual value of the divisor and to...
Conference Paper
We present hardware primitives for 3D rotation and vector normalization for high-throughput 3D graphics and animation. The primitives are based on the 2D and 3D CORDIC algorithms, in contrast to more conventional mac-based engines. Also considered are conversions among rotation representations and rotation composition based on the same primitives.
Conference Paper
In this work we present a reciprocal square-root algorithm by digit recurrence and selection by a staircase function, and the radix-4 implementation. As similar algorithms for division and square-root, the results are obtained correctly rounded in a straightforward manner (in contrast to existing methods to compute the reciprocal square-root). Alth...
Conference Paper
Full-text available
We present a reciprocal square-root algorithm by digit recurrence and selection by a staircase function, and the radix-4 implementation. As similar algorithms for division and square-root, the results are obtained correctly rounded in a straightforward manner (in contrast to existing methods to compute the reciprocal square-root). Although apparent...
Article
CORDIC algorithm has a lot of applications, e.g. digital signal processing, and its great latency is an important problem to overcome. A 32--bits precision implementation employing very--high radix of circular vectoring CORDIC in a word--serial architecture is presented. It is restricted to angle calculation, and has been implemented using a VHDL--...
Article
Full-text available
A very--high radix algorithm and implementation for circular CORDIC in vectoring mode is presented. As for division, to simplify the selection function, the operands are pre--scaled. However, in the CORDIC algorithm the coordinate x varies during the execution so several scalings might be needed; we show that two scalings are sufficient. Moreover,...
Article
In this work we present a Cordic rotator, using carry--save arithmetic, based on the prediction of all the coefficients into which the rotation angle is decomposed. The prediction algorithm is based on the use of radix--2 microrotations with multiple shifts in the first iterations and the use of a redundant radix--2 and radix--4 representation for...
Article
In this paper we present a high-radix Cordic rotation algorithm, which results in a reduction of the number of iterations. Carry--save representation is used, leading to a fast iteration time. The selection function is performed by rounding, except for i = 0 where a small table is necessary. The scale factor is not constant, but is efficiently comp...
Article
CORDIC--based algorithms to compute cos Gamma1 (t), sin Gamma1 (t) and p 1 Gamma t 2 are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors. Although these functions can be obtaine...
Article
In this work we present the VLSI implementation of an application specific processor that performs the angle calculation and rotation operation. This operation is important in matrix algebra and its hardware implementation is of interest for many real time applications. The computation of the angle and the rotation are performed by means of the red...
Article
A very-high radix algorithm and implementation for circular CORDIC is presented. We first present in depth the algorithm for the vectoring mode in which the selection of the digits is performed by rounding of the control variable. To assure convergence with this kind of selection, the operands are prescaled. However, in the CORDIC algorithm, the co...
Article
. A very-high radix algorithm and implementation for CORDIC rotation in circular and hyperbolic coordinates is presented. The selection function consists of rounding the residual. It is shown that this assures convergence from the second iteration on. For the first iteration, the selection is done by table, using a lower radix than for the remainin...
Article
CORDIC-based algorithms to compute cos $\cos ^{ - 1} (t),\sin ^{ - 1} (t)$ and $\sqrt {1 - t^2 }$ are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors [T. Lang and E. Antelo...
Article
Full-text available
Many applications require the evaluation of rotations at high speeds. However there is a trade--off between the chip area and the latency. In this paper we develop a digit on--line pipelined array architecture based on the radix-- 4 CORDIC algorithm in rotation mode. The radix--4 CORDIC algorithm halves the number of microrotations with respect the...
Article
In this work we extend the radix--4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode,...
Article
The computation of additional functions in the CORDIC module increases its flexibility. We consider here the extension of the vectoring mode (angle calculation) so that the vector is rotated until one of the coordinates (for instance y) attains a target value t (in contrast to the value 0, as in standard vectoring). The main problem in the algorith...
Article
Full-text available
This paper presents a new design for two operand normalization. The two operand normalization operation involves the normalization of at least one of two operands by left shifting both by the same amount. Our design performs the computation of the shift by making an OR of the bits of both operands in a tree network, encoding the position of the fir...
Article
A very-high radix digit-recurrence algorithm for the operation √(x/d) is developed, with residual scaling and digit selection by rounding. This is an extension of the division and square-root algorithms presented previously, and for which a combined unit was shown to provide a fast execution of these operations. The architecture of a combined unit...
Article
Abstract—A very-high radix digit-recurrence algorithm for the operation $\sqrt {{x \mathord{\left/ {\vphantom {x d}} \right. \kern-\nulldelimiterspace} d}}$ is developed, with residual scaling and digit selection by rounding. This is an extension of the division and square-root algorithms presented previously, and for which a combined unit was show...
Article
In this paper, we consider the errors appearing in angle computations with the CORDIC algorithm (circular and hyperbolic coordinate systems) using fixed-point arithmetic. We include errors arising not only from the finite number of iterations and the finite width of the data path, but also from the finite number of bits of the input. We show that t...
Article
Full-text available
Traditionally, CORDIC algorithms have employed radix-2 in the first n/2 microrotations (n is the precision in bits) in order to preserve a constant scale factor. The authors present a full radix-4 CORDIC algorithm in rotation mode and circular coordinates and its corresponding selection function, and propose an efficient technique for the compensat...
Article
Full-text available
In this work we present a new CORDIC algorithm for the vectoring mode, based on the use of radix-4, preserving a complexity in the microrotations that is similar to that of the conventional radix-2 CORDIC. The use of this radix, together with the inclusion in the CORDIC algorithm of the zero skipping technique, reduces by more than half the number...
Article
A very--high radix digit--recurrence algorithm for the operation p x=d is developed, with residual scaling and digit selection by rounding. This is an extension of the division and square--root algorithms presented previously, and for which a combined unit was shown to provide a fast execution of these operations. The architecture of a combined uni...
Conference Paper
CORDIC-based algorithms to compute cos<sup>-1</sup>(t), sin<sup>-1 </sup>(t) and √(1-t<sup>2</sup>) are proposed. The implementation requires a standard CORDIC module plus a module to compute the direction of rotation, this being the same hardware required for the extended CORDIC vectoring, recently proposed by the authors. Although these functions...
Conference Paper
The computation of additional functions in the CORDIC module increases its flexibility. We consider here the exten - sion of the vectoring mode (angle calculation) so that the vector is rotated until one of the coordinates (for instance ) attains a target value (in contrast to the value 0, as in standard vectoring). The main problem in the algorith...
Article
Full-text available
We present a unified mixed radix CORDIC algorithm with carry-save arithmetic with a constant scale factor. The pipelined architecture of the processor is determined by a unique sequence of microrotations for the two modes of operation (rotation and vectoring) in circular and hyperbolic coordinates. The combination of radix-2 and radix-4 microrotati...
Conference Paper
Full-text available
In this paper we present a new CORDIC algorithm for the vectoring mode, based on the use of radix-4 preserving a complexity in the microrotations that is similar to that of the conventional radix-2 CORDIC. The use of this radix, together with the inclusion in the CORDIC algorithm of the zero skipping technique, reduces by more than half the number...
Conference Paper
The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we will propose two algorithms and architectures in order to perform the compensation of the scale factor in parallel with the computation of the CORDIC iterations. This way it is not necessary to carry out the final multiplication or ad...
Conference Paper
Many applications figure the evaluation of rotations at high speeds. However there is a trade-off between the chip area and the latency. In this paper we develop a digit on-line pipelined array architecture based on the radix-4 CORDIC algorithm in rotation mode. The radix-4 CORDIC algorithm halves the number of microrotations with respect the tradi...
Conference Paper
We present a Cordic rotator, using carry-save arithmetic, based on the prediction of all the coefficients into which the rotation angle is decomposed. The prediction algorithm is based on the use of radix-2 microrotations with multiple shifts in the first iterations and the use of a redundant radix-2 and radix-4 representation for the coefficients...
Conference Paper
We present the design and implementation of the Sobel operator in an application specific integrated circuit. Systolic processor arrays were employed for an efficient exploitation of the advantages of VLSI technology. The architecture obtained is highly regular and simple. The performance of the architecture is improved by means of the use of carry...
Article
In this work we develop a generalization of the CORDIC algorithm for any radix in three coordinate systems, linear, circular and hyperbolic. We carry out a comparative study between different radixes at the number of additions level, due to the fact that the complexity in additions determines the total hardware associated with the implementation of...
Article
Full-text available
Floating-point implementations of the logarithm function require to compute a fixed-point approximation with high accuracy when the result is close to zero. Thus, iterative methods with linear convergence for the logarithm introduce a significant latency penalty when the input argument X ≈ 1. Some solutions use a second order polynomial approximati...

Network

Cited By