Figure 1 - uploaded by Florent De Dinechin
Content may be subject to copyright.
A 8 × 8-bit multiplier by 221, using the recoding 100100101

A 8 × 8-bit multiplier by 221, using the recoding 100100101

Source publication
Conference Paper
Full-text available
This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and well-known technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of this idea in Xil- inx JBits is detailed and discussed...

Contexts in source publication

Context 1
... least significant bits of this partial sum may be output directly, as they don't appear in any subsequent operation. Figure 1 shows for example a multiplier by 221. This core generator consists in less than 1000 lines of heavily commented Java. ...
Context 2
... target Virtex chips, where the adders may be very efficiently implemented using the dedicated fast carry logic. On Fig. 1, each grey block is a LUT configured as a full adder, and on Fig. 2, each small square represents the out- put of a LUT (the LUTs are grouped by CLBs). In a column of CLBs it is thus possible to place two fast adder ...
Context 3
... the core, so we implemented another solution which has the same cost and other advantages: a first slice computes −x, and then all the slices are adders. Care must be taken however, when we know that the current partial sum is nega- tive, to perform a sign extension of this partial sum, i.e. feed the free inputs with ones instead of zeroes (see Fig. 1). One advantage of this solution is that it makes the handling of two's complement signed numbers easy: as we al- ready manipulate internally x and −x, oper- ating on signed input and signed constants is only a matter of setting the sign extension bits properly (although this is not implemented ...
Context 4
... the least significant bits of the result are not routed to a side of the core (contrary to what Fig 1 could lead to be- lieve). These outputs are JBits "ports", acces- sible to the router without necessarily having to worry about their actual location. ...

Similar publications

Article
Full-text available
Fuzzy Logic controller (FLC) contains three operations; the fuzzification of the inputs, the knowledge base (data base and rule base), and the defuzzification of the output. In this paper our fuzzy controller contains two inputs and one output each have five membership functions. This fuzzy controller will pass through two operations; the first is...
Conference Paper
Full-text available
Configurable Computing is emerging as important new organizational structures for implementing computations. It combines post-fabrication programmability of processor with the spatial computational style most commonly employed in hardware designs. The result changes the traditional ―hardware‖ and ―Software‖ boundaries providing an opportunity for g...
Article
Full-text available
Signal generation and measurement have been widely used in many engineering applications, such as for creating test signals in radar, communication, and software-defined radio. In field programmable gate array (FPGA) design, to generate and measure an analog signal, compatible software and hardware interfaces are required. Traditionally, hardware d...
Article
Full-text available
This paper is concerned with FPGA implementation of CORDIC schemes for fast and silicon area efficient computation of the sine and cosine functions. The results of theoretical investigation into redundant CORDIC are presented. Summary of CORDIC synthesis results based on Actel and XILINX FPGAs is given. Finally applications of CORDIC sine and cosin...
Article
Full-text available
During the last years, convolutional neural networks have been used for different applications, thanks to their potentiality to carry out tasks by using a reduced number of parameters when compared with other deep learning approaches. However, power consumption and memory footprint constraints, typical of on the edge and portable applications, usua...

Citations

... Due to this structural feature, when implementing arithmetic on an FPGA it is possible to use table methods based on the different table algorithms. These are, for example, the constant factor multiplier method based on canonical recoding, using the special algorithms to find the optimal chains of adders, subtractors, and shift elements [35]; the constant factor multiplier construction method, using finegrained FPGA memory resources and the special table search method [36]; and the method using the pre-computation of partial products [37]. ...
Article
Full-text available
Traditionally, the usual multipliers are used to multiply signals by a constant, but multiplication by a constant can be considered as a special operation requiring the development of specialized multipliers. Different methods are being developed to accelerate multiplications. A large list of methods implement multiplication on a group of bits. The most known one is Booth’s algorithm, which implements two-digit multiplication. We propose a modification of the algorithm for the multiplication by three digits at the same time. This solution reduces the number of partial products and accelerates the operation of the multiplier. The paper presents the results of a comparative analysis of the characteristics of Booth’s algorithm and the proposed algorithm. Additionally, a comparison with built-in FPGA multipliers is illustrated.
... For example, due to rounding errors (binary calculation), instead of the correct result of 1098.892, an initial index value at the Vancouver Stock Exchange, which began from 10 0 0.0 0 0 (decimal), dropped to 574.081 after two years [10] . On the other hand, from the hardware realization point of view, the use of generic multipliers to carry out constant multiplication is not advantageous to the operation's speed, implementation area and power consumption [11] . ...
Article
Due to the increasing demand for decimal calculations in the business, financial and economic world, decimal arithmetic circuits have been much considered by system designers. This is mainly because, these applications heavily depend on decimal arithmetic since the results must match exactly those obtained by human calculations. While decimal multiplication is one of the most frequent and complex-to-implement decimal operations, the special case of constant decimal multiplication is widely used in the economic and financial applications. In this paper, we propose two ideas, named “Constant Decimal TCSD” (CDT) and “Constant Decimal DDDS” (CDD), and their hardware implementations for realizing multiple constant decimal multiplication. In the CDT and CDD architectures, the partial products are generated using a set of positive multiplicand multiples coded in2′s complement signed-digit format (TCSD) and binary coded decimal (BCD), respectively. We also present two new (3:1) compressors to reduce the number of partial products in both designs, one based on a new Double Decimal Digit Set (DDDS), which is not only self-complementing but also its redundancy, allows carry-free addition. Finally, a redundant to non-redundant converter recodes the TCSD and DDDS product to BCD in the first and second schemes, respectively. Hardware synthesis evaluation shows that compared to the most recent 16 × 16 decimal multipliers, delay, area, power consumption and PDP of the proposed multiple constant multipliers improve up to 57%, 89%, 93% and 97%, respectively.
... In the early days of microprocessors it was established that it was necessary to create a specialized circuit that will perform this task [2]. Contemporary digital circuits that perform digital signal processing (DSP), error correction codes (ECC), fast fourier transformations (FFT) all implement this function [3], [6]. Specifically, FFT processing is one of the most critical components in the orthogonal frequency division multiplexing (OFDM) [4]. ...
... From hardware point of view, it's always a waste of space and time to implement a generic constant multiplier [6]. Considering this we have realized our multiplier design using the simple functions of shifting and addition. ...
Conference Paper
Full-text available
The rising complexity of embedded digital applications and the growing importance of time-to-market require EDA tools to automate the design and implementation process of various IP blocks. One very important class of EDA tools is the generation of hardware descriptions for popular IP blocks. The multiplication by an integer constant is a special type of problem that is required in a plethora of situations. Here, we present an online tool that can generate HDL descriptions constant multiplication intellectual property blocks, using only elementary operations, like shifting and addition. Our synthesized circuits on Xilinx Virtex 6 FPGA XC6VLX760, operate up to 589 Mhz.
... Thus, in order to effectively use embedded primitive and macro resources the design entry needs to be modified. There has been subsequent work regarding implementation of multipliers on FPGAs1718192021222324252627282930313233. These mainly focus on modifying the multiplier architecture to achieve performance improvement. ...
Article
Full-text available
Modern Field Programmable Gate Arrays (FPGA) are fast moving into the consumer market and their domain has expanded from prototype designing to low and medium volume productions. FPGAs are proving to be an attractive replacement for Application Specific Integrated Circuits (ASIC) primarily because of the low Non-recurring Engineering (NRE) costs associated with FPGA platforms. This has prompted FPGA vendors to improve the capacity and flexibility of the underlying primitive fabric and include specialized macro support and intellectual property (IP) cores in their offerings. However, most of the work related to FPGA implementations does not take full advantage of these offerings. This is primarily because designers rely mainly on the technology-independent optimization to enhance the performance of the system and completely neglect the speed-up that is achievable using these embedded primitives and macro support. In this paper, we consider the technology-dependent optimization of fixed-point bit-parallel multipliers by carrying out their implementations using embedded primitives and macro support that are inherent in modern day FPGAs. Our implementation targets three different FPGA families viz. Spartan-6, Virtex-4 and Virtex-5. The implementation results indicate that a considerable speed up in performance is achievable using these embedded FPGA resources. Keywords— Fixed point arithmetic, FPGA primitives, VHDL, Instantiation based coding, Look-up table.
... Thus, in order to effectively use embedded primitive and macro resources the design entry needs to be modified. There has been subsequent work regarding implementation of multipliers on FPGAs1718192021222324252627282930313233. These mainly focus on modifying the multiplier architecture to achieve performance improvement. ...
Article
Full-text available
Modern Field Programmable Gate Arrays (FPGA) are fast moving into the consumer market and their domain has expanded from prototype designing to low and medium volume productions. FPGAs are proving to be an attractive replacement for Application Specific Integrated Circuits (ASIC) primarily because of the low Non-recurring Engineering (NRE) costs associated with FPGA platforms. This has prompted FPGA vendors to improve the capacity and flexibility of the underlying primitive fabric and include specialized macro support and intellectual property (IP) cores in their offerings. However, most of the work related to FPGA implementations does not take full advantage of these offerings. This is primarily because designers rely mainly on the technology-independent optimization to enhance the performance of the system and completely neglect the speed-up that is achievable using these embedded primitives and macro support. In this paper, we consider the technology-dependent optimization of fixed-point bit-parallel multipliers by carrying out their implementations using embedded primitives and macro support that are inherent in modern day FPGAs. Our implementation targets three different FPGA families viz. Spartan-6, Virtex-4 and Virtex-5. The implementation results indicate that a considerable speed up in performance is achievable using these embedded FPGA resources. Keywords— Fixed point arithmetic, FPGA primitives, VHDL, Instantiation based coding, Look-up table.
... This is simply because these algorithms do not consider the implementation cost of each addition/subtraction operation in terms of the main building blocks of FPGAs, e.g., look-up tables (LUTs) or configurable logic blocks (CLBs). Although there exist high-level techniques [8], [9] whose solutions are realized on FPGAs, to the best our knowledge, there exists no high-level MCM algorithm that targets the optimization of MCM area on FPGAs considering its specifications. Hence, this paper introduces an approximate algorithm, called LUTOR, that initially applies the Hcub algorithm [5] on an MCM instance to find a solution with the fewest number of operations. ...
Article
Full-text available
The multiple constant multiplications (MCM) operation, which realizes the multiplication of a set of constants by a variable, has a significant impact on the complexity and performance of the digital finite impulse response (FIR) filters. Over the years, many high-level algorithms and design methods have been proposed for the efficient implementation of the MCM operation using only addition, subtraction, and shift operations. The main contribution of this paper is the introduction of a high-level synthesis algorithm that optimizes the area of the MCM operation and, consequently, of the FIR filter design, on field programmable gate arrays (FPGAs) by taking into account the implementation cost of each addition and subtraction operation in terms of the number of fundamental building blocks of FPGAs. It is observed from the experimental results that the solutions of the proposed algorithm yield less complex FIR filters on FPGAs with respect to those whose MCM part is implemented using prominent MCM algorithms and design methods.
... A single unoptimised multiplication by 4/π may account for about one third the area of a dual sine/cosine operator [7]. The present article essentially reconciles two research directions that were so far treated separately: on the one side, the optimisation of multiplication by an integer constant , addressed in section 2, and on the other side the issue of correct rounding of multiplication or division by an arbitrary precision constant, addressed in section 4. Integer constant multiplication has been well studied, with many good heuristics published [3, 6, 13, 5, 1, 15]. Its theoretical complexity is still an open question: it was only recently proven sub-linear, although using an approach which is useless in practice [9, 15] . ...
... The CSD recoding of a constant may be translated into a rectangular architecture[5], an example of which is given byFigure 1 . This architecture corresponds to the following parenthesing: 221X = X< <8 + (−X< <5 + (−X< < 2 + X)). ...
Conference Paper
Reconfigurable circuits now have a capacity that allows them to be used as floating-point accelerators. They offer massive parallelism, but also the opportunity to design optimised floating-point hardware operators not available in microprocessors. Multiplication by a constant is an important example of such an operator. This article presents an architecture generator for the correctly rounded multiplication of a floating-point number by a constant. This constant can be a floating-point value, but also an arbitrary irrational number. The multiplication of the significands is an instance of the well-studied problem of constant integer multiplication, for which improvement to existing algorithms are also proposed and evaluated.
... Par exemple, chaque architecture utilise des multiplieurs par des constantes. Pour ces opérateurs, il est possible d'utiliser diverses techniques d'optimisations (Chapman, 1994 ; de Dinechin et al., 2000). Une question plus générale est la méthodologie de conception de tels opérateurs. ...
Article
Les circuits reconfigurables FPGA ont désormais une capacité telle qu'ils peuvent être utilisés à des tâches d'accélération de calcul en virgule flottante. La littérature (et depuis peu les constructeurs) proposent des opérateurs pour les quatre opérations. L'étape suivante est de proposer des opérateurs pour les fonctions élémentaires les plus utilisées. Parmi celles-ci, nous proposons des architectures dédiées pour l'évaluation des fonctions exponentielles, logarithme, sinus et cosinus, et étudions les compromis possibles. Pour chacune de ces fonctions, un seul de ces opérateurs surpasse d'un facteur dix les processeurs généralistes en terme de débit, tout en occupant une fraction des ressources matérielles du FPGA. Tous ces opérateurs sont disponibles librement sur http://www.ens-lyon.fr/LIP/Arenaire/.
... By definition, a constant has a fixed exponent, therefore the floating point is of little significance here: all the research that has been done on integer constant multiplication [4,26,6,14] can be used straightforwardly. To sum up this research, a constant multiplier will always be at least twice as small (and usually much smaller) than using a standard multiplier implemented using only CLBs. ...
Article
It has been shown that FPGAs could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the FPGA the operators present in a processor. This is a safe and relatively straightforward approach, but it doesn't exploit the greater flexibility of the FPGA. This article is a survey of the many ways in which the FPGA implementation of a given floating-point computation can be not only faster, but also more accurate than its microprocessor counterpart. Techniques studied here include custom precision, specific accumulator design, dedicated architectures for coarser operators which have to be implemented in software in processors, and others. A real-world biomedical application illustrates these claims. This study also points to how current FPGA fabrics could be enhanced for better floating-point support.
... By definition, a constant has a fixed exponent, therefore the floating point is of little significance here: all the research that has been done on integer constant multiplication [4,26,6,14] can be used straightforwardly. To sum up this research, a constant multiplier will always be at least twice as small (and usually much smaller) than using a standard multiplier implemented using only CLBs. ...
Conference Paper
Full-text available
It has been shown that FPGAs could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the FPGA the operators present in a processor. This conservative approach is relatively straightforward, but it doesn't exploit the greater flexibility of the FPGA. We survey the many ways in which the FPGA implementation of a given floating-point computation can be not only faster, but also more accurate than its microprocessor counterpart. Techniques studied here include custom precision, mixing and matching fixed- and floating-point, specific accumulator design, dedicated architectures for coarser operators implemented as software in processors (such as elementary functions or Euclidean norms), operator specialization such as constant multiplication, and others. The FloPoCo project (http://www.ens-lyon.fr/LIP/Arenaire/Ware/FloPoCo/) aims at providing such non-standard operators. As a conclusion, current FPGA fabrics could be enhanced to improve floating-point performance. However, these enhancements should not take the form of hard FPU blocks as others have suggested. Instead, what is needed is smaller building blocks more generally useful to the implementation of floating-point operators, such as cascadable barrel shifters and leading zero counters