A 8 × 8-bit multiplier by 221, using the recoding 100100101

Source publication

Constant Multipliers for FPGAs.

Conference Paper

Full-text available

Jan 2000

This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and well-known technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of this idea in Xil- inx JBits is detailed and discussed...

Context 1

... least significant bits of this partial sum may be output directly, as they don't appear in any subsequent operation. Figure 1 shows for example a multiplier by 221. This core generator consists in less than 1000 lines of heavily commented Java. ...

View in full-text

Context 2

... target Virtex chips, where the adders may be very efficiently implemented using the dedicated fast carry logic. On Fig. 1, each grey block is a LUT configured as a full adder, and on Fig. 2, each small square represents the out- put of a LUT (the LUTs are grouped by CLBs). In a column of CLBs it is thus possible to place two fast adder ...

View in full-text

Context 3

... the core, so we implemented another solution which has the same cost and other advantages: a first slice computes −x, and then all the slices are adders. Care must be taken however, when we know that the current partial sum is nega- tive, to perform a sign extension of this partial sum, i.e. feed the free inputs with ones instead of zeroes (see Fig. 1). One advantage of this solution is that it makes the handling of two's complement signed numbers easy: as we al- ready manipulate internally x and −x, oper- ating on signed input and signed constants is only a matter of setting the sign extension bits properly (although this is not implemented ...

View in full-text

Context 4

... the least significant bits of the result are not routed to a side of the core (contrary to what Fig 1 could lead to be- lieve). These outputs are JBits "ports", acces- sible to the router without necessarily having to worry about their actual location. ...

View in full-text

FPGA-Based Implementation of Genetically Tuned Fuzzy Logic Controller (GA-FLC)

Article

Full-text available

Sep 2012

Fuzzy Logic controller (FLC) contains three operations; the fuzzification of the inputs, the knowledge base (data base and rule base), and the defuzzification of the output. In this paper our fuzzy controller contains two inputs and one output each have five membership functions. This fuzzy controller will pass through two operations; the first is...

Implementation of Linear Block Code for Digital Communication System using Configurable FPGA

Conference Paper

Full-text available

Jan 2010

Ambadas Shinde

Configurable Computing is emerging as important new organizational structures for implementing computations. It combines post-fabrication programmability of processor with the spatial computational style most commonly employed in hardware designs. The result changes the traditional ―hardware‖ and ―Software‖ boundaries providing an opportunity for g...

FIGURE 1. OpenCL system with a host CPU and FPGAs (a) data...

FIGURE 4. Kernel-to-kernel communication (a) without I/O channel...

FIGURE 17. Comparison between (a) input frequency and (b) output frequency.

OpenCL Implementation of FPGA-Based Signal Generation and Measurement

Article

Full-text available

Jan 2019

Signal generation and measurement have been widely used in many engineering applications, such as for creating test signals in radar, communication, and software-defined radio. In field programmable gate array (FPGA) design, to generate and measure an analog signal, compatible software and hardware interfaces are required. Traditionally, hardware d...

FPGA implementation of sine and cosine generators using the CORDIC algorithm

Article

Full-text available

Jan 1998

This paper is concerned with FPGA implementation of CORDIC schemes for fast and silicon area efficient computation of the sine and cosine functions. The results of theoretical investigation into redundant CORDIC are presented. Summary of CORDIC synthesis results based on Actel and XILINX FPGAs is given. Finally applications of CORDIC sine and cosin...

Convolutional layers: (a) classic CNN network and (b) SCNN network.

Accuracy and MSE to the change of the number of bits for input layer words.

An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

Article

Full-text available

Oct 2019

During the last years, convolutional neural networks have been used for different applications, thanks to their potentiality to carry out tasks by using a reduced number of parameters when compared with other deep learning approaches. However, power consumption and memory footprint constraints, typical of on the edge and portable applications, usua...

Fast FPGA-Based Multipliers by Constant for Digital Signal Processing Systems

Article

Full-text available

Jan 2023

Traditionally, the usual multipliers are used to multiply signals by a constant, but multiplication by a constant can be considered as a special operation requiring the development of specialized multipliers. Different methods are being developed to accelerate multiplications. A large list of methods implement multiplication on a group of bits. The most known one is Booth’s algorithm, which implements two-digit multiplication. We propose a modification of the algorithm for the multiplication by three digits at the same time. This solution reduces the number of partial products and accelerates the operation of the multiplier. The paper presents the results of a comparative analysis of the characteristics of Booth’s algorithm and the proposed algorithm. Additionally, a comparison with built-in FPGA multipliers is illustrated.

Architectures for multiple constant decimal multiplication

Article

May 2019
COMPUT ELECTR ENG

Due to the increasing demand for decimal calculations in the business, financial and economic world, decimal arithmetic circuits have been much considered by system designers. This is mainly because, these applications heavily depend on decimal arithmetic since the results must match exactly those obtained by human calculations. While decimal multiplication is one of the most frequent and complex-to-implement decimal operations, the special case of constant decimal multiplication is widely used in the economic and financial applications. In this paper, we propose two ideas, named “Constant Decimal TCSD” (CDT) and “Constant Decimal DDDS” (CDD), and their hardware implementations for realizing multiple constant decimal multiplication. In the CDT and CDD architectures, the partial products are generated using a set of positive multiplicand multiples coded in2′s complement signed-digit format (TCSD) and binary coded decimal (BCD), respectively. We also present two new (3:1) compressors to reduce the number of partial products in both designs, one based on a new Double Decimal Digit Set (DDDS), which is not only self-complementing but also its redundancy, allows carry-free addition. Finally, a redundant to non-redundant converter recodes the TCSD and DDDS product to BCD in the first and second schemes, respectively. Hardware synthesis evaluation shows that compared to the most recent 16 × 16 decimal multipliers, delay, area, power consumption and PDP of the proposed multiple constant multipliers improve up to 57%, 89%, 93% and 97%, respectively.

Online generation of constant mulitplication accelerators

Conference Paper

Full-text available

May 2015

The rising complexity of embedded digital applications and the growing importance of time-to-market require EDA tools to automate the design and implementation process of various IP blocks. One very important class of EDA tools is the generation of hardware descriptions for popular IP blocks. The multiplication by an integer constant is a special type of problem that is required in a plethora of situations. Here, we present an online tool that can generate HDL descriptions constant multiplication intellectual property blocks, using only elementary operations, like shifting and addition. Our synthesized circuits on Xilinx Virtex 6 FPGA XC6VLX760, operate up to 589 Mhz.

Achieving Performance Speed-up in FPGA Based Bit-Parallel Multipliers using Embedded Primitive and Macro support

Article

Full-text available

May 2015

Burhan Khurshid

Modern Field Programmable Gate Arrays (FPGA) are fast moving into the consumer market and their domain has expanded from prototype designing to low and medium volume productions. FPGAs are proving to be an attractive replacement for Application Specific Integrated Circuits (ASIC) primarily because of the low Non-recurring Engineering (NRE) costs associated with FPGA platforms. This has prompted FPGA vendors to improve the capacity and flexibility of the underlying primitive fabric and include specialized macro support and intellectual property (IP) cores in their offerings. However, most of the work related to FPGA implementations does not take full advantage of these offerings. This is primarily because designers rely mainly on the technology-independent optimization to enhance the performance of the system and completely neglect the speed-up that is achievable using these embedded primitives and macro support. In this paper, we consider the technology-dependent optimization of fixed-point bit-parallel multipliers by carrying out their implementations using embedded primitives and macro support that are inherent in modern day FPGAs. Our implementation targets three different FPGA families viz. Spartan-6, Virtex-4 and Virtex-5. The implementation results indicate that a considerable speed up in performance is achievable using these embedded FPGA resources. Keywords— Fixed point arithmetic, FPGA primitives, VHDL, Instantiation based coding, Look-up table.

Achieving Performance Speed-up in FPGA Based Bit-Parallel Multipliers using Embedded Primitive and Macro support

Article

Full-text available

May 2015

Design of low-complexity digital finite impulse response filters on FPGAs

Article

Full-text available

Mar 2012

The multiple constant multiplications (MCM) operation, which realizes the multiplication of a set of constants by a variable, has a significant impact on the complexity and performance of the digital finite impulse response (FIR) filters. Over the years, many high-level algorithms and design methods have been proposed for the efficient implementation of the MCM operation using only addition, subtraction, and shift operations. The main contribution of this paper is the introduction of a high-level synthesis algorithm that optimizes the area of the MCM operation and, consequently, of the FIR filter design, on field programmable gate arrays (FPGAs) by taking into account the implementation cost of each addition and subtraction operation in terms of the number of fundamental building blocks of FPGAs. It is observed from the experimental results that the solutions of the proposed algorithm yield less complex FIR filters on FPGAs with respect to those whose MCM part is implemented using prominent MCM algorithms and design methods.

Integer and Floating-Point Constant Multipliers for FPGAs

Conference Paper

Aug 2008

Reconfigurable circuits now have a capacity that allows them to be used as floating-point accelerators. They offer massive parallelism, but also the opportunity to design optimised floating-point hardware operators not available in microprocessors. Multiplication by a constant is an important example of such an operator. This article presents an architecture generator for the correctly rounded multiplication of a floating-point number by a constant. This constant can be a floating-point value, but also an arbitrary irrational number. The multiplication of the significands is an instance of the well-studied problem of constant integer multiplication, for which improvement to existing algorithms are also proposed and evaluated.

Fonctions élémentaires en virgule flottante pour les accélérateurs reconfigurables

Article

Jul 2008

Les circuits reconfigurables FPGA ont désormais une capacité telle qu'ils peuvent être utilisés à des tâches d'accélération de calcul en virgule flottante. La littérature (et depuis peu les constructeurs) proposent des opérateurs pour les quatre opérations. L'étape suivante est de proposer des opérateurs pour les fonctions élémentaires les plus utilisées. Parmi celles-ci, nous proposons des architectures dédiées pour l'évaluation des fonctions exponentielles, logarithme, sinus et cosinus, et étudions les compromis possibles. Pour chacune de ces fonctions, un seul de ces opérateurs surpasse d'un facteur dix les processeurs généralistes en terme de débit, tout en occupant une fraction des ressources matérielles du FPGA. Tous ces opérateurs sont disponibles librement sur http://www.ens-lyon.fr/LIP/Arenaire/.

When FPGAs are better at floating-point than microprocessors

Article

Jul 2008

When FPGAs are better at floating-point than microprocessors

Conference Paper

Full-text available

Feb 2008

It has been shown that FPGAs could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the FPGA the operators present in a processor. This conservative approach is relatively straightforward, but it doesn't exploit the greater flexibility of the FPGA. We survey the many ways in which the FPGA implementation of a given floating-point computation can be not only faster, but also more accurate than its microprocessor counterpart. Techniques studied here include custom precision, mixing and matching fixed- and floating-point, specific accumulator design, dedicated architectures for coarser operators implemented as software in processors (such as elementary functions or Euclidean norms), operator specialization such as constant multiplication, and others. The FloPoCo project (http://www.ens-lyon.fr/LIP/Arenaire/Ware/FloPoCo/) aims at providing such non-standard operators. As a conclusion, current FPGA fabrics could be enhanced to improve floating-point performance. However, these enhancements should not take the form of hard FPU blocks as others have suggested. Instead, what is needed is smaller building blocks more generally useful to the implementation of floating-point operators, such as cascadable barrel shifters and leading zero counters

A 8 × 8-bit multiplier by 221, using the recoding 100100101

Contexts in source publication

Similar publications

Citations