Improving FPGA Performance for Carry-Save Arithmetic

Processor Archit. Lab., Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Impact Factor: 1.22). 05/2010; DOI: 10.1109/TVLSI.2009.2014380
Source: IEEE Xplore

ABSTRACT The selective use of carry-save arithmetic, where appropriate, can accelerate a variety of arithmetic-dominated circuits. Carry-save arithmetic occurs naturally in a variety of DSP applications, and further opportunities to exploit it can be exposed through systematic data flow transformations that can be applied by a hardware compiler. Field-programmable gate arrays (FPGAs), however, are not particularly well suited to carry-save arithmetic. To address this concern, we introduce the ??field programmable counter array?? (FPCA), an accelerator for carry-save arithmetic intended for integration into an FPGA as an alternative to DSP blocks. In addition to multiplication and multiply accumulation, the FPCA can accelerate more general carry-save operations, such as multi-input addition (e.g., add k > 2 integers) and multipliers that have been fused with other adders. Our experiments show that the FPCA accelerates a wider variety of applications than DSP blocks and improves performance, area utilization, and energy consumption compared with soft FPGA logic.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Addition is the key arithmetic operation in most digital circuits and processors. Therefore, their performance and other parameters, such as area and power consumption, are highly dependent on the adders' features. In this paper, we present multispeculation as a way of increasing adders' performance with a low area penalty. In our proposed design, dividing an adder into several fragments and predicting the carry-in of each fragment enables computing every addition in two very short cycles at the most, with 99% or higher probability. Furthermore, based on multispeculation principles, we propose a new strategy for implementing addition chains and hiding most of the penalty cycles due to mispredictions, while keeping at the same time the resource sharing capabilities that are sought in high-level synthesis. Our results show that it is possible to build linear and logarithmic adders more than 4.7× and 1.7× faster than the nonspeculative case, respectively. Moreover, this is achieved with a low area penalty (38% for linear adders) or even an area reduction (-8% for logarithmic adders). Finally, applying multispeculation principles to signal processing benchmarks that use addition chains will result in 25% execution time reduction, with an additional 3% decrease in datapath area with respect to implementations with logarithmic fast adders.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 01/2012; 31(12):1817-1830. · 1.09 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interpolation is a basic concept in all fields of science and technology. Calculating the neighboring weights of an un interpolated data is found. This can be done effectively by partial volume interpolation, because it produces smooth changes with small changes in transformation and improves subvoxel accuracy. Partial volume interpolator consists of multipliers as its main component. In this work, partial volume interpolation unit is implemented using Wallace multiplier and Carry save adder multiplier and the performance of these multipliers are compared on the basis of synthesis report.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe an energy-efficient Vedic multiplier structure using Energy Efficient Adiabatic Logic (EEAL). The power consumption of the proposed multiplier is significantly low because the energy transferred to the load capacitance is mostly recovered. The proposed 8×8 CMOS and adiabatic multiplier structure have been designed in a TSMC 0.18 μm CMOS process technology and verified by Cadence Design Suite. Both simulation and measurement results verify the functionality of such logic, making it suitable for implementing energy-aware and performance-efficient very-large scale integration (VLSI) circuitry.
    Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), 2013 International Multi-Conference on; 01/2013

Full-text (2 Sources)

Available from
May 15, 2014