Article

Improving FPGA Performance for Carry-Save Arithmetic

Processor Archit. Lab., Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (Impact Factor: 1.14). 05/2010; DOI: 10.1109/TVLSI.2009.2014380
Source: IEEE Xplore

ABSTRACT The selective use of carry-save arithmetic, where appropriate, can accelerate a variety of arithmetic-dominated circuits. Carry-save arithmetic occurs naturally in a variety of DSP applications, and further opportunities to exploit it can be exposed through systematic data flow transformations that can be applied by a hardware compiler. Field-programmable gate arrays (FPGAs), however, are not particularly well suited to carry-save arithmetic. To address this concern, we introduce the ??field programmable counter array?? (FPCA), an accelerator for carry-save arithmetic intended for integration into an FPGA as an alternative to DSP blocks. In addition to multiplication and multiply accumulation, the FPCA can accelerate more general carry-save operations, such as multi-input addition (e.g., add k > 2 integers) and multipliers that have been fused with other adders. Our experiments show that the FPCA accelerates a wider variety of applications than DSP blocks and improves performance, area utilization, and energy consumption compared with soft FPGA logic.

Download full-text

Full-text

Available from: Paolo Ienne, Jul 04, 2015
0 Followers
 · 
95 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Image registration is the process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors. It geometrically aligns two images the reference and floating images. Medical image registration concentrates on aligning two or more images that represent the same anatomy from different angles and are obtained at different times. In recent efforts, image registration algorithms have been implemented in FPGA technology to improve performance while providing programmability and dynamic configurability. To transform points from one image to another, similarity metric is an important criteria. This can be calculated efficiently using mutual information based technique. When transforming a floating image to reference image the transformed location of a voxel of the floating image may not coincide with the location of a voxel in the reference image, so interpolation is needed. This can be done effectively by partial volume interpolation, because it produces smooth changes with small changes in transformation and improves subvoxel accuracy. Partial volume interpolator consists of multipliers as its main component. In this work, it is implemented using array multiplier and CSA multiplier and compared the performance of these multipliers.
    Communication Control and Computing Technologies (ICCCCT), 2010 IEEE International Conference on; 11/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Interpolation is a basic concept in all fields of science and technology. Calculating the neighboring weights of an un interpolated data is found. This can be done effectively by partial volume interpolation, because it produces smooth changes with small changes in transformation and improves subvoxel accuracy. Partial volume interpolator consists of multipliers as its main component. In this work, partial volume interpolation unit is implemented using Wallace multiplier and Carry save adder multiplier and the performance of these multipliers are compared on the basis of synthesis report.
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we present a real-time implementation of a stereo algorithm on field-programmable gate array (FPGA). The approach is a phase-based model that allows computation with sub-pixel accuracy. The algorithm uses a robust multi-scale and multi-orientation method that optimizes the estimation extraction with respect to the local image structure support. With respect to the state of the art, our work increases the on-chip power of computation compared to previous approaches in order to obtain a good accuracy of results with a large disparity range. In addition, our approach is specially suited for unconstrained environments applications thanks to the robustness of the phase information, capable of dealing with severe illumination changes and with small affine deformation between the image pair. This work also includes the rectification images circuitry in order to exploit the epipolar constraints on the chip. The dedicated circuit can rectify and process images of VGA resolution at a frame rate of 57 fps. The implementation uses a fine pipelined method (also with superscalar units) and multiple user defined parameters that lead to a high working frequency and a good adaptability to different scenarios. In the paper, we present different results and we compare them with state of the art approaches.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 01/2011; 20(12):2208 -2219. DOI:10.1109/TVLSI.2011.2172007 · 1.14 Impact Factor