Conference Paper

A faster distributed arithmetic architecture for FPGAs.

DOI: 10.1145/503048.503054 Conference: the 2002 ACM/SIGDA tenth international symposium
Source: DBLP

ABSTRACT Distributed Arithmetic (DA) is an important technique to implement digital signal processing (DSP) functions in FPGAs. However, traditional lookup table (LUT) based DA architectures contain one or more carry propagation chains in the critical path that dictates the fastest time at which an entire design can run. In this paper, we describe a novel technique that can reduce or eliminate the carry-propagate chain from the critical path in LUT based DA architectures on FPGAs. In the proposed scheme, the individual bits of a word do not have to be processed as a unit. Instead, the current iteration can start as soon as the least significant bit (LSB) of the previous iteration is available, without waiting for the entire word from the previous iteration to be fully computed. This technique has great potential in speeding up DSP applications based on DA. Designs are described for serial and parallel DALUT and accumulator structures in which an n-bit carry chain, where n is the word length, is broken into smaller r-bit chains, 1*nnr n . A cost-performance analysis of the designs is presented. The analysis shows that the designs proposed in this paper have a lower cost-performance ratio (indicating better performance) than traditional DA designs. We also show that the 8-bit (r = 8) designs offer a good compromise between cost and performance. The implementation is on a Xilinx chip XC4028XL-3-BG256 using Xilinx Foundation tools v 3.1i. The results show that the proposed designs can achieve speedup by a factor of at least 1.5 over traditional DA designs in some cases.

0 Bookmarks
 · 
49 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a comparison between two methods, the modified Loeffler algorithm (11 MUL and 29 ADD) and Distributed Arithmetic, to implement the DCT/IDCT algorithm for MPEG or H.26x video compression using VHDL description language. The implementation has been achieved on Altera Stratix EP1S10 FPGA which provides a dedicated DSP blocks required for common signal processing functions. A new solution based on this DSP blocks used for to implement multipliers for the modified Loeffler algorithm in order to optimize speed and area.
    01/2006;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, a highly area-efficient multiplier-less FIR filter is presented. Distributed Arithmetic (DA) has been used to implement a bit-serial scheme of a general asymmetric version of an FIR filter, taking optimal advantage of the 4-input LUT-based structure of FPGAs. Furthermore, we have introduced a modification in the accumulator stage to achieve further savings. The proposed filter has been designed and synthesized with Altera Quartus II, and implemented on a Stratix FPGA device. Our results show reduced area requirements in comparison to previous LUT-less DA architectures
    Signal Processing and Information Technology, 2006 IEEE International Symposium on; 09/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract In today’s proactive computing age, sensor networks monitor the environ- ment, collect data, and execute tasks that aect our lives. The main ingredient to this process is a tiny sensor node that demands a long operating lifetime. Because of the sluggish growth of battery energy density, several research groups have developed technologies to power these sensors with scavenged energy from vibration and light. This highly variable supply mandates precision-on-demand processing. Distributed Arithmetic (DA), a bit-serial algorithm for dot product computation, possesses this capability to trade output quality for power consumption as demonstrated with the use of full-custom circuits. This thesis evaluates the energy scalability of a DA-based low-pass lter on a modern eld programmable gate array (FPGA) and a standard-

Full-text

Download
1 Download
Available from