This paper introduces a new field-programmable architecture that
is targeted at compute-intensive applications. These applications are
important because of their use in the expanding multi-media markets in
signal and data processing. We explain the design methodology, layout
and implementation of the new architecture. A synthesis method has also
been developed with which we have mapped several circuits to the new
architecture. In this paper, we show that the invented architecture is
more area-efficient than traditional FPGAs by a factor of more than 2.5
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.
"The vast majority of carry chains that have been proposed are for different types of adders     ; the carry chains on commercial FPGAs available from Xilinx and Altera also fall into this category. One interesting alternative is a carry chain that allows an Altera-style logic cell to be configured as a 7:2 compressor, which is used for multi-operand addition . "
[Show abstract][Hide abstract] ABSTRACT: Routing resources in modern FPGAs use 50% of the silicon real estate and are significant contributors to critical path delay and power consumption; the situation gets worse with each successive process generation, as transistors scale more effectively than wires. To cope with these challenges, FPGA architects have divided wires into local and global categories and introduced fast dedicated carry chains between adjacent logic cells, which reduce routing resource usage for certain arithmetic circuits (primarily adders and subtractors). Inspired by the carry chains, we generalize the idea to connect lookup tables (LUTs) in adjacent logic cells. By exploiting the fracturable structure of LUTs in current FPGA generations, we increase the utilization of the existing LUTs in the logic cell by providing new inputs along the logic chain, but without increasing the I/O bandwidth from the programmable interconnect. This allows us to increase the logic density of the configurable logic cells while reducing demand for routing resources, as long as the mapping tools are able to exploit the logic chains. Our experiments using the combinational MCNC benchmarks and comparing against an Altera Stratix-III FPGA show that the introduction of logic chains reduce the average usage of local routing wires by 37%, with a 12% reduction in total wiring (local and global); this translates to improvements in dynamic power consumption of 18% in the routing network and 10% overall, while utilizing 4% fewer logic cells, on average.
[Show abstract][Hide abstract] ABSTRACT: We propose a new DSP block for use in modern high-performance FPGAs. Current DSP blocks contain fixed-bitwidth multipliers that can be combined efficiently to form larger multipliers. Our approach is similar, but includes a bypass layer following the partial product generator that exposes the compressor tree used for partial product reduction directly to the user. As a consequence, the proposed DSP block can accelerate multi-input addition operations in addition to multiplication. To increase the flexibility of the device, the partial product reduction tree used within our DSP block uses a fixed-function compression logic along with a field programmable compressor tree (FPCT), the latter of which is user-configurable to meet the needs of the application at hand. Multi-input addition operations can be mapped directly onto the FPCT without compromising any of the other functionality of the DSP block.
"To improve arithmetic performance, several researchers proposed carry chains that could efficiently embed circuitry that could perform fast addition inside a series of adjacent logic blocks     . Carry chains have been adopted by commercial vendors: The Xilinx Virtex-4/5 CLBs can send propagate/generate signals to adjacent blocks  ; the Altera Stratix II/III Adaptiv Logic Modules (ALMs) implement ripple-carry addition   . "
[Show abstract][Hide abstract] ABSTRACT: The Field Programmable Counter Array (FPCA) was introduced to improve FPGA performance for arithmetic circuits. An FPCA is a reconfigurable IP core that can be integrated into an FPGA. To exploit the FPCA, a circuit is transformed by merging disparate addition and multiplication operations into large multi-input addition operations, which are synthesized as compressor trees on the FPCA; the remaining portion of the circuit is synthesized on the FPGA. This paper presents a series of architectural improvements to the FPCA that reduce routing delay, increase flexibility and component utilization, and simplify the integration process. Using an FPGA containing six FPCAs, we observed average and maximum speedups of 1.60x and 2.40x on a set of arithmetic benchmarks