Conference Proceeding

Scalable FPGA-array for high-performance and power-efficient computation based on difference schemes

Grad. Sch. of Inf. Sci., Tohoku Univ., Sendai
12/2008; DOI:10.1109/HPRCTA.2008.4745679 In proceeding of: High-Performance Reconfigurable Computing Technology and Applications, 2008. HPRCTA 2008. Second International Workshop on
Source: IEEE Xplore

ABSTRACT For numerical computations requiring a relatively high ratio of data access to operation, the scalability of memory bandwidth is key to performance improvement. In this paper, we propose a scalable FPGA-array to achieve custom computing machines for high-performance and power-efficient scientific simulations based on difference schemes. With the FPGA-array, we construct a systolic computational-memory array (SCMA) by homogeneously partitioning the SCMA among multiple tightly-coupled FPGAs. A large SCMA implemented using a lot of FPGAs achieves high-performance computation with scalable memory-bandwidth and scalable arithmetic-performance according to the array size. For feasibility demonstration and quantitative evaluation, we design and implement the SCMA of 192 processing elements over two ALTERA StratixII FPGAs. The implemented SCMA running at 106 MHz achieves the sustained performances of 32.8 to 36.5 GFlops in single precision for three benchmark computations while the peak performance is 40.7 GFlops. In comparison with a 3.4GHz Pentium4 processor, the SCMAs consume 70% to 87% power and require only 3% to 7% energy consumption for the same computations. Based on the requirement model for inter-FPGA bandwidth, we illustrate that SCMAs are completely scalable for the currently available high-end to low-end FPGAs, while the SCMA implemented with the two FPGAs demonstrates the doubled performance of that by the single-FPGA SCMA.

0 0
 · 
0 Bookmarks
 · 
21 Views

Keywords

192 processing elements
 
ALTERA StratixII FPGAs
 
available high-end
 
benchmark computations
 
data access
 
doubled performance
 
high-performance computation
 
implemented SCMA
 
large SCMA
 
low-end FPGAs
 
multiple tightly-coupled FPGAs
 
numerical computations
 
peak performance
 
power-efficient scientific simulations
 
requirement model
 
scalable arithmetic-performance
 
single precision
 
single-FPGA SCMA
 
systolic computational-memory array
 
two FPGAs
 

K Sano