Conference Paper

FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations

DOI: 10.1007/978-3-642-19475-7_33 Conference: Reconfigurable Computing: Architectures, Tools and Applications - 7th International Symposium, ARC 2011, Belfast, UK, March 23-25, 2011. Proceedings
Source: DBLP


Sparse matrix factorization is a critical step for the circuit simulation problem, since it is time consuming and computed
repeatedly in the flow of circuit simulation. To accelerate the factorization of sparse matrices, a parallel CPU+FPGA based
architecture is proposed in this paper. While the pre-processing of the matrix is implemented on CPU, the parallelism of numeric
factorization is explored by processing several columns of the sparse matrix simultaneously on a set of processing elements
(PE) in FPGA. To cater for the requirements of circuit simulation, we also modified the Gilbert/Peierls (G/P) algorithm and
considered the scalability of our architecture. Experimental results on circuit matrices from the University of Florida Sparse
Matrix Collection show that our architecture achieves speedup of 0.5x-5.36x compared with the CPU KLU results.

Download full-text


Available from: Yu Wang,
  • Source
    • "The approach detailed in [11] parallelizes the resulting dataflow operations by mapping them to a network of spatial floatingpoint operators. On the other hand, the approach followed in [12] maps the ensuing dataflow graph to a multi-PE shared-memory system. It is clear that both approaches primarily focus on extracting the fine-grained dataflow parallelism, whereas the approach we propose in this paper favors harnessing the medium-grained column parallelism without overlooking the finer-grained data operation parallelism. "
    [Show abstract] [Hide abstract]
    ABSTRACT: SPICE is the de facto standard for circuit simulation. However, accurate SPICE simulations of today’s sub-micron circuits can often take days or weeks on conventional processors. A SPICE simulation is an iterative process that consists of two phases per iteration: model evaluation followed by a matrix solution. The model evaluation phase has been found to be easily parallelizable, unlike the subsequent phase, which involves the solution of highly sparse and asymmetric matrices. In this paper, we present an FPGA implementation of a sparse matrix solver, geared towards matrices that arise in SPICE circuit simulations. Our approach combines static pivoting with symbolic analysis to compute an accurate task flow-graph which efficiently exploits parallelism at multiple granularities and sustains high floating-point data rates. We also present a quantitative comparison between the performance of our hardware prototype and state-of-the-art software packages running on a general-purpose PC. We report average speed-ups of 9.65$times$ , 11.83 $times$, and 17.21 $times$ against UMFPACK, KLU, and Kundert Sparse matrix packages, respectively.
    IEEE Transactions on Computers 04/2015; 64(4):1090-1103. DOI:10.1109/TC.2014.2308202 · 1.66 Impact Factor
  • Source
    • "Another parallel hardware platform, FPGA, is featured with flexibility due to its reconfigurable architecture. It is usually deployed as accelerator for specific applications [12], [13]. Compared to multi-core CPUs and many-core GPUs, where friendly programming environments are developed, FPGAs are still programmed by low-level hardware description languages (HDL), such as Verilog HDL and VHDL. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Considering the increasing complexity of integrated circuit (IC) designs at Nano-Tera scale, multi-core CPUs and many-core GPUs have provided ideal hardware platforms for emerging parallel algorithm developments in electronic design automation (EDA). However, it has become extremely challenging to leverage parallel hardware platforms at extreme scale beyond 22nm and 60GHz where the EDA algorithms, such as circuit simulation, show strong data dependencies. This paper presents data dependency elimination in circuit simulation algorithms such as parasitic extraction, transient simulation and periodic-steady-state (PSS) simulation, which paves the way towards unleashing the underlying power of parallel hardware platforms.
    IEEE Design and Test of Computers 02/2013; 30(1):26-35. DOI:10.1109/MDT.2012.2226201 · 1.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Sparse LU decomposition is the core computation in the direct method that solves sparse systems of linear equations. Only little work has been conducted on parallelizing it on FPGAs. In this paper, we study parallelization strategies for sparse LU decomposition on FPGAs. We first analyze how to parallelize the right-looking algorithm and find that this algorithm is not suitable for FPGAs. Then the left-looking algorithm is analyzed and considered as better candidate than the right-looking version. Our design derived from the left-looking algorithm is based on a simple yet efficient parallel computational model for FPGAs. Our design mainly consists of multiple parallel processing elements (PEs). A total of 14 PEs can be integrated into a Xilinx Virtex-5 XC5VLX330. Unlike related work, where their designs are applied to sparse matrices from particular application domains, our hardware design can be applied to any symmetric positive definite or diagonally dominant matrices.
    Field-Programmable Technology (FPT), 2012 International Conference on; 01/2012
Show more