Conference Paper

Multigrid on GPU: Tackling Power Grid Analysis on parallel SIMT platforms

DOI: 10.1109/ICCAD.2008.4681645 Conference: 2008 International Conference on Computer-Aided Design (ICCAD'08), November 10-13, 2008, San Jose, CA, USA
Source: DBLP

ABSTRACT The challenging task of analyzing on-chip power (ground) distribution networks with multi-million node complexity and beyond is key to todaypsilas large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with promising performance. Several key enablers including GPU-specific algorithm design, circuit topology transformation, workload partitioning, performance tuning are embodied in our GPU-accelerated hybrid multigrid algorithm, GpuHMD, and its implementation. In particular, a proper interplay between algorithm design and SIMT architecture consideration is shown to be essential to achieve good runtime performance. Different from the standard CPU based CAD development, care must be taken to balance between computing and memory access, reduce random memory access patterns and simplify flow control to achieve efficiency on the GPU platform. Extensive experiments on industrial and synthetic benchmarks have shown that the proposed GpuHMD engine can achieve 100times runtime speedup over a state-of-the-art direct solver and be more than 15times faster than the CPU based multigrid implementation. The DC analysis of a 1.6 million-node industrial power grid benchmark can be accurately solved in three seconds with less than 50 MB memory on a commodity GPU. It is observed that the proposed approach scales favorably with the circuit complexity, at a rate about one second per million nodes.

14 Reads
  • Source
    • "Nowadays, the emerging multi-core and many-core platforms bring powerful computing resources and opportunities for parallel computing. Even more, cloud computing techniques [34] drive distributed systems scaling to thousands of computing nodes [35]–[37], etc. Distributed computing systems have been incorporated into products of many leading EDA companies and in-house simulators [38]–[42]. However, building scalable and efficient distributed algorithmic framework for transient linear circuit simulation framework is still a challenge to leverage these powerful computing tools. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we design an efficient and accurate algorithmic framework using matrix exponentials for time-domain simulation of power delivery network (PDN). Thanks to the explicit exponential time integration scheme with high order approximation of differential equation system, our framework can reuse factorized matrices for adaptive time stepping without loss of accuracy. The key operation of matrix exponential and vector product (MEVP) is computed by proposed efficient rational Krylov subspace method and helps achieve large stepping. With the enhancing capability of time marching and high-order approximation capability, we design R-MATEX, which outperforms the classical PDN simulation method using trapezoidal formulation with fixed step size (TR-FTS). We also propose a distributed computing framework, DR-MATEX, and highly accelerate the simulation speedup by reducing Krylov subspace generations caused by frequent breakpoints from the side of current sources. By virtue of the superposition property of linear system and scaling invariance property of Krylov subspace, DR-MATEX can divide the whole simulation task into subtasks based on the alignments of breakpoints among current sources. Then, the subtasks are processed in parallel at different computing nodes and summed up at the end of simulation to provide the accurate solutions. The experimental results show R-MATEX and DR-MATEX can achieve 11.4x and 68.0x runtime speedups on average over TR-FTS.
  • Source
    • "They used geometric and algebraic multigrid (aMG) for finite-difference type discretisations. More recent publications presenting applications that require multigrid solvers are supersonic flows (aMG, unstructured grids [6]), (interactive) flow simulations for feature film (aMG/gMG, structured [7] [8]), out-of core multigrid for gigapixel image stitching (gMG/aMG, structured [9]), image denoising and optical flow (gMG/aMG, structured [10]), power grid analysis (aMG, structured/unstructured [11]) and electric potential in the human heart (aMG, unstructured [12]). This last paper is similar in spirit to our work, since the authors also reduce (almost) the entire multigrid algorithm to sequences of sparse matrix-vector multiplications. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe our FE-gMG solver, a finite element geometric multigrid approach for problems relying on unstructured grids. We augment our GPU- and multicore-oriented implementation technique based on cascades of sparse matrix–vector multiplication by applying strong smoothers. In particular, we employ Sparse Approximate Inverse (SPAI) and Stabilised Approximate Inverse (SAINV) techniques. We focus on presenting the numerical efficiency of our smoothers in combination with low- and high-order finite element spaces as well as the hardware efficiency of the FE-gMG. For a representative problem and computational grids in 2D and 3D, we achieve a speedup of an average of 5 on a single GPU over a multithreaded CPU code in our benchmarks. In addition, our strong smoothers can deliver a speedup of 3.5 depending on the element space, compared to simple Jacobi smoothing. This can even be enhanced to a factor of 7 when combining the usage of approximate inverse-based smoothers with clever sorting of the degrees of freedom. In total the FE-gMG solver can outperform a simple (multicore-) CPU-based multigrid by a total factor of over 40.
    Computers & Fluids 07/2013; 80(1):327–332. DOI:10.1016/j.compfluid.2012.01.025 · 1.62 Impact Factor
  • Source
    • "Any changes beyond the system capa­ bility may incur architecture change, circuit redesign or even new chip fabrication with high cost. The application of programmable elements, such as GPU, mitigates the redesign cost, but achieving the system reconfigurability and power efficiency simultaneously still remains as a challenge [11]. * Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial ad­ vantage and that copies bear this notice and the full citation on the first page. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The invention of neuromorphic computing architecture is inspired by the working mechanism of human-brain. Memristor technology revitalized neuromorphic computing system design by efficiently executing the analog Matrix-Vector multiplication on the memristor-based crossbar (MBC) structure. However, programming the MBC to the target state can be very challenging due to the difficulty to real-time monitor the memristor state during the training. In this work, we quantitatively analyzed the sensitivity of the MBC programming to the process variations and input signal noise. We then proposed a noise-eliminating training method on top of a new crossbar structure to minimize the noise accumulation during the MBC training and improve the trained system performance, i.e., the pattern recall rate. A digital-assisted initialization step for MBC training is also introduced to reduce the training failure rate as well as the training time. Experimental results show that our noise-eliminating training method can improve the pattern recall rate. For the tested patterns with 128 x 128 pixels our technique can reduce the MBC training time by 12.6% ~ 14.1% for the same pattern recognition rate, or improve the pattern recall rate by 18.7% ~ 36.2% for the same training time.
    Proceedings of the 50th Annual Design Automation Conference; 05/2013
Show more


14 Reads
Available from