Conference Paper

Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms.

DOI: 10.1109/ICCAD.2008.4681645 Conference: 2008 International Conference on Computer-Aided Design (ICCAD'08), November 10-13, 2008, San Jose, CA, USA
Source: DBLP

ABSTRACT The challenging task of analyzing on-chip power (ground) distribution networks with multi-million node complexity and beyond is key to todaypsilas large chip designs. For the first time, we show how to exploit recent massively parallel single-instruction multiple-thread (SIMT) based graphics processing unit (GPU) platforms to tackle power grid analysis with promising performance. Several key enablers including GPU-specific algorithm design, circuit topology transformation, workload partitioning, performance tuning are embodied in our GPU-accelerated hybrid multigrid algorithm, GpuHMD, and its implementation. In particular, a proper interplay between algorithm design and SIMT architecture consideration is shown to be essential to achieve good runtime performance. Different from the standard CPU based CAD development, care must be taken to balance between computing and memory access, reduce random memory access patterns and simplify flow control to achieve efficiency on the GPU platform. Extensive experiments on industrial and synthetic benchmarks have shown that the proposed GpuHMD engine can achieve 100times runtime speedup over a state-of-the-art direct solver and be more than 15times faster than the CPU based multigrid implementation. The DC analysis of a 1.6 million-node industrial power grid benchmark can be accurately solved in three seconds with less than 50 MB memory on a commodity GPU. It is observed that the proposed approach scales favorably with the circuit complexity, at a rate about one second per million nodes.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we design an efficient and accurate algorithmic framework using matrix exponentials for time-domain simulation of power delivery network (PDN). Thanks to the explicit exponential time integration scheme with high order approximation of differential equation system, our framework can reuse factorized matrices for adaptive time stepping without loss of accuracy. The key operation of matrix exponential and vector product (MEVP) is computed by proposed efficient rational Krylov subspace method and helps achieve large stepping. With the enhancing capability of time marching and high-order approximation capability, we design R-MATEX, which outperforms the classical PDN simulation method using trapezoidal formulation with fixed step size (TR-FTS). We also propose a distributed computing framework, DR-MATEX, and highly accelerate the simulation speedup by reducing Krylov subspace generations caused by frequent breakpoints from the side of current sources. By virtue of the superposition property of linear system and scaling invariance property of Krylov subspace, DR-MATEX can divide the whole simulation task into subtasks based on the alignments of breakpoints among current sources. Then, the subtasks are processed in parallel at different computing nodes and summed up at the end of simulation to provide the accurate solutions. The experimental results show R-MATEX and DR-MATEX can achieve 11.4x and 68.0x runtime speedups on average over TR-FTS.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Robust and efficient algorithms for power grid analysis are crucial for both VLSI design and optimization. Due to the increasing size of power grids, IR drop analysis has become more computationally challenging both in runtime and memory consumption. This paper presents a Fast Poisson Solver (FPS) preconditioned method for unstructured power grids with unideal boundary conditions. Unstructured power grids are transformed to structured grids, which can be modeled as Poisson blocks by analytic formulation. The analytic formulation of transformed structured grids is adopted as an analytic preconditioner for original unstructured grids, in which the analytic preconditioner can be considered as a sparse approximate inverse technique. By combining this analytic preconditioner with robust conjugate gradient method, we demonstrate that this approach is totally robust for extremely large scale power grid simulations. Theoretical proof and experimental results show that iterations of our proposed method will hardly increase with the increasing of grid size as long as the pads density and the distribution range of metal conductance value have been decided. We demonstrate that the run efficiency of our approach is much higher than classical incomplete Cholesky factorization preconditioned conjugate gradient solver and random walk-based hybrid solver.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 04/2014; 22(4):899-912. DOI:10.1109/TVLSI.2013.2252375 · 1.14 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Modern GPUs are gradually used by more and more cluster computing systems as the high performance computing units due to their outstanding computational power, whereas bringing system-level (among different nodes) architectural heterogeneity to cluster. In this paper, based on MPI and CUDA programming model, we aim to investigate task scheduling for GPU heterogeneous cluster by taking into account the system-level heterogeneous characteristics and also involving the weights of the processor (both CPUs and GPUs). At first, based on our GPU heterogeneous cluster, we classify executing tasks to six major classifications according to their parallelism degrees, input data sizes, and processing workloads. Then, aiming to realize the approximately optimal mapping between tasks and computing resources, a task scheduling strategy is presented. In this paper, we present the WSLSA greedy heuristic which can involve the weights of the processor. Besides, we also define two measurement factors for the task assignments. One is the maximum value of total workloads for all task assignments to consider the maximum workloads for the GPU heterogeneity cluster. The other is the distribution of task assignments which can determine the load balance of the task assignments for the GPU heterogeneity cluster. The other is the distribution of task assignments which can determine the load balance of the task assignments for the GPU heterogeneity cluster.
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International; 01/2013


1 Download
Available from