3D Stacked Microprocessor: Are We There Yet?

Georgia Institute of Technology
IEEE Micro (Impact Factor: 2.39). 07/2010; DOI: 10.1109/MM.2010.45
Source: IEEE Xplore

ABSTRACT Editors' NoteWe live in a 3D world. It is hard to imagine a large city, such as New York City, with only single-level structures. There would be no skyscrapers, no mixed-use, no live-work. It would be a long walk (or drive) between everything, especially between dissimilar uses—all in all, very inefficient!Integrated circuits today are typically designed using single-level Manhattan geometries, nothing like the layout of the real city. In this prolegomenon, Gabriel Loh and Yuan Xie survey 3D integrated circuit technology, demonstrating the virtues, potentials, and challenges of applying three dimensions to future microprocessor designs and exploiting the locality and diversity of real-world Manhattan geometries.

  • [Show abstract] [Hide abstract]
    ABSTRACT: This column examines how GPU computing might affect the architecture of future exascale supercomputers. Specifically, the authors argue that a system with slower but better-balanced processors might yield higher performance and consume less energy than a system with very fast but imbalanced processors.
    IEEE Micro 09/2011; · 2.39 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper demonstrates a fully functional hardware and software design for a 3D stacked multi-core system for the first time. Our 3D system is a low-power 3D Modular Multi-Core (3D-MMC) architecture built by vertically stacking identical layers. Each layer consists of cores, private and shared memory units, and communication infrastructures. The system uses shared memory communication and Through-Silicon-Vias (TSVs) to transfer data across layers. A serialization scheme is employed for inter-layer communication to minimize the overall number of TSVs. The proposed architecture has been implemented in HDL and verified on a test chip targeting an operating frequency of 400MHz with a vertical bandwidth of 3.2Gbps. The paper first evaluates the performance, power and temperature characteristics of the architecture using a set of software applications we have designed. We demonstrate quantitatively that the proposed modular 3D design improves upon the cost and performance bottlenecks of traditional 2D multi-core design. In addition, a novel resource pooling approach is introduced to efficiently manage the shared memory of the 3D stacked system. Our approach reduces the application execution time significantly compared to 2D and 3D systems with conventional memory sharing.
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The performance and power consumption of mobile DRAMs (LPDDRs) depend on the configuration of system-level parameters, such as operating frequency, interface width, request size, and memory map. In mobile systems running both real-time and non-real-time applications, the memory configuration must satisfy bandwidth requirements of real-time applications, meet the power consumption budget, and offer the best average-case execution time to the non-real-time applications. There is currently no well-defined methodology for selecting a suitable memory configuration for real-time mobile systems. The worst-case bandwidth, average-case execution time, and power consumption of mobile DRAMs across generations have furthermore not been investigated. This paper has two main contributions. 1) We analyze the worst-case bandwidth, average-case execution time, and power consumption of mobile DRAMs across three generations: LPDDR, LPDDR2 and Wide-IO-based 3D-stacked DRAM. 2) Based on our analysis, we propose a methodology for selecting memory configurations in real-time mobile systems.We show that LPDDR (32-bit IO), LPDDR2 (32-bit IO) and 3D-DRAM (128-bit IO) provide worst-case bandwidth up to 0.75 GB/s, 1.6 GB/s and 3.1 GB/s, respectively. We furthermore show for an H.263 decoder that LPDDR2 and 3D-DRAM reduce power consumption with up to 25% and 67%, respectively, compared to LPDDR, and reduce the execution time with up to 18% and 25%.


Available from