Conference Proceeding

An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth

Sch. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
02/2010; DOI:10.1109/HPCA.2010.5416628 In proceeding of: High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on
Source: IEEE Xplore

ABSTRACT Memory bandwidth has become a major performance bottleneck as more and more cores are integrated onto a single die, demanding more and more data from the system memory. Several prior studies have demonstrated that this memory bandwidth problem can be addressed by employing a 3D-stacked memory architecture, which provides a wide, high frequency memory-bus interface. Although previous 3D proposals already provide as much bandwidth as a traditional L2 cache can consume, the dense through-silicon-vias (TSVs) of 3D chip stacks can provide still more bandwidth. In this paper, we contest that we need to re-architect our memory hierarchy, including the L2 cache and DRAM interface, so that it can take full advantage of this massive bandwidth. Our technique, SMART-3D, is a new 3D-stacked memory architecture with a vertical L2 fetch/write-back network using a large array of TSVs. Simply stated, we leverage the TSV bandwidth to hide latency behind very large data transfers. We analyze the design trade-offs for the DRAM arrays, careful enough to avoid compromising the DRAM density because of TSV placement. Moreover, we propose an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system. For single-threaded memory-intensive applications, the SMART-3D architecture achieves speedups from 1.53 to 2.14 over planar designs and from 1.27 to 1.72 over prior 3D designs. We achieve similar speedups for multi-program and multi-threaded workloads on multi-core and multi-socket processors. Furthermore, SMART-3D can even lower the energy consumption in the L2 cache and 3D DRAM for it reduces the total number of row buffer misses.

0 0
 · 
0 Bookmarks
 · 
98 Views
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: As DRAM scaling becomes more challenging and its energy efficiency receives a growing concern for data center operation, an alternative approach- stacking DRAM die with thru-silicon vias (TSV) using 3-D integration technology is being undertaken by industry to address these looming issues. Furthermore, 3-D technology also enables heterogeneous die stacking within one DRAM package. In this paper, we study how to design such a heterogeneous DRAM chip for improving both performance and energy efficiency, in particular, we propose a novel floorplan and several architectural techniques to fully exploit the benefits of 3-D die stacking technology when integrating an SRAM row cache into a DRAM chip. Our multi-core simulation results show that, by tightly integrating a small row cache with its corresponding DRAM array, we can improve performance by 30% while saving dynamic energy by 31% for memory intensive applications.
    Circuits and Systems (MWSCAS), 2011 IEEE 54th International Midwest Symposium on; 09/2011
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The growth of 3D technology had led to opportunities for stacked multiprocessor-accelerator computing platforms with high-bandwidth and low-latency TSV connections between them, resulting in high computing performance and better energy efficiency. This work evaluates the performance and energy benefits of such an advanced architecture and addresses associated design problems. To better utilize the reconfigurable hardware resource and to explore the opportunity of kernel sharing across applications, we propose to use a dedicated domain-specific computing platform. In particular, we have chosen medical image processing as the domain in this work to accelerate due to its growing for real-time processing demand yet inadequete performance on conventional computing architectures. A design flow is proposed in this work for the 3D multiprocessor-accelerator platform and a number of methods are applied to optimize the average performance of all the applications in the targeted domain under area and bandwidth constraints. Experiments show that the applications in this domain can gain a 7.4× speed-up and 18.8× energy savings on average running on our platform using CMP cores and domain-specific accelerators as compared to their counterparts coded in CPU only.
    Application-Specific Systems, Architectures and Processors (ASAP), 2011 IEEE International Conference on; 10/2011
  • [show abstract] [hide abstract]
    ABSTRACT: Three methods have been proposed to test Through-Silicon-Vias (TSV) electrically prior to 3D integration. These test methods are (1) sense amplification; (2) leakage current monitor; and (3) capacitance bridge methods. These tests are aimed at detecting one or both of two failure types, pin-holes and voids. The test circuits measure capacitance and leakage current of the TSVs, and generate a 1 bit pass/fail signal. The outputs are streamed out through a scan chain. The test time is 10 μs for the leakage test and the sense amplification methods, and is 15 μs for the capacitive bridge method. All these methods can be implemented for test-before-stacking, which will increase assembled yield. Resolution, power and area of these TSV test circuits were compared. The performance of each circuit was studied at PVT corners. The IMEC TSV technology was assumed, and the designs were simulated using the 32 nm predicted device model. Without any failure, the TSV capacitance’s mean value is 37 fF, and its leakage resistance is higher than 850 MΩ. With respect to 37 fF standard capacitance, resolution for the sense amplification method is 3.3 fF (8.9%); it is 0.16 fF (0.4%) for the capacitance bridge method. Although the capacitance bridge method has relatively better resolution, it takes 4x area and 10x power than the other two, and is also more sensitive to PVT variation. Resolution of the leakage current monitor method is 10 MΩ (1.1%) with respect to its threshold 850 MΩ, and use 42.5aJ power in normal case. Sense amplification circuit can be modified to detect equivalent leakage resistance under 2KΩ.
    Journal of Electronic Testing 01/2012; 28:27-38. · 0.45 Impact Factor

Full-text

View
1 Download
Available from