[Show abstract][Hide abstract] ABSTRACT: 3-D stacking and integration can provide system advantages. This paper explores application drivers and computer-aided design (CAD) for 3-D integrated circuits (ICs). Interconnect-rich applications especially benefit, sometimes up to the equivalent of two technology nodes. This paper presents physical-design case studies of ternary content-addressable memories (TCAMs), first-in first-out (FIFO) memories, and a 8192-point fast Fourier transform (FFT) processor in order to quantify the benefit of the through-silicon vias in an available 180-nm 3-D process. The TCAM shows a 23% power reduction and the FFT shows a 22% reduction in cycle-time, coupled with an 18% reduction in energy per transform.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 05/2009; · 1.22 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: 3D stacking and integration can provide system advantages. This paper explores an application driver for 3D ICs. Interconnect-rich applications especially benefit, sometimes up to the equivalent of two technology nodes. Another promising application area is that of logic-on-memory. This paper presents a case studies of an 8192-point fast Fourier transform (FFT) processor in order to quantify the benefit of the through-silicon vias in an available 180 nm 3D process. The FFT shows a 22% reduction in cycle-time, coupled with an 18% reduction in energy per transform.
[Show abstract][Hide abstract] ABSTRACT: 3D stacking and integration can provide system advantages equivalent to up to two technology nodes of scaling. This paper explores application drivers and computer aided design (CAD) for 3D ICs.
Signals, Systems and Electronics, 2007. ISSSE '07. International Symposium on; 01/2007
[Show abstract][Hide abstract] ABSTRACT: 3DICs are motivated by the expectation of better performance over their 2D counterparts; however, non-idealities threaten to diminish the benefit of multiple tiers. Previous work has predicted the benefit of 3DICs, but have not taken into account the increased temperature and leakage power. This work develops an automated design flow with 2D CAD tools to design 3DICs with the MIT Lincoln Lab 0.18mum three-tier fully depleted silicon on insulator (FDSOI) process (Suntharalingam et al., 2005). This flow uses carefully designed scripts to fill the gap between 2D methodologies and 3D designs. We examine wire-length, timing, clock skew, and total power dissipation, along with temperature, of two benchmark circuits implemented in both 2D and 3D integration. We then extend our observations to the 90nm and 45nm technology nodes with predictive technology model (PTM) and the BSIMSOI model. Experimental results show that the performance of 3DIC, even with the non-idealities, shows up to two-generation advantage over its 2D counterpart with only three tiers
[Show abstract][Hide abstract] ABSTRACT: Three-dimensional integrated circuits (3DICs) have the potential to reduce interconnect lengths and improve digital system performance. However, heat removal is more difficult in 3DICs, and the higher temperatures increase delay and leakage power, potentially negating the performance improvement. Thermal vias can help to remove heat, but they create routing congestion, which also leads to longer interconnects. It is therefore very difficult to tell whether or not a particular system may benefit from 3D integration. In order to help understand this trade-off, physical design experiments were performed on a low-power and a high-performance design in an existing 3DIC technology. Each design was partitioned and routed with varying numbers of tiers and thermal-via densities. A thermal-analysis methodology is developed to predict the final performance. Results show that the lowest energy per operation and delay are achieved with 4 or 5 tiers. These results show a reduction in energy and delay of up to 27% and 20% compared to a traditional 2DIC approach. In addition, it is shown that thermal-vias offer no performance benefit for the low-power system and only marginal benefit for the high-performance system
Proceedings of the 43rd Design Automation Conference, DAC 2006, San Francisco, CA, USA, July 24-28, 2006; 01/2006
[Show abstract][Hide abstract] ABSTRACT: Sphere decoding has become a popular implementation of MIMO detection due to its improved performance at lower hardware complexity. ASIC implementations have proven the feasibility of this method but fail to effectively address the issue of power efficiency. In this work, we propose an improved architecture that aims to exploit a combination of a deeper pipeline and the use of single-port read and write memories to increase the energy efficiency (bits/sec/mW) of the implementation. We see a 30% and 80% increase in memory and logic energy efficiencies when compared to an unpipelined version of the implementation in 0.18 mu technology.
[Show abstract][Hide abstract] ABSTRACT: This article provides a practical introduction to the design trade-offs of the currently available 3D IC technology options. It begins with an overview of techniques, such as wire bonding, microbumps, through vias, and contactless interconnection, comparing them in terms of vertical density and practical limits to their use. We then present a high-level discussion of the pros and cons of 3D technologies, with an analysis relating the number of transistors on a chip to the vertical interconnect density using estimates based on Rent's rule. Next, we provide a more detailed design example of inductively coupled interconnects, with measured results of a system fabricated in a 0.35-μm technology and an analysis of misalignment and crosstalk tolerances. Lastly, we present a case study of a fast Fourier transform (FFT) placed and routed in a 0.18-μm through-via silicon-on-insulator (SOI) technology, comparing the 3D design to a traditional 2D approach in terms of wire length and critical-path delay.
IEEE Design and Test of Computers 12/2005; · 1.62 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: As processing technology continues to evolve, power minimization becomes more complex and crucial. Emerging technologies offer an array of different threshold voltages and gate oxide thicknesses. Along with choices of supply-voltage, parallelism, and pipelining, these options complicate the search for energy-optimal architectures. This paper explores the possibility of using convex optimization to solve the multi-parameter optimization problem and presents a case-study of an 8-bit multiply-accumulate block, which is optimized in 250nm and 70nm technologies.
VLSI, 2004. Proceedings. IEEE Computer society Annual Symposium on; 03/2004
[Show abstract][Hide abstract] ABSTRACT: D-ICs promise to reduce wire-delays, but non-idealities threaten to diminish the benefit. This paper presents an analysis of the performance improvement of a standard-cell implementation of an FFT when designed for the three-tier process from MIT Lincoln Labs. The methodology is presented, along with analyses of delay, routing congestion, and heat. The methodology uses commercial 2D CAD tools with very simple scripts to link them together, but still achieve 15% reduction in average wire length and 23% reduction in total power.
[Show abstract][Hide abstract] ABSTRACT: Inter-wire coupling continues to become a more significant portion of the total wire-capacitance in deep submicron designs. Coupling introduces noise that has 2 most important aspects: functional noise and delay noise. Additional delay causes timing closure problems, and noise destroys signal integrity. Fixing functional and delay-noise violations are time consuming, however, due to the iterative routing and analysis need with current crosstalk-avoidance design flows. The ultimate goal for crosstalk avoidance routing is to reduce delay uncertainty as well as coupling noise. In this paper, we present a method called "net classing" to avoid both types of noise during the global routing phase. The basic idea is to have a global view of interconnect behavior. We created a path analyzer to identify timing critical nets and a coupling analyzer to identify noise critical nets. We feed the net class information into Cadence tool to complete detail routing and noise analysis. Comparing to a crosstalk avoidance routing flow, CeltIc analysis of detail-routed results showed that net classes help reduce timing uncertainty by 6.47% and maximum noise peak by 4.5%Vdd on average with 0.25um technology node, 3.76% on timing uncertainty and 2.95%Vdd on peak noise in 0.13um technology node. We also compare the novel 3D design with conventional 2D design and find an average 3.1% reduction on timing uncertainty and 2.13%Vdd reduction on peak noise with a 20% reduction in worst case delay.
[Show abstract][Hide abstract] ABSTRACT: This paper presents a 3D IC case-study in the design of First-in First-out (FIFO) buffers. The architecture presented uses single-ported memories while still allowing simultaneous write and read accesses to the FIFO without stalling any requests. A novel design for generating the read/write address pointers in the FIFO is also proposed. A 16KB FIFO is designed which can run at 800 MHz in a 3DIC 0.18μm process. Analysis shows this to be 33% faster and use 15% less energy than a conventional double-ported FIFO. Further analysis shows this 16KB FIFO to be and 8% faster and use 6% less energy than a similar FIFO implemented in a single-tier.