Conference Paper

Geyser-1: A MIPS R3000 CPU core with fine grain runtime power gating

DOI: 10.1109/ASSCC.2009.5357257 Conference: Solid-State Circuits Conference, 2009. A-SSCC 2009. IEEE Asian
Source: IEEE Xplore

ABSTRACT Geyser-1, a prototype MIPS R3000 CPU with fine grain runtime PG for major computational components in the execution stage is available. Function units such as CLU, shifter, multiplier and divider are power-gated and controlled at runtime such that only the function unit to be used is powered-on to minimize the leakage power. The evaluation results on the real chip reveals that the fine grain runtime PG mechanism works without electric problems. It reduces the leakage power 7% at 25°C and 24% at 80°C. The evaluation results using benchmark programs show that the power consumption can be reduced from 3% at 25°C and 30% at 80°C.

0 Bookmarks
 · 
100 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9% on average even when the application programs are fully running, at the expense of 4.6% of the area and 0.7-3.7% of the performance overheads when we assume a 1 GHz operation.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 05/2011; · 1.09 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes an ultra fine-grained run-time power gating of on-chip router, in which power supply to each router component (e.g., VC queue, crossbar MUX, and output latch) can be individually controlled in response to the applied workload. As only the router components which are just transferring a packet are activated, the leakage power of the on-chip network can be reduced to the near-optimal level. However, a certain amount of wakeup latency is required to activate the sleeping components, and the application performance will be degraded. In this paper, we estimate the wakeup latency for each component based on circuit simulations using a 65 nm process. Then we propose four early wakeup methods to overcome the wakeup latency. The proposed router with the early wakeup methods is evaluated in terms of the application performance, area, and leakage power. As a result, it reduces the leakage power by 78.9%, at the expense of the 4.3% area and 4.0% performance when we assume a 1 GHz operation.
    NOCS 2010, Fourth ACM/IEEE International Symposium on Networks-on-Chip, Grenoble, France, May 3-6, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Static power dissipation has been identified as a limiting factor in future microprocessor technologies. This paper presents Loop-Directed Mothballing (LDM) to reduce static power by power-gating execution units. The method accurately predicts the resource requirements and limits performance degradation by focussing on inner loops. In simulation, the energy-delay product (EDP) of the processor is reduced by 10.3%. Two prior methods show worse EDP despite having greater power savings. Index Terms Microprocessor, static power, power-gating, execution unit, simulation and EDP Static power dissipation is a key issue for microprocessor technology scaling in the near future [5]. Increasing clock frequency and transistor count drive the power consumption higher, but the related problems of heat dissipation, energy costs and battery life could limit the practicalities of such future technologies. Existing approaches [4], [8] address this problem by power-gating (switching off) execution units to reduce the static power dissipation, which is present in all logic that is powered even if it exhibits no dynamic switching activity. The energy-delay product (EDP) takes both performance and energy into account, and is used in this work as power can often be saved at the expense of performance. Execution units are among the most power-hungry devices in the microprocessor [4], [6], but power-gating them is a nontrivial problem; the heterogeneity of the units requires more analysis to match the application’s requirements to the resources, and a poor match could result in a costly loss in performance due to contention for units or stalls while required units are powered up. In this paper, a method that analyses innermost loop bodies is presented, which exploits similar resource utilisation across loop iterations and hides the costs of unit power up in the branch misprediction from the loop exit. II. LOOP-DIRECTED MOTHBALLING
    01/2011;