Fine-grained dynamic voltage and frequency scaling for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times
ABSTRACT This paper presents an intra-process dynamic voltage and frequency scaling (DVFS) technique targeted toward non real-time applications running on an embedded system platform. The key idea is to make use of runtime information about the external memory access statistics in order to perform CPU voltage and frequency scaling with the goal of minimizing the energy consumption while translucently controlling the performance penalty. The proposed DVFS technique relies on dynamically-constructed regression models that allow the CPU to calculate the expected workload and slack time for the next time slot, and thus, adjust its voltage and frequency in order to save energy while meeting soft timing constraints. This is in turn achieved by estimating and exploiting the ratio of the total off-chip access time to the total on-chip computation time. The proposed technique has been implemented on an XScale-based embedded system platform and actual energy savings have been calculated by current measurements in hardware. For memory-bound programs, a CPU energy saving of more than 70% with a performance degradation of 12% was achieved. For CPU-bound programs, 15∼60% CPU energy saving was achieved at the cost of 5-20% performance penalty.
Article: Reducing Power in All Major CAM and SRAM-Based Processor Units via Centralized, Dynamic Resource Size Management.IEEE Trans. VLSI Syst. 01/2011; 19:2081-2094.
Conference Proceeding: An energy-efficient 3D CMP design with fine-grained voltage scaling.Design, Automation and Test in Europe, DATE 2011, Grenoble, France, March 14-18, 2011; 01/2011
Article: Reducing Power in All Major CAM and SRAM-Based Processor Units via Centralized, Dynamic Resource Size Management[show abstract] [hide abstract]
ABSTRACT: Power minimization has become a primary concern in microprocessor design. In recent years, many circuit and micro-architectural innovations have been proposed to reduce power in many individual processor units. However, many of these prior efforts have concentrated on the approaches which require considerable redesign and verification efforts. Also it has not been investigated whether these techniques can be combined. Therefore a challenge is to find a centralized and simple algorithm which can address power issues for more than one unit, and ultimately the entire chip and comes with the least amount of redesign and verification efforts, the lowest possible design risk and the least hardware overhead. This paper proposes such a centralized approach that attempts to simultaneously reduce power in processor units with highest dissipation: reorder buffer, instruction queue, load/store queue, and register files. It is based on an observation that utilization for the aforementioned units varies significantly, during cache miss period. Therefore we propose to dynamically adjust the size and thus power dissipation of these resources during such periods. Circuit level modifications required for such resource adaptation are presented. Simulation results show a substantial power reduction at the cost of a negligible performance impact and a small hardware overhead.IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12/2011; · 1.22 Impact Factor