Conference Paper

Improving Energy-Efficiency by Bypassing Trivial Computations.

DOI: 10.1109/IPDPS.2005.253 Conference: 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), CD-ROM / Abstracts Proceedings, 4-8 April 2005, Denver, CO, USA
Source: DBLP

ABSTRACT We study the energy efficiency benefits of bypassing trivial computations in high-performance processors. Trivial computations are those computations whose output can be determined without performing the computation. We show that bypassing trivial instructions reduces energy consumption while improving performance. Our study shows that by bypassing trivial instructions and for the subset of SPEC'2K benchmarks studied here, on average, it is possible to improve energy and energy-delay by up to 4.5% and 11.8% over a conventional processor.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the issue of improving the energy efficiency of processors by eliminating trivial operations. The paper provides a new classification of trivial operations and quantifies their relative frequency in desktop and embedded applications. It then presents a hardware technique to remove trivial operations as early as at the decode stage of the pipeline to save energy. This paper shows that 13.6% and 8.6% of the instructions are identity-trivial in the selected applications in the SPEC CPU2000 and EEMBC1.1 benchmark suites, respectively. Early detection and elimination of trivial operations reduce the average energy consumption of the core pipeline by 9% and 6%, respectively.
    Microprocessors and Microsystems 06/2008; 32:183-196. DOI:10.1016/j.micpro.2007.10.001 · 0.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Physics-based animation has enormous potential to im- prove the realism of interactive entertainment through dy- namic, immersive content creation. Despite the massively parallel nature of physics simulation, fully exploiting this parallelism to reach interactive frame rates will require significant area to place the large number of cores. For- tunately, interactive entertainment requires believability rather than accuracy. Recent work shows that real-time physics has a remarkable tolerance for reduced precision of the significand in floating-point (FP) operations. In this paper, we describe an architecture with a hierarchi- cal floating-point unit (FPU) that leverages dynamic preci- sion reduction to enable efficient FPU sharing among mul- tiple cores. This sharing reduces the area required by these cores, thereby allowing more cores to be packed into a given area and exploiting more parallelism.
    Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Instruction reuse (IR) and trivial computation (TC) elimination are two architectural techniques that aim at eliminating redundant code to better exploit instruction-level parallelism. While they have been extensively studied in isolation, this paper is the first to compare their relative efficiency. This is done using applications from the embedded domain. This paper establishes the relationship between the two techniques by framing the arithmetic instructions detected by each of them. While TC can only eliminate instructions where one of the operands is zero or one, IR has potentially a wider scope as it can potentially eliminate any instruction given that it has been executed before with the same set of operand values. Despite the wider scope, we have found that IR and TC can eliminate about the same fraction of instructions even if an infinitely large instruction reuse buffer is assumed (IR and TC can eliminate 26% and 22% of the instructions, respectively). Another quite surprising finding is that the two techniques target quite different sets of instructions suggesting that they can provide almost additive gains if combined. In combination, they can eliminate 40% of the instructions they target. In terms of energy-efficiency, we finally find that if an instruction reuse buffer of 256 entries is used, it uses 1% more energy than a processor without IR and TC reduces the energy consumption by 5.6%.
    IEEE Second International Symposium on Industrial Embedded Systems - SIES'2007, Hotel Costa da Caparica, Lisbon, Portugal, 4-6 July 2007; 01/2007