Conference Paper

Improving Energy-Efficiency by Bypassing Trivial Computations.

DOI: 10.1109/IPDPS.2005.253 Conference: 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), CD-ROM / Abstracts Proceedings, 4-8 April 2005, Denver, CO, USA
Source: DBLP


We study the energy efficiency benefits of bypassing trivial computations in high-performance processors. Trivial computations are those computations whose output can be determined without performing the computation. We show that bypassing trivial instructions reduces energy consumption while improving performance. Our study shows that by bypassing trivial instructions and for the subset of SPEC'2K benchmarks studied here, on average, it is possible to improve energy and energy-delay by up to 4.5% and 11.8% over a conventional processor.

Full-text preview

Available from:
  • Source
    • "In this case, we can bypass the instruction by remapping R1 to R3 without waiting for the value of R3 to be known. This distinction of decode-trivial has not been done in the previous studies [2] [25]. The distinction between 1-Op and 2-Op makes a lot of sense if we review the results of SPEC CPU2000 and EEMBC1.1 benchmarks shown in Fig. 4a and b, respectively . "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper addresses the issue of improving the energy efficiency of processors by eliminating trivial operations. The paper provides a new classification of trivial operations and quantifies their relative frequency in desktop and embedded applications. It then presents a hardware technique to remove trivial operations as early as at the decode stage of the pipeline to save energy. This paper shows that 13.6% and 8.6% of the instructions are identity-trivial in the selected applications in the SPEC CPU2000 and EEMBC1.1 benchmark suites, respectively. Early detection and elimination of trivial operations reduce the average energy consumption of the core pipeline by 9% and 6%, respectively.
    Full-text · Article · Jun 2008 · Microprocessors and Microsystems
  • Source
    • "For SPEC 95/2000 and MediaBench benchmark suites, 13% and 6% of total dynamic instructions are trivial which enables an 8% average performance improvement . [5] studies the energy benefits of bypassing trivial operations. Bypassing trivial operations results in average energy and energy-delay improvement of 5% and 12% re- spectively. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Physics-based animation has enormous potential to im- prove the realism of interactive entertainment through dy- namic, immersive content creation. Despite the massively parallel nature of physics simulation, fully exploiting this parallelism to reach interactive frame rates will require significant area to place the large number of cores. For- tunately, interactive entertainment requires believability rather than accuracy. Recent work shows that real-time physics has a remarkable tolerance for reduced precision of the significand in floating-point (FP) operations. In this paper, we describe an architecture with a hierarchi- cal floating-point unit (FPU) that leverages dynamic preci- sion reduction to enable efficient FPU sharing among mul- tiple cores. This sharing reduces the area required by these cores, thereby allowing more cores to be packed into a given area and exploiting more parallelism.
    Preview · Conference Paper · Jan 2008
  • Source
    • "On the contrary, the result of an issue-trivial must be broadcast on the result bus so that dependent instructions can be woken up and the dependency is resolved [11]. Thus, as shown in [1] [11], decode-trivial operations provide more opportunities of improving power and performance than issue-trivial operations do. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Instruction reuse (IR) and trivial computation (TC) elimination are two architectural techniques that aim at eliminating redundant code to better exploit instruction-level parallelism. While they have been extensively studied in isolation, this paper is the first to compare their relative efficiency. This is done using applications from the embedded domain. This paper establishes the relationship between the two techniques by framing the arithmetic instructions detected by each of them. While TC can only eliminate instructions where one of the operands is zero or one, IR has potentially a wider scope as it can potentially eliminate any instruction given that it has been executed before with the same set of operand values. Despite the wider scope, we have found that IR and TC can eliminate about the same fraction of instructions even if an infinitely large instruction reuse buffer is assumed (IR and TC can eliminate 26% and 22% of the instructions, respectively). Another quite surprising finding is that the two techniques target quite different sets of instructions suggesting that they can provide almost additive gains if combined. In combination, they can eliminate 40% of the instructions they target. In terms of energy-efficiency, we finally find that if an instruction reuse buffer of 256 entries is used, it uses 1% more energy than a processor without IR and TC reduces the energy consumption by 5.6%.
    Preview · Conference Paper · Jul 2007
Show more