N. Ranganathan

University of South Florida, Tampa, Florida, United States

Are you N. Ranganathan?

Claim your profile

Publications (318)104.03 Total impact

  • Matthew Morrison, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Quantum mechanical principles that govern the basic laws of physics increasingly limit CMOS operation with transistor scaling. Traditional logic based CMOS circuits cannot achieve ultra-low power levels due to heat dissipated for a single bit loss of information as represented by the Landauer barrier. Reversible logic is a promising computing paradigm towards realization of ultra-low power computing circuits. Reducing average and peak power consumption is an effective strategy for mitigation of side-channel attacks, such as Differential Power Analysis. We present designs of Forward Body Biased Adiabatic Logic for reduction of average, peak, and differential power. HSPICE simulations with predictive 22nm technology are used to analyze performance metrics and exhaustive simulation results are presented for various reversible CMOS designs. Average power is improved upon by up to 91%, the peak power by up to 96%, and the differential power is improved by up to a factor of 128.57.
    Proceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems; 01/2014
  • Saurabh Kotiyal, Himanshu Thapliyal, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible logic has emerged as a promising computing paradigm having applications in quantum computing, optical computing, dissipation less computing and low power computing etc. In reversible logic there exists a one to one mapping between the input and output vectors. Reversible circuits require constant ancilla inputs for reconfiguration of gate functions and garbage outputs that help in keeping reversibility. Quantum circuits of many qubits are extremely difficult to realize thus reduction in the number of ancilla inputs and the garbage outputs is the primary goal of optimization. In existing literature researchers have proposed several designs of reversible quantum multipliers based on reversible full adders and reversible half adders. The use of reversible full adders and the half adders for the addition of partial products increases the overhead in terms of number of ancilla inputs and number of garbage outputs. This paper presents a binary tree based design methodology for a NxN reversible quantum multiplier. The proposed binary tree based design methodology for NxN reversible quantum multiplier performs the addition of partial products in parallel using the reversible ripple quantum adders with no garbage output and ancilla bit, thereby minimizing the number of ancilla and garbage bits used in the design. The proposed design methodology shows the improvement of 17.86% to 60.34% in terms of ancilla inputs, and 21.43% to 52.17% in terms of garbage outputs compared to all the existing reversible quantum multiplier designs.
    Proceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems; 01/2014
  • M. Morrison, N. Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Programmable reversible logic is emerging as a prospective logic design style for implementation in low power, low frequency applications where minimal impact on circuit heat generation is desirable, such as mitigation of differential power analysis attacks. Adiabatic logic is an implementation of reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Recent advances in dual-rail adiabatic logic show reduction in average and differential power, making this design methodology advantageous in applications where security is the primary design metric and operating frequency is slower, such as Smart Cards. In this paper, we present an algorithm for synthesis of adiabatic circuits in CMOS. Then, using the ESPRESSO heuristic for minimization of Boolean functions method on each output node, we reduce the size of the synthesized circuit. Our approach correlates the horizontal offsets in the permutation matrix with the necessary switches required for synthesis instead of using a library of equivalent functions. The synthesis results show that, on average, the proposed algorithm represents an improvement of 36% over the best known reversible designs with the optimized dual-rail cell libraries. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse.
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 01/2014; 33(7):975-988. · 1.09 Impact Factor
  • Saurabh Kotiyal, Himanshu Thapliyal, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible logic is a computing paradigm in which there is a one to one mapping between the input and the output vectors. Reversible logic gates are implemented in an optical domain as it provides high speed and low energy computations. In the existing literature there are two types of optical mapping of reversible logic gates: (i) based on a semiconductor optical amplifier (SOA) using a Mach–Zehnder interferometer (MZI) switch; (ii) based on linear optical quantum computation (LOQC) using linear optical quantum logic gates. In reversible computing, the NAND logic based reversible gates and design methodologies based on them are widely popular. The NOR logic based reversible gates and design methodologies based on them are still unexplored. In this work, we propose two NOR logic based n-input and n-output reversible gates one of which can be efficiently mapped in optical computing using the Mach–Zehnder interferometer (MZI) while the other one can be mapped efficiently in optical computing using the linear optical quantum gates. The proposed reversible NOR gates work as a corresponding NOR counterpart of NAND logic based Toffoli gates. The proposed optical reversible NOR logic gates can implement the reversible boolean logic functions with a reduced number of linear optical quantum logic gates or reduced optical cost and propagation delay compared to their implementation using existing optical reversible NAND gates. It is illustrated that an optical reversible gate library having both optical Toffoli gate and the proposed optical reversible NOR gate is superior compared to the library containing only the optical Toffoli gate: (i) in terms of number of linear optical quantum gates when implemented using linear optical quantum computing (LOQC), (ii) in terms of optical cost and delay when implemented using the Mach–Zehnder interferometer.
    Microelectronics Journal 01/2014; · 0.91 Impact Factor
  • Himanshu Thapliyal, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible logic is gaining significance in the context of emerging technologies such as quantum computing since reversible circuits do not lose information during computation and there is one-to-one mapping between the inputs and outputs. In this work, we present a class of new designs for reversible binary and BCD adder circuits. The proposed designs are primarily optimized for the number of ancilla inputs and the number of garbage outputs and are designed for possible best values for the quantum cost and delay. In reversible circuits, in addition to the primary inputs, some constant input bits are used to realize different logic functions which are referred to as ancilla inputs and are overheads that need to be reduced. Further, the garbage outputs which do not contribute to any useful computations but are needed to maintain reversibility are also overheads that need to be reduced in reversible designs. First, we propose two new designs for the reversible ripple carry adder: (i) one with no input carry c0 and no ancilla input bits, and (ii) one with input carry c0 and no ancilla input bits. The proposed reversible ripple carry adder designs with no ancilla input bits have less quantum cost and logic depth (delay) compared to their existing counterparts in the literature. In these designs, the quantum cost and delay are reduced by deriving designs based on the reversible Peres gate and the TR gate. Next, four new designs for the reversible BCD adder are presented based on the following two approaches: (i) the addition is performed in binary mode and correction is applied to convert to BCD when required through detection and correction, and (ii) the addition is performed in binary mode and the result is always converted using a binary to BCD converter. The proposed reversible binary and BCD adders can be applied in a wide variety of digital signal processing applications and constitute important design components of reversible computing.
    ACM Journal on Emerging Technologies in Computing Systems (JETC). 09/2013; 9(3).
  • M. Morrison, N. Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Programmable reversible logic is emerging as a prospective logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on circuit heat generation. Recent advances in reversible logic using and quantum computer algorithms allow for improved computer architecture and arithmetic logic unit designs. We present an optimization method for reversible logic synthesis based on the Integrated Qubit (IQ) library. This method works in conjunction with existing methods to further improve quantum cost and delay of a synthesized reversible logic circuit. This algorithm runs in O(N) time, and reduces the quantum cost of synthesized circuit by up to 45 percent. In addition, the process of replacing the gates in the synthesized circuits with IQ gates uses a locally optimal technique whose major benefits include reduction of cost as well as delay.
    VLSI (ISVLSI), 2013 IEEE Computer Society Annual Symposium on; 01/2013
  • H. Thapliyal, N. Ranganathan, S. Kotiyal
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose the design of two vectors testable sequential circuits based on conservative logic gates. The proposed sequential circuits based on conservative logic gates outperform the sequential circuits implemented in classical gates in terms of testability. Any sequential circuit based on conservative logic gates can be tested for classical unidirectional stuck-at faults using only two test vectors. The two test vectors are all 1's, and all 0's. The designs of two vectors testable latches, master-slave flip-flops and double edge triggered (DET) flip-flops are presented. The importance of the proposed work lies in the fact that it provides the design of reversible sequential circuits completely testable for any stuck-at fault by only two test vectors, thereby eliminating the need for any type of scan-path access to internal memory cells. The reversible design of the DET flip-flop is proposed for the first time in the literature. We also showed the application of the proposed approach toward 100% fault coverage for single missing/additional cell defect in the quantum-dot cellular automata (QCA) layout of the Fredkin gate. We are also presenting a new conservative logic gate called multiplexer conservative QCA gate (MX-cqca) that is not reversible in nature but has similar properties as the Fredkin gate of working as 2:1 multiplexer. The proposed MX-cqca gate surpasses the Fredkin gate in terms of complexity (the number of majority voters), speed, and area.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 01/2013; 21(7):1201-1209. · 1.22 Impact Factor
  • R. Hyman, N. Ranganathan, T. Bingel, D. Tran Vo
    [Show abstract] [Hide abstract]
    ABSTRACT: Peak power reduction has been a critical challenge in the design of integrated circuits impacting the chip's performance and reliability. The reduction of peak power also reduces the power density of integrated circuits. Due to large IR-voltage drops in circuits, transistor switching slows down giving rise to timing violations and logic failures. In this paper, we present a new clock control strategy for peak-power reduction in VLSI circuits. In the proposed method, the simultaneous switching of combinational paths is minimized by taking advantage of the delay slacks among the paths and clustering the paths with similar slack values. Once the paths are identified based on the path delays and their slack values, the clustering algorithm determines the ideal number of clusters for the given circuit and for each cluster the maximum possible phase shift that can be applied to the clock. The paths are assigned to clusters in a load balanced manner based on the slack values and each cluster will have a phase shift possible on its clock depending on the slack. Thus, the proposed register-transfer level (RTL) method takes advantage of the logic-path timing slack to re-schedule circuit activities at optimal intervals within the unaltered clock period. When switching activities are redistributed more evenly across the clock period, the IC supply-current consumption is also spread across a wider range of time within the clock period. This has the beneficial effect of reducing peak-current draw in addition to reducing RMS power draw without having to change the operating frequency and without utilizing additional power supply voltages as in dual or multi VT approaches. The proposed method is implemented and tested through simulations using an experimental setup with Synopsys Tools Suite and Cadence Tools on the ISCAS'85 benchmark circuits, OpenCore circuits and LEON processor multiplier circuit. Experimental results indicate that peak power can be reduced significantly to at- least 72% depending on the number of clusters and the phase-shifted clock identified as suitable for the given circuit by the proposed algorithms. Although the proposed method incurs some power overhead compared to the traditional clocking method, the overhead can be made negligible compared to the peak-power reduction as seen in the experimental results presented.
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems 01/2013; 21(2):259-269. · 1.22 Impact Factor
  • H. Thapliyal, A. Bhatt, N. Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Conservative reversible logic gate is a reversible logic gate that is reversible in nature and also satisfy the property that there are equal number of 1s in the outputs as in the inputs. In this work, we present a new class of n × n (n inputs and n outputs) conservative reversible logic gate named SCRL (Super Conservative Reversible Logic) gate for the design of reversible quantum circuits. The proposed SCRL gate has 1 control input depending on the value of which it can swap any two n - 1 data inputs, hence is superior to the existing Fredkin gate. In reversible circuits, the constant input bits that are used to realize different logic functions are referred to as ancilla inputs, while the outputs that are neither primary inputs nor contribute to any useful computations are referred to as garbage outputs. As Ancilla inputs and garbage outputs are overhead bits in a reversible circuit, they need to be minimized. Barrel shifter forms an integral component of many computing systems. As an example of using the proposed SCRL gate to design efficient reversible quantum circuits, the design of reversible barrel shifter with zero ancilla inputs and zero garbage outputs is illustrated.
    Circuits and Systems (MWSCAS), 2013 IEEE 56th International Midwest Symposium on; 01/2013
  • Source
    Matthew Lewandowski, Nagarajan Ranganathan, Matthew Morrison
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible logic is gaining significant consideration as the potential logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on physical entropy. Recent advances in reversible logic allow schemes for computer architectures using improved quantum computer algorithms. We present a VHDL behavioral model for the design and simulation of the quantum interactions of qubits in theoretical reversible logic structures. Modeling IQ gates, as opposed to only Control-V gates or Toffoli gates, allows for a more robust model that more accurately reflects a theoretical reversible computing structure. This method is an extension to existing programming language and modeling method that allows for reversible logic structures to be designed, simulated, and verified. To the best of our knowledge, this is the first work in the behavioral model of integrated qubit gates.
    IEEE Computer Society International Symposium on VLSI; 01/2013
  • Venkataraman Mahalingam, Nagarajan Ranganathan, Ransford Hyman Jr
    [Show abstract] [Hide abstract]
    ABSTRACT: In the nanometer era, process, voltage, and temperature variations are dominating circuit performance, power, and yield. Over the past few years, statistical optimization methods have been effective in improving yield in the presence of uncertainty due to process variations. However, statistical methods overconsume resources, even in the absence of variations. Hence, to facilitate a better performance-power-yield trade-off, techniques that can dynamically enable variation compensation are becoming necessary. In this article, we propose a dynamic technique that controls the instance of data capture in critical path memory flops, by delaying the clock edge trigger. The methodology employs a dynamic delay detection circuit to identify the uncertainty in delay due to variations and stretches the clock in the destination flip-flops. The delay detection circuit uses a latch and set of combinational gates to dynamically detect and create the slack needed to accommodate the delay due to variations. The Clock Stretching Logic (CSL) is added only to paths, which have a high probability of failure in the presence of variations. The proposed methodology improves the timing yield of the circuit without significant overcompensation. The methodology approach was simulated using Synopsys design tools for circuit synthesis and Cadence tools for placement and routing of the design. Extraction of parasitic of timing information was parsed using Perl scripts and simulated using a simulation program generated in C++. Experimental results based on Monte-Carlo simulations on benchmark circuits indicate considerable improvement in timing yield with negligible area overhead.
    ACM Journal on Emerging Technologies in Computing Systems - JETC. 01/2012;
  • M. Morrison, N. Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Significant debate exists in the literature with regards to the permissibility of feedback in reversible computing nanotechnologies. Feedback allows for reuse of logical subroutines, which is a desired functionality of any computational device. Determining whether loop back is allowed is paramount to assessing the robustness of reversible logic in any quantum design. In this paper, the fundamental discoveries in entropy and quantum mechanics that serve as the foundations for reversible logic are reviewed. The fundamentals for implementation of reversibility in computing are shown. Then, definitions are presented for a sequential reversible logic structure. A sequential reversible logic structure is proven to have an identical number of feedback-dependent inputs and feedback-producing outputs, and new metrics for measuring the probability of each output state are presented. Using these metrics, the reversibility of each clock cycle of such a device is verified. Therefore, we demonstrate that any reversible logic structure with feedback is physically reversible.
    VLSI (ISVLSI), 2012 IEEE Computer Society Annual Symposium on; 01/2012
  • S. Kotiyal, H. Thapliyal, N. Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible logic has promising applications in dissipation less optical computing, low power computing, quantum computing etc. Reversible circuits do not lose information, and there is a one to one mapping between the input and the output vectors. In recent years researchers have implemented reversible logic gates in optical domain as it provides high speed and low energy computations. The reversible gates can be easily fabricated at the chip level using optical computing. The all optical implementation of reversible logic gates are based on semiconductor optical amplifier (SOA) based Mach-Zehnder interferometer (MZI). The Mach-Zehnder interferometer has advantages such as high speed, low power, easy fabrication and fast switching time. In the existing literature, the NAND logic based implementation is the only implementation available for reversible gates and functions. There is a lack of research in the direction of NOR logic based implementation of reversible gates and functions. In this work, we propose the NOR logic based all optical reversible gates referred as all optical TNOR gate and all optical PNOR gate. The proposed all optical reversible NOR logic gates can implement the reversible boolean logic functions with reduced optical cost and propagation delay compared to their implementation using existing all optical reversible NAND gates. The advantages in terms of optical cost and delay is illustrated by implementing 13 standard boolean functions that can represent all 256 possible combinations of three variable boolean function.
    VLSI (ISVLSI), 2012 IEEE Computer Society Annual Symposium on; 01/2012
  • H. Thapliyal, N. Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible circuits can generate unique output vector from each input vector, and vice-versa, that is, there is a one-to-one mapping between the input and the output vectors. The contributions of the dissertation include a novel reversible gate particularly suitable for reversible arithmetic, several designs for reversible arithmetic such as binary and BCD adders, sub tractors and comparators, a set of reversible sequential circuits such as latches, flip-flops, and shift registers. Unlike previous works, the above designs are optimized for multiple parameters such as ancilla and garbage bits, quantum cost and delay. Another important contribution is the application of conservative reversible logic towards online and offline testing of single as well as multiple faults in reversible as well as traditional logic VLSI circuits.
    VLSI (ISVLSI), 2012 IEEE Computer Society Annual Symposium on; 01/2012
  • Conference Paper: [Seven tutorials]
    [Show abstract] [Hide abstract]
    ABSTRACT: These tutorials discusses the following: Operational Amplifiers: Theory, Design and Applications; Reversible Logic: Basics, Prospects in Emerging Nanotechnologies and Challenges in Future; Digital Signal Processing for Communications; Recent Advances on Nyquist and Oversampled Analog-to-Digital Converters; Audio Steganography for Watermarking, Data Embedding and Covert Communication; Delta-Sigma Analog-to-Digital Converters - From System Architecture to Transistor-Level Design; The Memristor: a Circuit Designer's Prospective.
    Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on; 01/2012
  • Source
    Saurabh Kotiyal, Himanshu Thapliyal, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: In recent years reversible logic has emerged as a promising computing model for applications in dissipation less optical computing, low power CMOS, quantum computing, etc. In reversible circuits there exist a one-to-one mapping between the inputs and the outputs resulting in no loss of information. Researchers have implemented reversible logic gates in optical computing domain as it can provide high speed and low energy requirement along with easy fabrication at the chip level [1]. The all optical implementation of reversible gates are based on semiconductor optical amplifier (SOA) based Mach-Zehnder interferometer (MZI) due to its significant advantages such as high speed, low power, fast switching time and ease in fabrication. In this work we present the all optical implementation of an n bit reversible ripple carry adder for the first time in literature. The all optical reversible adder design is based on two new optical reversible gates referred as optical reversible gate I (ORG-I) and optical reversible gate II (ORG-II) and the existing all optical Feynman gate. The two new reversible gates ORG-I and ORGI-I are proposed as they can implement a reversible adder with reduced optical cost which is the measure of number of MZIs switches and the propagation delay, and with zero overhead in terms of number of ancilla inputs and the garbage outputs. The proposed all optical reversible adder design based on the ORG-I and ORG-II reversible gates are compared and shown to be better than the other existing designs of reversible adder proposed in non-optical domain in terms of number of MZIs, delay, number of ancilla inputs and the garbage outputs. The proposed all optical reversible ripple carry adder will be a key component of an all optical reversible ALU that can be applied in a wide variety of optical signal processing applications.
    01/2012;
  • Himanshu Thapliyal, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Reversible logic is emerging as a promising computing paradigm with applications in ultralow power nanocomputing and emerging nanotechnologies such as quantum computing, quantum dot cellular automata (QCA), optical computing, etc. Reversible circuits are similar to conventional logic circuits except that they are built from reversible gates. In reversible gates, there is a unique, one-to-one mapping between the inputs and outputs, not the case with conventional logic. One of the primary motivations for adopting reversible logic lies in the fact that it can provide a logic design methodology for designing ultra-low power circuits beyond KTln2 limit for those emerging nanotechnologies in which the energy dissipated due to information destruction will be a significant factor of the overall heat dissipation. Further, logic circuits for quantum computers must be built from reversible logic components. Several important metrics need to be considered in the design of reversible circuits the importance of which needs to be discussed. Quantum computers of many qubits are extremely difficult to realize thus the number of qubits in the quantum circuits needs to be minimized. This sets the major objective of optimizing the number of ancilla inputs and the number of the garbage outputs in the reversible logic based quantum circuits. The constant input in the reversible quantum circuit is called the ancilla input, while the garbage output refers to the output which exists in the circuit just to maintain one-to-one mapping but is not a primary or a useful output. The reversible circuit has other important parameters of quantum cost and delay which need to be optimized.
    Proceedings of the IEEE International Conference on VLSI Design 01/2012;
  • Source
    Yue Wang, Soumyaroop Roy, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a novel microarchitectural technique for run-time power-gating caches of GPUs to save leakage energy. The L1 cache (private to a core) can be put in a low-leakage sleep mode when there are no ready threads to be scheduled, and the L2 cache can be put in sleep mode when there is no memory request. The sleep mode is state-retentive, which precludes the necessity to flush the caches after they are woken up. The primary reason for the effectiveness our technique lies in the fact that the latency of detecting cache inactivity, putting a cache to sleep and waking it up before it is accessed, is completely hidden microarchitecturally. The technique incurs insignificant overheads in terms of power and area. Experiments were performed using the GPGPU-Sim simulator on benchmarks that was set up using the CUDA framework. The power and latency modeling of the cache arrays for measuring the wake-up latency and the break-even periods is performed using a 32-nm SOI IBM technology model. Based on experiments on 16 different GPU workloads, the average energy savings achieved by the proposed technique is 54%.
    01/2012;
  • Source
    Matthew Morrison, Matthew Lewandowski, Nagarajan Ranganathan
    [Show abstract] [Hide abstract]
    ABSTRACT: Programmable reversible logic is gain wide consideration as a logic design style for modern nanotechnology and quantum computing with minimal impact on circuit heat generation in improved computer architecture and arithmetic logic unit designs. In this paper, a 2*2 Swap gate which is a reduced implementation in terms of quantum cost and delay to the previous Swap gate is presented. Then, a novel 3*3 programmable UPG gate capable of calculating the universal logic calculations is presented and verified, and its advantages over the Toffoli and Peres gates are discussed. The UPG is then implemented in a reduced design for calculating n-bit AND, n-bit OR and n-bit ZERO calculations. Then, two 3*3 RMUX gates capable of multiplexing two input values with reduced quantum cost and delay compared to the previously existing Fredkin gate is presented and verified. Next, a novel 4*4 reversible programmable RC gate capable of nine unique logical calculations at low cost and delay is presented and verified. The UPG and RC are implemented in the design of novel sequential and tree-based comparators. These designs are compared to previously existing designs, and their advantages in terms of cost and delay are analyzed. Then, the RMUX is used to improve a reversible SRAM cell we previously presented. The memory cell and comparator are implemented in the design of a Min/Max Comparator device.
    IEEE Computer Society Annual Symposium on VLSI; 01/2012
  • Source
    S. Roy, N. Ranganathan, S. Katkoori
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we investigate state-retentive power gating of register files for leakage reduction in multicore processors supporting multithreading. In an in-order core, when a thread gets blocked due to a memory stall, the corresponding register file can be placed in a low leakage state through power gating for leakage reduction. When the memory stall gets resolved, the register file is activated for being accessed again. Since the contents of the register file are not lost and restored on wakeup, this is referred to as state-retentive power gating of register files. While state-retentive power gating in single cores has been studied in the literature, it is being investigated for multicore architectures for the first time in this work. We propose specific techniques to implement state-retentive power gating for three different multicore processor configurations based on the multithreading model: 1) coarse-grained multithreading, 2) fine-grained multithreading, and 3) simultaneous multithreading. The proposed techniques can be implemented as design extensions within the control units of the in-order cores. Each technique uses two different modes of leakage states: low-leakage savings and low wake-up and high-leakage savings and high wake-up latency. The overhead due to wake-up latency is completely avoided in two techniques while it is hidden for most part in the third approach, either by overlapping the wake-up process with the thread context switching latency or by executing instructions from other threads ready for execution. The proposed techniques were evaluated through simulations with multiprogrammed workloads comprised of SPEC 2000 integer benchmarks. Experimental results show that in an 8-core processor executing 64 threads, the average leakage savings were 42 percent in coarse-grained multithreading, while they were between seven percent and eight percent for finegrained and simultaneous multithreading.
    IEEE Transactions on Computers 12/2011; · 1.38 Impact Factor

Publication Stats

2k Citations
104.03 Total Impact Points

Institutions

  • 1989–2014
    • University of South Florida
      • • Department of Computer Science & Engineering
      • • Department of Electrical Engineering
      Tampa, Florida, United States
  • 2005–2007
    • University of North Texas
      • Department of Computer Sciences & Engineering
      Denton, TX, United States
  • 2004
    • Stevens Institute of Technology
      • Department of Electrical & Computer Engineering
      Hoboken, NJ, United States
  • 1999–2003
    • University of Texas at El Paso
      • Department of Electrical and Computer Engineering
      El Paso, TX, United States
  • 1988–2003
    • University of Central Florida
      • Department of Electrical Engineering & Computer Science
      Orlando, FL, United States
  • 2002
    • Florida Atlantic University
      Boca Raton, Florida, United States
  • 2001
    • Winter Haven Hospital
      Florida, United States
  • 1999–2001
    • AT&T Labs
      Austin, Texas, United States
  • 2000
    • Pennsylvania State University
      • Department of Computer Science and Engineering
      University Park, MD, United States
  • 1998
    • University of South Florida St. Petersburg
      St. Petersburg, Florida, United States
  • 1992–1996
    • University of Zagreb
      Zagrabia, Grad Zagreb, Croatia
  • 1991–1994
    • University of Kentucky
      • Department of Computer Science
      Lexington, Kentucky, United States