Arijit Raychowdhury

Georgia Institute of Technology, Atlanta, Georgia, United States

Are you Arijit Raychowdhury?

Claim your profile

Publications (100)64.14 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: An original circuit-level model of two-terminal vanadium dioxide electron devices exhibiting electronic hysteresis is presented. Such devices allow realisation of very compact relaxation nano-oscillators that potentially can be used in bio-inspired neurocomputing. The proposed model is exploited to determine the parameters, values that ensure stable periodic oscillations.
    Electronics Letters 05/2015; 51(11):819-820. DOI:10.1049/el.2015.0025 · 1.07 Impact Factor
  • Source
    Saad Bin Nasir, Arijit Raychowdhury
    [Show abstract] [Hide abstract]
    ABSTRACT: With an increasing number of power-states, finer- grained power management and larger dynamic ranges of digital circuits, the integration of compact, scalable linear-regulators embedded deep within logic blocks has become important. While analog linear-regulators have traditionally been used in digital ICs, the need for digitally implementable designs that can be synthesized and embedded in digital functional units for ultra fine- grained power management has emerged. This paper presents the circuit design and control models of an all-digital, discrete-time linear regulator and explores the parametric design space for transient response time and loop stability.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The need for fine-grained power management in digital ICs has led to the design and implementation of compact, scalable low-drop out regulators (LDOs) embedded deep within logic blocks. While analog LDOs have traditionally been used in digital ICs, the need for digitally implementable LDOs embedded in digital functional units for ultrafine grained power management is paramount. This paper presents a fully-digital, phase locked LDO implemented in 32 nm CMOS. The control model of the proposed design has been provided and limits of stability have been shown. Measurement results with a resistive load as well as a digital load exhibit peak current efficiency of 98%.
    IEEE Journal of Solid-State Circuits 11/2014; 49(11):2684-2693. DOI:10.1109/JSSC.2014.2353798 · 3.11 Impact Factor
  • Source
    Abhinav Parihar, Nikhil Shukla, Suman Datta, Arijit Raychowdhury
    [Show abstract] [Hide abstract]
    ABSTRACT: Computing with networks of synchronous oscillators has attracted wide-spread attention as novel materials and device topologies have enabled realization of compact, scalable and low-power coupled oscillatory systems. Of particular interest are compact and low-power relaxation oscillators that have been recently demonstrated using MIT (metal- insulator-transition) devices using properties of correlated oxides. This paper presents an analysis of the dynamics and synchronization of a system of two such identical coupled relaxation oscillators implemented with MIT devices. We focus on two implementations of the oscillator: (a) a D-D configuration where complementary MIT devices (D) are connected in series to provide oscillations and (b) a D-R configuration where it is composed of a resistor (R) in series with a voltage-triggered state changing MIT device (D). The MIT device acts like a hysteresis resistor with different resistances in the two different states. The synchronization dynamics of such a system has been analyzed with purely charge based coupling using a resistive (Rc) and a capacitive (Cc) element in parallel. It is shown that in a D-D configuration symmetric, identical and capacitively coupled relaxation oscillator system synchronizes to an anti-phase locking state, whereas when coupled resistively the system locks in phase. Further, we demonstrate that for certain range of values of Rc and Cc, a bistable system is possible which can have potential applications in associative computing. In D-R configuration, we demonstrate the existence of rich dynamics including non-monotonic flows and complex phase relationship governed by the ratios of the coupling impedance. Finally, the developed theoretical formulations have been shown to explain experimentally measured waveforms of such pairwise coupled relaxation oscillators.
    Journal of Applied Physics 08/2014; 117(5). DOI:10.1063/1.4906783 · 2.19 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Harnessing the computational capabilities of dynamical systems has attracted the attention of scientists and engineers form varied technical disciplines over decades. The time evolution of coupled, non-linear synchronous oscillatory systems has led to active research in understanding their dynamical properties and exploring their applications in brain-inspired, neuromorphic computational models. In this paper we present the realization of coupled and scalable relaxation-oscillators utilizing the metal-insulator-metal transition of vanadium-dioxide (VO2) thin films. We demonstrate the potential use of such a system in pattern recognition, as one possible computational model using such a system.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Strongly correlated phases exhibit collective carrier dynamics that if properly harnessed can enable novel functionalities and applications. In this article, we investigate the phenomenon of electrical oscillations in a prototypical MIT system, vanadium dioxide (VO2). We show that the key to such oscillatory behaviour is the ability to induce and stabilize a non-hysteretic and spontaneously reversible phase transition using a negative feedback mechanism. Further, we investigate the synchronization and coupling dynamics of such VO2 based relaxation oscillators and show, via experiment and simulation, that this coupled oscillator system exhibits rich non-linear dynamics including charge oscillations that are synchronized in both frequency and phase. Our approach of harnessing a non-hysteretic reversible phase transition region is applicable to other correlated systems exhibiting metal-insulator transitions and can be a potential candidate for oscillator based non-Boolean computing.
    Scientific Reports 05/2014; 4:4964. DOI:10.1038/srep04964 · 5.58 Impact Factor
  • Saad Bin Nasir, Youngtak Lee, Arijit Raychowdhury
    [Show abstract] [Hide abstract]
    ABSTRACT: On-chip power delivery networks (PDNs) for today's microprocessors and systems-on-chip (SoCs), which are characterized by dynamic supply voltage, many embedded integrated VRs (IVRs), lower decoupling-capacitor, high current ranges, multiple power modes and fast transient loads are designed to minimize AC load transients and supply noise. The close interaction of the VRs with the power grids create multiple feedback paths in the overall network, compromising the resultant phase margin and can even lead to system instabilities. The introduction of digital linear regulators operating in the low dropout (LDO) mode, with low power supply rejection, further exacerbates the problem. This paper provides a comprehensive methodology, based on Mason's Gain Formula applied to hybrid control, for modeling and analyzing distributed linear regulators and their interaction with the PDN.
    2014 15th International Symposium on Quality Electronic Design (ISQED); 03/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a low-power graphics processing core that achieves a 40% improvement in peak energy efficiency using dual-VCC arrays, adaptive clocking for voltage droop mitigation, and state retention capability with an integrated retention clamping circuit for low-power sleep mode. The 22nm testchip includes a graphics execution core connected to an SRAM array and test controller used for storage and delivery of at-speed test vectors. Correct execution of the tests is validated through a multiple-input signature register (MISR), which accumulates key signals in the core and generates a 32b signature at test completion.
    2014 IEEE International Solid- State Circuits Conference (ISSCC); 02/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Discrete time digital linear regulators, including low dropout regulators (LDOs) have become competitive in muti-Vcc digital systems for fine-grained spatio-temporal voltage regulation and distribution. However, wide dynamic current range of the digital load circuits poses serious problems in maintaining stability and high efficiency at all corners. In this paper we present a control model for discrete time LDOs and demonstrate how online adaptive control can be employed for consistent performance and high efficiency across the load current range.
    Design Automation and Test in Europe; 01/2014
  • A. Parihar, N. Shukla, S. Datta, A. Raychowdhury
    [Show abstract] [Hide abstract]
    ABSTRACT: As complementary metal-oxide-semiconductor (CMOS) scaling continues to offer insurmountable challenges, questions about the performance capabilities of Boolean, digital machine based on Von-Neumann architecture, when operated within a power budget, have also surfaced. Research has started in earnest to identify alternative computing paradigms that provide orders of magnitude improvement in power-performance for specific tasks such as graph traversal, image recognition, template matching, and so on. Further, post-CMOS device technologies have emerged that realize computing elements which are neither CMOS replacements nor suited to work as a binary switch. In this paper, we present the realization of coupled and scalable relaxation-oscillators utilizing the metal-insulator-metal transition of vanadium-dioxide (VO2) thin films. We demonstrate the potential use of such a system in a non-Boolean computing paradigm and demonstrate pattern recognition, as one possible application using such a system.
    01/2014; 4(4):450-459. DOI:10.1109/JETCAS.2014.2361069
  • [Show abstract] [Hide abstract]
    ABSTRACT: Advanced human-machine interfaces require improved embedded sensors that can seamlessly interact with the user. Voice-based communication has emerged as a promising interface for next generation mobile, automotive and hands-free devices. Presented here is such an audio front-end with Voice Activity Detection (VAD) hardware targeted for low-power embedded SoCs, featuring a 512 pt FFT, programmable filters, noise floor estimator and a decision engine which has been fabricated in 32 nm CMOS. The dual-VCC, dual-frequency design allows the core datapath to scale to near-threshold voltage (NTV), where power consumption is less than 50 uW. At peak energy efficiency, the core can process audio data at 2.3 nJ/frame - a 9.4X improvement over nominal voltage conditions.
    IEEE Journal of Solid-State Circuits 08/2013; 48(8):1963-1969. DOI:10.1109/JSSC.2013.2258827 · 3.11 Impact Factor
  • A. Raychowdhury
    [Show abstract] [Hide abstract]
    ABSTRACT: Ever larger on-die memory arrays for future processors in CMOS logic technology drives the need for dense and scalable embedded memory alternatives beyond SRAM and eDRAM. Recent advances in non-volatile STT-RAM technology, which stores data by the spin orientation of a soft ferromagnetic material and shows current induced switching, have created interest for its use as embedded memory [1-3]. When a spin-polarized current passes through a mono-domain ferromagnet, it attempts to polarize the current in its preferred direction of magnetic moment. As the ferromagnet absorbs some of the angular momentum of the electrons, it creates a torque that causes a flip in the direction of magnetization in the ferromagnet. This is used in magnetic tunneling junction (MTJ) based spin torque transfer (STT) RAM cells where a thin insulator (MgO) is sandwiched between a fixed ferromagnetic layer (polarizer) and the free layer (storage node). This can be integrated in the metal stack (Fig. 1) and hence provide high memory density. Depending on the direction of the current flow (perpendicular to these layers in our study), the magnetization of the free layer is switched to a parallel (P: low resistance state) or anti-parallel (AP: high resistance state) state. The minimum size cell (mincell) contains an access transistor (Tx) of width 2F (WTX=2F, F: half-pitch of the process node) and a planar storage node of dimensions 2FxF. The area of the mincell is 3Fx2F=6F2. In this paper, we examine the design space for key magnetic material properties and access transistor needed for embedded on-die memory with adequate scalability, density, read/write performance and robustness against various intrinsic variabilities and disturbances. New models and simulation methodologies, calibrated to existing measurements [1], for read, write and disturbance mechanisms are developed. Different storage node structures and materials are evaluated to reveal the most promising scaling opti- ns.
    Computer-Aided Design (ICCAD), 2013 IEEE/ACM International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Domain Wall Memory (DWM) is a recently developed spin-based memory technology in which several bits of data are densely packed into the domains of a ferromagnetic wire. DWM has shown great promise in enabling non-volatile memory with unprecedented density and high energy efficiency. In this work, we propose TapeCache, a first attempt to employ DWMs as last-level caches in general purpose computing platforms. DWMs enable much higher density compared to SRAM, DRAM, and other spin-based memory technologies such as STT-MRAM. However, they also pose unique challenges such as serial access to the bits stored in a DWM cell, leading to variable access latencies. We propose a novel circuit-architecture co-design for TapeCache, consisting of (i) a multi-port DWM macro-cell optimized for read operations considering the asymmetry in applications' read/write characteristics, and (ii) a new cache organization and suitable management policies that mitigate the performance penalty arising from serial access to bits in a macro-cell. Over a wide range of SPEC 2006 benchmarks, TapeCache achieves 7.8X improvement in area, an average energy improvement of 7.3X, and an average performance improvement of 1.2% compared to an iso-capacity SRAM cache. Compared to an iso-capacity STT-MRAM cache, TapeCache obtains 2.3X improvement in area and 1.4X average energy savings with virtually identical performance.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This session brings together specialists from the DfT, DfY and DfR domains that will address key problems together with their solutions for the 14nm node and beyond, dealing with extremely complex chips affected by high defect levels, unpredictable and heterogeneous timing behavior, circuit degradation over time, including extreme situations related with the ultimate CMOS nodes, where all processor nodes, routers and links of single-chip massively parallel tera-device processors could comprise timing faults (such as delay faults or clock skews); a large percentage of these parts are affected by catastrophic failures; all parts experience significant performance degradations over time; and new catastrophic failures occur at low MTBF.
    03/2012; DOI:10.1109/DATE.2012.6176556
  • A. Raychowdhury, D. Somasekhar, J. Tschanz, V. De
    [Show abstract] [Hide abstract]
    ABSTRACT: A fully-digital phase-locked low dropout regulator (LDO) has been designed in 32nm CMOS for fine-grained power delivery to multi-Vcc digital circuits. Measurements across a wide range of input voltages and currents exhibit that the LDO offers excellent load regulation and efficiency close to 97% of ideal efficiency at nominal load current conditions (ILOAD=3mA).
    VLSI Circuits (VLSIC), 2012 Symposium on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: High leakage power in sub-100-nm memory technol- ogy nodes drives the need for nonvolatile memory devices to reduce power consumption and enhance battery life. Spin-transfer torque magnetic tunneling junction (STT-MTJ) is a promising nonvolatile memory device with comparable read and write performances as SRAM and eDRAM with almost zero standby power. In this paper, we present a simulation framework that can solve transport (using nonequilibrium Green's function (NEGF) formalism) and magnet dynamics (using Landau-Lifshitz-Gilbert (LLG) equa- tion) self-consistently to study the read and write performances of STT-MTJ. Due to process variations, thermal disturbances, and stray fields, the performance of STT-MTJ degrades and results in one transistor-one STT-MTJ (1T-1STTMTJ) memory failures. A thorough memory design space investigation can help us to reduce such failures. Hence, we present a design space exploration framework for 1T-1STTMTJ memory, which consists of magnetic materials with different RAPA products, different genres of MTJ stacks, and a transistor. A comprehensive study based on critical memory performance metrics such as tunneling magnetoresis- tance, JC , and write cycle shows the relative merits and demerits of each MTJ stack for embedded memory applications. Finally, the benefits of synthetic antiferromagnet free layer in providing immunity against stray fields are shown illustrating the need for coupled free-layer stacks in scaled technology nodes. Index Terms—Dual-barrier, dual-free-layer, failures, magnetic memory, magnetic tunneling junction (MTJ), non-equilibrium Green's function (NEGF), spin-transfer torque (STT), variability, yield.
    IEEE Transactions on Electron Devices 12/2011; 58(12):4333-4343. DOI:10.1109/TED.2011.2169962 · 2.36 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A 45 nm microprocessor integrates an all-digital dynamic variation monitor (DVM) to continuously measure the impact of dynamic parameter variations on circuit-level performance to enhance silicon debug and adaptive clock control. The DVM consists of a tunable replica circuit, a time-to-digital converter, and multiplexers to measure circuit delay or frequency changes with less than a 1% measured resolution error while capturing clock-to-data correlations. In validating the DVM with microprocessor maximum clock frequency (FMAX) measurements, an on-die noise injector circuit induces a supply voltage (VCC) droop at a particular cycle in the test program. The FMAX measurement is then repeated for over a thousand iterations while shifting the droop placement to a different cycle per iteration. Silicon measurements demonstrate the DVM capability of tracking the worst case FMAX reduction to within 1% for a wide range of VCC droop profiles. Furthermore, silicon measurements reveal that FMAX is highly sensitive to the placement and magnitude of a high-frequency VCC droop during program execution, thus highlighting the value of the DVM for silicon debug. In addition, the DVM interfaces with an adaptive clock control circuit to dynamically adjust the clock frequency by changing the divide ratio in the phase-locked loop in response to persistent variations, enabling the microprocessor to adapt to the operating environment for maximum efficiency.
    Circuits and Systems I: Regular Papers, IEEE Transactions on 09/2011; 58(9):2017-2025. DOI:10.1109/TCSI.2011.2163893 · 2.30 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Built-in resiliency features enable a microprocessor to detect and correct errors due to fast dynamic voltage droop events as well as other types of dynamic variations. Timing errors in the microprocessor core as well as read (RD) and write (WR) errors in the 8T SRAM based cache can be detected. As a result, guardbands added for these variations are reduced or eliminated, improving performance and reducing power consumption. Measurements on a 45 nm research microprocessor core demonstrate 41% improvement in throughput or 22% reduction in energy at 0.8 V. Measurements on the cache demonstrate reduction of the minimum operating Vcc (VMIN) by 9% thereby resulting in a 7.5% reduction of net operating power.
    09/2011; 1(3):208-217. DOI:10.1109/JETCAS.2011.2167070
  • [Show abstract] [Hide abstract]
    ABSTRACT: Infrequent dynamic events like V<sub>CC</sub> droops and temperature changes result in the use of a static V<sub>CC</sub> guardband in 8T SRAM arrays. This paper proposes the use of tunable replica bits (TRBs) as a potential solution to mitigating a part of the V<sub>CC</sub> guardband. Measured data on a 16 KB 8T array featuring tun able replica bits illustrate 9% reduction of the operating minimum V<sub>CC</sub> (V<sub>MIN</sub>) and correspondingly a 7.5% reduction in array power.
    IEEE Journal of Solid-State Circuits 05/2011; 46(4-46):797 - 805. DOI:10.1109/JSSC.2011.2108141 · 3.11 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: A 45 nm microprocessor core integrates resilient error-detection and recovery circuits to mitigate the clock frequency (F<sub>CLK</sub>) guardbands for dynamic parameter variations to improve throughput and energy efficiency. The core supports two distinct error-detection designs, allowing a direct comparison of the relative trade-offs. The first design embeds error-detection sequential (EDS) circuits in critical paths to detect late timing transitions. In addition to reducing the Fclk guardbands for dynamic variations, the embedded EDS design can exploit path-activation rates to operate the microprocessor faster than infrequently-activated critical paths. The second error-detection design offers a less-intrusive approach for dynamic timing-error detection by placing a tunable replica circuit (TRC) per pipeline stage to monitor worst-case delays. Although the TRCs require a delay guardband to ensure the TRC delay is always slower than critical-path delays, the TRC design captures most of the benefits from the embedded EDS design with less implementation overhead. Furthermore, while core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (V<sub>CC</sub>) measurements. Both error-detection designs interface with error-recovery techniques, enabling the detection and correction of timing errors from fast-changing variations such as high-frequency V<sub>CC</sub> droops. The microprocessor core also supports two separate error-recovery techniques to guarantee correct execution even if dynamic variations persist. The first technique requires clock control to replay errant instructions at 1/2F<sub>CLK</sub>. In comparison, the second technique is a new multiple-issue instruction replay design that corrects errant instructions with a lower performance penalty and without requiring clock control. Silico- - n measurements demonstrate that resilient circuits enable a 41% throughput gain at equal energy or a 22% energy reduction at equal throughput, as compared to a conventional design when executing a benchmark program with a 10% V<sub>CC</sub> droop. In addition, the microprocessor includes a new adaptive clock control circuit that interfaces with the resilient circuits and a phase-locked loop (PLL) to track recovery cycles and adapt to persistent errors by dynamically changing Fclk f°Γ maximum efficiency.
    IEEE Journal of Solid-State Circuits 02/2011; DOI:10.1109/JSSC.2010.2089657 · 3.11 Impact Factor

Publication Stats

1k Citations
64.14 Total Impact Points

Institutions

  • 2010–2014
    • Georgia Institute of Technology
      • School of Electrical & Computer Engineering
      Atlanta, Georgia, United States
  • 2009–2014
    • Intel
      • Intel IT Labs
      Santa Clara, California, United States
  • 2003–2010
    • Purdue University
      • School of Electrical and Computer Engineering
      West Lafayette, IN, United States
  • 2008
    • Texas A&M University
      • Department of Computer Science and Engineering
      College Station, Texas, United States