V. Kamakoti

Indian Institute of Technology Madras, Chennai, Tamil Nadu, India

Are you V. Kamakoti?

Claim your profile

Publications (107)24.88 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Scan-based testing is crucial to ensuring correct functioning of chips. In this scheme, the scan and capture phases are interleaved. It is well known that for large designs, excessive switching activity during the launch-to-capture window leads to high voltage droop on the power grid, ultimately resulting in false delay failures during at-speed test. This article proposes a new design-for-testability (DFT) scheme for launch-onshift (LOS) testing, which ensures that the combinational logic remains undisturbed between the interleaved capture phases, providing computer-aided-design (CAD) tools with extra search space for minimizing launchto-capture switching activity through test pattern ordering (TPO). We further propose a new TPO algorithm that keeps track of the don't cares during the ordering process, so that the don't care filling step after the ordering process yields a better reduction in launch-to-capture switching activity compared to any other technique in the literature. The proposed DFT-assisted technique, when applied to circuits in ITC99 benchmark suite, produces an average reduction of 17.68% in peak launch-to-capture switching activity (CSA) compared to the best known lowpower TPO technique. Even for circuits whose test cubes are not rich in don't care bits, the proposed technique produces an average reduction of 15% in peak CSA, while for the circuits with test cubes rich in don't care bits (≥75%), the average reduction is 24%. The proposed technique also reduces the average power dissipation (considering both scan cells and combinational logic) during the scan phase by about 43.5% on an average, compared to the adjacent filling technique.
    No preview · Article · Dec 2015 · ACM Transactions on Design Automation of Electronic Systems
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the increase in process variations and diversity in workloads, it is imperative to holistically explore optimization techniques for power and temperature from the circuit layer right up to the compiler/operating system (OS) layer. This article proposes one such holistic technique, called proactive workload aware temperature management framework for low-power chip multi-processors (ProWATCh). At the compiler level ProWATCh includes two techniques: (1) a novel compiler design for estimating the architectural parameters of a task at compile time; and (2) a model-based technique for dynamic estimation of architectural parameters at runtime. At the OS level ProWATCh integrates two techniques: (1) a workload- and temperature-aware process manager for dynamic distribution of tasks to different cores; and (2) a model predictive control-based task scheduler for generating the efficient sequence of task execution. At the circuit level ProWATCh implements either of two techniques: (1) a workload-aware voltage manager for dynamic supply and body bias voltage assignment for a given frequency in processors that support adaptive body bias (ABB); or (2) a workload-aware frequency governor for efficient assignment of upper and lower frequency bounds for frequency scaling in processors that do not support an ABB. Employing ProWATCh (with voltage manager) on an ABB-compatible 3D OpenSPARC architecture using MiBench benchmarks resulted in an average 18% (19°C) reduction in peak temperature. Evaluating ProWATCh on an existing quad-core Intel Corei7 processor with frequency governor alone (as the processor does not support an ABB interface) resulted in 10% (8°C) reduction in peak temperature when compared to what was obtained using the native Linux 3.0 completely fair scheduler (CFS). To study the effectiveness of the proposed framework across benchmark suites, ProWATCh was evaluated on a quad-core Intel Corei7 processor using CPU SPEC 2006 benchmarks which resulted in 7°C reduction in peak temperature as compared to the native Linux 3.0 CFS.
    No preview · Article · Sep 2015 · ACM Journal on Emerging Technologies in Computing Systems
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the possible end of the Moore's Law at the horizon, Approximate Computing has gathered momentum over the past years as a possible alternative for low power designs. Approximate computing trades better energy performance for tolerable inaccuracies at the output of the design. In this paper, we propose an application independent automated flow that converts a given design into an approximate version using either voltage scaling or power gating based techniques. The proposed model is shown to be effective for designing low power media type IPs (Intellectual Properties) based ASICs (Application Specific Integrated Chips). The model encompasses various automated techniques to identify logic within a given design which in turn can be leveraged for approximation. Following this identification the model uses a series of physical optimizations which lead to a tunable approximate circuit capable of operating in both approximate and accurate modes of operations depending on the environment and user constraints. The flow has been demonstrate to provide up to 35% power reduction in ASICs (operating in approximate mode) that implement imaging applications such as video decoding and image restoration IPs.
    No preview · Article · Jun 2015 · Journal of Low Power Electronics
  • Source
    S V S Suresh · R Krishna Kumar · V Kamakoti

    Full-text · Dataset · Mar 2015
  • PV Torvi · Devanathan V.R. · V. Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: With increasing adoption of newer technologies and architectures targeted for automotive and aviation electronics with an objective to improve performance and/or reduce power/area, soft-error robustness is becoming an important issue to ensure reliable operation for an extended lifetime over a wide range of operating conditions. In this paper, we propose a modeling and optimization framework to systematically improve the FIT (failure-in-time) rate of a design with minimal impact on power, performance and area. We first propose a framework to model and evaluate the relative vulnerability to soft errors of the standard master-slave flip-flops and Dual Interlocked Storage Cells (DICE) in the cell library. Later, we formulate a linear optimization problem using this information to selectively replace the flip-flops so as to improve the FIT rate of the design with minimal impact on area and power. Employing the proposed technique on a popular industrial IP core shows a 32% relative improvement in the design robustness with just 2% increase in design area.
    No preview · Conference Paper · Jan 2015
  • K. Shyamala · V. Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: Logic minimization is an important aspect of the digital Computer Aided Design flows of both Application Specific Integrated Circuits and Field Programmable Gate Arrays. Peephole optimization is one of the effective logic minimization technique employed in design of digital circuits. This paper presents a novel automated peephole optimization based approach for logic minimization that interlaces commercially available ASIC-based and FPGA-based synthesis tools in an alternating fashion. Experimenting the technique on standard benchmark circuits resulted in a logic reduction of upto 59.71% and 73.40%; delay reduction of upto 36.64% and 57.78%; and, power reduction of upto 54.12% and 57.14% when compared with the output generated by current commercial state-of-the-art ASIC and FPGA synthesis tools respectively. Importantly, this technique can be adopted by design houses at no extra cost. Using the addition operation as a case study the paper also demonstrates how to use the proposed methodology to automatically design arithmetic circuits that meet different area and performance budgets.
    No preview · Article · Mar 2014 · Journal of Low Power Electronics
  • [Show abstract] [Hide abstract]
    ABSTRACT: Excessive power dissipation can cause high voltage droop on the power grid, leading to timing failures. Since test power dissipation is typically higher than functional power, test peak power minimization becomes very important in order to avoid test induced timing failures. Test cubes for large designs are usually dominated by don't care bits, making X-leveraging algorithms promising for test power reduction. In this paper, we show that X-bit statistics can be used to reorder test vectors on scan based architectures realized using toggle-masking flip flops. Based on this, the paper also presents an algorithm namely balanced X-filling that when applied to ITC'99 circuits, reduced the peak capture power by 7.4% on the average and 40.3% in the best case. Additionally XStat improved the running time for Test Vector Ordering and X-filling phases compared to the best known techniques.
    No preview · Article · Mar 2014 · Journal of Low Power Electronics
  • [Show abstract] [Hide abstract]
    ABSTRACT: With increasing integration of capabilities into mobile application processors, a host of imaging operations that were earlier performed in software are now implemented in hardware. Though imaging applications are inherently error resilient, the complexity of such designs has increased over time and thus identifying logic that can be leveraged for energy-quality trade-offs has become difficult. The paper proposes a Progressive Configuration Aware (ProCA) criticality analysis framework, that is 10X faster than the state-of-the-art, to identify logic which is functionally-critical to output quality. This accounts for the various modes of operation of the design. Through such a framework, we demonstrate how a low powered tunable stochastic design can be derived. The proposed methodology uses layered synthesis and voltage scaling mechanisms as primary tools for power reduction. We demonstrate the proposed methodology on a production quality imaging IP implemented in 28nm low leakage technology. For the tunable stochastic imaging IP, we gain up to 10.57% power reduction in exact mode and up to 32.53% power reduction in error tolerant mode (30dB PSNR), with negligible design overhead.
    No preview · Conference Paper · Jan 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many techniques are reported in the literature for controlling power and temperature on a chip. This paper presents a technique that uses optimal supply and body-bias voltage assignment to establish the same. The technique is guided by novel analytical models proposed in this paper that yield accurate estimations of delay and power for multi-V t designs in the presence of process-strength variations and temperature. These models incorporate base-line components of gate delay, interconnect delay, leakage power and dynamic power per V t -class, which makes them accurate and scalable across designs. Further, to the best of our knowledge, the model proposed in this paper is the first reported in the literature that considers leakage through body-bias pin. This has resulted in high accuracy in leakage power estimation. Estimation of timing and power using the proposed model on a post-layout ARM® processor block implemented using mixed-V t cells (SVT and LVT) running at 1.25 GHz with TI® 28 nm technology showed 3% RMS Error when compared with those reported using detailed timing and power analysis tool. With the use of these models along with a standard optimizer, it is shown that the optimized supply and body-bias voltage assignment for ARM® processor block provides significant upto 39% (30%) run-time power reduction in the presence (absence) of on-chip temperature sensors when compared with ASV (a static process-strength aware adaptive supply voltage) conditions. Employing the proposed scheme on 3D Chip Multi-processor (CMP) designs at worst-case operating conditions (on-line system testing) show >35% power reduction and >8 °C peak temperature reduction over and above what is achieved by employing thermal and power management techniques reported in the literature.
    No preview · Article · Aug 2013 · Journal of Low Power Electronics
  • [Show abstract] [Hide abstract]
    ABSTRACT: High power consumption during test may lead to yield loss and premature aging. In particular, excessive peak power consumption during at-speed delay fault testing represents an important issue. In the literature, several techniques have been proposed to reduce peak power consumption during at-speed LOC or LOS delay testing. On the other side, limiting too much the power consumption during test may reduce the defect coverage. Hence, techniques for identifying upper and lower functional power limits are crucial for delay fault testing. Yet, the task of computing the maximum functional peak power achievable by CPU cores is challenging, since the functional patterns with maximum peak power depend on specific instruction execution order and operands. In this paper, we present a methodology combining neural networks and evolutionary computing for quickly estimating peak power consumption. The method is used within an algorithm for automatic functional program generation used to identify test programs with maximal functional peak power consumption, which are suitable for defining peak power limits under test. The proposed approach was applied on the Intel 8051 CPU core synthesized with a 65 nm industrial technology reducing significant time with respect to old methods.
    No preview · Article · Aug 2013 · Journal of Low Power Electronics
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conventional ATPG tools help in detecting only the equivalence class to which a fault belongs and not the fault itself. This paper presents PinPoint, a technique that further divides the equivalence class into smaller sets based on the capture power consumed by the circuit under test in the presence of different faults in it, thus aiding in narrowing down on the fault. Applying the technique on ITC benchmark circuits yielded significant improvement in diagnostic resolution.
    No preview · Conference Paper · May 2013
  • S. Srinivasan · V. Kamakoti · A. Bhattacharya
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA microarrays are used extensively for biochemical analysis that includes genomics and drug discovery. This increased usage demands large microarrays, thus complicating their computer aided design (CAD) and manufacturing methodologies. One such time-consuming design problem is to minimize the border length of masks used during the manufacture of microarrays. From the manufacturing point of view the border length of masks is one of the crucial parameters determining the reliability of the microarray. This article presents a novel algorithm for synthesis (placement and embedding) of microarrays, which consumes significantly less time than the best algorithm reported in the literature, while maintaining the quality (border length of masks) of the result. The proposed technique uses only a part of each probe to decide on the placement and the remaining parts for deciding on the embedding sequence. This is in contrast to the earlier methods that considered the entire probe for both placement and embedding. The second novelty of the proposed technique is the preclassification (prior to placement and embedding) of probes based on their prefixes. This decreases the complexity of the problem of deciding the next probe to be placed from that involving computation of Hamming distance between all probes (as used in earlier approaches) to the one involving searching of nonempty cells on a constant size grid array. The proposed algorithm is 43x faster than the best reported in the literature for the case of synthesizing a microarray with 250,000 probes and further exhibits linear behavior in terms of computation time for larger microarrays.
    No preview · Article · Feb 2013 · ACM Journal on Emerging Technologies in Computing Systems
  • SVS Suresh · Krishna R Kumar · V Kamakoti

    No preview · Conference Paper · Jan 2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: With increasing computing power in mobile devices, conserving battery power (or extending battery life) has become crucial. This together with the fact that most applications running on these mobile devices are increasingly error tolerant, has created immense interest in stochastic (or inexact) computing. In this paper, we present a framework wherein, the devices can operate at varying error tolerant modes while significantly reducing the power dissipated. Further, in very deep sub-micron technologies, temperature has a crucial role in both performance and power. The proposed framework presents a novel layered synthesis optimization coupled with temperature aware supply and body bias voltage scaling to operate the design at various “tunable” error tolerant modes. We implement the proposed technique on a H.264 decoder block in industrial 28nm low leakage technology node, and demonstrate reductions in total power varying from 30% to 45%, while changing the operating mode from exact computing to inaccurate/error-tolerant computing.
    No preview · Conference Paper · Jan 2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: System test and online test techniques are aggressively being used in today's SoCs for improved test quality and reliability (e.g., aging/soft-error robustness). With gaining popularity of vertical integration such as 2.5D and 3D, in the semiconductor industry, ensuring thermal safety of SoCs during these test modes poses a challenge. In this paper, we propose a dynamic test scheduling mechanism for system tests and/or online test that uses dynamic feedback from on-chip thermal sensors to control temperature during shift (or scan) and capture, thereby ensuring thermal-safe conditions while applying the test patterns. The proposed technique is a closed loop test application scheme that eliminates the need for separate thermal simulation of test patterns at design stage. The technique also enables granular field-level configuration of thermal limits, so that different units across multiple cores are subjected to customized thermal profiles. Results from implementation of the proposed schemes on a 4-layer, 16-core, 12.8 million gates, OpenSparc S1 processor subsystem are presented.
    Full-text · Article · Dec 2012 · Journal of Low Power Electronics
  • [Show abstract] [Hide abstract]
    ABSTRACT: In Digital ICs, energy consumed in scan test cycles is known to be higher than that consumed during functional cycles. Scan-cell reordering (SCR) is a popular technique to reduce test energy consumption. Conventional SCR techniques use the number of toggles in the scan flip-flops as cost criteria for reordering. The energy consumed during the scan test cycles includes that consumed by the logic and that consumed by the scan-chain. Interconnects contribute to more than 50% of the scan-chain energy consumption. Motivated by this, the paper proposes an SCR technique that uses the wire capacitances, in addition to the toggle criteria to perform the reordering. Results obtained by employing the technique on ISCAS89 benchmarks and OpenCores show a reduction in total scan-shift energy of up to 32% and a 11 × reduction in total scan-chain wire length. It is interesting to note that just applying the SCR without considering the interconnect capacitances may lead to increase in scan-chain energy consumption in some cases. Additionally, we observe that a significant portion of the total scan-shift power comes from the first-level capacitance, contributed by both interconnects and input capacitances of gates at the first level of logic. Using this, we show that first level capacitance gating , which gates this switching capacitances of the flop-logic interconnect and first-level gates during scan-shift saves power significantly over first-level supply gating. Combining the above two methods, when applied to ISCAS89 and OpenCores benchmark circuits, we get 62% total scan-shift energy savings with a delay penalty of 3%, on the average on the functional performance of the circuit, compared to the best known algorithm.
    No preview · Article · Aug 2012 · Journal of Low Power Electronics

  • No preview · Conference Paper · May 2012
  • Source
    S V S Suresh · R Krishna Kumar · V Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a novel design of a portable low cost 3 Lead wireless wearable ECG device.
    Full-text · Conference Paper · May 2012
  • Source
    S V S Suresh · R Krishna Kumar · V Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a novel design of a portable low cost 3 Lead wireless wearable ECG device. This device acquires the raw ECG signal using three electrodes placed on the chest of the subject. An analog circuit board conditions this raw signal which is then sent to the microcontroller for further filtration of artifacts. A novel method is used for phase compensation. Bluetooth module receives the filtered data from the microcontroller and transmits it to the user's phone where the ECG is displayed and simultaneously stored in text and JPEG format. The recorded JPEG image can be transmitted to the doctor's mobile phone via MMS and the latter can give an instant feedback. The analog front-end (AFE) module is designed using low cost reliable components. TI's MSP430F47186 micro-controller runs the digital filters written in C language. The Bluetooth module " WT12 " is from Bluegiga and the system requires 3 volts for operation. The Android application was written in Java for acquiring, plotting and storing the data in the phone's SD card, in text and JPEG formats. Samsung Galaxy Fit android phone was used for the prototype design. All through the system design, cost incurred was kept to a minimum.
    Full-text · Conference Paper · May 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Buffers in on-chip networks constitute a significant proportion of the power consumption and area of the interconnect, and hence reducing them is an important problem. Application-specific designs have nonuniform network utilization, thereby requiring a buffer-sizing approach that tackles the nonuniformity. Also, congestion effects that occur during network operation need to be captured when sizing the buffers. Many NoCs are designed to operate in multiple voltage/frequency islands, with interisland communication taking place through frequency converters. To this end, we propose a two-phase algorithm to size the switch buffers in network-on-chips (NoCs) considering support for multiple-frequency islands. Our algorithm considers both the static and dynamic effects when sizing buffers. We analyze the impact of placing frequency converters (FCs) on a link, as well as pack and send units that effectively utilize network bandwidth. Experiments on many realistic system-on-Chip (SoC) benchmark show that our algorithm results in 42% reduction in amount of buffering when compared to a standard buffering approach.
    Full-text · Article · Feb 2012 · Journal of Electrical and Computer Engineering

Publication Stats

406 Citations
24.88 Total Impact Points

Institutions

  • 1992-2014
    • Indian Institute of Technology Madras
      • Department of Computer Science and Engineering
      Chennai, Tamil Nadu, India
  • 2011
    • Chennai Institute Of Technology
      Chennai, Tamil Nadu, India
  • 2003
    • Indian Institute of Technology Ropar
      • Department of Computer Science and Engineering
      Rūpar, Punjab, India
  • 1997-1999
    • Chennai Mathematical Institute
      Chennai, Tamil Nādu, India