V. Kamakoti

Indian Institute of Technology Madras, Chennai, Tamil Nadu, India

Are you V. Kamakoti?

Claim your profile

Publications (81)15.84 Total impact

  • PV Torvi, Devanathan V.R., V. Kamakoti
    VLSI Design (VLSID), 2015 28th International Conference on, 381-386; 01/2015
  • K. Shyamala, V. Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: Logic minimization is an important aspect of the digital Computer Aided Design flows of both Application Specific Integrated Circuits and Field Programmable Gate Arrays. Peephole optimization is one of the effective logic minimization technique employed in design of digital circuits. This paper presents a novel automated peephole optimization based approach for logic minimization that interlaces commercially available ASIC-based and FPGA-based synthesis tools in an alternating fashion. Experimenting the technique on standard benchmark circuits resulted in a logic reduction of upto 59.71% and 73.40%; delay reduction of upto 36.64% and 57.78%; and, power reduction of upto 54.12% and 57.14% when compared with the output generated by current commercial state-of-the-art ASIC and FPGA synthesis tools respectively. Importantly, this technique can be adopted by design houses at no extra cost. Using the addition operation as a case study the paper also demonstrates how to use the proposed methodology to automatically design arithmetic circuits that meet different area and performance budgets.
    Journal of Low Power Electronics 03/2014; 10(1). DOI:10.1166/jolpe.2014.1291
  • [Show abstract] [Hide abstract]
    ABSTRACT: Excessive power dissipation can cause high voltage droop on the power grid, leading to timing failures. Since test power dissipation is typically higher than functional power, test peak power minimization becomes very important in order to avoid test induced timing failures. Test cubes for large designs are usually dominated by don't care bits, making X-leveraging algorithms promising for test power reduction. In this paper, we show that X-bit statistics can be used to reorder test vectors on scan based architectures realized using toggle-masking flip flops. Based on this, the paper also presents an algorithm namely balanced X-filling that when applied to ITC'99 circuits, reduced the peak capture power by 7.4% on the average and 40.3% in the best case. Additionally XStat improved the running time for Test Vector Ordering and X-filling phases compared to the best known techniques.
    Journal of Low Power Electronics 03/2014; 10(1). DOI:10.1166/jolpe.2014.1302
  • [Show abstract] [Hide abstract]
    ABSTRACT: With increasing integration of capabilities into mobile application processors, a host of imaging operations that were earlier performed in software are now implemented in hardware. Though imaging applications are inherently error resilient, the complexity of such designs has increased over time and thus identifying logic that can be leveraged for energy-quality trade-offs has become difficult. The paper proposes a Progressive Configuration Aware (ProCA) criticality analysis framework, that is 10X faster than the state-of-the-art, to identify logic which is functionally-critical to output quality. This accounts for the various modes of operation of the design. Through such a framework, we demonstrate how a low powered tunable stochastic design can be derived. The proposed methodology uses layered synthesis and voltage scaling mechanisms as primary tools for power reduction. We demonstrate the proposed methodology on a production quality imaging IP implemented in 28nm low leakage technology. For the tunable stochastic imaging IP, we gain up to 10.57% power reduction in exact mode and up to 32.53% power reduction in error tolerant mode (30dB PSNR), with negligible design overhead.
    Proceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many techniques are reported in the literature for controlling power and temperature on a chip. This paper presents a technique that uses optimal supply and body-bias voltage assignment to establish the same. The technique is guided by novel analytical models proposed in this paper that yield accurate estimations of delay and power for multi-V t designs in the presence of process-strength variations and temperature. These models incorporate base-line components of gate delay, interconnect delay, leakage power and dynamic power per V t -class, which makes them accurate and scalable across designs. Further, to the best of our knowledge, the model proposed in this paper is the first reported in the literature that considers leakage through body-bias pin. This has resulted in high accuracy in leakage power estimation. Estimation of timing and power using the proposed model on a post-layout ARM® processor block implemented using mixed-V t cells (SVT and LVT) running at 1.25 GHz with TI® 28 nm technology showed 3% RMS Error when compared with those reported using detailed timing and power analysis tool. With the use of these models along with a standard optimizer, it is shown that the optimized supply and body-bias voltage assignment for ARM® processor block provides significant upto 39% (30%) run-time power reduction in the presence (absence) of on-chip temperature sensors when compared with ASV (a static process-strength aware adaptive supply voltage) conditions. Employing the proposed scheme on 3D Chip Multi-processor (CMP) designs at worst-case operating conditions (on-line system testing) show >35% power reduction and >8 °C peak temperature reduction over and above what is achieved by employing thermal and power management techniques reported in the literature.
    Journal of Low Power Electronics 08/2013; 9(2):207-228. DOI:10.1166/jolpe.2013.1254
  • [Show abstract] [Hide abstract]
    ABSTRACT: High power consumption during test may lead to yield loss and premature aging. In particular, excessive peak power consumption during at-speed delay fault testing represents an important issue. In the literature, several techniques have been proposed to reduce peak power consumption during at-speed LOC or LOS delay testing. On the other side, limiting too much the power consumption during test may reduce the defect coverage. Hence, techniques for identifying upper and lower functional power limits are crucial for delay fault testing. Yet, the task of computing the maximum functional peak power achievable by CPU cores is challenging, since the functional patterns with maximum peak power depend on specific instruction execution order and operands. In this paper, we present a methodology combining neural networks and evolutionary computing for quickly estimating peak power consumption. The method is used within an algorithm for automatic functional program generation used to identify test programs with maximal functional peak power consumption, which are suitable for defining peak power limits under test. The proposed approach was applied on the Intel 8051 CPU core synthesized with a 65 nm industrial technology reducing significant time with respect to old methods.
    Journal of Low Power Electronics 08/2013; 9(2):264-274. DOI:10.1166/jolpe.2013.1255
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conventional ATPG tools help in detecting only the equivalence class to which a fault belongs and not the fault itself. This paper presents PinPoint, a technique that further divides the equivalence class into smaller sets based on the capture power consumed by the circuit under test in the presence of different faults in it, thus aiding in narrowing down on the fault. Applying the technique on ITC benchmark circuits yielded significant improvement in diagnostic resolution.
    Test Symposium (ETS), 2013 18th IEEE European; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: With increasing computing power in mobile devices, conserving battery power (or extending battery life) has become crucial. This together with the fact that most applications running on these mobile devices are increasingly error tolerant, has created immense interest in stochastic (or inexact) computing. In this paper, we present a framework wherein, the devices can operate at varying error tolerant modes while significantly reducing the power dissipated. Further, in very deep sub-micron technologies, temperature has a crucial role in both performance and power. The proposed framework presents a novel layered synthesis optimization coupled with temperature aware supply and body bias voltage scaling to operate the design at various “tunable” error tolerant modes. We implement the proposed technique on a H.264 decoder block in industrial 28nm low leakage technology node, and demonstrate reductions in total power varying from 30% to 45%, while changing the operating mode from exact computing to inaccurate/error-tolerant computing.
    Quality Electronic Design (ASQED), 2013 5th Asia Symposium on; 01/2013
  • Source
    Journal of Low Power Electronics 12/2012; 8(5):684-695. DOI:10.1166/jolpe.2012.1226
  • [Show abstract] [Hide abstract]
    ABSTRACT: In Digital ICs, energy consumed in scan test cycles is known to be higher than that consumed during functional cycles. Scan-cell reordering (SCR) is a popular technique to reduce test energy consumption. Conventional SCR techniques use the number of toggles in the scan flip-flops as cost criteria for reordering. The energy consumed during the scan test cycles includes that consumed by the logic and that consumed by the scan-chain. Interconnects contribute to more than 50% of the scan-chain energy consumption. Motivated by this, the paper proposes an SCR technique that uses the wire capacitances, in addition to the toggle criteria to perform the reordering. Results obtained by employing the technique on ISCAS89 benchmarks and OpenCores show a reduction in total scan-shift energy of up to 32% and a 11 × reduction in total scan-chain wire length. It is interesting to note that just applying the SCR without considering the interconnect capacitances may lead to increase in scan-chain energy consumption in some cases. Additionally, we observe that a significant portion of the total scan-shift power comes from the first-level capacitance, contributed by both interconnects and input capacitances of gates at the first level of logic. Using this, we show that first level capacitance gating , which gates this switching capacitances of the flop-logic interconnect and first-level gates during scan-shift saves power significantly over first-level supply gating. Combining the above two methods, when applied to ISCAS89 and OpenCores benchmark circuits, we get 62% total scan-shift energy savings with a delay penalty of 3%, on the average on the functional performance of the circuit, compared to the best known algorithm.
    Journal of Low Power Electronics 08/2012; 8(4):516-525. DOI:10.1166/jolpe.2012.1212
  • [Show abstract] [Hide abstract]
    ABSTRACT: Buffers in on-chip networks constitute a significant proportion of the power consumption and area of the interconnect, and hence reducing them is an important problem. Application-specific designs have nonuniform network utilization, thereby requiring a buffer-sizing approach that tackles the nonuniformity. Also, congestion effects that occur during network operation need to be captured when sizing the buffers. Many NoCs are designed to operate in multiple voltage/frequency islands, with interisland communication taking place through frequency converters. To this end, we propose a two-phase algorithm to size the switch buffers in network-on-chips (NoCs) considering support for multiple-frequency islands. Our algorithm considers both the static and dynamic effects when sizing buffers. We analyze the impact of placing frequency converters (FCs) on a link, as well as pack and send units that effectively utilize network bandwidth. Experiments on many realistic system-on-Chip (SoC) benchmark show that our algorithm results in 42% reduction in amount of buffering when compared to a standard buffering approach.
    01/2012; 2012. DOI:10.1155/2012/537286
  • [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of reducing active mode leakage power by modifying the post-synthesis netlists of combi- national logic blocks. The stacking effect is used to reduce leakage power, but instead of a separate signal one of the inputs to the gate itself is used. The approach is studied on multiplier blocks. It is found that a significant number of nets have high probabilities of being constant at 0 or 1. In specific applications such as those having high peak to average ratio, like audio and other signal processing applications, this effect is more pronounced. We show how these signals can be used to put gates to sleep, thus saving significant leakage power. I. INTRODUCTION
    IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2011, 4-6 July 2011, Chennai, India; 01/2011
  • Kavish Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: Error concealment in video communication is becoming increasingly important because of the growing interest in video delivery over unreliable channels such as wireless networks and the Internet. A subclass of this error concealment in video communication is known as motion vector recovery (MVR). MVR techniques try to retrieve the lost motion information in the compressed video streams based on the available information in the locality (both spatial and temporal) of the lost data. The activities and practice in the area of MVR-based error concealment during the last two decades has been mainly elaborated here. A performance comparison of the prominent MVR techniques has also been presented.
    IETE Technical Review 01/2011; DOI:10.4103/0256-4602.74509 · 0.93 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
    Advanced Parallel Processing Technologies - 9th International Symposium, APPT 2011, Shanghai, China, September 26-27, 2011. Proceedings; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This work presents a hardware implementation of an FIR filter that is self-adaptive; that responds to arbitrary frequency response landscapes; that has built-in coefficient error tolerance capabilities; and that has a minimal adaptation latency. This hardware design is based on a heuristic genetic algorithm. Experimental results show that the proposed design is more efficient than non-evolutionary designs even for arbitrary response filters. As a byproduct, the paper also presents a novel flow for the complete hardware design of what is termed as an Evolutionary System on Chip (ESoC). With the inclusion of an evolutionary process, the ESoC is a new paradigm in modern System on Chip (SoC) designs. The ESoC methodology could be a very useful structured FPGA/ASIC implementation alternative in many practical applications of FIR filters.
    Applied Soft Computing 01/2011; 11(1-11):842-854. DOI:10.1016/j.asoc.2010.01.004 · 2.68 Impact Factor
  • Kavish Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: The H.264 encoded video is highly sensitive to loss of motion vectors during transmission. Several statistical techniques are proposed for recovering such lost motion vectors. These use only the motion vectors that belong to the macroblocks that are horizontally or vertically adjacent to the lost macroblock, to recover the latter. Intuitively this is one of the main reasons behind why these techniques yield inferior solutions in scenarios where there is a non-linear motion. This paper proposes B-Spline based statistical techniques that comprehensively address the motion vector recovery problem in the presence of different types of motions that include slow, fast/sudden, continuous and non-linear movements. Testing the proposed algorithms with different benchmark video sequences shows an average improvement of up to 2 dB in the Peak Signal to Noise Ratio of some of the recovered videos, over existing techniques. A 2 dB improvement in PSNR is very significant from an application point of view.
    IEEE Transactions on Broadcasting 01/2011; 56(4-56):467 - 480. DOI:10.1109/TBC.2010.2058030 · 2.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Buffers in on-chip networks constitute a significant proportion of the power consumption and area of the interconnect. Hence, reducing the buffering overhead of Networks on Chips (NoCs) is an important problem. For application-specific designs, the network utilization across the different links and switches is non-uniform, thereby requiring a buffer sizing approach that tackles the non uniformity. Moreover, congestion effects that occur during network operation needs to be captured when sizing the buffers. To this end, we propose a two-phase algorithm to size the switch buffers in NoCs. Our algorithm considers both the static (based on bandwidth and latency requirements) and dynamic (based on simulation) effects when sizing buffers. Our experiments show that the application of the algorithm results in 42% reduction in amount of buffering required to meet the application constraints when compared to a standard buffering approach.
    IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2011, 4-6 July 2011, Chennai, India; 01/2011
  • K. Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: This study proposes a novel motion vector recovery (MVR) algorithm for the H.264 video coding standard, which takes into account the change in the motion vectors (MVs) in different directions. Existing algorithms for MVR are confined to use the horizontal or vertical directions to recover the lost MVs. However, in the presence of non-linear movements or a fast/sudden motion of any object in a scene of the given input video, the MVs recovered by these algorithms turn out to be inaccurate. The proposed directional interpolation-based technique can interpolate the MVs in any direction based on the tendency of motion around the lost macro block, thus making it suitable to handle non-linear or fast motions. Testing the proposed technique with different benchmark video sequences shows an average improvement 1-2-dB in the peak signal-to-noise ratio of the recovered video over existing techniques.
    IET Image Processing 05/2010; DOI:10.1049/iet-ipr.2008.0228 · 0.68 Impact Factor
  • Kavish Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a fast statistical approach to recover lost motion vectors in H.264 video coding standard. Unlike other video coding standards, the motion vectors of H.264 cover smaller area of the video frame being encoded. This leads to a strong correlation between neighboring motion vectors, thus making H.264 standard amenable for statistical analysis to recover the lost motion vectors. This paper proposes a Pearson Correlation Coefficient based matching algorithm that speeds up the recovery of lost motion vectors with very less compromise in visual quality of the recovered video. To the best of our knowledge, this is the first attempt that employs correlation coefficient for motion vector recovery. Experimental results obtained by employing the proposed algorithm on standard benchmark video sequences show that they yield comparable quality of recovered video with significantly less computation than the best reported in the literature, thus making it suitable for real-time applications.
    Proceedings of SPIE - The International Society for Optical Engineering 02/2010; DOI:10.1117/12.843973 · 0.20 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The usage of more advanced, less mature processes during manufacturing of semiconductor devices has increased the need for performing unconventional types of testing, like temperature-testing, in order to maintain the same high quality levels. However, performing temperature-testing is costly. This paper proposes a viable low-cost alternative to temperature testing that quantifies the impact of temperature variations on the test quality and also determines optimal test conditions. The test flow proposed is empirically validated on an industrial-standard die. The results obtained show that majority of the defects that were originally detected by temperature-testing are also detected by the proposed test flow, thereby reducing the dependence on temperature testing to achieve zero-defect quality. Details of an interesting defect behavior at cold test conditions is also presented.
    VLSI Design 2010: 23rd International Conference on VLSI Design, 9th International Conference on Embedded Systems, Bangalore, India, 3-7 January 2010; 01/2010

Publication Stats

304 Citations
15.84 Total Impact Points

Institutions

  • 2002–2013
    • Indian Institute of Technology Madras
      • Department of Computer Science and Engineering
      Chennai, Tamil Nadu, India
  • 2003–2008
    • Indian Institute of Technology Ropar
      • Department of Computer Science and Engineering
      Rūpar, Punjab, India