V. Kamakoti

Indian Institute of Technology Madras, Chennai, Tamil Nādu, India

Are you V. Kamakoti?

Claim your profile

Publications (78)7.98 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: With increasing integration of capabilities into mobile application processors, a host of imaging operations that were earlier performed in software are now implemented in hardware. Though imaging applications are inherently error resilient, the complexity of such designs has increased over time and thus identifying logic that can be leveraged for energy-quality trade-offs has become difficult. The paper proposes a Progressive Configuration Aware (ProCA) criticality analysis framework, that is 10X faster than the state-of-the-art, to identify logic which is functionally-critical to output quality. This accounts for the various modes of operation of the design. Through such a framework, we demonstrate how a low powered tunable stochastic design can be derived. The proposed methodology uses layered synthesis and voltage scaling mechanisms as primary tools for power reduction. We demonstrate the proposed methodology on a production quality imaging IP implemented in 28nm low leakage technology. For the tunable stochastic imaging IP, we gain up to 10.57% power reduction in exact mode and up to 32.53% power reduction in error tolerant mode (30dB PSNR), with negligible design overhead.
    Proceedings of the 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems; 01/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Excessive power dissipation can cause high voltage droop on the power grid, leading to timing failures. Since test power dissipation is typically higher than functional power, test peak power minimization becomes very important in order to avoid test induced timing failures. Test cubes for large designs are usually dominated by don't care bits, making X-leveraging algorithms promising for test power reduction. In this paper, we show that X-bit statistics can be used to reorder test vectors on scan based architectures realized using toggle-masking flip flops. Based on this, the paper also presents an algorithm namely balanced X-filling that when applied to ITC'99 circuits, reduced the peak capture power by 7.4% on the average and 40.3% in the best case. Additionally XStat improved the running time for Test Vector Ordering and X-filling phases compared to the best known techniques.
    Journal of Low Power Electronics 01/2014; 10(1).
  • K. Shyamala, V. Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: Logic minimization is an important aspect of the digital Computer Aided Design flows of both Application Specific Integrated Circuits and Field Programmable Gate Arrays. Peephole optimization is one of the effective logic minimization technique employed in design of digital circuits. This paper presents a novel automated peephole optimization based approach for logic minimization that interlaces commercially available ASIC-based and FPGA-based synthesis tools in an alternating fashion. Experimenting the technique on standard benchmark circuits resulted in a logic reduction of upto 59.71% and 73.40%; delay reduction of upto 36.64% and 57.78%; and, power reduction of upto 54.12% and 57.14% when compared with the output generated by current commercial state-of-the-art ASIC and FPGA synthesis tools respectively. Importantly, this technique can be adopted by design houses at no extra cost. Using the addition operation as a case study the paper also demonstrates how to use the proposed methodology to automatically design arithmetic circuits that meet different area and performance budgets.
    Journal of Low Power Electronics 01/2014; 10(1).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Conventional ATPG tools help in detecting only the equivalence class to which a fault belongs and not the fault itself. This paper presents PinPoint, a technique that further divides the equivalence class into smaller sets based on the capture power consumed by the circuit under test in the presence of different faults in it, thus aiding in narrowing down on the fault. Applying the technique on ITC benchmark circuits yielded significant improvement in diagnostic resolution.
    Test Symposium (ETS), 2013 18th IEEE European; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: With increasing computing power in mobile devices, conserving battery power (or extending battery life) has become crucial. This together with the fact that most applications running on these mobile devices are increasingly error tolerant, has created immense interest in stochastic (or inexact) computing. In this paper, we present a framework wherein, the devices can operate at varying error tolerant modes while significantly reducing the power dissipated. Further, in very deep sub-micron technologies, temperature has a crucial role in both performance and power. The proposed framework presents a novel layered synthesis optimization coupled with temperature aware supply and body bias voltage scaling to operate the design at various “tunable” error tolerant modes. We implement the proposed technique on a H.264 decoder block in industrial 28nm low leakage technology node, and demonstrate reductions in total power varying from 30% to 45%, while changing the operating mode from exact computing to inaccurate/error-tolerant computing.
    Quality Electronic Design (ASQED), 2013 5th Asia Symposium on; 01/2013
  • Source
    Journal of Low Power Electronics 12/2012; 8(5):684-695.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Buffers in on-chip networks constitute a significant proportion of the power consumption and area of the interconnect, and hence reducing them is an important problem. Application-specific designs have nonuniform network utilization, thereby requiring a buffer-sizing approach that tackles the nonuniformity. Also, congestion effects that occur during network operation need to be captured when sizing the buffers. Many NoCs are designed to operate in multiple voltage/frequency islands, with interisland communication taking place through frequency converters. To this end, we propose a two-phase algorithm to size the switch buffers in network-on-chips (NoCs) considering support for multiple-frequency islands. Our algorithm considers both the static and dynamic effects when sizing buffers. We analyze the impact of placing frequency converters (FCs) on a link, as well as pack and send units that effectively utilize network bandwidth. Experiments on many realistic system-on-Chip (SoC) benchmark show that our algorithm results in 42% reduction in amount of buffering when compared to a standard buffering approach.
    Journal of Electrical and Computer Engineering. 01/2012; 2012.
  • Kavish Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: Error concealment in video communication is becoming increasingly important because of the growing interest in video delivery over unreliable channels such as wireless networks and the Internet. A subclass of this error concealment in video communication is known as motion vector recovery (MVR). MVR techniques try to retrieve the lost motion information in the compressed video streams based on the available information in the locality (both spatial and temporal) of the lost data. The activities and practice in the area of MVR-based error concealment during the last two decades has been mainly elaborated here. A performance comparison of the prominent MVR techniques has also been presented.
    IETE Technical Review 01/2011; · 0.71 Impact Factor
  • K. Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: The H.264 encoded video is highly sensitive to loss of motion vectors during transmission. Several statistical techniques are proposed for recovering such lost motion vectors. These use only the motion vectors that belong to the macroblocks that are horizontally or vertically adjacent to the lost macroblock, to recover the latter. Intuitively this is one of the main reasons behind why these techniques yield inferior solutions in scenarios where there is a non-linear motion. This paper proposes B-Spline based statistical techniques that comprehensively address the motion vector recovery problem in the presence of different types of motions that include slow, fast/sudden, continuous and non-linear movements. Testing the proposed algorithms with different benchmark video sequences shows an average improvement of up to 2 dB in the Peak Signal to Noise Ratio of some of the recovered videos, over existing techniques. A 2 dB improvement in PSNR is very significant from an application point of view.
    IEEE Transactions on Broadcasting 01/2011; · 2.09 Impact Factor
  • Karthik Raghavan, V. Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: In the nanometer era, there has been a steady decline in the semiconductor chip manufacturing yield due to various contributing factors, such as wearout and defects due to complex processes. One of the strategies to alleviate this issue is to recover and use faulty hardware at gracefully degraded performance. A common, though naive, recovery strategy followed in the context of general purpose multicore systems is to disable the cores with faults and use only the fully functional cores. Such a coarse-granular solution is suboptimal, as the disabled cores would have many working modules which go un-utilized. The Resurrecting Operating SYstem (ROSY) presented in this paper is a step towards the development of an operating system that can work on faulty cores by adapting itself to hardware faults using software workarounds, and and utilize their working components. We consider many realistic fault models and present software workarounds for them. We have developed a framework which can be trivially plugged into a fullyfeatured x86 based OS kernel to demonstrate the feasibility of the proposed ideas. Performance evaluation using SPEC benchmarks and real-world applications show that the performance degradation of the depleted cores executing ROSY is on an average between 1.6x to 4x, depending on the fault type.
    Operating Systems Review. 01/2011; 45:82-84.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Buffers in on-chip networks constitute a significant proportion of the power consumption and area of the interconnect. Hence, reducing the buffering overhead of Networks on Chips (NoCs) is an important problem. For application-specific designs, the network utilization across the different links and switches is non-uniform, thereby requiring a buffer sizing approach that tackles the non uniformity. Moreover, congestion effects that occur during network operation needs to be captured when sizing the buffers. To this end, we propose a two-phase algorithm to size the switch buffers in NoCs. Our algorithm considers both the static (based on bandwidth and latency requirements) and dynamic (based on simulation) effects when sizing buffers. Our experiments show that the application of the algorithm results in 42% reduction in amount of buffering required to meet the application constraints when compared to a standard buffering approach.
    IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2011, 4-6 July 2011, Chennai, India; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
    Advanced Parallel Processing Technologies - 9th International Symposium, APPT 2011, Shanghai, China, September 26-27, 2011. Proceedings; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: We consider the problem of reducing active mode leakage power by modifying the post-synthesis netlists of combi- national logic blocks. The stacking effect is used to reduce leakage power, but instead of a separate signal one of the inputs to the gate itself is used. The approach is studied on multiplier blocks. It is found that a significant number of nets have high probabilities of being constant at 0 or 1. In specific applications such as those having high peak to average ratio, like audio and other signal processing applications, this effect is more pronounced. We show how these signals can be used to put gates to sleep, thus saving significant leakage power. I. INTRODUCTION
    IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2011, 4-6 July 2011, Chennai, India; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This work presents a hardware implementation of an FIR filter that is self-adaptive; that responds to arbitrary frequency response landscapes; that has built-in coefficient error tolerance capabilities; and that has a minimal adaptation latency. This hardware design is based on a heuristic genetic algorithm. Experimental results show that the proposed design is more efficient than non-evolutionary designs even for arbitrary response filters. As a byproduct, the paper also presents a novel flow for the complete hardware design of what is termed as an Evolutionary System on Chip (ESoC). With the inclusion of an evolutionary process, the ESoC is a new paradigm in modern System on Chip (SoC) designs. The ESoC methodology could be a very useful structured FPGA/ASIC implementation alternative in many practical applications of FIR filters.
    Applied Soft Computing. 01/2011;
  • K. Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: This study proposes a novel motion vector recovery (MVR) algorithm for the H.264 video coding standard, which takes into account the change in the motion vectors (MVs) in different directions. Existing algorithms for MVR are confined to use the horizontal or vertical directions to recover the lost MVs. However, in the presence of non-linear movements or a fast/sudden motion of any object in a scene of the given input video, the MVs recovered by these algorithms turn out to be inaccurate. The proposed directional interpolation-based technique can interpolate the MVs in any direction based on the tendency of motion around the lost macro block, thus making it suitable to handle non-linear or fast motions. Testing the proposed technique with different benchmark video sequences shows an average improvement 1-2-dB in the peak signal-to-noise ratio of the recovered video over existing techniques.
    IET Image Processing 05/2010; · 0.90 Impact Factor
  • Kavish Seth, V. Kamakoti, S. Srinivasan
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a fast statistical approach to recover lost motion vectors in H.264 video coding standard. Unlike other video coding standards, the motion vectors of H.264 cover smaller area of the video frame being encoded. This leads to a strong correlation between neighboring motion vectors, thus making H.264 standard amenable for statistical analysis to recover the lost motion vectors. This paper proposes a Pearson Correlation Coefficient based matching algorithm that speeds up the recovery of lost motion vectors with very less compromise in visual quality of the recovered video. To the best of our knowledge, this is the first attempt that employs correlation coefficient for motion vector recovery. Experimental results obtained by employing the proposed algorithm on standard benchmark video sequences show that they yield comparable quality of recovered video with significantly less computation than the best reported in the literature, thus making it suitable for real-time applications.
    Proc SPIE 02/2010;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The usage of more advanced, less mature processes during manufacturing of semiconductor devices has increased the need for performing unconventional types of testing, like temperature-testing, in order to maintain the same high quality levels. However, performing temperature-testing is costly. This paper proposes a viable low-cost alternative to temperature testing that quantifies the impact of temperature variations on the test quality and also determines optimal test conditions. The test flow proposed is empirically validated on an industrial-standard die. The results obtained show that majority of the defects that were originally detected by temperature-testing are also detected by the proposed test flow, thereby reducing the dependence on temperature testing to achieve zero-defect quality. Details of an interesting defect behavior at cold test conditions is also presented.
    VLSI Design 2010: 23rd International Conference on VLSI Design, 9th International Conference on Embedded Systems, Bangalore, India, 3-7 January 2010; 01/2010
  • Lavanya Jagan, V Kamakoti
    [Show abstract] [Hide abstract]
    ABSTRACT: In a highly dynamic semiconductor manufacturing technology, the migration to the next process technology node tries to create a smaller-faster-cheaper chip. To achieve this, the integrated circuit (IC) designs are becoming more complex and new manufacturing materials/processes are introduced. Additionally, demands for sustainable yield and shorter time to volume production have to be met as they govern the profitability of the semiconductor industry. As a result, rapid yield improvement techniques are crucial to overcome huge yield losses. Any yield improvement technique typically involves detecting the yield loss (testing) and identifying its root cause (diagnosis). This paper presents a survey of the existing research work reported in the literature for test and diagnostics of nanometer chips and presents interesting open issues.
    IETE Technical Review 01/2010; · 0.71 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Volume Yield Diagnostics (VYD) is crucial to diagnose critical systematic yield issues from the reports obtained by testing thousands of chips. This paper presents an efficient clustering technique for VYD that has been shown to work successfully both in the simulation environment as well as on real industrial failure data.
    VLSI Design, 2009 22nd International Conference on; 02/2009
  • K. Shyamala, J. Vimalkumar, V. Kamakoti
    J. Low Power Electronics. 01/2009; 5:429-438.