About
19
Publications
2,345
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
188
Citations
Citations since 2017
Introduction
Publications
Publications (19)
Coarse Grain Reconfigurable Architectures(CGRA) support Spatial and Temporal computation to speedup execution and reduce reconfiguration
time. Thus compilation involves partitioning instructions spatially and scheduling them temporally. We extend Edge-Betweenness
Centrality scheme, originally used for detecting community structures in social and bi...
Coarse Grain Reconfigurable Architectures (CGRA) support spatial and temporal computation to speedup execution and reduce reconfiguration time. Thus compilation involves partitioning instructions spatially and scheduling them temporally. The task of partitioning is governed by the opposing forces of being able to expose as much parallelism as possi...
3GPP LongTerm Evolution (LTE) is targeted towards variable transmission bandwidths to improve universal mobile telecommunications. This requires support for variable point FFT/IFFT (128 through 4096). ASIC solutions, one for a particular FFT is not cost-effective, while DSP solutions are not performance-effective. We hence provide a solution on RED...
In the world of high performance computing huge efforts have been put to accelerate Numerical Linear Algebra (NLA) kernels like QR Decomposition (QRD) with the added advantage of reconfigurability and scalability. While popular custom hardware solution in form of systolic arrays can deliver high performance, they are not scalable, and hence not com...
Numerical Linear Algebra (NLA) kernels are at the heart of all computational problems. These kernels require hardware acceleration for increased throughput. NLA Solvers for dense and sparse matrices differ in the way the matrices are stored and operated upon although they exhibit similar computational properties. While ASIC solutions for NLA Solver...
In Dynamically Reconfigurable Processors (DRPs), compilation involves breaking an application into sub-tasks for piecewise execution on the fabric. These sub-tasks are sequenced based on data and control dependences. In DRPs, sub-task prefetching is used to hide the reconfiguration time while another sub-task executes. In REDEFINE, our target DRP,...
REDEFINE is a runtime reconfigurable hardware platform. In this paper, we trace the development of a runtime reconfigurable hardware from a general purpose processor, by eliminating certain characteristics such as: Register files and Bypass network. We instead allow explicit write backs to the reservation stations as in Transport Triggered Architec...
Emerging embedded applications are based on evolving standards (e.g., MPEG2/4, H.264/265, IEEE802.11a/b/g/n). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilizat...
This paper reports the design of an input-triggered polymorphic ASIC for H.264 baseline decoder. Hardware polymorphism is achieved by selectively reusing hardware resources at system and module level. Complete design is done using ESL design tools following a methodology that maintains consistency in testing and verification throughout the design f...
In this paper we develop compilation techniques for the realization of applications described in a High Level Language (HLL)
onto a Runtime Reconfigurable Architecture. The compiler determines Hyper Operations (HyperOps) that are subgraphs of a data
flow graph (of an application) and comprise elementary operations that have strong producer-consumer...
In this paper we explore an implementation of a high-throughput, streaming application on REDEFINE-v2, which is an enhancement of REDEFINE. REDEFINE is a polymorphic ASIC combining the flexibility of a programmable solution with the execution speed of an ASIC. In REDEFINE Compute Elements are arranged in an 8x8 grid connected via a Network on Chip...
Building flexible constraint length Viterbi decoders requires us to be able to realize de Bruijn networks of various sizes on the physically provided interconnection network. This paper considers the case when the physical network is itself a de Bruijn network and presents a scalable technique for realizing any n-node de Bruijn network on an N-node...
A polymorphic ASIC is a runtime reconfigurable hardware substrate comprising compute and communication elements. It is a ldquofuture proofrdquo custom hardware solution for multiple applications and their derivatives in a domain. Interoperability between application derivatives at runtime is achieved through hardware reconfiguration. In this paper...
Run-time interoperability between different applications based on H.264/AVC is an emerging need in networked infotainment, where media delivery must match the desired resolution and quality of the end terminals. In this paper, we describe the architecture and design of a polymorphic ASIC to support this. The H.264 decoding flow is partitioned into...
Application accelerators are predominantly ASICs. The cost of ASIC solutions are order of magnitudes higher than programmable processing cores. Despite this, ASIC solutions are preferred when both high performance and low power is the target. ASICs offer no flexibility in terms of it being able to cater to application derivatives, unless this has b...
In this paper we propose the architecture of a SoC fabric onto which applications described in a HLL are synthesized. The fabric is a homogeneous layout of computation, storage and communication resources on silicon. Through a process of composition of resources (as opposed to decomposition of applications), application specific computational struc...
We provide a hardware realization of motion compensation and reconstruction (MCR) module for H.264 baseline profile. We synthesize the MCR module using UMC library in 0.13 mu CMOS technology. Our implementation occupies an area of 94756 gates and operates at a frequency of 250 MHz.
H.264 video standard achieves high quality video along with high data compression when compared to other existing video standards. H.264 uses context-based adaptive variable length coding (CAVLC) to code residual data in baseline profile. In this paper, the authors describe a novel architecture for CAVLC decoder, including coeff_token decoder, leve...
Emerging embedded applications are based on evolving standards (ex. MPEG2/4, H.264/265, IEEE802.11a/b/g). Since most of these applications run on handheld devices, there is an increasing need for a single chip solution that can dynamically interoperate between different standards and their derivatives. In order to achieve high resource utilization,...