About
63
Publications
19,000
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,026
Citations
Introduction
Current institution
Additional affiliations
May 2022 - January 2024
ST Engineering
Position
- Head Research and Collaboration and Staff URS Tech Office
July 2018 - May 2022
September 2014 - December 2015
Education
January 2008 - August 2014
Publications
Publications (63)
Smart mobility is a key component in smart city initiatives that are currently being explored around the world by authorities, industry players, and academics alike. The term smart mobility encompasses many facets of mobility, including improving public transport services, providing guidance to commuters and motorists, and realtime traffic monitori...
The increasing size of modern FPGAs allows for ever more complex applications to be mapped onto them. However, long design implementation times for large designs can severely affect design productivity. A modular design methodology can improve design productivity in a divide and conqueror fashion but at the expense of degraded performance and power...
The latest and upcoming mobile application processors, embedded in a myriad of consumer devices, are typically implemented as heterogeneous multiprocessor system-on-chips, comprised of various processing engines such as general-purpose cores with differing characteristics, GPUs, DSPs, nonprogrammable accelerators, and reconfigurable computing devic...
This work proposes a novel technique for hardware area-time estimation of applications on FPGA. The application C code is first converted to the target independent LLVM IR prior to wrapping the basic blocks as functions using a LLVM transformation pass. The LegUp tool’s ‘LLVM IR functions to RTL modules’ conversion is carried out to facilitate RTL...
Latest and upcoming Mobile Application Processors , embedded in a myriad of consumer devices, are typically implemented as heterogeneous multi-processor system-on-chips (MPSoCs) comprising of various processing engines such as general-purpose cores with differing characteristics, GPUs, DSPs, non-programmable accelerators, and reconfigurable computi...
Heterogeneous Mobile System-on-Chips (SoCs) containing CPU and GPU cores are becoming prevalent in embedded computing, and they need to execute applications concurrently. However, existing run-time management approaches do not perform adaptive mapping and thread-partitioning of applications while exploiting both CPU and GPU cores at the same time....
Demand responsive transit (DRT) services have significantly evolved in the past few years owing to developments in information and communication technologies. Among the many forms of DRT services, demand responsive bus (DRB) services are gaining traction as a complimentary mode to existing public transit services, especially to dynamically bridge t...
The popularity of real-time on-demand transit as a fast evolving mobility service has paved the way to explore novel solutions for point-to-point transit requests. In addition, strict government regulations on greenhouse gas emission calls for energy efficient transit solutions. To this end, we propose an on-demand public transit system using a fle...
Fast and accurate detection of vehicles on road traffic scenes captured by traffic surveillance cameras, is essential for large-scale deployment of automated traffic surveillance systems. The state-of-the-art techniques typically employ background modeling for low-complexity foreground detection. However, this is a challenging problem as these meth...
The large number of embedded soft core processors available today make it tedious and time consuming to select the best processor for a given application. This task is even more challenging due to the numerous configuration options available for a single soft core processor while optimizing for contradicting design requirements such as performance...
Heterogeneous computing, materialized in the form of multiprocessor system-on-chips (MPSoC) comprising of various processing elements such as general-purpose cores with differing characteristics, GPUs, DSPs, non-programmable accelerators, and reconfigurable computing, are expected to dominate the current and the future consumer device landscape. Th...
First/Last mile gaps are a significant hurdle in large scale adoption of public transit systems. Recently, demand responsive transit systems have emerged as a preferable solution to first/last mile problem. However, existing work requires significant computation time or advance bookings. Hence, we propose a public transit system linking the neighbo...
FPGA-based system-on-chip (SoC) devices for Internet of Things (IoT) applications require hardware-software (HW-SW) partitioning techniques to optimize for performance under stringent area and power constraints. To obtain an optimally partitioned design it is necessary to account for the data communication cost between hardware and software. Howeve...
Programmable Systems-on-Chips (SoCs) are expected to incorporate a larger number of application-specific hardware accelerators with tightly integrated memories in order to meet stringent performance-power requirements of embedded systems. As data sharing between the accelerator memories and the processor is inevitable, it is of paramount importance...
The graphics processing units (GPUs) in mobile devices have come a long way in terms of their capabilities to accelerate both graphics- and general-purpose applications. However, they also consume a significant amount of power when executing such applications. This necessitates sophisticated power management techniques for the GPUs in order to save...
The nature of ever evolving anomalies have become more sophisticated and complex in attacking the defense schemes, thereby leading to serious compromises. Existing software based techniques aim to protect a vulnerable software with another software which is also prone to compromise like obfuscation-based attacks. On the other hand, hardware perform...
Existing intrusion detection techniques that could once detect malicious applications effectively, now fail, since they are unable to keep up with the rapidly evolving zero-day attacks. Currently, anomaly detection is a widely used technique to detect zero-day attacks. However, anomaly detection based on system calls, control flow, network traffic,...
Programmable Systems-on-Chips (SoCs) are expected to incorporate a larger number of application-specific hardware accelerators with tightly integrated memories in order to meet stringent performance-power requirements of embedded systems. As data sharing between the accelerator memories and the processor is inevitable, it is of paramount importance...
First/last mile transit using public transport has consistently been a bottleneck for commuters due to the relatively higher time spent in these legs when compared to the overall journey. Recently, demand responsive transportation(DRT) services have been proposed for the first/last mile transit. However, in contrast to the requirements of a public...
Global motion estimation (GME) algorithms are typically employed on aerial videos captured by on-board UAV
cameras, to compensate for the artificial motion induced in these video frames due to camera motion. However, existing methods for GME have high computational complexity and are therefore not suitable for on-board processing in UAVs with limit...
Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) containing CPU and GPU cores are typically required to execute applications concurrently. However, as will be shown in this paper, existing approaches are not well suited for concurrent applications as they are developed either by considering only a single application or they do not exploit bo...
The embedded hard-core processors beside the traditional FPGA fabric in FPGA-based System-on-Chip (SoC) devices make them an attractive alternative for realizing the software portions of the application while using the FPGA fabric for hardware acceleration. However, several hard-core processor options are becoming available from different manufactu...
Modern FPGAs integrate multi-million logic resources that allow the realization of increasingly large designs. However, state-of-the-art simulated annealing based CAD tools for FPGA suffer from long runtime, poor performance and sub-optimal routing and placement decisions, especially for large applications, leading to less energy efficient designs....
Heterogeneous computing platforms combining general-purpose processing elements with different accelerators (such as GPU or FPGAs) are ideally suited for efficient processing of compute-intensive data analytics kernels. In this chapter, we focus on the acceleration of data analytics kernels on heterogenous computing systems with FPGAs. The introduc...
Applications containing compute-intensive kernels with nested loops can effectively leverage FPGAs to exploit fine- and coarse-grained parallelism. HLS tools used to translate these kernels from high-level languages (e.g., C/C++), however, are inefficient in exploiting multiple levels of parallelism automatically, thereby producing sub-optimal acce...
The increasing complexity of FPGA-based accelerators, coupled with time-to-market pressure, makes high-level synthesis (HLS) an attractive solution to improve designer productivity by abstracting the programming effort above register-transfer level (RTL). HLS offers various architectural design options with different trade-offs via pragmas (loop un...
State-of-the-art thermal management techniques independently throttle the frequencies of high-performance multi-core CPU and powerful graphics processing units (GPU) on heterogeneous multiprocessor system-on-chips deployed in latest mobile devices. For graphics-intensive gaming applications, this approach is inadequate because both the CPU and the...
The large number of possible configurations in modern soft-core processors make it tedious and time consuming to select the optimal configuration for a given application. In this paper, we propose a framework for rapid area-efficient customization of soft-core processors that exploits the dependencies between the various configuration options to pr...
Advanced driver-assistance systems (ADAS) generally embrace heterogeneous platforms consisting of central processing units and field-programmable gate arrays (FPGAs) to achieve higher performance and energy efficiency. The multiple-target tracking (MTT) system is an important component in most ADAS and is particularly suited for heterogeneous imple...
State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-perfo...
Games have emerged as one of the most popular applications on mobile platforms. Recent platforms are now equipped with Heterogeneous Multiprocessor System-on-Chips (HMPSoCs) tightly integrating CPUs and GPUs on the same chip. This configuration enables high-end gaming on the platform but at the cost of high power consumption rapidly draining the un...
Modern system-on-chips (SoC) integrate CPU and GPU for immersive 3D gaming experience. These games require both the CPU and GPU to work in tandem, resulting in high power consumption. In the past, Dynamic Voltage Frequency Scaling (DVFS) has been exploited for embedded CPU to save power during game play; but it is only recently that embedded GPUs h...
Local memories increase the efficiency of hardware accelerators by enabling fast accesses to frequently used data. In addition, the access latencies of local memories are deterministic which allows for more accurate evaluation of the system performance during design exploration. We have previously proposed local memories with an un-cached memory sl...
Traditionally, Instruction set extension (ISE) algorithms have treated memory and control flow as invalid operations during custom instruction identification to ensure deterministic latency of these extended instructions. In order to overcome these constraints some work has been done to incorporate local memory for custom instructions with memory o...
Custom instructions are commonly used to meet the strict design constraints in high performance systems. This paper extends the application space of our previously proposed FPGA-aware custom instruction enumeration and selection technique for area-constrained designs that maximizes the logic utilization of the available FPGA space. Results indicate...
Instruction set extension is becoming extremely popular for meeting the tight design constraints in embedded systems. This
mechanism is now widely supported by commercially available FPGA (Field-Programmable Gate Array) based reconfigurable processors.
In this paper, we present a design flow that automatically enumerates and selects custom instruct...
Off-the-shelf soft core processors are becoming increasingly popular in embedded systems design today as they provide for application specific customization, in particular through instruction subsetting. However, choosing the right processor configuration remains a challenge as the search space becomes prohibitively large when the configurable opti...
Mapping of applications onto Multiprocessor System-on-Chip (MPSoC) can be realized either at design-time or run-time. At any time the number of tasks executing in MPSoC platform can exceed the available resources, requiring efficient run-time mapping techniques to meet the real-time constraints of the applications. This paper presents two run-time...
The number of tasks executing in MPSoC platform can exceed the available resources, requiring efficient run-time mapping strategies to meet the real-time constraints of the applications. This paper describes two new run-time mapping heuristics for mapping applications onto NoC-based Heterogeneous Multiprocessor Systems-on-Chip (MPSoC). The heuristi...