February 2016
·
13 Reads
·
4 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
February 2016
·
13 Reads
·
4 Citations
January 2016
·
24 Reads
December 2015
·
31 Reads
·
13 Citations
October 2015
·
15 Reads
·
3 Citations
September 2015
·
31 Reads
·
2 Citations
July 2015
·
65 Reads
·
8 Citations
Nowadays face recognition application is widely used in various industries such as traffic, safety, medical engineering, etc. In this paper, we propose a power and energy efficient heterogeneous platform to accelerate face recognition applications. To achieve this efficiency, we propose a novel hybrid platform which consists of a Xilinx Zynq (ARM+FPGA) and an NVidia's Jetson TK1 (ARM+GPU) coupled with PCIe card. In this application, we optimized local binary pattern and eigenvalue based face detection and recognition in order to achieve a speedup of 69x when compared to sequential execution on the ARM core, 4.8x against Zynq platform (ARM+FPGA), 3.2x against NVidia platform (ARM+GPU) and 40% more energy efficient against sequential execution.
May 2015
·
12 Reads
·
4 Citations
November 2014
·
46 Reads
·
15 Citations
Using low-power symmetric multi-cores on FPGAs are becoming ubiquitous in embedded computing. This is due to the emergence of power and energy as key design metrics, as important as performance. This leads to the requirement of powerful and reliable tools, which will be used for the Design Space Exploration (DSE) based on power and energy at an early stage of the design flow. In this paper, we propose a simulation based virtual platform power and energy estimation tool for heterogeneous Multiprocessor System-on-Chip (MPSoC) based platforms. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for different parts of the system. This is a one-time activity. In the second step, a simulation based virtual platform framework is developed to accurately grab the activities used in the related power models generated in the first step. The combination of the two steps leads to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool is automated and also scalable for exploring complex embedded multi-core architectures. The efficiency of the proposed tool is validated through multi-cores/processors designed around the FPGAs and extended to accommodate futuristic multi-processors/cores for a reliable energy based DSE. The obtained power/energy estimation results provide less than 4% of error for single core processor, 8% for dual-core processor and 9% for heterogeneous MPSoC based systems when compared to real board measurements.
November 2014
·
32 Reads
·
8 Citations
This paper proposes DESSERT (DESign Space ExploRation Tool at System-Level), a novel simulation-based tool for heterogeneous multi-core processor based platforms. This tool supports power/energy estimation, comprehensive architectural explorations and optimization of the given embedded applications for multi-core processor architectures. The development of DESSERT consists of three steps. First, we developed generic functional-level power models for different parts of the multi-core system to estimate power/energy, which are integrated into the system-level simulation environment. Second, we built a SystemC-based virtual platform prototype of the processor architecture to accurately extract the functional activities needed by the power model. Third, we designed a runtime task-dependencies management and optimization technique (work-load or dynamic slack reclamation) based on programming models that support both OpenMP and Pthread API for multi-core execution to consider both data-level and thread-level parallelism. The combination of above three steps leads to a novel Design Space Exploration (DSE) methodology. Power and energy estimates are validated against real board measurements. DESSERT power/energy estimation results provide less than 5% of error and offer reliable power/energy based DSE for the given applications.
October 2014
·
31 Reads
·
4 Citations
Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level design is considered as a vital premise to explore design trade-offs for such devices in the early stage of the design flow. This paper proposes a novel system-level power/energy estimation methodology and optimization techniques for heterogeneous CPU-GPU based platforms. There are two parts involved in this methodology. First, we developed the power models by using functional parameters to set up generic power models for different parts of the platform. Second, we designed a simulation based system-level prototype using SystemC (JIT) and Cycle-Accurate simulators to accurately evaluate the activities used in the related power models. The combination of the two parts leads to a novel power estimation methodology at system-level, which gives a good trade-off between accuracy and speed. Moreover, leveraging our methodology, we introduce novel power optimization techniques such as inter-task DVFS and workload balancing at the system-level for CPU-GPU platforms. The efficiency of our proposed methodology and optimization techniques are validated through a CARMA kit, which consists of an ARM quad-core processor and a NVIDIA GPU processor (96 cores). Estimated power and energy values are compared to real board measurements. Our obtained power/energy estimation results provide less than 2.5% of error for single core processor, 4% for dual-core processor, 4% for quad-core, 4% for GPU and 6% multi-processor based systems. By using the proposed optimization techniques, we achieved significant power and energy savings of up to 45% and 70% respectively for various industrial benchmarks.
... Despite the aging Kepler architecture, previous research has demonstrated the potential of the TK1 for modern space applications. For example, the TK1 has been used for tasks such as 3D scanning [22], aircraft detection [23], autonomous robotics [24], collision avoidance [25], encryption speedup [26], fully convolutional networks [27], object detection [28,29], image processing [30], Synthetic Aperture Radar (SAR) imaging [31], and target tracking [32,33]. ...
May 2015
... Processors, SoCs [53,54] "bottom-up" low-level simulation is complexity, rendering this technique (i.e. simulating dynamics at register-transfer or gate level) unsuited to real-time estimation of power consumption. ...
October 2015
... It should be noted that there are several power and energy estimation algorithms and tools available for CPUs, GPUs and CPU+GPU systems such as e.g. FAcET [30] that consists of power models that depict functional activities of various system components as well as a simulation based component for evaluation of parameters of the models. This allows easy application to various compute devices such as for the shown ARM multi-core CPUs and NVIDIA GPUs. ...
September 2015
... Challenges include managing data transfers and maximizing parallelism. Future research could optimize workload distribution and explore new high-performance applications for trigeneous platforms [7]. ...
December 2015
... There are two approaches to undervolting studies: i) simulation-based studies [89,108,127,132], or ii) direct implementation or testing on real hardware fabrics, mainly performed on CPUs, GPUs, ASICs, and DRAMs [9,18,50,78,81,138]. The simulation-based approach requires less engineering effort. ...
February 2016
... There have been many studies on scheduling schemes to improve energy efficiency and execution speed when executing specific tasks in consideration of task partition in the FPGA-GPU hybrid system [24], [25]. In particular, Rethinagiri et al. [25] have applied task splitting to face recognition applications based on LBHP algorithms and found a 40% increase in energy efficiency compared to traditional GPUs. ...
July 2015
... It also considers the Spatio-temporal correlation between the different input patterns. A hybrid system-level model for MPSoC is presented in [12]. ...
March 2013
... The Virtual Platform Power and Energy Estimation Tool (VPPET) [22,33] uses an OVP virtual platform with attached power and energy and energy monitor. This approach is based on a measurement of the power consumption on the real platform and puts it into relation for performance counters (like instructions per cycle and cache moss rate) to compute the overall power consumption. ...
November 2014
... A popular scheme for ICV central control is CPU+GPU [63], represented by NVIDIA Drive PX 2, which is announced to be able to support high-level automated driving. Intel applies another hardware approach for AI, which is the CPU+FPGA scheme 11) . ...
October 2014
... DSE was already addressed by a large set of studies. Earlier works have introduced accurate tools based on simulation techniques, e.g., [2], [3], [4], [5]. However with the advent of the many-core era and the explosion of the design space size, the simulation-based approach is no longer an efficient solution. ...
November 2014