Conference PaperPDF Available

Wibheda: Framework for Data Dependency-Aware Multi-Constrained Hardware-Software Partitioning in FPGA-Based SoCs for IoT Devices

Authors:
Wibheda: Framework for Data Dependency-aware
Multi-constrained Hardware-Software Partitioning
in FPGA-based SoCs for IoT Devices
Deshya Wijesundera, Alok Prakash, Thilina Perera, Kalindu Herathand Thambipillai Srikanthan
Nanyang Technological University, Singapore 639798
Email: {deshyase001,pere0004,kalindub001}@e.ntu.edu.sg, {alok,astsrikan}@ntu.edu.sg
Abstract—The increasing popularity of FPGA-based system-
on-chip (SoC) devices for Internet of Things (IoT) applications
calls for hardware-software partitioning solutions optimized for
performance under stringent area and power constraints. In
this work, we propose Wibheda, a heuristic based framework
for data dependency-aware multi-constrained hardware-software
partitioning at fine-granularity that can be employed to partition
designs for FPGA-based SoCs used in IoT. Wibheda, evaluated
on 6 applications from the popular CHStone benchmark suite has
been shown to find solutions with 98.7% accuracy within several
milliseconds compared to several minutes or hours in an existing
state-of-the-art work and an exhaustive approach respectively.
I. INTRODUCTION
Modern FPGAs are not only suited for accelerating critical
parts of an application, but also for realizing an entire System-
on-Chip (SoC), constituting processors, programmable logic,
memory subsystems, etc. However, efficient partitioning of the
application between the processor and programmable logic
is crucial to exploit the benefits offered by both worlds.
Partitioning decisions must typically be made early in the
design of a product. However, finding the optimally partitioned
solution is an NP-complete problem [1]. This has encouraged
many researchers to explore heuristic based approaches.
The selection of granularity at which to partition an ap-
plication poses another challenge in the partitioning process.
A coarse-grained approach implements large sections of code
where only a small fraction of the code may actually execute
frequently enough to provide meaningful acceleration in exe-
cution time, resulting in reduced return on the investment of
hardware area. A fine-grained approach provides much greater
control on the accelerated code segments resulting in higher
acceleration in execution time for the same hardware space, but
leads to a much more complex design space exploration. This
also, incurs higher data communication cost. Thus, accurate
modelling of data communication costs is important in a fine-
grained approach.
At the same time, the increasing popularity of Internet
of Things (IoT) devices necessitates designs with extremely
tight constraints in terms of size (area), power consumption,
costs, etc. Thus, reconfigurable solutions provide a favorable
design platform for such devices. This has resulted in FPGA
vendors offering FPGAs specifically targeted for IoT [2] [3].
Such systems can benefit immensely from intelligent fine-
grained acceleration to improve performance in highly re-
source constrained environments. This is also evident from
Xilinx’s initiative for industrial IoT solutions focusing on
software programmability and hardware acceleration [4].
II. PRO PO SE D FRA ME WO RK
We propose Wibheda, a framework for rapid data
dependency-, area- and power constraint- aware HW-SW par-
titioning at a fine-grained(basic block) level that can be applied
to applications of varying size and complexity. The main
contributions of this work are, a methodology for analysis of
data communication cost between basic blocks and memory
components and a scalable heuristic formulation to select
the most profitable HW-SW partitioning considering (i) data
communication cost of basic blocks and memory components,
(ii) area constraints in terms of look-up-table (LUT), digital
signal processing (DSP) block and flip-flop (FF) and (iii)
power constraints.
III. RES ULTS A ND DISCUSSION
The runtime of Wibheda is in the order of milliseconds
while that of the state-of-the-art (SoA) [5], is in the order of
minutes. Averaging across 6 applications from the CHStone
benchmark suite Wibheda shows an average estimation error
of only 1.3% in comparison to the SoA work which has an
error of 16.5%. We also used 3 different (LUT, FF, DSP and
power) constraints, each representing a latest FPGA device
targetted for IoT applications [2] [3] to validate Wibheda in
a system level design. The average difference in performance
for the 6 applications across the 3 experiments is only 0.27%.
ACKNOWLEDGMENT
This research project is partially funded by the National
Research Foundation Singapore under its Campus for Re-
search Excellence and Technological Enterprise (CREATE)
programme with the Technical University of Munich at TUM-
CREATE.
REFERENCES
[1] P. Arato et al., “Hardware-software partitioning in embedded system
design,” in ISISP ’03, 2003.
[2] Lattice Semiconductor, “iCE40 Ultra/UltraLite/UltraPlus-Lattice Semi-
conductor, http://bit.ly/2n9SEhR, 2017.
[3] A. Shilov, “Intel Announces Cyclone 10 FPGAs for IoT Devices,”
http://bit.ly/2m30SUX, 2017.
[4] Xilinx, “Industrial IoT Solutions Powered by Xilinx,
http://bit.ly/2Dj04Dl, 2018.
[5] A. Prakash et al., “Rapid Memory-Aware Selection of Hardware Accel-
erators in Programmable SoC Design,” TVLSI, 2017.
... For Internet of Things (IoT) sensor network, the accumulation error is significant, and therefore, the precision helps. FPGAs are gaining popularity in IoT applications along with low power consumption and real time processing capability [4]. With the integration of XADC in modern FPGAs eliminating the drawback of relatively higher power consumption and more complexity of an ADC connected [5] externally, this work integrates CORDIC with POSIT arithmetic considering the capabilities of an embedded XADC while optimizing for resource efficiency in FPGA-based SoCs. ...
Conference Paper
The integration of Field Programmable Gate-Array (FPGA) technology-driven System-on-Chip (SoC) solutions in Internet of Things (IoT) applications has been significantly enhanced by employing Coordinate Rotation Digital Computer (CORDIC) and Positive Integer and Signed Integer Ternary (POSIT) for efficient sensor data management. By incorporating XADC embedded capabilities within the SoC architecture, the need for external Analog-to-Digital Converters (ADCs) is eliminated, leading to a more compact and power efficient system. The CORDIC algorithm facilitates accurate trigonometric and logarithmic computations, which are crucial for precise sensor data processing, particularly for soil moisture measurements using capacitance sensors. Furthermore, POSIT arithmetic improves numerical accuracy and minimizes error in calculations, optimizing overall system performance. FPGA implementations demonstrated remarkable efficiency, utilizing only 2.10\% of LUTs, with the processor segment consuming just 1.60\% of the SoC resources. This CORDIC-POSIT-enabled framework was implemented and tested on the ZedBoard Zynq-7000 Development Board, resulting in significant enhancements in operational efficiency and real-time data processing capabilities within IoT sensor networks.
Conference Paper
Full-text available
One of the most crucial steps in the design of embedded systems is hardware-software partitioning, i.e. deciding which components of the system are implemented in hardware and which ones in software. Different versions of the partitioning problem are defined, corresponding to real-time systems, and cost-constrained systems, respectively. The authors provide a formal mathematic analysis of the complexity of the problems: it is proven that they are NP-hard in the general case, and some efficiently solvable special cases are also presented. An ILP (integer linear programming) based approach is presented that are solving the problem optimally even for quite big systems, and a genetic algorithm (GA) that finds near-optimal solutions for even larger systems. A specialty of the GA is that nonvalid individuals are also allowed, but punished by the fitness function.
Article
Programmable Systems-on-Chips (SoCs) are expected to incorporate a larger number of application-specific hardware accelerators with tightly integrated memories in order to meet stringent performance-power requirements of embedded systems. As data sharing between the accelerator memories and the processor is inevitable, it is of paramount importance that the selection of application segments for hardware acceleration must be undertaken such that the communication overhead of data transfers do not impede the advantages of the accelerators. In this paper, we propose a novel memory-aware selection algorithm that is based on an iterative approach to rapidly recommend a set of hardware accelerators that will provide high performance gain under varying area constraint. In order to significantly reduce the algorithm runtime while still guaranteeing near-optimal solutions, we propose a heuristic to estimate the penalties incurred when the processor accesses the accelerator memories. In each iteration of the proposed algorithm, a two-pass method is employed where a set of good hardware accelerator candidates is selected using a greedy approach in the first pass, and a "sliding window" approach is used in the second pass to refine the solution. The two-pass method is iteratively performed on a bounded set of candidate hardware accelerators to limit the search space and to avoid local maxima. In order to validate the benefits of the proposed selection algorithm, an exhaustive search algorithm is also developed. Experimental results using the popular CHStone benchmark suite show that the performance achieved by the accelerators recommended by the proposed algorithm closely matches the performance of the exhaustive algorithm, with close to 99% accuracy, while being orders of magnitude faster.
Intel Announces Cyclone 10 FPGAs for IoT Devices
  • A Shilov
A. Shilov, "Intel Announces Cyclone 10 FPGAs for IoT Devices," http://bit.ly/2m30SUX, 2017.
Industrial IoT Solutions Powered by Xilinx
  • Xilinx
Xilinx, "Industrial IoT Solutions Powered by Xilinx," http://bit.ly/2Dj04Dl, 2018.