Conference Paper

RTL-to-Layout Implementation of an Embedded Coarse Grained Architecture for Dynamically Reconfigurable Computing in Systems-on-Chip

ST Microelectron., Agrate Brianza, Italy
DOI: 10.1109/SOCC.2009.5335665 Conference: System-on-Chip, 2009. SOC 2009. International Symposium on
Source: IEEE Xplore


This paper describes the RTL-to-layout implementation of the PACT XPP-III coarse-grained reconfigurable architecture (CGRA). The implementation activity was strictly based on a hierarchical approach in order to exploit performance optimization at all levels, as well as guarantee maximum scalability and provide a portfolio of IP-blocks that could be reused to build different configurations and embodiments of the same CGRA template. The final result can be seamlessly introduced in any SoC design flow as embedded accelerator. It is designed in STMicroelectronics 90nm GP technology, occupies 42.5 mm2, delivers 13 16-bit GOPS (0.8 GOPS/mW, 10 MOPS/mW) and has a measured max frequency of 150 MHZ, requiring a measured 13 mW/MHz dynamic power, 93 mW static. A silicon prototype was also produced embedding XPP-III in a complex system-on-chip including an ARM processor as system controller as well as different ASIC blocks.

6 Reads
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a System on Chip implementation of a reconfigurable digital signal processor. The device is suitable for execution of a wide range of applications exploiting a balanced mix of heterogeneous reconfigurable fabrics merged together by a flexible and efficient communication infrastructure based on a 64-bit Network On Chip. The SoC combines a fine grain embedded FPGA, a mid grain configurable processor and a coarse grain reconfigurable array. An ARM processor featuring a resident operating system is the SoC supervisor, managing communication, synchronization and reconfiguration mechanisms. This computational model enables the programmer to manage the high level synchronization and global data of complex signal processing applications through the ARM processor, while allocating most critical computational kernels to the most suitable reconfigurable engines. The SoC has been fabricated in 90-nm technology, the die area being 110 mm<sup>2</sup>; it integrates 97 million transistors and has a peak power consumption of 2.5 W. In order to demonstrate the proposed computational model and the reconfigurable signal processor capabilities in a real test case, a video surveillance motion detection application was implemented in the SoC. When running this application, the device proved able to deliver 120 GOPS dissipating 1.45 W.
    IEEE Journal of Solid-State Circuits 09/2010; 45(8-45):1615 - 1626. DOI:10.1109/JSSC.2010.2048149 · 3.01 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Integration of system components is a crucial challenge in the design of embedded real-time systems, as complex non-functional interdependencies may exist. [20] presented a framework, enabling autonomous verification of timing properties in the system itself. The work presented in this paper, takes that approach one step further, enabling autonomuous assignment of execution priorities under timing constraints. We present a distributed heuristic algorithm for the constraint statisfaction problem (CSP) of finding feasible priority assignments in static priority preemptive (SPP) scheduled hard real-time systems. The proposed heuristic considers end-to-end path latency constraints in arbitrary task graphs mapped on arbitrary platform graphs.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Predicting timing behaviour is essential for the design of embedded real-time systems that can switch between different operational modes at runtime. The settling time of a mode change, called mode change transition latency, is an important system parameter. Known approaches that address the problem of timing analysis for multi-mode real-time systems are restricted to applications without communicating tasks. Also, these assume that transitions are initiated only during a steady state, however, without indicating when a system executes in a steady state. In this paper, we present an analysis algorithm which gives a maximum bound on each mode change transition latency of multi-mode distributed applications thereby overcoming limitations of previous work. We explain the algorithm, prove its correctness, illustrate the steps and provide experimental data that show its usefulness.
    01/2011; DOI:10.1109/ETFA.2011.6059009
Show more

Similar Publications