In contexts such as embedded and cyber-physical systems, the design of a desired functionality under constraints increasingly requires a parallel execution of different tasks on heterogeneous architectures. The nature of such parallel systems implies a huge complexity in understanding and predicting performance in terms of response time. Indeed, response time depends on many factors associated with the characteristics of both the functionality and the target architecture. State-of-the art strategies derive response time by examining the operations required by each task for both processing and accessing shared resources. This procedure is often followed by the addition or elimination of potential interferences due to task concurrency. However, such approaches require an advanced knowledge of the software and hardware details, rarely available in practice. This thesis provides an alternative "topdown" strategy aimed at extending the cases in which hardware and software response times can be analyzed and predicted. The proposed strategy leverages on dataflow-based application representations and focuses on the response time estimation of reconfigurable applications mapped on both general-purpose and specialized processing elements.
In the last few years, besides the concepts of embedded
and interconnected systems, also the notion of Cyber-
Physical Systems (CPS) has emerged: embedded computational
collaborating devices, capable of sensing and controlling physical
elements and, often, responding to humans. The continuous interaction
between physical and computing layers makes their design
and maintenance extremely complex. Uncertainty management
and runtime reconfigurability, to mention the most relevant ones,
are rarely tackled by available toolchains.
In this context, the Cross-layer modEl-based fRamework for
multi-oBjective dEsign of Reconfigurable systems in unceRtain
hybRid envirOnments (CERBERO) EU project aims at developing
a design environment for CPS based of two pillars:
1) a cross-layer model-based approach to describe, optimize,
and analyze the system and all its different views concurrently
and 2) an advanced adaptivity support based on a multi-layer
autonomous engine. In this work, we describe the components
and the required developments for seamless design of reusable
and reconfigurable CPS and System of Systems in uncertain
Demand for low-power data processing hardware continues to rise inexorably. Existing programmable and "general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as evidenced by the order-of-magnitude improvements and industry adoption of application and domain-specific accelerators in important areas like machine learning, computer vision and big data. The stark tradeoffs between efficiency and generality at these two extremes poses a difficult question: how could domain-specific hardware efficiency be achieved without domain-specific hardware solutions?
In this work, we rely on the insight that "acceleratable" algorithms have broad common properties: high computational intensity with long phases, simple control patterns and dependences, and simple streaming memory access and reuse patterns. We define a general architecture (a hardware-software interface) which can more efficiently expresses program with these properties called stream-dataflow. The dataflow component of this architecture enables high concurrency, and the stream component enables communication and coordination at very-low power and area overhead. This paper explores the hardware and software implications, describes its detailed microarchitecture, and evaluates an implementation. Compared to a state-of-the-art domain specific accelerator (DianNao), and fixed-function accelerators for MachSuite, Softbrain can match their performance with only 2x power overhead on average.
Applicable in different fields and markets, low energy High Efficiency Video Coding (HEVC) codecs and their constituting elements have been heavily studied. Fractional pixel interpolation is one of its most costly blocks. In this paper, a Field Programmable Gate Array (FPGA) implementation of HEVC fractional pixel interpolation, outperforming literature solutions, is proposed. Approximate computing, in conjunction with hardware reconfiguration, guarantees a tunable interpolation system offering an energy vs. quality trade-off to further reduce energy.
Combining processors with hardware accelerators has become a norm with systems-on-chip (SoCs) ever present in modern compute devices. Heterogeneous programmable system on chip platforms sometimes referred to as hybrid FPGAs, tightly couple general purpose processors with high performance reconfigurable fabrics, providing a more flexible alternative. We can now think of a software application with hardware accelerated portions that are reconfigured at runtime. While such ideas have been explored in the past, modern hybrid FPGAs are the first commercial platforms to enable this move to a more software oriented view, where reconfiguration enables hardware resources to be shared by multiple tasks in a bigger application. However, while the rapidly increasing logic density and more capable hard resources found in modern hybrid FPGA devices should make them widely deployable, they remain constrained within specialist application domains. This is due to both design productivity issues and a lack of suitable hardware abstraction to eliminate the need for working with platform-specific details, as server and desktop virtualization has done in a more general sense. To allow mainstream adoption of FPGA based accelerators in general purpose computing, there is a need to virtualize FPGAs and make them more accessible to application developers who are accustomed to software API abstractions and fast development cycles. In this paper, we discuss the role of overlay architectures in enabling general purpose FPGA application acceleration.