[Show abstract][Hide abstract] ABSTRACT: Next generation real-time applications demand big-data infrastructures to process huge and continuous data volumes under complex computational constraints. This type of application raises new issues on current big-data processing infrastructures. The first issue to be considered is that most of current infrastructures for big-data processing were defined for general purpose applications. Thus, they set aside real-time performance, which is in some cases an implicit requirement. A second important limitation is the lack of clear computational models that could be supported by current big-data frameworks. In an effort to reduce this gap, this article contributes along several lines. First, it provides a set of improvements to a computational model called distributed stream processing in order to formalize it as a real-time infrastructure. Second, it proposes some extensions to Storm, one of the most popular stream processors. These extensions are designed to gain an extra control over the resources used by the application in order to improve its predictability. Lastly, the article presents some empirical evidences on the performance that can be expected from this type of infrastructure. Index Terms— real-time, distributed stream processing, predictable infrastructure
Full-text · Article · Jan 2016 · Future Generation Computer Systems
[Show abstract][Hide abstract] ABSTRACT: The JUNIPER project is developing a framework for the construction of
large-scale distributed systems in which execution time bounds can be
guaranteed. Part of this work involves the automatic implementation of input
Java code on FPGAs, both for speed and predictability. An important focus of
this work is to make the use of FPGAs transparent though runtime co-design and
partial reconfiguration. Initial results show that the use of Java does not
hamper hardware generation, and provides tight execution time estimates. This
paper describes an overview the approach taken, and presents some preliminary
results that demonstrate the promise in the technique.
[Show abstract][Hide abstract] ABSTRACT: Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.
No preview · Article · Apr 2015 · Journal of Systems Architecture
[Show abstract][Hide abstract] ABSTRACT: Manufacturing variability is an increasingly significant problem. Silicon devices that are designed to be identical will display widely ranging characteristics after manufacture. Power use, supported clock frequencies and lifespan may all vary considerably. This is of particular concern for embedded systems because of their extensive use of complex system on- chip (SoC)-based architectures. If this variability is not tolerated by the software, then manufacturing yields are reduced and devices are not used efficiently. This study discusses a novel approach to the integration of variability-mitigation techniques that uses model-driven engineering to explicitly consider variability as part of the development process. Developers can build systems that are much more resilient to variability effects, allowing systems to have higher yields, lower costs and greater reliability. The approach uses code generation and code transformation to simplify design-space exploration and reduce time-to-market. The approach is illustrated with an example of audio processing on a complex multiprocessor SoC with simulated variability, and it is shown to be increasingly effective as system variability becomes more significant.
No preview · Article · Jan 2015 · IET Computers & Digital Techniques
[Show abstract][Hide abstract] ABSTRACT: Existing programming models for distributed and cloud-based systems tend to abstract away from the architectures of individual target nodes, concentrating instead on higher-level issues of algorithm representation (MapReduce etc.). However, as programmers begin to tackle the issue of Big Data, increasing data volumes are forcing developers to reconsider this approach and to optimise their software heavily. JUNIPER is an EU-funded project which assists Big Data developers to create architecture-aware software in a way that is suitable for the target domain, and provides higher performance, portability, and real-time guarantees.
[Show abstract][Hide abstract] ABSTRACT: As modern embedded systems become increasingly complex, they also become susceptible to manufacturing variability. Variability causes otherwise identical hardware elements to exhibit large differences in dynamic and static power usage, maximum clock frequency, thermal resilience, and lifespan. There are currently no standard ways of handling this variability from the software developer's point of view, forcing the hardware vendor to discard devices that fall below a certain threshold. This chapter first presents a review of existing state-of-the-art techniques for mitigating the effects of variability. It then presents the toolflow developed as part of the ToucHMore project, which aims to build variability-awareness into the entire design process. In this approach, the platform is modelled in SysML, along with the expected variability and the monitoring and mitigation capabilities that the hardware presents. This information is used to automatically generate a customised variability-aware runtime, which is used by the programmer to perform operations such as offloading computation to another processing element, parallelising operations, and altering the energy use of operations (using voltage scaling, power gating, etc.). The variability-aware runtime affects its behaviour according to modelled static manufacturing variability and measured dynamic variability (such as battery power, temperature, and hardware degradation). This is done by moving computation to different parts of the system, spreading computation load more efficiency, and by making use of the modelled capabilities of the system.
[Show abstract][Hide abstract] ABSTRACT: This chapter presents the EU-funded MADES FP7 project that aims to develop an effective model-driven methodology to improve the current practices in the development of real-time embedded systems for avionics and surveillance industries. MADES developed an effective SysML/MARTE language subset, and a set of new tools and technologies that support high-level design specifications, validation, simulation, and automatic code generation, while integrating aspects such as component re-use. This chapter illustrates the MADES methodology by means of a car collision avoidance system case study; it presents the underlying MADES language, the design phases, and the set of tools supporting on one hand model verification and validation and, on the other hand, automatic code generation, which enables the implementation on execution platforms such as state-of-the-art FPGAs.
[Show abstract][Hide abstract] ABSTRACT: The architectures of modern embedded systems tend to be highly application-specific, containing features such as heterogeneous multicore processors, non-uniform memory architectures, custom function accelerators and on-chip networks. Furthermore, these systems are resource-constrained and are often deployed as part of safety-related systems. This necessitates the levels of certification and the use of designs that meet stringent non-functional requirements (such as timing or power). This chapter focusses upon new tools for the generation of software and hardware for modern embedded systems implemented using Java. The approach promotes rapid deployment and design space exploration, and is integrated into a fully model-driven toolflow that supports existing industrial practices. The presented approach allows the automatic deployment of architecture-neutral Java code over complex embedded architectures, with minimal overheads and a run-time support that is amenable to real-time analysis.
[Show abstract][Hide abstract] ABSTRACT: FPGAs enable NoC architecture experimentation, although to be effective they need to be supported by tools and frameworks for construction of the NoC and effective software programming of the NoC. In this paper, we focus upon effective programming of the NoC using Java, complementing previous work which proposes the Blueshell framework for NoC generation for FPGAs. The approach taken is called Network-Chi, providing a number of key extensions to the Chi Java compiler. This includes provision of a networking API within Java giving a mesh based abstraction for network communication, allowing the programmer to send Java objects to other nodes without consideration for the underlying hardware topology or protocols; and a region-based memory management API that enables the definition of transient allocation contexts that discard all objects allocated within them when they reach the end of execution. Results show the approach taken to be efficient and effective.
[Show abstract][Hide abstract] ABSTRACT: We present a multitasking scratchpad memory reuse scheme (MSRS) for the dynamic partitioning of scratchpad memory between tasks in a preemptive multitasking system. We specify a means to compute the worst-case response time (WCRT) and schedulability of task sets executed using MSRS. Our scratchpad-related preemption delay (SRPD) is an analog of cache-related preemption delay (CRPD), proposed in previous work as a way to compute the worst-case cost imposed upon a preempted task by preemption in a multitasking system. Unlike CRPD, however, SRPD is independent of the number of tasks and the local memory size. We compare SRPD with CRPD by experiment and determine that neither dominates the other, i.e. either may be better for certain task sets. However, MSRS leads to improved schedulability versus cache when contention for local memory space is high, either because the local memory size is small, or because the task set is large, provided that the cost of loading blocks from external memory to scratchpad is similar to the cost of loading blocks into cache.
[Show abstract][Hide abstract] ABSTRACT: The previous chapter discussed how the Java Virtual Machine could benefit from some of its components being implemented in hardware. This chapter continues this hardware-support theme by considering how Java applications can interface to more general hardware coprocessors.
[Show abstract][Hide abstract] ABSTRACT: Scratchpad memory (SPM) provides a predictable and energy efficient way to store program instructions and data. It would be ideal for embedded real-time systems if not for the practical difficulty that most programs have to be modified in source or binary form in order to use it effectively. This modification process is called partitioning, and it splits a large program into sub-units called regions that are small enough to be stored in SPM. Earlier papers on this subject have only considered regions formed around program structures, such as loops, methods and even entire tasks. Region formation and SPM allocation are performed in two separate steps. This is an approximation that does not make best use of SPM. In this paper, we propose a k-partitioning algorithm as a new way to solve the problem. This allows us to carry out region formation and SPM allocation simultaneously. We can generate optimal partitions for programs expressed either as call trees or by a restricted form of control-flow graph (CFG). We show that this approach obtains superior results to the previous two-step approach. We apply our algorithm to various programs and SPM sizes and show that it reduces the execution time cost for executing those programs relative to execution with cache.
[Show abstract][Hide abstract] ABSTRACT: This paper discusses a strategy for translating the Java programming language to a form that is suitable for execution on resource limited embedded systems such as softcore processors in FPGAs, Network-on-Chip nodes and microcontrollers. The translation strategy prioritises the minimisation of runtime memory usage, generated code size, and suitability for a wide range of small architectures over other desirable goals such as execution speed and strict adherence to the Java standard. The translation procedure, or Concrete Hardware Implementation of a software application first converts the application's compiled Java class files to a self-contained intermediate representation conducive to optimisation and refactoring. The intermediate format is then serialised into a programming language compilable to the target architecture. This paper presents techniques for analysing whole Java applications, translating Java methods and building a stand-alone translated application with the same functional behaviour as the original Java. An example C-code generator is described and evaluated against similar previous approaches. An existing benchmark application, JavaBenchEmbedded, is demonstrated to require less than 30KiB of program code and 16KiB of runtime heap memory when executing on a Xilinx MicroBlaze Processor.
[Show abstract][Hide abstract] ABSTRACT: This paper proposes Carousel, a mechanism to manage local memory space, i.e. cache or scratch pad memory (SPM), such that inter-task interference is completely eliminated. The cost of saving and restoring the local memory state across context switches is explicitly handled by the preempting task, rather than being imposed implicitly on preempted tasks. Unlike earlier attempts to eliminate inter-task interference, Carousel allows each task to use as much local memory space as it requires, permitting the approach to scale to large numbers of tasks. Carousel is experimentally evaluated using a simulator. We demonstrate that preemption has no effect on task execution times, and that the Carousel technique compares well to the conventional approach to handling interference, where worst-case interference costs are simply added to the worst-case execution times (WCETs) of lower-priority tasks.
[Show abstract][Hide abstract] ABSTRACT: This paper proposes Anvil J, a novel technology developed to assist the development of software for predictable, embedded applications. In particular, the work focuses on the complexities of programming for heterogeneous embedded systems in an industrial context, in which the need for predictability is an important requirement. Anvil J converts architecturally-neutral Java code into a set of target-specific programs, automatically distributing the input software over the heterogeneous target architecture whilst ensuring preservation of predictability. During translation it generates a low-to zero-overhead runtime that is tailored to the specific combination of input application and target system, thereby ensuring maximum efficiency. Anvil J uses a technique called Compile-Time Virtualisation that allows it to work with existing compilers and removes the need for language extensions which can hinder certification efforts.