Neil C. Audsley

The University of York, York, England, United Kingdom

Are you Neil C. Audsley?

Claim your profile

Publications (117)13.05 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Next generation real-time applications demand big-data infrastructures to process huge and continuous data volumes under complex computational constraints. This type of application raises new issues on current big-data processing infrastructures. The first issue to be considered is that most of current infrastructures for big-data processing were defined for general purpose applications. Thus, they set aside real-time performance, which is in some cases an implicit requirement. A second important limitation is the lack of clear computational models that could be supported by current big-data frameworks. In an effort to reduce this gap, this article contributes along several lines. First, it provides a set of improvements to a computational model called distributed stream processing in order to formalize it as a real-time infrastructure. Second, it proposes some extensions to Storm, one of the most popular stream processors. These extensions are designed to gain an extra control over the resources used by the application in order to improve its predictability. Lastly, the article presents some empirical evidences on the performance that can be expected from this type of infrastructure. Index Terms— real-time, distributed stream processing, predictable infrastructure
    Future Generation Computer Systems 01/2016; 52. DOI:10.1016/j.future.2015.03.023 · 2.64 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.
    Journal of Systems Architecture 04/2015; DOI:10.1016/j.sysarc.2015.04.002 · 0.69 Impact Factor
  • Gary Plumbridge, Neil C. Audsley, Ian Gray
    IET Computers & Digital Techniques 01/2015; 9(1):82-92. DOI:10.1049/iet-cdt.2014.0070 · 0.36 Impact Factor
  • Ian Gray, Yu Chan, Neil C. Audsley, Andy Wellings
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing programming models for distributed and cloud-based systems tend to abstract away from the architectures of individual target nodes, concentrating instead on higher-level issues of algorithm representation (MapReduce etc.). However, as programmers begin to tackle the issue of Big Data, increasing data volumes are forcing developers to reconsider this approach and to optimise their software heavily. JUNIPER is an EU-funded project which assists Big Data developers to create architecture-aware software in a way that is suitable for the target domain, and provides higher performance, portability, and real-time guarantees.
  • Gary Plumbridge, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: FPGAs enable NoC architecture experimentation, although to be effective they need to be supported by tools and frameworks for construction of the NoC and effective software programming of the NoC. In this paper, we focus upon effective programming of the NoC using Java, complementing previous work which proposes the Blueshell framework for NoC generation for FPGAs. The approach taken is called Network-Chi, providing a number of key extensions to the Chi Java compiler. This includes provision of a networking API within Java giving a mesh based abstraction for network communication, allowing the programmer to send Java objects to other nodes without consideration for the underlying hardware topology or protocols; and a region-based memory management API that enables the definition of transient allocation contexts that discard all objects allocated within them when they reach the end of execution. Results show the approach taken to be efficient and effective.
    2013 International Conference on ReConFigurable Computing and FPGAs (ReConFig); 12/2013
  • Gary Plumbridge, Neil Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper discusses a strategy for translating the Java programming language to a form that is suitable for execution on resource limited embedded systems such as softcore processors in FPGAs, Network-on-Chip nodes and microcontrollers. The translation strategy prioritises the minimisation of runtime memory usage, generated code size, and suitability for a wide range of small architectures over other desirable goals such as execution speed and strict adherence to the Java standard. The translation procedure, or Concrete Hardware Implementation of a software application first converts the application's compiled Java class files to a self-contained intermediate representation conducive to optimisation and refactoring. The intermediate format is then serialised into a programming language compilable to the target architecture. This paper presents techniques for analysing whole Java applications, translating Java methods and building a stand-alone translated application with the same functional behaviour as the original Java. An example C-code generator is described and evaluated against similar previous approaches. An existing benchmark application, JavaBenchEmbedded, is demonstrated to require less than 30KiB of program code and 16KiB of runtime heap memory when executing on a Xilinx MicroBlaze Processor.
    2012 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC); 07/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a multitasking scratchpad memory reuse scheme (MSRS) for the dynamic partitioning of scratchpad memory between tasks in a preemptive multitasking system. We specify a means to compute the worst-case response time (WCRT) and schedulability of task sets executed using MSRS. Our scratchpad-related preemption delay (SRPD) is an analog of cache-related preemption delay (CRPD), proposed in previous work as a way to compute the worst-case cost imposed upon a preempted task by preemption in a multitasking system. Unlike CRPD, however, SRPD is independent of the number of tasks and the local memory size. We compare SRPD with CRPD by experiment and determine that neither dominates the other, i.e. either may be better for certain task sets. However, MSRS leads to improved schedulability versus cache when contention for local memory space is high, either because the local memory size is small, or because the task set is large, provided that the cost of loading blocks from external memory to scratchpad is similar to the cost of loading blocks into cache.
    Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd; 01/2012
  • Jack Whitham, Neil Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Scratchpad memory (SPM) provides a predictable and energy efficient way to store program instructions and data. It would be ideal for embedded real-time systems if not for the practical difficulty that most programs have to be modified in source or binary form in order to use it effectively. This modification process is called partitioning, and it splits a large program into sub-units called regions that are small enough to be stored in SPM. Earlier papers on this subject have only considered regions formed around program structures, such as loops, methods and even entire tasks. Region formation and SPM allocation are performed in two separate steps. This is an approximation that does not make best use of SPM. In this paper, we propose a k-partitioning algorithm as a new way to solve the problem. This allows us to carry out region formation and SPM allocation simultaneously. We can generate optimal partitions for programs expressed either as call trees or by a restricted form of control-flow graph (CFG). We show that this approach obtains superior results to the previous two-step approach. We apply our algorithm to various programs and SPM sizes and show that it reduces the execution time cost for executing those programs relative to execution with cache.
    Real-Time Systems (ECRTS), 2012 24th Euromicro Conference on; 01/2012
  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes Carousel, a mechanism to manage local memory space, i.e. cache or scratch pad memory (SPM), such that inter-task interference is completely eliminated. The cost of saving and restoring the local memory state across context switches is explicitly handled by the preempting task, rather than being imposed implicitly on preempted tasks. Unlike earlier attempts to eliminate inter-task interference, Carousel allows each task to use as much local memory space as it requires, permitting the approach to scale to large numbers of tasks. Carousel is experimentally evaluated using a simulator. We demonstrate that preemption has no effect on task execution times, and that the Carousel technique compares well to the conventional approach to handling interference, where worst-case interference costs are simply added to the worst-case execution times (WCETs) of lower-priority tasks.
    01/2012; 13(4s):3-12. DOI:10.1109/RTAS.2012.19
  • Ian Gray, N.C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes Anvil J, a novel technology developed to assist the development of software for predictable, embedded applications. In particular, the work focuses on the complexities of programming for heterogeneous embedded systems in an industrial context, in which the need for predictability is an important requirement. Anvil J converts architecturally-neutral Java code into a set of target-specific programs, automatically distributing the input software over the heterogeneous target architecture whilst ensuring preservation of predictability. During translation it generates a low-to zero-overhead runtime that is tailored to the specific combination of input application and target system, thereby ensuring maximum efficiency. Anvil J uses a technique called Compile-Time Virtualisation that allows it to work with existing compilers and removes the need for language extensions which can hinder certification efforts.
    Real-Time and Embedded Technology and Applications Symposium (RTAS), 2012 IEEE 18th; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: MARTE has matured into a substantial industrially relevant profile that extends UML expressive power to support the specification and design of embedded systems. When supported by appropriate model transformation and code generation tools, MARTE forms an appropriate starting point for embedded system development. In this paper we propose a simpler yet less powerful subset of MARTE, targeted at multiprocessor systems and amenable to early analysis (including timing) of design alternatives before committing to a particular design for implementation. We use the proposed subset of MARTE constructs to generate abstract simulation and real-time schedulability analysis models, allowing both average and worst-case performance metrics to be considered when comparing multiple design alternatives.
    Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2012 7th International Workshop on; 01/2012
  • N.C. Audsley, I. Gray, A. Acquaviva, R. Haines
    [Show abstract] [Hide abstract]
    ABSTRACT: Run-time platform variability presents a number of challenges to the system software in order that a run-time environment is presented to applications that sufficiently masks dynamic platform variability (including fabrication variability), whilst allowing applications to tune overall system performance to exploit key aspects of dynamic energy usage and platform variability. The approach taken within the Touchmore project is to model key aspects of the platform in order that performance and variability can be understood and exploited by the system software. In turn, the system software (comprising OS and run-time) utilises the model so that aspects of variability and energy usage are abstracted from the platform, then monitored and controlled in order to meet policy goals, eg. energy minimisation. This paper documents aspects of the modeling and system software structure to show how the Touchmore project is managing energy and platform variability using customisation of the application, system software and toolchain.
    High Level Design Validation and Test Workshop (HLDVT), 2012 IEEE International; 01/2012
  • I. Gray, N.C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiprocessor Systems-on-Chip (MPSoC)-based platforms are becoming more common in the embedded domain. Such systems are a significant deviation from the homogeneous, uniprocessor architectures that have been traditionally employed by embedded designers, thereby making the software development process to effectively target the platform more challenging. Low-resource embedded systems rely on efficient implementations that are not well supported by traditional solutions based on architecture virtualisation or middleware. Within this paper we examine these challenges and discuss ways in which they can be mitigated. In particular, we focus on the contributions made by two recent approaches based on Model-Driven Engineering (MDE). We also discuss challenges for future research.
    Rapid System Prototyping (RSP), 2012 23rd IEEE International Symposium on; 01/2012
  • G. Plumbridge, N. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces Machine Java, a framework of classes for the Java programming language that enable the description of software for systems with heterogeneous processing elements (such as CPUs, microcontrollers and function accelerators). Intended for the behavioural description of embedded systems, Machine Java encapsulates both the data and control aspects of computation into `machine' objects that are appropriate for mapping onto architecturally diverse multiprocessors. System descriptions in Machine Java avoid the need for a separate programming language for each processing element, and makes explicit description of communications between processors unnecessary. Suitability for a wide variety of hardware platforms is enhanced by avoiding dependence on notions of shared memory or shared timing resources.
    Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2011 6th International Workshop on; 07/2011
  • Source
    Ian Gray, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Within the domain of embedded systems, hardware architectures are commonly characterised by application-specific heterogeneity. Systems may contain multiple dissimilar processing elements, non-standard memory architectures, and custom hardware elements. The programming of such systems is a considerable challenge, not only because of the need to exploit large degrees of parallelism but also because hardware architectures change from system to system. To solve this problem, this paper proposes the novel combination of a new industry standard for communication across multicore architectures (MCAPI), with a minimal-overhead technique for targeting complex architectures with standard programming languages (Compile-Time Virtualisation). The Multicore Association have proposed MCAPI as an industry standard for on-chip communications. MCAPI abstracts the on-chip physical communication to provide the application with logical point-to-point unidirectional channels between nodes (software thread, hardware core, etc.). Compile-Time Virtualisation is used to provide an extremely lightweight implementation of MCAPI, that supports a much wider range of architectures than its specification normally considers. Overall, this unique combination enhances programmability by abstracting on-chip communication whilst also exposing critical parts of the target architecture to the programming language.
    Proceedings of the ACM SIGPLAN/SIGBED 2011 conference on Languages, compilers, and tools for embedded systems, LCTES 2011, Chicago, IL, USA, April 11-14, 2011; 04/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper gives an overview of the model-based hardware generation and programming approach proposed within the MADES project. MADES aims to develop a model-driven development process for safety-critical, real-time embedded systems. MADES defines a systems modelling language based on subsets of MARTE and SysML that allows iterative refinement from high-level specification down to final implementation. The MADES project specifically focusses on three unique features which differentiate it from existing model-driven development frameworks. First, model transformations in the Epsilon modelling framework are used to move between system models and provide traceability. Second, the Zot verification tool is employed to allow early and frequent verification of the system being developed. Third, Compile-Time Virtualisation is used to automatically retarget architecturally-neutral software for execution on complex embedded architectures. This paper concentrates on MADES's approach to the specification of hardware and the way in which software is refactored by Compile-Time Virtualisation.
  • Source
    2nd Workshop on Model Based Engineering for Embedded Systems Design (M-BED2011); 01/2011
  • Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Superscalar out-of-order CPU designs can achieve higher performance than simpler in-order designs through exploitation of instruction-level parallelism in software. However, these CPU designs are often considered to be unsuitable for hard real-time systems because of the difficulty of guaranteeing the worst-case execution time (WCET) of software. This paper proposes and evaluates modifications for a superscalar out-of-order CPU core to allow instruction-level parallelism to be exploited without sacrificing time predictability and support for WCET analysis. Experiments using the M5 O3 CPU simulator show that WCETs can be two-four times smaller than those obtained using an idealized in-order CPU design, as instruction-level parallelism is exploited without compromising timing safety.
    IEEE Transactions on Computers 09/2010; 59:1210-1223. DOI:10.1109/TC.2010.109 · 1.47 Impact Factor
  • Source
    Ian Gray, Neil C Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: As their complexity grows, the architectures of embedded systems are becoming increasingly parallel. However, the frameworks used to assist development on highly-parallel general-purpose systems (such as CORBA or MPI) are too heavyweight for use on the non-standard architectures of embedded systems. They introduce significant overheads due to the lack of architectural and structural information contained within most programming languages. Specifically, thread migration across irregular architectures can lead to very poor memory access times, and unconstrained cache coherency cannot scale to cope with large systems. This paper introduces an approach to solving these prob-lems in a scalable way with minimal run-time overhead by using the concept of 'Islands of Coherency'. Cooperating threads are grouped into clusters along with the data that they use. These clusters can then be efficiently mapped to the target architecture, utilising migration only in the areas where the programmer explicitly declares it. This is supported through the use of an existing technique called Compile-Time Virtualisation (CTV). CTV does not support run-time dynamism, so it is extended to allow the implementation of Islands of Coherency. The presented sys-tem is evaluated experimentally through implementation on an FPGA platform. Simulation-based results are also pre-sented that show the potential that this approach has for increasing the performance of future embedded systems.
  • Source
    A J Wellings, A H Malik, N C Audsley, A Burns
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time systems are finding it difficult to make the shift from single processor systems to multiprocessors because of the lack of support from programming platforms for multiprocessors. Although, Ada provides some support for SMPs, it's goal is to hide the complexity of the architectures so that the programmers are not distracted by low-level architectural issues. This paper argues that programmer should be given enough visibility to use the underlying architecture predictably and efficiently. We focus on the issue of memory management and memory accesses on a cc-NUMA architecture. A cc-NUMA architecture is chosen, as we believe it to be more scalable than SMP systems.
    ACM SIGAda Ada Letters 05/2010; 30(1). DOI:10.1145/1806546.1806560