Neil C. Audsley

CUNY Graduate Center, New York, New York, United States

Are you Neil C. Audsley?

Claim your profile

Publications (115)8.74 Total impact

  • Gary Plumbridge, Neil C. Audsley, Ian Gray
    IET Computers & Digital Techniques 01/2015; 9(1):82-92. DOI:10.1049/iet-cdt.2014.0070 · 0.36 Impact Factor
  • Ian Gray, Yu Chan, Neil C. Audsley, Andy Wellings
    [Show abstract] [Hide abstract]
    ABSTRACT: Existing programming models for distributed and cloud-based systems tend to abstract away from the architectures of individual target nodes, concentrating instead on higher-level issues of algorithm representation (MapReduce etc.). However, as programmers begin to tackle the issue of Big Data, increasing data volumes are forcing developers to reconsider this approach and to optimise their software heavily. JUNIPER is an EU-funded project which assists Big Data developers to create architecture-aware software in a way that is suitable for the target domain, and provides higher performance, portability, and real-time guarantees.
  • Gary Plumbridge, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: FPGAs enable NoC architecture experimentation, although to be effective they need to be supported by tools and frameworks for construction of the NoC and effective software programming of the NoC. In this paper, we focus upon effective programming of the NoC using Java, complementing previous work which proposes the Blueshell framework for NoC generation for FPGAs. The approach taken is called Network-Chi, providing a number of key extensions to the Chi Java compiler. This includes provision of a networking API within Java giving a mesh based abstraction for network communication, allowing the programmer to send Java objects to other nodes without consideration for the underlying hardware topology or protocols; and a region-based memory management API that enables the definition of transient allocation contexts that discard all objects allocated within them when they reach the end of execution. Results show the approach taken to be efficient and effective.
    2013 International Conference on ReConFigurable Computing and FPGAs (ReConFig); 12/2013
  • Gary Plumbridge, Neil Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper discusses a strategy for translating the Java programming language to a form that is suitable for execution on resource limited embedded systems such as softcore processors in FPGAs, Network-on-Chip nodes and microcontrollers. The translation strategy prioritises the minimisation of runtime memory usage, generated code size, and suitability for a wide range of small architectures over other desirable goals such as execution speed and strict adherence to the Java standard. The translation procedure, or Concrete Hardware Implementation of a software application first converts the application's compiled Java class files to a self-contained intermediate representation conducive to optimisation and refactoring. The intermediate format is then serialised into a programming language compilable to the target architecture. This paper presents techniques for analysing whole Java applications, translating Java methods and building a stand-alone translated application with the same functional behaviour as the original Java. An example C-code generator is described and evaluated against similar previous approaches. An existing benchmark application, JavaBenchEmbedded, is demonstrated to require less than 30KiB of program code and 16KiB of runtime heap memory when executing on a Xilinx MicroBlaze Processor.
    2012 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC); 07/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a multitasking scratchpad memory reuse scheme (MSRS) for the dynamic partitioning of scratchpad memory between tasks in a preemptive multitasking system. We specify a means to compute the worst-case response time (WCRT) and schedulability of task sets executed using MSRS. Our scratchpad-related preemption delay (SRPD) is an analog of cache-related preemption delay (CRPD), proposed in previous work as a way to compute the worst-case cost imposed upon a preempted task by preemption in a multitasking system. Unlike CRPD, however, SRPD is independent of the number of tasks and the local memory size. We compare SRPD with CRPD by experiment and determine that neither dominates the other, i.e. either may be better for certain task sets. However, MSRS leads to improved schedulability versus cache when contention for local memory space is high, either because the local memory size is small, or because the task set is large, provided that the cost of loading blocks from external memory to scratchpad is similar to the cost of loading blocks into cache.
    Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd; 01/2012
  • Jack Whitham, Neil Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Scratchpad memory (SPM) provides a predictable and energy efficient way to store program instructions and data. It would be ideal for embedded real-time systems if not for the practical difficulty that most programs have to be modified in source or binary form in order to use it effectively. This modification process is called partitioning, and it splits a large program into sub-units called regions that are small enough to be stored in SPM. Earlier papers on this subject have only considered regions formed around program structures, such as loops, methods and even entire tasks. Region formation and SPM allocation are performed in two separate steps. This is an approximation that does not make best use of SPM. In this paper, we propose a k-partitioning algorithm as a new way to solve the problem. This allows us to carry out region formation and SPM allocation simultaneously. We can generate optimal partitions for programs expressed either as call trees or by a restricted form of control-flow graph (CFG). We show that this approach obtains superior results to the previous two-step approach. We apply our algorithm to various programs and SPM sizes and show that it reduces the execution time cost for executing those programs relative to execution with cache.
    Real-Time Systems (ECRTS), 2012 24th Euromicro Conference on; 01/2012
  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes Carousel, a mechanism to manage local memory space, i.e. cache or scratch pad memory (SPM), such that inter-task interference is completely eliminated. The cost of saving and restoring the local memory state across context switches is explicitly handled by the preempting task, rather than being imposed implicitly on preempted tasks. Unlike earlier attempts to eliminate inter-task interference, Carousel allows each task to use as much local memory space as it requires, permitting the approach to scale to large numbers of tasks. Carousel is experimentally evaluated using a simulator. We demonstrate that preemption has no effect on task execution times, and that the Carousel technique compares well to the conventional approach to handling interference, where worst-case interference costs are simply added to the worst-case execution times (WCETs) of lower-priority tasks.
    01/2012; 13(4s):3-12. DOI:10.1109/RTAS.2012.19
  • Ian Gray, N.C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes Anvil J, a novel technology developed to assist the development of software for predictable, embedded applications. In particular, the work focuses on the complexities of programming for heterogeneous embedded systems in an industrial context, in which the need for predictability is an important requirement. Anvil J converts architecturally-neutral Java code into a set of target-specific programs, automatically distributing the input software over the heterogeneous target architecture whilst ensuring preservation of predictability. During translation it generates a low-to zero-overhead runtime that is tailored to the specific combination of input application and target system, thereby ensuring maximum efficiency. Anvil J uses a technique called Compile-Time Virtualisation that allows it to work with existing compilers and removes the need for language extensions which can hinder certification efforts.
    Real-Time and Embedded Technology and Applications Symposium (RTAS), 2012 IEEE 18th; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: MARTE has matured into a substantial industrially relevant profile that extends UML expressive power to support the specification and design of embedded systems. When supported by appropriate model transformation and code generation tools, MARTE forms an appropriate starting point for embedded system development. In this paper we propose a simpler yet less powerful subset of MARTE, targeted at multiprocessor systems and amenable to early analysis (including timing) of design alternatives before committing to a particular design for implementation. We use the proposed subset of MARTE constructs to generate abstract simulation and real-time schedulability analysis models, allowing both average and worst-case performance metrics to be considered when comparing multiple design alternatives.
    Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2012 7th International Workshop on; 01/2012
  • N.C. Audsley, I. Gray, A. Acquaviva, R. Haines
    [Show abstract] [Hide abstract]
    ABSTRACT: Run-time platform variability presents a number of challenges to the system software in order that a run-time environment is presented to applications that sufficiently masks dynamic platform variability (including fabrication variability), whilst allowing applications to tune overall system performance to exploit key aspects of dynamic energy usage and platform variability. The approach taken within the Touchmore project is to model key aspects of the platform in order that performance and variability can be understood and exploited by the system software. In turn, the system software (comprising OS and run-time) utilises the model so that aspects of variability and energy usage are abstracted from the platform, then monitored and controlled in order to meet policy goals, eg. energy minimisation. This paper documents aspects of the modeling and system software structure to show how the Touchmore project is managing energy and platform variability using customisation of the application, system software and toolchain.
    High Level Design Validation and Test Workshop (HLDVT), 2012 IEEE International; 01/2012
  • I. Gray, N.C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Multiprocessor Systems-on-Chip (MPSoC)-based platforms are becoming more common in the embedded domain. Such systems are a significant deviation from the homogeneous, uniprocessor architectures that have been traditionally employed by embedded designers, thereby making the software development process to effectively target the platform more challenging. Low-resource embedded systems rely on efficient implementations that are not well supported by traditional solutions based on architecture virtualisation or middleware. Within this paper we examine these challenges and discuss ways in which they can be mitigated. In particular, we focus on the contributions made by two recent approaches based on Model-Driven Engineering (MDE). We also discuss challenges for future research.
    Rapid System Prototyping (RSP), 2012 23rd IEEE International Symposium on; 01/2012
  • G. Plumbridge, N. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces Machine Java, a framework of classes for the Java programming language that enable the description of software for systems with heterogeneous processing elements (such as CPUs, microcontrollers and function accelerators). Intended for the behavioural description of embedded systems, Machine Java encapsulates both the data and control aspects of computation into `machine' objects that are appropriate for mapping onto architecturally diverse multiprocessors. System descriptions in Machine Java avoid the need for a separate programming language for each processing element, and makes explicit description of communications between processors unnecessary. Suitability for a wide variety of hardware platforms is enhanced by avoiding dependence on notions of shared memory or shared timing resources.
    Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2011 6th International Workshop on; 07/2011
  • Source
    Ian Gray, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Within the domain of embedded systems, hardware architectures are commonly characterised by application-specific heterogeneity. Systems may contain multiple dissimilar processing elements, non-standard memory architectures, and custom hardware elements. The programming of such systems is a considerable challenge, not only because of the need to exploit large degrees of parallelism but also because hardware architectures change from system to system. To solve this problem, this paper proposes the novel combination of a new industry standard for communication across multicore architectures (MCAPI), with a minimal-overhead technique for targeting complex architectures with standard programming languages (Compile-Time Virtualisation). The Multicore Association have proposed MCAPI as an industry standard for on-chip communications. MCAPI abstracts the on-chip physical communication to provide the application with logical point-to-point unidirectional channels between nodes (software thread, hardware core, etc.). Compile-Time Virtualisation is used to provide an extremely lightweight implementation of MCAPI, that supports a much wider range of architectures than its specification normally considers. Overall, this unique combination enhances programmability by abstracting on-chip communication whilst also exposing critical parts of the target architecture to the programming language.
    Proceedings of the ACM SIGPLAN/SIGBED 2011 conference on Languages, compilers, and tools for embedded systems, LCTES 2011, Chicago, IL, USA, April 11-14, 2011; 04/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper gives an overview of the model-based hardware generation and programming approach proposed within the MADES project. MADES aims to develop a model-driven development process for safety-critical, real-time embedded systems. MADES defines a systems modelling language based on subsets of MARTE and SysML that allows iterative refinement from high-level specification down to final implementation. The MADES project specifically focusses on three unique features which differentiate it from existing model-driven development frameworks. First, model transformations in the Epsilon modelling framework are used to move between system models and provide traceability. Second, the Zot verification tool is employed to allow early and frequent verification of the system being developed. Third, Compile-Time Virtualisation is used to automatically retarget architecturally-neutral software for execution on complex embedded architectures. This paper concentrates on MADES's approach to the specification of hardware and the way in which software is refactored by Compile-Time Virtualisation.
  • Source
    2nd Workshop on Model Based Engineering for Embedded Systems Design (M-BED2011); 01/2011
  • Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Superscalar out-of-order CPU designs can achieve higher performance than simpler in-order designs through exploitation of instruction-level parallelism in software. However, these CPU designs are often considered to be unsuitable for hard real-time systems because of the difficulty of guaranteeing the worst-case execution time (WCET) of software. This paper proposes and evaluates modifications for a superscalar out-of-order CPU core to allow instruction-level parallelism to be exploited without sacrificing time predictability and support for WCET analysis. Experiments using the M5 O3 CPU simulator show that WCETs can be two-four times smaller than those obtained using an idealized in-order CPU design, as instruction-level parallelism is exploited without compromising timing safety.
    IEEE Transactions on Computers 09/2010; 59:1210-1223. DOI:10.1109/TC.2010.109 · 1.47 Impact Factor
  • Source
    Ian Gray, Neil C Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: As their complexity grows, the architectures of embedded systems are becoming increasingly parallel. However, the frameworks used to assist development on highly-parallel general-purpose systems (such as CORBA or MPI) are too heavyweight for use on the non-standard architectures of embedded systems. They introduce significant overheads due to the lack of architectural and structural information contained within most programming languages. Specifically, thread migration across irregular architectures can lead to very poor memory access times, and unconstrained cache coherency cannot scale to cope with large systems. This paper introduces an approach to solving these prob-lems in a scalable way with minimal run-time overhead by using the concept of 'Islands of Coherency'. Cooperating threads are grouped into clusters along with the data that they use. These clusters can then be efficiently mapped to the target architecture, utilising migration only in the areas where the programmer explicitly declares it. This is supported through the use of an existing technique called Compile-Time Virtualisation (CTV). CTV does not support run-time dynamism, so it is extended to allow the implementation of Islands of Coherency. The presented sys-tem is evaluated experimentally through implementation on an FPGA platform. Simulation-based results are also pre-sented that show the potential that this approach has for increasing the performance of future embedded systems.
  • Source
    A J Wellings, A H Malik, N C Audsley, A Burns
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time systems are finding it difficult to make the shift from single processor systems to multiprocessors because of the lack of support from programming platforms for multiprocessors. Although, Ada provides some support for SMPs, it's goal is to hide the complexity of the architectures so that the programmers are not distracted by low-level architectural issues. This paper argues that programmer should be given enough visibility to use the underlying architecture predictably and efficiently. We focus on the issue of memory management and memory accesses on a cc-NUMA architecture. A cc-NUMA architecture is chosen, as we believe it to be more scalable than SMP systems.
    ACM SIGAda Ada Letters 05/2010; 30(1). DOI:10.1145/1806546.1806560
  • Source
    Michael Burke, Neil Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper examines the problem of introducing advanced forms of fault-tolerance via reconfiguration into safety-critical avionic systems. This is required to enable increased availability after fault occurrence in distributed integrated avionic systems(compared to static federated systems). The approach taken is to identify a migration path from current architectures to those that incorporate re-configuration to a lesser or greater degree. Other challenges identified include change of the development process; incremental and flexible timing and safety analyses; configurable kernels applicable for safety-critical systems.
  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: A combination of a scratchpad and scratchpad memory management unit (SMMU) has been proposed as a way to implement fast and time-predictable memory access operations in programs that use dynamic data structures. A memory access operation is time-predictable if its execution time is known or bounded-this is important within a hard real-time task so that the worst-case execution time (WCET) can be determined. However, the requirement for time-predictability does not remove the conventional requirement for efficiency: operations must be serviced as quickly as possible under worst-case conditions. This paper studies the capabilities of the SMMU when applied to a number of benchmark programs. A new allocation algorithm is proposed to dynamically manage the scratchpad space. In many cases,the SMMU vastly reduces the number of accesses to dynamic data structures stored in external memory along the worst-case execution path (WCEP). Across all the benchmarks,an average of 47% of accesses are rerouted to scratchpad, with nearly 100% for some programs. In previous scratchpad-based work, time-predictability could only be assured for these operations using external memory.The paper also examines situations in which the SMMU does not perform so well, and discusses how these could be addressed.
    16th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2010, Stockholm, Sweden, April 12-15, 2010; 01/2010