Neil C. Audsley

The University of York, York, ENG, United Kingdom

Are you Neil C. Audsley?

Claim your profile

Publications (103)4.7 Total impact

  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes Carousel, a mechanism to manage local memory space, i.e. cache or scratch pad memory (SPM), such that inter-task interference is completely eliminated. The cost of saving and restoring the local memory state across context switches is explicitly handled by the preempting task, rather than being imposed implicitly on preempted tasks. Unlike earlier attempts to eliminate inter-task interference, Carousel allows each task to use as much local memory space as it requires, permitting the approach to scale to large numbers of tasks. Carousel is experimentally evaluated using a simulator. We demonstrate that preemption has no effect on task execution times, and that the Carousel technique compares well to the conventional approach to handling interference, where worst-case interference costs are simply added to the worst-case execution times (WCETs) of lower-priority tasks.
    ACM Transactions on Embedded Computing Systems (TECS). 01/2012; 13(4s):3-12.
  • [Show abstract] [Hide abstract]
    ABSTRACT: MARTE has matured into a substantial industrially relevant profile that extends UML expressive power to support the specification and design of embedded systems. When supported by appropriate model transformation and code generation tools, MARTE forms an appropriate starting point for embedded system development. In this paper we propose a simpler yet less powerful subset of MARTE, targeted at multiprocessor systems and amenable to early analysis (including timing) of design alternatives before committing to a particular design for implementation. We use the proposed subset of MARTE constructs to generate abstract simulation and real-time schedulability analysis models, allowing both average and worst-case performance metrics to be considered when comparing multiple design alternatives.
    Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2012 7th International Workshop on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a multitasking scratchpad memory reuse scheme (MSRS) for the dynamic partitioning of scratchpad memory between tasks in a preemptive multitasking system. We specify a means to compute the worst-case response time (WCRT) and schedulability of task sets executed using MSRS. Our scratchpad-related preemption delay (SRPD) is an analog of cache-related preemption delay (CRPD), proposed in previous work as a way to compute the worst-case cost imposed upon a preempted task by preemption in a multitasking system. Unlike CRPD, however, SRPD is independent of the number of tasks and the local memory size. We compare SRPD with CRPD by experiment and determine that neither dominates the other, i.e. either may be better for certain task sets. However, MSRS leads to improved schedulability versus cache when contention for local memory space is high, either because the local memory size is small, or because the task set is large, provided that the cost of loading blocks from external memory to scratchpad is similar to the cost of loading blocks into cache.
    Real-Time Systems Symposium (RTSS), 2012 IEEE 33rd; 01/2012
  • Jack Whitham, N. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Scratchpad memory (SPM) provides a predictable and energy efficient way to store program instructions and data. It would be ideal for embedded real-time systems if not for the practical difficulty that most programs have to be modified in source or binary form in order to use it effectively. This modification process is called partitioning, and it splits a large program into sub-units called regions that are small enough to be stored in SPM. Earlier papers on this subject have only considered regions formed around program structures, such as loops, methods and even entire tasks. Region formation and SPM allocation are performed in two separate steps. This is an approximation that does not make best use of SPM. In this paper, we propose a k-partitioning algorithm as a new way to solve the problem. This allows us to carry out region formation and SPM allocation simultaneously. We can generate optimal partitions for programs expressed either as call trees or by a restricted form of control-flow graph (CFG). We show that this approach obtains superior results to the previous two-step approach. We apply our algorithm to various programs and SPM sizes and show that it reduces the execution time cost for executing those programs relative to execution with cache.
    Real-Time Systems (ECRTS), 2012 24th Euromicro Conference on; 01/2012
  • G. Plumbridge, N. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces Machine Java, a framework of classes for the Java programming language that enable the description of software for systems with heterogeneous processing elements (such as CPUs, microcontrollers and function accelerators). Intended for the behavioural description of embedded systems, Machine Java encapsulates both the data and control aspects of computation into `machine' objects that are appropriate for mapping onto architecturally diverse multiprocessors. System descriptions in Machine Java avoid the need for a separate programming language for each processing element, and makes explicit description of communications between processors unnecessary. Suitability for a wide variety of hardware platforms is enhanced by avoiding dependence on notions of shared memory or shared timing resources.
    Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2011 6th International Workshop on; 07/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper gives an overview of the model-based hardware generation and programming approach proposed within the MADES project. MADES aims to develop a model-driven development process for safety-critical, real-time embedded systems. MADES defines a systems modelling language based on subsets of MARTE and SysML that allows iterative refinement from high-level specification down to final implementation. The MADES project specifically focusses on three unique features which differentiate it from existing model-driven development frameworks. First, model transformations in the Epsilon modelling framework are used to move between system models and provide traceability. Second, the Zot verification tool is employed to allow early and frequent verification of the system being developed. Third, Compile-Time Virtualisation is used to automatically retarget architecturally-neutral software for execution on complex embedded architectures. This paper concentrates on MADES's approach to the specification of hardware and the way in which software is refactored by Compile-Time Virtualisation.
    2012 IEEE 15th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops. 03/2011;
  • Source
    2nd Workshop on Model Based Engineering for Embedded Systems Design (M-BED2011); 01/2011
  • Source
    Ian Gray, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Within the domain of embedded systems, hardware architectures are commonly characterised by application-specific heterogeneity. Systems may contain multiple dissimilar processing elements, non-standard memory architectures, and custom hardware elements. The programming of such systems is a considerable challenge, not only because of the need to exploit large degrees of parallelism but also because hardware architectures change from system to system. To solve this problem, this paper proposes the novel combination of a new industry standard for communication across multicore architectures (MCAPI), with a minimal-overhead technique for targeting complex architectures with standard programming languages (Compile-Time Virtualisation). The Multicore Association have proposed MCAPI as an industry standard for on-chip communications. MCAPI abstracts the on-chip physical communication to provide the application with logical point-to-point unidirectional channels between nodes (software thread, hardware core, etc.). Compile-Time Virtualisation is used to provide an extremely lightweight implementation of MCAPI, that supports a much wider range of architectures than its specification normally considers. Overall, this unique combination enhances programmability by abstracting on-chip communication whilst also exposing critical parts of the target architecture to the programming language.
    Proceedings of the ACM SIGPLAN/SIGBED 2011 conference on Languages, compilers, and tools for embedded systems, LCTES 2011, Chicago, IL, USA, April 11-14, 2011; 01/2011
  • Source
    Ian Gray, Neil C Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: As their complexity grows, the architectures of embedded systems are becoming increasingly parallel. However, the frameworks used to assist development on highly-parallel general-purpose systems (such as CORBA or MPI) are too heavyweight for use on the non-standard architectures of embedded systems. They introduce significant overheads due to the lack of architectural and structural information contained within most programming languages. Specifically, thread migration across irregular architectures can lead to very poor memory access times, and unconstrained cache coherency cannot scale to cope with large systems. This paper introduces an approach to solving these prob-lems in a scalable way with minimal run-time overhead by using the concept of 'Islands of Coherency'. Cooperating threads are grouped into clusters along with the data that they use. These clusters can then be efficiently mapped to the target architecture, utilising migration only in the areas where the programmer explicitly declares it. This is supported through the use of an existing technique called Compile-Time Virtualisation (CTV). CTV does not support run-time dynamism, so it is extended to allow the implementation of Islands of Coherency. The presented sys-tem is evaluated experimentally through implementation on an FPGA platform. Simulation-based results are also pre-sented that show the potential that this approach has for increasing the performance of future embedded systems.
    06/2010;
  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper shows that a program using a time- predictable memory system for data storage can achieve a similar worst-case execution time (WCET) to the average-case execution time (ACET) using a conventional heuristic-based memory system including a data cache. This result is useful within any embedded system where time-predictability and performance are both important, particularly hard real-time systems carrying out intensive data processing activities. It is a counter-example to the conventional wisdom that time-predictable means "slow" in comparison to ACET-focused heuristics. To carry out the investigation, 36 "memory access mod- els" are derived from benchmark programs and assumed to be representative of typical code. The models gener- ate LOAD/STORE instructions to exercise a data cache or scratchpad memory management unit (SMMU). The ACET is determined for the data cache and the WCET is deter- mined for the SMMU. After improvements are applied, re- sults show that the SMMU WCET is within 5% of the data cache ACET for 34 models. In 16 of 36 cases, the SMMU WCET is better than the data cache ACET.
    22nd Euromicro Conference on Real-Time Systems, ECRTS 2010, Brussels, Belgium, July 6-9, 2010; 01/2010
  • Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Superscalar out-of-order CPU designs can achieve higher performance than simpler in-order designs through exploitation of instruction-level parallelism in software. However, these CPU designs are often considered to be unsuitable for hard real-time systems because of the difficulty of guaranteeing the worst-case execution time (WCET) of software. This paper proposes and evaluates modifications for a superscalar out-of-order CPU core to allow instruction-level parallelism to be exploited without sacrificing time predictability and support for WCET analysis. Experiments using the M5 O3 CPU simulator show that WCETs can be two-four times smaller than those obtained using an idealized in-order CPU design, as instruction-level parallelism is exploited without compromising timing safety.
    IEEE Transactions on Computers 01/2010; 59:1210-1223. · 1.38 Impact Factor
  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: A combination of a scratchpad and scratchpad memory management unit (SMMU) has been proposed as a way to implement fast and time-predictable memory access operations in programs that use dynamic data structures. A memory access operation is time-predictable if its execution time is known or bounded-this is important within a hard real-time task so that the worst-case execution time (WCET) can be determined. However, the requirement for time-predictability does not remove the conventional requirement for efficiency: operations must be serviced as quickly as possible under worst-case conditions. This paper studies the capabilities of the SMMU when applied to a number of benchmark programs. A new allocation algorithm is proposed to dynamically manage the scratchpad space. In many cases,the SMMU vastly reduces the number of accesses to dynamic data structures stored in external memory along the worst-case execution path (WCEP). Across all the benchmarks,an average of 47% of accesses are rerouted to scratchpad, with nearly 100% for some programs. In previous scratchpad-based work, time-predictability could only be assured for these operations using external memory.The paper also examines situations in which the SMMU does not perform so well, and discusses how these could be addressed.
    16th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2010, Stockholm, Sweden, April 12-15, 2010; 01/2010
  • Source
    A J Wellings, A H Malik, N C Audsley, A Burns
    [Show abstract] [Hide abstract]
    ABSTRACT: Real-time systems are finding it difficult to make the shift from single processor systems to multiprocessors because of the lack of support from programming platforms for multiprocessors. Although, Ada provides some support for SMPs, it's goal is to hide the complexity of the architectures so that the programmers are not distracted by low-level architectural issues. This paper argues that programmer should be given enough visibility to use the underlying architecture predictably and efficiently. We focus on the issue of memory management and memory accesses on a cc-NUMA architecture. A cc-NUMA architecture is chosen, as we believe it to be more scalable than SMP systems.
    ACM SIGAda Ada Letters. 01/2010; 30(1).
  • Source
    Ke Yu, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Transaction Level Modelling (TLM) is an emerging design approach to accelerate Electronic System Level (ESL) design. A virtual TLM prototype of an embedded system is an integration of computation and communication. Currently, TLM communication and hardware modelling has been well discussed and standardised. However, there still exist problems in the domain of TLM for software computation modelling and simulation. In this paper, we aim to propose some appropriate real-time software models from the perspective of TLM software modelling. They are compatible with current TLM modelling concepts and able to be combined with existing TLM communication models. In addition, we implement a software Processing Element (PE) model which effectively integrates mixed timing RTOS-centric software models, abstract processor hardware functions, and OSCI TLM-2.0 communication interfaces.
    10th IEEE International Conference on Computer and Information Technology, CIT 2010, Bradford, West Yorkshire, UK, June 29-July 1, 2010; 01/2010
  • N. Gasson, N. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Most existing approaches to targeting high-level software to FPGAs are based on extensions to C and do not map easily to the features and characteristics of modern FPGAs. These include massive parallelism and a variety of complex IP-blocks (eg. RAMs, DSPs). In this paper we discuss a hardware implementation of SR, a software language with first class concurrency and high-level IPC.We show that the language model can be implemented efficiently on an FPGA, and that it provides a natural means to encapsulate FPGA resources. We compare against a commercial C-based synthesis tool and achieve similar resource usage using a more expressive language.
    Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on; 10/2009
  • Source
    Ian Gray, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: The architectures of embedded systems are often application-specific, containing multiple heterogenous cores, non-uniform memory, on-chip networks and custom hardware elements (e.g. DSP cores). Standard programming languages do not use these many of these features natively because they assume a traditional single processor and a single logical address space abstraction that hides these architectural details. This paper describes Compile-Time Virtualisation, a technique which uses a virtualisation layer to map software onto the target architecture whilst allowing the programmer to control the virtualisation mappings in order to effectively exploit custom architectures.
    Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES 2009, Grenoble, France, October 11-16, 2009; 01/2009
  • Source
    Jack Whitham, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: Scratchpads have been widely proposed as an alternative to caches for embedded systems. Advantages of scratchpads in- clude reduced energy consumption in comparison to a cache and access latencies that are independent of the preceding memory access pattern. The latter property makes memory accesses time-predictable, which is useful for hard real-time tasks as the worst-case execution time (WCET) must be safely estimated in order to check that the system will meet timing requirements. However, data must be explicitly moved between scratch- pad and external memory as a task executes in order to make best use of the limited scratchpad space. When dy- namic data is moved, issues such as pointer aliasing and pointer invalidation become problematic. Previous work has proposed solutions that are not suitable for hard real-time tasks because memory accesses are not time-predictable. This paper proposes the scratchpad memory management unit (SMMU) as an enhancement to scratchpad technol- ogy. The SMMU implements an alternative solution to the pointer aliasing and pointer invalidation problems which (1) does not require whole-program pointer analysis and (2) makes every memory access operation time-predictable. This allows WCET analysis to be applied to hard-real time tasks which use a scratchpad and dynamic data, but results are also applicable in the wider context of minimizing en- ergy consumption or average execution time. Experiments using C software show that the combination of an SMMU and scratchpad compares favorably with the best and worst case performance of a conventional data cache.
    Proceedings of the 9th ACM & IEEE International conference on Embedded software, EMSOFT 2009, Grenoble, France, October 12-16, 2009; 01/2009
  • Source
    Jack Whitham, Neil C. Audsley, Martin Schoeberl
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes hardware methods, a lightweight and platform-independent scheme for linking real-time Java code to co-processors implemented using a hardware description language (HDL). Intended for use in embedded systems, hardware methods have similar semantics to the native meth- ods used to interface Java code to legacy C/C++ software, but are also time-predictable, facilitating accurate worst- case execution time (WCET) analysis. By reference to several examples, the paper demonstrates the applicability of hardware methods and shows that they can (1) reduce the WCET of embedded real-time Java, and (2) improve the quality of WCET estimates in the presence of infeasible paths.
    Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems, JTRES 2009, Madrid, Spain, September 23-25, 2009; 01/2009
  • Source
    Ke Yu, Neil C. Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: System-level software modeling and simulation have become important techniques for real-time embedded system early design space exploration. However, the timing accuracy issues have not been solved well in current methods, which produce unrealistic results or large simulation overheads. In this paper, we propose a mixed timing modeling and simulation approach to decouple conventionally interdependent software timing modeling and simulation into two separate phases. This approach enables (1) mixed software timing information granularities and annotation methods at the modeling stage for performance and accuracy trade-off (2) good software preemption and hardware interrupt handling timing accuracy at the simulation stage without sacrificing simulation performance (3) varying system run-time status observability and simulation speed for efficiency trade-off. Experiments demonstrate that our approach has flexible simulation performance trade-offs and good simulation timing accuracy. The measured results indicate that hardware interruption and software preemption problems are also solved by our approach.
    International Conference on Embedded Software and Systems, ICESS '09, Hangzhou, Zhejiang, P. R. China, May 25-27, 2009.; 01/2009
  • Source
    Jack Whitham, Neil Audsley
    [Show abstract] [Hide abstract]
    ABSTRACT: It is notoriously difficult to model superscalar out-of-order CPUs for the purposes of worst-case execution time (WCET) analysis, which can force the use of simpler CPUs in hard real-time systems. To address this problem, it has been suggested that traces could be used to capture the timing properties of a complex CPU operation scheduler as it runs a sequence of basic blocks. In previous work, traces have been implemented using application-specific microcode. This paper proposes restrictions to a dynamic superscalar out-of-order CPU to implement virtual traces. These have the same timing properties as the traces in previous work, but microcode is not used. Instead, CPU modifications implement the same functionality. This allows traces to be used throughout a program because space requirements are minimal. To take advantage of this, a new allocation algorithm is proposed and evaluated for virtual traces.
    Embedded and Real-Time Computing Systems and Applications, 2008. RTCSA '08. 14th IEEE International Conference on; 09/2008