Jade Alglave

University College London, Londinium, England, United Kingdom

Are you Jade Alglave?

Claim your profile

Publications (28)6.59 Total impact

  • Source
    Richard Bornat · Jade Alglave · Matthew Parkinson
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe a program logic for weak memory (also known as relaxed memory). The logic is based on Hoare logic within a thread, and rely/guarantee between threads. It is presented via examples, giving proofs of many weak-memory litmus tests. It extends to coherence but not yet to synchronised assignment (compare-and-swap, load-logical/store-conditional). It deals with conditionals and loops but not yet arrays or heap. The logic uses a version of Hoare logic within threads, and a version of rely/guarantee between threads, with five stability rules to handle various kinds of parallelism (external, internal, propagation-free and two kinds of in-flight parallelism). There are $\mathbb{B}$ and $\mathbb{U}$ modalities to regulate propagation, and temporal modalities $\mathsf{since}$, $\mathbb{S}\mathsf{ofar}$ and $\mathbb{O}\mathsf{uat}$ to deal with global coherence (SC per location). The logic is presented by example. Proofs and unproofs of about thirty weak-memory examples, including many litmus tests in various guises, are dealt with in detail. There is a proof of a version of the token ring.
    Full-text · Article · Dec 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming-often supported by official tutorials-as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy.
    No preview · Article · May 2015 · ACM SIGPLAN Notices

  • No preview · Article · Mar 2015 · ACM SIGARCH Computer Architecture News
  • Source
    Alex Horn · Jade Alglave
    [Show abstract] [Hide abstract]
    ABSTRACT: Concurrent Kleene Algebra (CKA) by Tony Hoare et al. is an algebraic structure that unifies the laws of concurrent programming. The unifying power of CKA rests largely on the so called exchange law that describes how concurrent and sequential composition operators can be interchanged. This paper constructs a partial order model of CKA including its exchange law. The existence of such a model is particularly relevant when we want to disprove properties about concurrent programs thereby possibly facilitating the analysis of real world bugs.
    Preview · Article · Jul 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern architectures rely on memory fences to prevent undesired weakenings of memory consistency between threads. As both the semantics of the program under these architectures and the semantics of these fences may be subtle, the automation of their placement is highly desirable. However, precise methods to restore strong consistency do not scale to the size of deployed systems code. We choose to trade some precision for genuine scalability: we present a novel technique suitable for interprocedural analysis of large code bases. We implement this method in our new musketeer tool, and detail experiments on more than 350 executables of packages found in a Debian Linux distribution, e.g. memcached (about 10000 LoC).
    Preview · Article · Dec 2013
  • Source
    Jade Alglave · Luc Maranget · Michael Tautschnig
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose an axiomatic generic framework for modelling weak memory. We show how to instantiate this framework for SC, TSO, C++ restricted to release-acquire atomics, and Power. For Power, we compare our model to a preceding operational model in which we found a flaw. To do so, we define an operational model that we show equivalent to our axiomatic model. We also propose a model for ARM. Our testing on this architecture revealed a behaviour later acknowledged as a bug by ARM, and more recently 33 additional anomalies. We offer a new simulation tool, called herd, which allows the user to specify the model of his choice in a concise way. Given a specification of a model, the tool becomes a simulator for that model. The tool relies on an axiomatic description; this choice allows us to outperform all previous simulation tools. Additionally, we confirm that verification time is vastly improved, in the case of bounded model-checking. Finally, we put our models in perspective, in the light of empirical data obtained by analysing the C and C++ code of a Debian Linux distribution. We present our new analysis tool, called mole, which explores a piece of code to find the weak memory idioms that it uses.
    Preview · Article · Aug 2013 · ACM SIGPLAN Notices
  • Jade Alglave · Daniel Kroening · Michael Tautschnig
    [Show abstract] [Hide abstract]
    ABSTRACT: The number of interleavings of a concurrent program makes automatic analysis of such software very hard. Modern multiprocessors' execution models make this problem even harder. Modelling program executions with partial orders rather than interleavings addresses both issues: we obtain an efficient encoding into integer difference logic for bounded model checking that enables first-time formal verification of deployed concurrent systems code. We implemented the encoding in the CBMC tool and present experiments over a wide range of memory models, including SC, Intel x86 and IBM Power. Our experiments include core parts of PostgreSQL, the Linux kernel and the Apache HTTP server.
    No preview · Conference Paper · Jul 2013
  • Source
    Jade Alglave · Daniel Kroening · Michael Tautschnig
    [Show abstract] [Hide abstract]
    ABSTRACT: The vast number of interleavings that a concurrent program can have is typically identified as the root cause of the difficulty of automatic analysis of concurrent software. Weak memory is generally believed to make this problem even harder. We address both issues by modelling programs' executions with partial orders rather than the interleaving semantics (SC). We implemented a software analysis tool based on these ideas. It scales to programs of sufficient size to achieve first-time formal verification of non-trivial concurrent systems code over a wide range of models, including SC, Intel x86 and IBM Power.
    Preview · Article · Jan 2013
  • Conference Paper: Herding Cats
    Jade Alglave · Luc Maranget · Michael Tautschnig

    No preview · Conference Paper · Jan 2013
  • Jade Alglave
    [Show abstract] [Hide abstract]
    ABSTRACT: We present in this paper a formal generic framework, implemented in the Coq proof assistant, for defining and reasoning about weak memory models. We first present the three axioms of our framework, with several examples as illustration and justification. Then we show how to implement several existing weak memory models in our framework, and prove formally that our implementation is equivalent to the native definition for each of these models.
    No preview · Article · Oct 2012 · Formal Methods in System Design
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite multiprocessors implementing weak memory models, verification methods often assume Sequential Consistency (SC), thus may miss bugs due to weak memory. We propose a sound transformation of the program to verify, enabling SC tools to perform verification w.r.t. weak memory. We present experiments for a broad variety of models (from x86/TSO to Power/ARM) and a vast range of verification tools, quantify the additional cost of the transformation and highlight the cases when we can drastically reduce it. Our benchmarks include work-queue management code from PostgreSQL.
    Preview · Article · Jul 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The growing complexity of hardware optimizations employed by multiprocessors leads to subtle distinctions among allowed and disallowed behaviors, posing challenges in specifying their memory models formally and accurately, and in understanding and analyzing the behavior of concurrent software. This complexity is particularly evident in the IBM® Power Architecture®, for which a faithful specification was published only in 2011 using an operational style. In this paper we present an equivalent axiomatic specification, which is more abstract and concise. Although not officially sanctioned by the vendor, our results indicate that this axiomatic specification provides a reasonable basis for reasoning about current IBM® POWER® multiprocessors. We establish the equivalence of the axiomatic and operational specifications using both manual proof and extensive testing. To demonstrate that the constraint-based style of axiomatic specification is more amenable to computer-aided verification, we develop a SAT-based tool for evaluating possible outcomes of multi-threaded test programs, and we show that this tool is significantly more efficient than a tool based on an operational specification.
    Preview · Conference Paper · Jul 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it remains surprisingly unclear exactly what the programmer can depend on. This paper studies relaxed-memory synchronisation. On the hardware side, we give a clear semantic characterisation of the load-reserve/store-conditional primitives as provided by POWER multiprocessors, for the first time since they were introduced 20 years ago; we cover their interaction with relaxed loads, stores, barriers, and dependencies. Our model, while not officially sanctioned by the vendor, is validated by extensive testing, comparing actual implementation behaviour against an oracle generated from the model, and by detailed discussion with IBM staff. We believe the ARM semantics to be similar. On the software side, we prove sound a proposed compilation scheme of the C/C++ synchronisation constructs to POWER, including C/C++ spinlock mutexes, fences, and read-modify-write operations, together with the simpler atomic operations for which soundness is already known from our previous work; this is a first step in verifying concurrent algorithms that use load-reserve/store-conditional with respect to a realistic semantics. We also build confidence in the C/C++ model in its own terms, fixing some omissions and contributing to the C standards committee adoption of the C++11 concurrency model.
    Preview · Article · Jun 2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM (R) POWER (R), ARM, or C/C++, it remains surprisingly unclear exactly what the programmer can depend on. This paper studies relaxed-memory synchronisation. On the hardware side, we give a clear semantic characterisation of the load-reserve/store-conditional primitives as provided by POWER multiprocessors, for the first time since they were introduced 20 years ago; we cover their interaction with relaxed loads, stores, barriers, and dependencies. Our model, while not officially sanctioned by the vendor, is validated by extensive testing, comparing actual implementation behaviour against an oracle generated from the model, and by detailed discussion with IBM staff. We believe the ARM semantics to be similar. On the software side, we prove sound a proposed compilation scheme of the C/C++ synchronisation constructs to POWER, including C/C++ spinlock mutexes, fences, and read-modify-write operations, together with the simpler atomic operations for which soundness is already known from our previous work; this is a first step in verifying concurrent algorithms that use load-reserve/store-conditional with respect to a realistic semantics. We also build confidence in the C/C++ model in its own terms, fixing some omissions and contributing to the C standards committee adoption of the C++11 concurrency model.
    No preview · Article · Jun 2012 · ACM SIGPLAN Notices
  • Source
    Jade Alglave · Luc Maranget · Susmit Sarkar · Peter Sewell
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a class of relaxed memory models, defined in Coq, parameterised by the chosen permitted local reorderings of reads and writes, and by the visibility of inter- and intra-processor communications through memory (e.g. store atomicity relaxation). We prove results on the required behaviour and placement of memory fences to restore a given model (such as Sequential Consistency) from a weaker one. Based on this class of models we develop a tool, diy, that systematically and automatically generates and runs litmus tests. These tests can be used to explore the behaviour of processor implementations and the behaviour of models, and hence to compare the two against each other. We detail the results of experiments on Power and a model we base on them.
    Preview · Article · Apr 2012 · Formal Methods in System Design
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern multi-core microprocessors implement weak memory consistency models; programming for these architectures is a challenge. This paper solves a problem open for ten years, and originally posed by Rinard: we identify sufficient conditions for a data flow analysis to be sound w.r.t. weak memory models. We first identify a class of analyses that are sound, and provide a formal proof of soundness at the level of trace semantics. Then we discuss how analyses unsound with respect to weak memory models can be repaired via a fixed point iteration, and provide experimental data on the runtime overhead of this method.
    Full-text · Conference Paper · Dec 2011
  • Source
    Jade Alglave · Luc Maranget
    [Show abstract] [Hide abstract]
    ABSTRACT: Concurrent programs running on weak memory models exhibit relaxed behaviours, making them hard to understand and to debug. To use standard verification techniques on such programs, we can force them to behave as if running on a Sequentially Consistent (SC) model. Thus, we examine how to constrain the behaviour of such programs via synchronisation to ensure what we call their stability, i.e. that they behave as if they were running on a stronger model than the actual one, e.g. SC. First, we define sufficient conditions ensuring stability to a program, and show that Power’s locks and read-modify-write primitives meet them. Second, we minimise the amount of required synchronisation by characterising which parts of a given execution should be synchronised. Third, we characterise the programs stable from a weak architecture to SC. Finally, we present our offence tool which places either lock-based or lock-free synchronisation in a x86 or Power program to ensure its stability.
    Preview · Conference Paper · Jul 2011
  • Source
    Jade Alglave · Assia Mahboubi
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes Coq libraries devoted to the semantic of relaxed memory models. These libraries formalise a framework which covers a large class of industrial models. Implementing this framework inside a proof assistant has significantly helped improving its design and crafting the most concise and relevant specifications. Similarly the use of a proof assistant has been instrumental in the study of the semantic of synchronisation primitives, which we illustrate by the formal proof of a barrier placement theorem. We explain the choices we made to re-design our Coq libraries, and in particular what we gained from adopting a small-scale reflection methodology.
    Preview · Article · Jun 2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Exploiting today's multiprocessors requires high-performance and correct concurrent systems code (optimising compilers, language runtimes, OS kernels, etc.), which in turn requires a good understanding of the observable processor behaviour that can be relied on. Unfortunately this critical hardware/software interface is not at all clear for several current multiprocessors. In this paper we characterise the behaviour of IBM POWER multiprocessors, which have a subtle and highly relaxed memory model (ARM multiprocessors have a very similar architecture in this respect). We have conducted extensive experiments on several generations of processors: POWER G5, 5, 6, and 7. Based on these, on published details of the microarchitectures, and on discussions with IBM staff, we give an abstract-machine semantics that abstracts from most of the implementation detail but explains the behaviour of a range of subtle examples. Our semantics is explained in prose but defined in rigorous machine-processed mathematics; we also confirm that it captures the observable processor behaviour, or the architectural intent, for our examples with an executable checker. While not officially sanctioned by the vendor, we believe that this model gives a reasonable basis for reasoning about current POWER multiprocessors. Our work should bring new clarity to concurrent systems programming for these architectures, and is a necessary precondition for any analysis or verification. It should also inform the design of languages such as C and C++, where the language memory model is constrained by what can be efficiently compiled to such multiprocessors.
    Preview · Conference Paper · Jun 2011
  • Source
    Jade Alglave · Luc Maranget · Susmit Sarkar · Peter Sewell
    [Show abstract] [Hide abstract]
    ABSTRACT: Shared memory multiprocessors typically expose subtle, poorly understood and poorly specified relaxed-memory semantics to programmers. To understand them, and to develop formal models to use in program verification, we find it essential to take an empirical approach, testing what results parallel programs can actually produce when executed on the hardware. We describe a key ingredient of our approach, our litmus tool, which takes small ‘litmus test’ programs and runs them for many iterations to find interesting behaviour. It embodies various techniques for making such interesting behaviour appear more frequently.
    Preview · Conference Paper · Mar 2011

Publication Stats

306 Citations
6.59 Total Impact Points

Institutions

  • 2013
    • University College London
      Londinium, England, United Kingdom
  • 2012
    • Queen Mary, University of London
      Londinium, England, United Kingdom
  • 2011-2012
    • University of Oxford
      • Department of Computer Science
      Oxford, England, United Kingdom
  • 2010
    • National Institute for Research in Computer Science and Control
      Le Chesney, Île-de-France, France