Peter Sewell's research while affiliated with University of Cambridge and other places

Publications (151)

Article
Despite significant progress in the verification of hypervisors, operating systems, and compilers, and in verification tooling, there exists a wide gap between the approaches used in verification projects and conventional development of systems software. We see two main challenges in bringing these closer together: verification handling the complex...
Preprint
Full-text available
Virtual memory is an essential mechanism for enforcing security boundaries, but its relaxed-memory concurrency semantics has not previously been investigated in detail. The concurrent systems code managing virtual memory has been left on an entirely informal basis, and OS and hypervisor verification has had to make major simplifying assumptions. We...
Article
Systems code often requires fine-grained control over memory layout and pointers, expressed using low-level ( e.g. , bitwise) operations on pointer values. Since these operations go beyond what basic pointer arithmetic in C allows, they are performed with the help of integer-pointer casts . Prior work has explored increasingly realistic memory obje...
Chapter
Full-text available
Memory safety bugs continue to be a major source of security vulnerabilities in our critical infrastructure. The CHERI project has proposed extending conventional architectures with hardware-supported capabilities to enable fine-grained memory protection and scalable compartmentalisation, allowing historically memory-unsafe C and C++ to be adapted...
Chapter
Full-text available
Virtual memory is an essential mechanism for enforcing security boundaries, but its relaxed-memory concurrency semantics has not previously been investigated in detail. The concurrent systems code managing virtual memory has been left on an entirely informal basis, and OS and hypervisor verification has had to make major simplifying assumptions. We...
Chapter
Architecture specifications such as Armv8-A and RISC-V are the ultimate foundation for software verification and the correctness criteria for hardware verification. They should define the allowed sequential and relaxed-memory concurrency behaviour of programs, but hitherto there has been no integration of full-scale instruction-set architecture (IS...
Book
Draft Technical Specification, proposed to the national bodies for standardization
Chapter
Full-text available
Computing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and “user-mode” concurre...
Chapter
Full-text available
C remains central to our infrastructure, making verification of C code an essential and much-researched topic, but the semantics of C is remarkably complex, and important aspects of it are still unsettled, leaving programmers and verification tool builders on shaky ground. This paper describes a tool, Cerberus-BMC, that for the first time provides...
Article
Total-system hardware and microarchitectural issues are becoming increasingly critical.
Conference Paper
The CHERI architecture allows pointers to be implemented as capabilities (rather than integer virtual addresses) in a manner that is compatible with, and strengthens, the semantics of the C language. In addition to the spatial protections offered by conventional fat pointers, CHERI capabilities offer strong integrity, enforced provenance validity,...
Article
Full-text available
The semantics of pointers and memory objects in C has been a vexed question for many years. C values cannot be treated as either purely abstract or purely concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations...
Article
Full-text available
Architecture specifications notionally define the fundamental interface between hardware and software: the envelope of allowed behaviour for processor implementations, and the basic assumptions for software development and verification. But in practice, they are typically prose and pseudocode documents, not rigorous or executable artifacts, leaving...
Article
Conventional computer engineering relies on test-and-debug development processes, with the behavior of common interfaces described (at best) with prose specification documents. But prose specifications cannot be used in test-and-debug development in any automated way, and prose is a poor medium for expressing complex (and loose) specifications. The...
Article
ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes cou...
Conference Paper
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for softwar...
Article
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for softwar...
Article
Full-text available
Beneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking hello_world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language resea...
Conference Paper
The C/C++11 concurrency model balances two goals: it is relaxed enough to be efficiently implementable and (leaving aside the ``thin-air'' problem) it is strong enough to give useful guarantees to programmers. It is mathematically precise and has been used in verification research and compiler testing. However, the model is expressed in an axiomati...
Article
The C/C++11 concurrency model balances two goals: it is relaxed enough to be efficiently implementable and (leaving aside the ``thin-air'' problem) it is strong enough to give useful guarantees to programmers. It is mathematically precise and has been used in verification research and compiler testing. However, the model is expressed in an axiomati...
Conference Paper
Full-text available
Beneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking hello_world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language resea...
Article
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-pro...
Conference Paper
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-pro...
Conference Paper
Despite much research on concurrent programming languages, especially for Java and C/C++, we still do not have a satisfactory definition of their semantics, one that admits all common optimisations without also admitting undesired behaviour. Especially problematic are the "thin-Air" examples involving high-performance concurrent accesses, such as C...
Conference Paper
In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable behaviour is, and thereby to support future work on formal verification, analysis, and testing of concur...
Conference Paper
In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable behaviour is, and thereby to support future work on formal verification, analysis, and testing of concur...
Article
Despite much research on concurrent programming languages, especially for Java and C/C++, we still do not have a satisfactory definition of their semantics, one that admits all common optimisations without also admitting undesired behaviour. Especially problematic are the ``thin-air'' examples involving high-performance concurrent accesses, such as...
Article
Full-text available
Beneath the surface, software usually depends on com- plex linker behaviour to work as intended. Even linking hello _ world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language r...
Conference Paper
Full-text available
Weakly consistent multiprocessors such as ARM and IBM POWER have been with us for decades, but their subtle programmer-visible concurrency behaviour remains challenging, both to implement and to use; the traditional architecture documentation, with its mix of prose and pseudocode, leaves much unclear. In this paper we show how a precise architectur...
Conference Paper
Systems depend critically on the behaviour of file systems, but that behaviour differs in many details, both between implementations and between each implementation and the POSIX (and other) prose specifications. Building robust and portable software requires understanding these details and differences, but there is currently no good way to systema...
Conference Paper
Despite decades of research, we do not have a satisfactory concurrency semantics for any general-purpose programming language that aims to support concurrent systems code. The Java Memory Model has been shown to be unsound with respect to standard compiler optimisations, while the C/C++11 model is too weak, admitting undesirable thin-air executions...
Code
This tarball contains the source code of nqsb-TLS, as well as collected session traces, generated certificates, a virtual machine image of the Bitcoin Piñata etc.
Article
Full-text available
Recent years have seen remarkable successes in rigorous engineering: using mathematically rigorous semantic models (not just idealised calculi) of real-world processors, programming languages, protocols, and security mechanisms, for testing, proof, analysis, and design. Building these models is challenging, requiring experimentation, dialogue with...
Article
Full-text available
Recent years have seen remarkable successes in rigorous engineering: using mathematically rigorous semantic models (not just idealised calculi) of real-world processors, programming languages, protocols, and security mechanisms, for testing, proof, analysis, and design. Building these models is challenging, requiring experimentation, dialogue with...
Article
Software and other digital artifacts are amongst the most valuable contributions of computer science. Yet our conferences treat these mostly as second-class artifacts---especially conferences in the software sciences, which ought to know better. This ...
Article
Full-text available
In this article, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation on x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behavior of the hardware, the effects of compiler optimization on concurrent code...
Conference Paper
Even with the assistance of computer tools, the formalized de-scription and verification of research-level mathematics remains a daunting task, not least because of the talent with which mathema-ticians combine diverse theories to achieve their ends. ...
Article
We rely on a computational infrastructure that is a densely interwined mass of software and hardware: programming languages, network protocols, operating systems, and processors. It has accumulated great complexity, from a combination of engineering design decisions, contingent historical choices, and sheer scale, yet it is defined at best by prose...
Conference Paper
Concurrency theory and real-world multiprocessors have developed in parallel for the last 50 years, from their beginnings in the mid 1960s. Both have been very productive: concurrency theory has given us a host of models, calculi, and proof techniques, while engineered multiprocessors are now ubiquitous, from 2-8 core smartphones and laptops throug...
Conference Paper
The growing complexity of hardware optimizations employed by multiprocessors leads to subtle distinctions among allowed and disallowed behaviors, posing challenges in specifying their memory models formally and accurately, and in understanding and analyzing the behavior of concurrent software. This complexity is particularly evident in the IBM® Pow...
Article
Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it...
Article
Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it...
Article
We present a class of relaxed memory models, defined in Coq, parameterised by the chosen permitted local reorderings of reads and writes, and by the visibility of inter- and intra-processor communications through memory (e.g. store atomicity relaxation). We prove results on the required behaviour and placement of memory fences to restore a given mo...
Conference Paper
The upcoming C and C++ revised standards add concurrency to the languages, for the first time, in the form of a subtle *relaxed memory model* (the *C++11 model*). This aims to permit compiler optimisation and to accommodate the differing relaxed-memory behaviours of mainstream multiprocessors, combining simple semantics for most code with high-perf...
Conference Paper
The upcoming C and C++ revised standards add concurrency to the languages, for the first time, in the form of a subtle *relaxed memory model* (the *C++11 model*). This aims to permit compiler optimisation and to accommodate the differing relaxed-memory behaviours of mainstream multiprocessors, combining simple semantics for most code with high-perf...
Conference Paper
Many ITP developments exist in the context of a single prover, and are dominated by proof effort. In contrast, when applying rigorous semantic techniques to realistic computer systems, engineering the definitions becomes a major activity in its own right. Proof is then only one task among many: testing, simulation, communication, community review,...
Conference Paper
Exploiting today's multiprocessors requires high-performance and correct concurrent systems code (optimising compilers, language runtimes, OS kernels, etc.), which in turn requires a good understanding of the observable processor behaviour that can be relied on. Unfortunately this critical hardware/software interface is not at all clear for several...
Conference Paper
Shared memory multiprocessors typically expose subtle, poorly understood and poorly specified relaxed-memory semantics to programmers. To understand them, and to develop formal models to use in program verification, we find it essential to take an empirical approach, testing what results parallel programs can actually produce when executed on the h...
Conference Paper
Full-text available
Shared-memory concurrency in C and C++ is pervasive in systems programming, but has long been poorly defined. This motivated an ongoing shared effort by the standards committees to specify concurrent behaviour in the next versions of both languages. They aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed...
Conference Paper
The formal verification of programs has progressed tremendously in the last decade. In this talk, I review some of the obstacles that [6, 8, 15, 18] remain to be lifted before source-level verification tools can be taken really seriously in the critical ...
Article
Full-text available
Shared-memory concurrency in C and C++ is pervasive in systems programming, but has long been poorly defined. This motivated an ongoing shared effort by the standards committees to specify concurrent behaviour in the next versions of both languages. They aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed...
Conference Paper
Full-text available
In this paper, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation above x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behaviour of the hardware, the effects of compiler optimisation on concurrent co...
Article
Full-text available
In this paper, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation above x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behaviour of the hardware, the effects of compiler optimisation on concurrent co...
Conference Paper
We present a class of relaxed memory models, defined in Coq, parameterised by the chosen permitted local reorderings of reads and writes, and the visibility of inter- and intra-processor communications through memory (e.g. store atomicity relaxation) We prove results on the required behaviour and placement of memory fences to restore a given model...
Conference Paper
Multiprocessors are now ubiquitous. They provide an abstraction of shared memory, accessible by concurrently executing threads, which supports a wide range of software. However, exactly what this key abstraction is —what the hardware designers implement, and what programmers can depend on— is surprisingly elusive. In 1979, when articulating the not...
Article
Multiprocessors are now ubiquitous. They provide an abstraction of shared memory, accessible by concurrently executing threads, which supports a wide range of software. However, exactly what this key abstraction is -- what the hardware designers implement, and what programmers can depend on -- is surprisingly elusive. In 1979, when articulating the...
Article
Exploiting the multiprocessors that have recently become ubiquitous requires high-performance and reliable concurrent systems code, for concurrent data structures, operating system kernels, synchronization libraries, compilers, and so on. However, concurrent programming, which is always challenging, is made much more so by two problems. First, real...
Article
Full-text available
Mobile computation, in which executing computations can move from one physical computing device to another, is a recurring theme: from OS process migration, to language-level mobility, to virtual machine migration. This article reports on the design, implementation, and verification of overlay networks to support reliable communication between migr...
Article
Full-text available
Semantic definitions of full-scale programming languages are rarely given, despite the many potential benefits. Partly this is because the available metalanguages for expressing semantics – usually either for informal mathematics or the formal mathematics of a proof assistant – make it much harder than necessary to work with large definitions. We p...
Article
We develop a rigorous semantics for Power and ARM multiprocessor programs, including their relaxed memory model and the behaviour of reasonable fragments of their instruction sets. The semantics is mechanised in the HOL proof assistant. This should provide a good basis for informal reasoning and formal verification of low-level code for these weakl...
Conference Paper
Real multiprocessors do not provide the sequentially consis- tent memory that is assumed by most work on semantics and verifica- tion. Instead, they have relaxed memory models, typically described in ambiguous prose, which lead to widespread confusion. These are prime targets for mechanized formalization. In previous work we produced a rig- orous x...
Conference Paper
Multiprocessors are now dominant, but real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, usually described only in ambiguous prose, leading to widespread confusion. We develop a rigorous and accurate semantics f...
Conference Paper
We develop a rigorous semantics for Power and ARM multiprocessor programs, including their relaxed memory model and the behaviour of reasonable fragments of their instruction sets. The semantics is mechanised in the HOL proof assistant. This should provide a good basis for informal reasoning and formal verification of low-level code for these weakl...
Article
Representing binding structures is a central challenge in mechanizing the theory of programming languages. Two facets of the problem should be distinguished: on the one hand, one needs a specification language for express the complex binding structures found in realistic programming languages, while on the other hand one needs an implementation tec...
Article
Real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have relaxed memory models, typically described in ambiguous prose, which lead to widespread confusion. These are prime targets for mechanized formalization. In previous work we produced a rigorous x86-CC...
Article
Full-text available
Abstract Most programming languages adopt static binding, but for distributed programming an exclusive reliance on static binding,is too restrictive: dynamic,binding is required in various guises, for example, when a marshalled value is received from the network, containing identifiers that must be rebound to local resources. Typically, it is provi...
Conference Paper
Despite more then 30 years of research on protocol specification, the major protocols deployed in the Internet, such as TCP, are described only in informal prose RFCs and executable code. In part this is because the scale and complexity of these protocols makes them challenging targets for formalization. In this paper we show how these difficulties...
Conference Paper
Java has no module system. Its packages only subdivide the class name space, allowing only a very limited form of component-level information hiding and reuse. Two Java Community Processes have started addressing this problem: one describes the runtime system and has reached an early draft stage, while the other considers the developer's view and o...
Article
It is rare to give a semantic definition of a full-scale programming language, despite the many potential benefits. Partly this is because the available metalanguages for expressing semantics - usually either L a T E X for informal mathematics, or the formal mathematics of a proof assistant - make it much harder than necessary to work with large de...
Conference Paper
Java has no module system. Its packages only subdivide the class name space, allowing only a very limited form of component-level information hiding and reuse. Two Java Community Processes have started addressing this problem: one describes the runtime system and has reached an early draft stage, while the other considers the developer's view and o...
Article
Existing languages provide good support for typeful programming of standalone programs. In a distributed system, however, there may be interaction between multiple instances of many distinct programs, sharing some (but not necessarily all) of their module structure, and with some instances rebuilt with new versions of certain modules as time goes o...
Conference Paper
Abstractó This paper reports on an experiment in network protocol design: we use novel rigorous techniques in the design process of a new protocol, in a close collaboration between systems and theory researchers. The protocol is a Media Access Control (MAC) protocol for the SWIFT optical network, which uses optical switching and wavelength striping...
Conference Paper
Existing ML-like languages guarantee type-safety, ensuring mem- ory safety and protecting the invariants of abstract types, but only within single executions of single programs. Distributed program- ming is becoming ever more important, and should benet even more from such guarantees. In previous work on theoretical cal- culi and the Acute prototyp...
Article
Full-text available
This note reflects, from a Process Calculus point of view, on lessons learned during research modelling the real-world TCP and UDP network protocols.
Chapter
The dynamics of many calculi can be most clearly defined by a reduction semantics. To work with a calculus, however, an understanding of operational congruences is fundamental; these can often be given tractable definitions or characterisations using a labelled transition semantics. This paper considers calculi with arbitrary reduction semantics of...
Chapter
The concurrency theory literature contains many proposals for models of process algebras. We consider an example application of the -calculus, the programming language Pict of Pierce and Turner, primarily in order to see how far it is possible to argue, from facts about the application, that some model is the most appropriate. We discuss informally...
Article
The TCP/IP protocols and Sockets API underlie much of modern computation, but their semantics have historically been very complex and ill-defined. The real standard is the de facto one of the common implementations, including, for example, the 15,000--20,000 lines of C in the BSD implementation. Dealing rigorously with the behaviour of such bodies...
Conference Paper
The TCP/IP protocols and Sockets API underlie much of mod- ern computation, but their semantics have historically been very complex and ill-defined. The real standard is the de facto one of the common implementations, including, for example, the 15 000- 20 000 lines of C in the BSD implementation. Dealing rigorously with the behaviour of such bodie...
Conference Paper
How close are we to a world in which mechanically verified software is commonplace? A world in which theorem proving technology is used routinely by both software developers and programming language researchers alike? One crucial step towards achieving these goals is mech- anized reasoning about language metatheory. The time has come to bring toget...
Conference Paper
Full-text available
Network protocols are hard to implement correctly. Despite the existence of RFCs and other standards, implementations often have subtle differences and bugs. One reason for this is that the specications are typically informal, and hence inevitably contain ambiguities. Conformance testing against such specications is challenging. In this paper we pr...
Conference Paper
Existing languages provide good support for typeful programming of standalone programs. In a distributed system, however, there may be interaction between multiple instances of many distinct programs, sharing some (but not necessarily all) of their module structure, and with some instances rebuilt with new versions of certain modules as time goes o...
Conference Paper
How close are we to a world where every paper on programming languages is accompanied by an electronic appendix with machine-checked proofs? We propose an initial set of benchmarks for measuring progress in this area. Based on the metatheory of System F< :, a typed lambda-calculus with second-order polymorphism, subtyping, and records, these benchm...
Article
In this paper we consider low-latency connection-based anonymity systems which can be used for applications like web browsing or SSH. Although several such systems have been designed and built, their anonymity has so far not been adequately evaluated. We analyse the anonymity of connection-based systems against global passive adversaries. We give a...
Conference Paper
Dynamic software updates can be used to fix bugs or add features to a running program without downtime. Essential for some applications and convenient for others, low-level dynamic updating has been used for many years. Perhaps surprisingly, there is little high-level understanding or language support to help programmers write dynamic updates effec...
Conference Paper
Existing languages provide good support for typeful programming of standalone programs. In a distributed system, however, there may be interaction between multiple instances of many distinct programs, sharing some (but not necessarily all) of their module structure, and with some instances rebuilt with new versions of certain modules as time goes o...
Conference Paper
Full-text available
Dynamic software updates can be used to fix bugs or add features to a running program without downtime. Essential for some applications and convenient for others, low-level dynamic updating has been used for many years. Perhaps surprisingly, there is little high-level understanding or language support to help programmers write dynamic updates effec...
Conference Paper
We study the specification of access control policy in large-scale distributed systems. We present Cassandra, a language and system for expressing policy, and the results of a substantial case study, a security policy for a national electronic health record system, based on the requirements for the ongoing UK National Health Service procurement exe...
Conference Paper
We study the specification of access control policy in large-scale distributed systems. Our work on real-world policies has shown that standard policy idioms such as role hierarchy or role delegation occur in practice in many subtle variants. A policy specification language should therefore be able to express this variety of features smoothly, rath...
Article
We study the specification of access control policy in large-scale distributed systems. We present Cassandra, a language and system for expressing policy, and the results of a substantial case study, a security policy for a national Electronic Health Record system, based on the requirements for the ongoing UK National Health Service procurement exe...
Conference Paper
In this paper we consider low latency connection-based anonymity systems which can be used for applications like web browsing or SSH. Although several such systems have been designed and built, their anonymity has so far not been adequately evaluated. We analyse the anonymity of connection-based systems against passive adversaries. We give a preci...
Article
Full-text available
Most programming languages adopt static binding, but for distributed programming an exclusive reliance on static binding is too restrictive: dynamic binding is required in various guises, for example when a marshalled value is received from the network, containing identifiers that must be rebound to local resources. Typically it is provided only by...
Article
In this paper we consider low latency connection-based anonymity system which can be used for applications like web browsing or SSH. Although several such systems have been designed and built, their anonymity has so far not been adequately evaluated.
Article
We discuss the design of programming languages for distributed computation, focussing on support for type-safe marshalling of arbitrary language values. In particular: (1) unmarshalling can involve rebinding to local resources; (2) values of abstract types can be communicated, and a globally-coherent notion of type equality ensures that unmarshalli...