Peter Sewell's research while affiliated with University of Cambridge and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (151)
Despite significant progress in the verification of hypervisors, operating systems, and compilers, and in verification tooling, there exists a wide gap between the approaches used in verification projects and conventional development of systems software. We see two main challenges in bringing these closer together: verification handling the complex...
Virtual memory is an essential mechanism for enforcing security boundaries, but its relaxed-memory concurrency semantics has not previously been investigated in detail. The concurrent systems code managing virtual memory has been left on an entirely informal basis, and OS and hypervisor verification has had to make major simplifying assumptions. We...
Systems code often requires fine-grained control over memory layout and pointers, expressed using low-level ( e.g. , bitwise) operations on pointer values. Since these operations go beyond what basic pointer arithmetic in C allows, they are performed with the help of integer-pointer casts . Prior work has explored increasingly realistic memory obje...
Memory safety bugs continue to be a major source of security vulnerabilities in our critical infrastructure. The CHERI project has proposed extending conventional architectures with hardware-supported capabilities to enable fine-grained memory protection and scalable compartmentalisation, allowing historically memory-unsafe C and C++ to be adapted...
Virtual memory is an essential mechanism for enforcing security boundaries, but its relaxed-memory concurrency semantics has not previously been investigated in detail. The concurrent systems code managing virtual memory has been left on an entirely informal basis, and OS and hypervisor verification has had to make major simplifying assumptions.
We...
Architecture specifications such as Armv8-A and RISC-V are the ultimate foundation for software verification and the correctness criteria for hardware verification. They should define the allowed sequential and relaxed-memory concurrency behaviour of programs, but hitherto there has been no integration of full-scale instruction-set architecture (IS...
Draft Technical Specification, proposed to the national bodies for standardization
Computing relies on architecture specifications to decouple hardware and software development. Historically these have been prose documents, with all the problems that entails, but research over the last ten years has developed rigorous and executable-as-test-oracle specifications of mainstream architecture instruction sets and “user-mode” concurre...
C remains central to our infrastructure, making verification of C code an essential and much-researched topic, but the semantics of C is remarkably complex, and important aspects of it are still unsettled, leaving programmers and verification tool builders on shaky ground. This paper describes a tool, Cerberus-BMC, that for the first time provides...
Total-system hardware and microarchitectural issues are becoming increasingly critical.
The CHERI architecture allows pointers to be implemented as capabilities (rather than integer virtual addresses) in a manner that is compatible with, and strengthens, the semantics of the C language. In addition to the spatial protections offered by conventional fat pointers, CHERI capabilities offer strong integrity, enforced provenance validity,...
The semantics of pointers and memory objects in C has been a vexed question for many years. C values cannot be treated as either purely abstract or purely concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations...
Architecture specifications notionally define the fundamental interface between hardware and software: the envelope of allowed behaviour for processor implementations, and the basic assumptions for software development and verification. But in practice, they are typically prose and pseudocode documents, not rigorous or executable artifacts, leaving...
Conventional computer engineering relies on test-and-debug development processes, with the behavior of common interfaces described (at best) with prose specification documents. But prose specifications cannot be used in test-and-debug development in any automated way, and prose is a poor medium for expressing complex (and loose) specifications.
The...
ARM has a relaxed memory model, previously specified in informal prose for ARMv7 and ARMv8. Over time, and partly due to work building formal semantics for ARM concurrency, it has become clear that some of the complexity of the model is not justified by the potential benefits. In particular, the model was originally non-multicopy-atomic: writes cou...
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for softwar...
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for softwar...
Beneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking hello_world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language resea...
The C/C++11 concurrency model balances two goals: it is relaxed enough to be efficiently implementable and (leaving aside the ``thin-air'' problem) it is strong enough to give useful guarantees to programmers. It is mathematically precise and has been used in verification research and compiler testing. However, the model is expressed in an axiomati...
The C/C++11 concurrency model balances two goals: it is relaxed enough to be efficiently implementable and (leaving aside the ``thin-air'' problem) it is strong enough to give useful guarantees to programmers. It is mathematically precise and has been used in verification research and compiler testing. However, the model is expressed in an axiomati...
Beneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking hello_world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language resea...
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-pro...
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-pro...
Despite much research on concurrent programming languages, especially for Java and C/C++, we still do not have a satisfactory definition of their semantics, one that admits all common optimisations without also admitting undesired behaviour. Especially problematic are the "thin-Air" examples involving high-performance concurrent accesses, such as C...
In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable behaviour is, and thereby to support future work on formal verification, analysis, and testing of concur...
In this paper we develop semantics for key aspects of the ARMv8 multiprocessor architecture: the concurrency model and much of the 64-bit application-level instruction set (ISA). Our goal is to clarify what the range of architecturally allowable behaviour is, and thereby to support future work on formal verification, analysis, and testing of concur...
Despite much research on concurrent programming languages, especially for Java and C/C++, we still do not have a satisfactory definition of their semantics, one that admits all common optimisations without also admitting undesired behaviour. Especially problematic are the ``thin-air'' examples involving high-performance concurrent accesses, such as...
Beneath the surface, software usually depends on com- plex linker behaviour to work as intended. Even linking hello _ world.c is surprisingly involved, and systems software such as libc and operating system kernels rely on a host of linker features. But linking is poorly understood by working programmers and has largely been neglected by language r...
Weakly consistent multiprocessors such as ARM and IBM POWER have been with us for decades, but their subtle programmer-visible concurrency behaviour remains challenging, both to implement and to use; the traditional architecture documentation, with its mix of prose and pseudocode, leaves much unclear.
In this paper we show how a precise architectur...
Systems depend critically on the behaviour of file systems, but that behaviour differs in many details, both between implementations and between each implementation and the POSIX (and other) prose specifications. Building robust and portable software requires understanding these details and differences, but there is currently no good way to systema...
Despite decades of research, we do not have a satisfactory concurrency semantics for any general-purpose programming language that aims to support concurrent systems code. The Java Memory Model has been shown to be unsound with respect to standard compiler optimisations, while the C/C++11 model is too weak, admitting undesirable thin-air executions...
This tarball contains the source code of nqsb-TLS, as well as collected session traces, generated certificates, a virtual machine image of the Bitcoin Piñata etc.
Recent years have seen remarkable successes in rigorous engineering: using mathematically rigorous semantic models (not just idealised calculi) of real-world processors, programming languages, protocols, and security mechanisms, for testing, proof, analysis, and design. Building these models is challenging, requiring experimentation, dialogue with...
Recent years have seen remarkable successes in rigorous engineering: using mathematically rigorous semantic models (not just idealised calculi) of real-world processors, programming languages, protocols, and security mechanisms, for testing, proof, analysis, and design. Building these models is challenging, requiring experimentation, dialogue with...
Software and other digital artifacts are amongst the most valuable contributions of computer science. Yet our conferences treat these mostly as second-class artifacts---especially conferences in the software sciences, which ought to know better. This ...
In this article, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation on x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behavior of the hardware, the effects of compiler optimization on concurrent code...
Even with the assistance of computer tools, the formalized de-scription and verification of research-level mathematics remains a daunting task, not least because of the talent with which mathema-ticians combine diverse theories to achieve their ends. ...
We rely on a computational infrastructure that is a densely interwined mass of software and hardware: programming languages, network protocols, operating systems, and processors. It has accumulated great complexity, from a combination of engineering design decisions, contingent historical choices, and sheer scale, yet it is defined at best by prose...
Concurrency theory and real-world multiprocessors have developed in parallel for the last 50 years, from their beginnings in the mid 1960s. Both have been very productive: concurrency theory has given us a host of models, calculi, and proof techniques, while engineered multiprocessors are now ubiquitous, from 2-8 core smartphones and laptops throug...
The growing complexity of hardware optimizations employed by multiprocessors leads to subtle distinctions among allowed and disallowed behaviors, posing challenges in specifying their memory models formally and accurately, and in understanding and analyzing the behavior of concurrent software. This complexity is particularly evident in the IBM® Pow...
Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it...
Shared memory concurrency relies on synchronisation primitives: compare-and-swap, load-reserve/store-conditional (aka LL/SC), language-level mutexes, and so on. In a sequentially consistent setting, or even in the TSO setting of x86 and Sparc, these have well-understood semantics. But in the very relaxed settings of IBM®, POWER®, ARM, or C/C++, it...
We present a class of relaxed memory models, defined in Coq, parameterised by the chosen permitted local reorderings of reads and writes, and by the visibility of inter- and intra-processor communications through memory (e.g. store atomicity relaxation). We prove results on the required behaviour and placement of memory fences to restore a given mo...
The upcoming C and C++ revised standards add concurrency to the languages, for the first time, in the form of a subtle *relaxed memory model* (the *C++11 model*). This aims to permit compiler optimisation and to accommodate the differing relaxed-memory behaviours of mainstream multiprocessors, combining simple semantics for most code with high-perf...
The upcoming C and C++ revised standards add concurrency to the languages, for the first time, in the form of a subtle *relaxed memory model* (the *C++11 model*). This aims to permit compiler optimisation and to accommodate the differing relaxed-memory behaviours of mainstream multiprocessors, combining simple semantics for most code with high-perf...
Many ITP developments exist in the context of a single prover, and are dominated by proof effort. In contrast, when applying rigorous semantic techniques to realistic computer systems, engineering the definitions becomes a major activity in its own right. Proof is then only one task among many: testing, simulation, communication, community review,...
Exploiting today's multiprocessors requires high-performance and correct concurrent systems code (optimising compilers, language runtimes, OS kernels, etc.), which in turn requires a good understanding of the observable processor behaviour that can be relied on. Unfortunately this critical hardware/software interface is not at all clear for several...
Shared memory multiprocessors typically expose subtle, poorly understood and poorly specified relaxed-memory semantics to
programmers. To understand them, and to develop formal models to use in program verification, we find it essential to take
an empirical approach, testing what results parallel programs can actually produce when executed on the h...
Shared-memory concurrency in C and C++ is pervasive in systems programming, but has long been poorly defined. This motivated an ongoing shared effort by the standards committees to specify concurrent behaviour in the next versions of both languages. They aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed...
The formal verification of programs has progressed tremendously in the last decade. In this talk, I review some of the obstacles that [6, 8, 15, 18] remain to be lifted before source-level verification tools can be taken really seriously in the critical ...
Shared-memory concurrency in C and C++ is pervasive in systems programming, but has long been poorly defined. This motivated an ongoing shared effort by the standards committees to specify concurrent behaviour in the next versions of both languages. They aim to provide strong guarantees for race-free programs, together with new (but subtle) relaxed...
In this paper, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation above x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behaviour of the hardware, the effects of compiler optimisation on concurrent co...
In this paper, we consider the semantic design and verified compilation of a C-like programming language for concurrent shared-memory computation above x86 multiprocessors. The design of such a language is made surprisingly subtle by several factors: the relaxed-memory behaviour of the hardware, the effects of compiler optimisation on concurrent co...
We present a class of relaxed memory models, defined in Coq, parameterised by the chosen permitted local reorderings of reads and writes, and the visibility of inter- and intra-processor communications through memory (e.g. store atomicity relaxation) We prove results on the required behaviour and placement of memory fences to restore a given model...
Multiprocessors are now ubiquitous. They provide an abstraction of shared memory, accessible by concurrently executing threads, which supports a wide range of software. However, exactly what this key abstraction is —what the hardware designers implement, and what programmers can depend on— is surprisingly elusive. In 1979, when articulating the not...
Multiprocessors are now ubiquitous. They provide an abstraction of shared memory, accessible by concurrently executing threads, which supports a wide range of software. However, exactly what this key abstraction is -- what the hardware designers implement, and what programmers can depend on -- is surprisingly elusive. In 1979, when articulating the...
Exploiting the multiprocessors that have recently become ubiquitous requires high-performance and reliable concurrent systems code, for concurrent data structures, operating system kernels, synchronization libraries, compilers, and so on. However, concurrent programming, which is always challenging, is made much more so by two problems. First, real...
Mobile computation, in which executing computations can move from one physical computing device to another, is a recurring theme: from OS process migration, to language-level mobility, to virtual machine migration. This article reports on the design, implementation, and verification of overlay networks to support reliable communication between migr...
Semantic definitions of full-scale programming languages are rarely given, despite the many potential benefits. Partly this is because the available metalanguages for expressing semantics – usually either for informal mathematics or the formal mathematics of a proof assistant – make it much harder than necessary to work with large definitions. We p...
We develop a rigorous semantics for Power and ARM multiprocessor programs, including their relaxed memory model and the behaviour of reasonable fragments of their instruction sets. The semantics is mechanised in the HOL proof assistant. This should provide a good basis for informal reasoning and formal verification of low-level code for these weakl...
Real multiprocessors do not provide the sequentially consis- tent memory that is assumed by most work on semantics and verifica- tion. Instead, they have relaxed memory models, typically described in ambiguous prose, which lead to widespread confusion. These are prime targets for mechanized formalization. In previous work we produced a rig- orous x...
Multiprocessors are now dominant, but real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, usually described only in ambiguous prose, leading to widespread confusion.
We develop a rigorous and accurate semantics f...
We develop a rigorous semantics for Power and ARM multiprocessor programs, including their relaxed memory model and the behaviour of reasonable fragments of their instruction sets. The semantics is mechanised in the HOL proof assistant. This should provide a good basis for informal reasoning and formal verification of low-level code for these weakl...
Representing binding structures is a central challenge in mechanizing the theory of programming languages. Two facets of the problem should be distinguished: on the one hand, one needs a specification language for express the complex binding structures found in realistic programming languages, while on the other hand one needs an implementation tec...
Real multiprocessors do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have relaxed memory models, typically described in ambiguous prose, which lead to widespread confusion. These are prime targets for mechanized formalization. In previous work we produced a rigorous x86-CC...
Abstract Most programming languages adopt static binding, but for distributed programming an exclusive reliance on static binding,is too restrictive: dynamic,binding is required in various guises, for example, when a marshalled value is received from the network, containing identifiers that must be rebound to local resources. Typically, it is provi...
Despite more then 30 years of research on protocol specification, the major protocols deployed in the Internet, such as TCP, are described only in informal prose RFCs and executable code. In part this is because the scale and complexity of these protocols makes them challenging targets for formalization.
In this paper we show how these difficulties...
Java has no module system. Its packages only subdivide the class name space, allowing only a very limited form of component-level information hiding and reuse. Two Java Community Processes have started addressing this problem: one describes the runtime system and has reached an early draft stage, while the other considers the developer's view and o...
It is rare to give a semantic definition of a full-scale programming language, despite the many potential benefits. Partly this is because the available metalanguages for expressing semantics - usually either L a T E X for informal mathematics, or the formal mathematics of a proof assistant - make it much harder than necessary to work with large de...
Java has no module system. Its packages only subdivide the class name space, allowing only a very limited form of component-level information hiding and reuse. Two Java Community Processes have started addressing this problem: one describes the runtime system and has reached an early draft stage, while the other considers the developer's view and o...
Existing languages provide good support for typeful programming of standalone programs. In a distributed system, however, there may be interaction between multiple instances of many distinct programs, sharing some (but not necessarily all) of their module structure, and with some instances rebuilt with new versions of certain modules as time goes o...
Abstractó This paper reports on an experiment in network protocol design: we use novel rigorous techniques in the design process of a new protocol, in a close collaboration between systems and theory researchers. The protocol is a Media Access Control (MAC) protocol for the SWIFT optical network, which uses optical switching and wavelength striping...
Existing ML-like languages guarantee type-safety, ensuring mem- ory safety and protecting the invariants of abstract types, but only within single executions of single programs. Distributed program- ming is becoming ever more important, and should benet even more from such guarantees. In previous work on theoretical cal- culi and the Acute prototyp...
This note reflects, from a Process Calculus point of view, on lessons learned during research modelling the real-world TCP and UDP network protocols.
The dynamics of many calculi can be most clearly defined by a reduction semantics. To work with a calculus, however, an understanding
of operational congruences is fundamental; these can often be given tractable definitions or characterisations using a labelled
transition semantics. This paper considers calculi with arbitrary reduction semantics of...
The concurrency theory literature contains many proposals for models of process algebras. We consider an example application of the -calculus, the programming language Pict of Pierce and Turner, primarily in order to see how far it is possible to argue, from facts about the application, that some model is the most appropriate. We discuss informally...
The TCP/IP protocols and Sockets API underlie much of modern computation, but their semantics have historically been very complex and ill-defined. The real standard is the de facto one of the common implementations, including, for example, the 15,000--20,000 lines of C in the BSD implementation. Dealing rigorously with the behaviour of such bodies...
The TCP/IP protocols and Sockets API underlie much of mod- ern computation, but their semantics have historically been very complex and ill-defined. The real standard is the de facto one of the common implementations, including, for example, the 15 000- 20 000 lines of C in the BSD implementation. Dealing rigorously with the behaviour of such bodie...
How close are we to a world in which mechanically verified software is commonplace? A world in which theorem proving technology is used routinely by both software developers and programming language researchers alike? One crucial step towards achieving these goals is mech- anized reasoning about language metatheory. The time has come to bring toget...
Network protocols are hard to implement correctly. Despite the existence of RFCs and other standards, implementations often have subtle differences and bugs. One reason for this is that the specications are typically informal, and hence inevitably contain ambiguities. Conformance testing against such specications is challenging. In this paper we pr...
Existing languages provide good support for typeful programming of standalone programs. In a distributed system, however, there may be interaction between multiple instances of many distinct programs, sharing some (but not necessarily all) of their module structure, and with some instances rebuilt with new versions of certain modules as time goes o...
How close are we to a world where every paper on programming languages is accompanied by an electronic appendix with machine-checked proofs?
We propose an initial set of benchmarks for measuring progress in this area. Based on the metatheory of System F< :, a typed lambda-calculus with second-order polymorphism, subtyping, and records, these benchm...
In this paper we consider low-latency connection-based anonymity systems which can be used for applications like web browsing or SSH. Although several such systems have been designed and built, their anonymity has so far not been adequately evaluated.
We analyse the anonymity of connection-based systems against global passive adversaries. We give a...
Dynamic software updates can be used to fix bugs or add features to a running program without downtime. Essential for some applications and convenient for others, low-level dynamic updating has been used for many years. Perhaps surprisingly, there is little high-level understanding or language support to help programmers write dynamic updates effec...
Existing languages provide good support for typeful programming of standalone programs. In a distributed system, however, there may be interaction between multiple instances of many distinct programs, sharing some (but not necessarily all) of their module structure, and with some instances rebuilt with new versions of certain modules as time goes o...
Dynamic software updates can be used to fix bugs or add features to a running program without downtime. Essential for some applications and convenient for others, low-level dynamic updating has been used for many years. Perhaps surprisingly, there is little high-level understanding or language support to help programmers write dynamic updates effec...
We study the specification of access control policy in large-scale distributed systems. We present Cassandra, a language and system for expressing policy, and the results of a substantial case study, a security policy for a national electronic health record system, based on the requirements for the ongoing UK National Health Service procurement exe...
We study the specification of access control policy in large-scale distributed systems. Our work on real-world policies has shown that standard policy idioms such as role hierarchy or role delegation occur in practice in many subtle variants. A policy specification language should therefore be able to express this variety of features smoothly, rath...
We study the specification of access control policy in large-scale distributed systems. We present Cassandra, a language and system for expressing policy, and the results of a substantial case study, a security policy for a national Electronic Health Record system, based on the requirements for the ongoing UK National Health Service procurement exe...
In this paper we consider low latency connection-based anonymity systems which can be used for applications like web browsing
or SSH. Although several such systems have been designed and built, their anonymity has so far not been adequately evaluated.
We analyse the anonymity of connection-based systems against passive adversaries. We give a preci...
Most programming languages adopt static binding, but for distributed programming an exclusive reliance on static binding is too restrictive: dynamic binding is required in various guises, for example when a marshalled value is received from the network, containing identifiers that must be rebound to local resources. Typically it is provided only by...
In this paper we consider low latency connection-based anonymity system which can be used for applications like web browsing or SSH. Although several such systems have been designed and built, their anonymity has so far not been adequately evaluated.
We discuss the design of programming languages for distributed computation, focussing on support for type-safe marshalling of arbitrary language values. In particular: (1) unmarshalling can involve rebinding to local resources; (2) values of abstract types can be communicated, and a globally-coherent notion of type equality ensures that unmarshalli...