ArticlePDF Available

# Weakening WebAssembly

Authors:

## Abstract and Figures

WebAssembly (Wasm) is a safe, portable virtual instruction set that can be hosted in a wide range of environments, such as a Web browser. It is a low-level language whose instructions are intended to compile directly to bare hardware. While the initial version of Wasm focussed on single-threaded computation, a recent proposal extends it with low-level support for multiple threads and atomic instructions for synchronised access to shared memory. To support the correct compilation of concurrent programs, it is necessary to give a suitable specification of its memory model. Wasm's language definition is based on a fully formalised specification that carefully avoids undefined behaviour. We present a substantial extension to this semantics, incorporating a relaxed memory model, along with a few proposed extensions. Wasm's memory model is unique in that its linear address space can be dynamically grown during execution, while all accesses are bounds-checked. This leads to the novel problem of specifying how observations about the size of the memory can propagate between threads. We argue that, considering desirable compilation schemes, we cannot give a sequentially consistent semantics to memory growth. We show that our model provides sequential consistency for data-race-free executions (SC-DRF). However, because Wasm is to run on the Web, we must also consider interoperability of its model with that of JavaScript. We show, by counter-example, that JavaScript's memory model is not SC-DRF, in contrast to what is claimed in its specification. We propose two axiomatic conditions that should be added to the JavaScript model to correct this difference. We also describe a prototype SMT-based litmus tool which acts as an oracle for our axiomatic model, visualising its behaviours, including memory resizing.
133
Weakening WebAssembly
CONRAD WATT, University of Cambridge, UK
ANDREAS ROSSBERG, Dnity Stiftung, Germany
JEAN PICHON-PHARABOD, University of Cambridge, UK
WebAssembly (Wasm) is a safe, portable virtual instruction set that can be hosted in a wide range of environ-
ments, such as a Web browser. It is a low-level language whose instructions are intended to compile directly to
bare hardware. While the initial version of Wasm focussed on single-threaded computation, a recent proposal
shared memory. To support the correct compilation of concurrent programs, it is necessary to give a suitable
specication of its memory model.
Wasm’s language denition is based on a fully formalised specication that carefully avoids undened
behaviour. We present a substantial extension to this semantics, incorporating a relaxed memory model, along
with a few proposed operational extensions. Wasm’s memory model is unique in that its linear address space
can be dynamically grown during execution, while all accesses are bounds-checked. This leads to the novel
problem of specifying how observations about the size of the memory can propagate between threads. We
argue that, considering desirable compilation schemes, we cannot give a sequentially consistent semantics to
memory growth.
We show that our model guarantees Sequential Consistency of Data-Race-Free programs (SC-DRF). However,
because Wasm is to run on the Web, we must also consider interoperability of its model with that of JavaScript.
We show, by counter-example, that JavaScript’s memory model is not SC-DRF, in contrast to what is claimed
in its specication. We propose two axiomatic conditions that should be added to the JavaScript model to
correct this dierence.
We also describe a prototype SMT-based litmus tool which acts as an oracle for our axiomatic model,
visualising its behaviours, including memory resizing.
CCS Concepts:
·Software and its engineering Virtual machines
;
Assembly languages
;Runtime
environments;Just-in-time compilers.
Additional Key Words and Phrases: Virtual machines, programming languages, assembly languages, just-in-
time compilers, type systems
ACM Reference Format:
Conrad Watt, Andreas Rossberg, and Jean Pichon-Pharabod. 2019. Weakening WebAssembly. Proc. ACM
Program. Lang. 3, OOPSLA, Article 133 (October 2019), 28 pages. https://doi.org/10.1145/3360559
1 INTRODUCTION
WebAssembly [Haas et al
.
2017] (abbreviated Wasm) is a safe virtual instruction set architecture
that can be embedded into a range of host environments, such as Web browsers, content delivery
networks, or cloud computing platforms. It is represented as a byte code designed to be just-in-time-
compiled to native code on the target platform. Wasm is positioned to be an ecient compilation
target for low-level languages like C++. Wasm is unusual, especially for a technology in the context
Stiftung, Germany, rossberg@mpi-sws.org; Jean Pichon-Pharabod, University of Cambridge, UK, jean.pichon@cl.cam.ac.uk.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for third-party components of this work must be honored. For all other uses,
contact the owner/author(s).
2475-1421/2019/10-ART133
https://doi.org/10.1145/3360559
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:2 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
of the Web, in that its normative specication is given as a fully formal semantics, informed by
the state of the art in programming language semantics, and any new feature must be given a full
Wasm in its original form was purely single-threaded, however the ability to compile multi-
threaded code to Wasm is considered to be of great importance by the Wasm Working Group. In
order to fully support compilation of multi-threaded code to Wasm, it is necessary to extend it with
threads as well as a memory consistency model [Boehm 2005]. A memory model describes the way
in which architectural behaviour and compiler optimisations may combine to produce a relaxed,
or weak, observed semantics of concurrent memory operations, which is not consistent with a
naive sequential execution of the individual operations. Such relaxed memory models have become
the subject of intense study in recent years, at the level of both architectural [Alglave et al
.
2009;
Flur et al
.
2016;Higham et al
.
.
2012;Owens et al
.
2009] and source-level
language [Batty et al
.
2015;Dolan et al
.
2018;Kang et al
.
2017;Lochbihler 2018;Manson et al
.
2005;
Nienhuis et al. 2016] semantics.
It often proves dicult to balance the various concerns of developers and implementers. An
intuitive, predictable source-level semantics often translates to an inecient and error-prone
implementation. The underlying hardware may exhibit weak behaviours which must be carefully
mitigated by inserting memory barriers or other synchronisation primitives at compile time, and
the compiler must be careful not to perform an optimisation which is valid in single-threaded code,
but not when the eects of the code on memory may be observed by other threads [Boehm 2011;
Ševčík and Aspinall 2008]. Conversely, a high-level language may, in order to achieve maximum
performance, attempt to support all compiler optimisations and expose the union of all possible
weak behaviours in the underlying hardware. As can be seen with C++11 relaxed atomics, this
leads to a semantics which is almost impossible to reason about, or even circularly justied [Batty
et al. 2013;Boehm and Demsky 2014;McKenney et al. 2005].
The memory model for Wasm faces a unique design pressure compared to existing work: all
accesses are bounds-checked, and the bounds of Wasm’s memory address space may be resized at
runtime. The memory model must not only specify the values observed by concurrent accesses, but
also the out-of-bounds behaviour of accesses in the presence of concurrent memory size alterations.
Moreover, the need for a safe and portable semantics forbids any notion of undened behaviour
sneaking in.
Our contributions are as follows:
We give the formal semantics of a concurrent extension of Wasm with threads, atomics,
rst-class references, and mutable tables.
We present an axiomatic memory model for this extension which addresses the above
challenges.
We prove that our memory model is sequentially consistent for data-race-free programs
(SC-DRF, see Section 6).
We show by counter-example, veried by tool support, that JavaScript’s memory model is
not SC-DRF.
We present an SMT-based litmus tool which visualises our semantics, including growth, and
use it to experimentally validate a correspondence between our model and JavaScript’s.
We discuss compilation to and from Wasm, and the extent to which we can formally motivate
correctness of compilation, given the current state of the art in relaxed memory research.
In developing the memory model, we extensively consulted with Wasm implementers and
the Wasm Working Group. The model we present here is part of the wider Wasm threading
specication [Smith 2019], which is still under active development and only allows concurrent
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:3
accesses to the linear memory. In this paper, we present a generalised design that already anticipates
concurrent use of functions, global variables, and tables, allowing both atomic and non-atomic
access to each, to illuminate the full extent of the design space.
2 BACKGROUND
Wasm is closely based on the instruction sets of real CPUs, but at the same time must be portable
across hardware. Hence its memory model follows the lineage of models for low-level programming
languages, rather than hardware models. To provide some necessary background, we survey the
most directly relevant ancestors, the C++ and the JavaScript memory models, in the following.
2.1 The C++ Memory Model
The C++ axiomatic memory model [Batty et al
.
2011;Boehm and Adve 2008] is in many ways a
seminal work in the area of weak memory semantics. Objects in C++ may be declared as atomic,
and atomic operations on these objects may be parameterised with one of several consistency modes,
which form a hierarchy from
relaxed
to
seqcst
(sequentially consistent). Stronger consistency
modes provide more semantic guarantees, but require additional synchronisation in the compilation
scheme [Sewell and Sevcik 2016], allowing expert programmers to unlock the full performance of
the underlying hardware by carefully designing their program to use weaker consistency modes,
with relaxed atomics being designed to compile to bare loads and stores [Sewell and Sevcik 2016].
However, relaxed atomics are so weak that various deciencies have been found in their seman-
tics [Batty et al
.
2013;Boehm and Demsky 2014;McKenney et al
.
2005]. In particular, we have the
issue of out-of-thin-air reads, whose value may be circularly justied, as in the notorious example
where the value 42 may appear in a program that contains no constant numbers or arithmetic [Batty
and Sewell 2014]. It is not expected that real hardware or compiler optimisations could ever give
rise to such an astonishing execution, so clearly there is space for relaxed atomics to be given a
stronger semantics while still compiling to bare loads and stores. However, properly specifying such
a strengthening while still admitting all current compiler optimisations is an open problem [Batty
et al. 2015].
A location which is not declared as atomic may still be accessed concurrently by multiple
threads. However any data race involving a non-atomic operation triggers C++ undened behaviour.
Undened behaviour in C++ is specied rather brutally. If it is potentially triggered as part of
an execution, every execution of the program is allowed to have arbitrary behaviour, even for
operations that took place in the past, before the behaviour is triggered, or in executions where it is
not triggered at all. This is an ultimate safety valve in the specication where it would be otherwise
impossible to give a sensible semantics. The initializing write to the atomic location is modelled as
a non-atomic write. Aside from this, atomic locations cannot experience non-atomic accesses.
Because the C++ model contains many consistency modes and concurrency features, its full
model is rather large. In order to explain its relationship with the Wasm model, it suces for us
to consider only the łC++ Model Supporting Low-Level Atomics" fragment initially described in
[Boehm and Adve 2008], which supports only seqcst and non-atomic consistency modes.
To summarise the core of the model briey, memory accesses over the course of program
execution are collected as a set of abstract records, recording which location was accessed, which
value was read/written, and the consistency mode. Accesses that execute sequentially in the same
thread are related by sequenced-before. The specication guarantees that all observable executions
are valid; that is, it must be possible to give denitions for the relations over accesses reads-
from,synchronizes-with,happens-before, and sc such that the axiomatic conditions of the model
hold. We reproduce these conditions below, grouping some sub-conditions as ł
value-consistent
ž,
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:4 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
ł
hb-consistent
ž, and ł
sc-last-visible
ž to facilitate comparisons to other models presented in this
paper.
happens-before is a strict partial order, and sc is a strict total order on sequentially consistent
accesses.
happens-before is the transitive closure of
sequenced-before synchronizes-with
, and sc is
compatible with happens-before.
For all read accesses R, there must exist a write access W such that R reads-from W.
For all accesses R and W, such that R reads-from W, the following must hold:
value-consistent:
(1) W must access the same location as R, and the observed values are consistent.
hb-consistent:
(1) It is not the case that R happens-before W.
(2) W synchronizes-with R i both R and W are seqcst.
(3)
There exists no W
such that W happens-before W’, W
happens-before R, and W
writes
to the same location that R and W access,
sc-last-visible:
(1)
If both R and W are
seqcst
, then W must be the last write to the location of R that is sc
before R.
(2)
If R is
seqcst
and W is non-atomic, then there exists no W
such that W happens-before
W, Wis sc before R, and Wwrites to the same location that R and W access,. (†)
The condition highlighted and marked
(†)
was added to the model [Batty 2014;Batty et al
.
2011]
after the original draft was found not to guarantee Sequential Consistency of Data-Race-Free
programs (SC-DRF), a crucial correctness condition (see Section 6).
The memory model of C++ is specied in a mostly formal manner. However, the wider specica-
tion is not a formal semantics. This means that there are inevitable imprecisions in how behaviour
in other areas of the specication can be related to the sets of accesses manipulated by the memory
model, for example in programs exhibiting undened behaviour or non-terminating executions.
The C++ memory model implicitly relies on several language-level invariants of the C++ seman-
tics.
As previously mentioned, it can be assumed that there are no racing non-atomics, since
otherwise the program has undened behaviour.
Accesses are guaranteed to be to discrete locations which never overlap each other, as a
consequence of the C++ łeective typež rules.
By the same rule, no location can experience a mixture of atomic and non-atomic accesses,
except for initializing writes to atomic locations. This exception was the cause of the deciency
corrected by (†).
None of these assumptions hold for Wasm’s more low-level instruction set.
2.2 The JavaScript Memory Model
JavaScript’s shared memory operations are dened over shared array buers, linear buers of
raw bytes that can be accessed by multiple threads in an array-like fashion through (potentially
dierent) data views. Unlike C++, a single access is therefore dened as aecting the values of a
range of bytes, rather than the value of a single abstract location. Moreover, since a JavaScript
program may have multiple shared array buers, events must also track which buer they are
accessing.
The JavaScript memory model [ECMA International 2018b] denes two consistency modes
that can be used programmatically:
unordered
, and
seqcst
. While C++ models initial values as
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:5
non-atomic writes, JavaScript models them using a third consistency mode,
init
. The
init
mode
functions mostly as
unordered
, except that it is guaranteed to occur before other events. This is
strictly stronger than the C/C++ notion of initialisation, which can be delayed, while JavaScript
buers are guaranteed to be zero-initialised at the moment of their creation.
The model’s core can be briey summarised in a similar manner to that of the C++ model. Again,
sequenced-before has the same meaning, and every execution must respect the following constraints
on reads-from,synchronizes-with, and happens-before. However the JavaScript model uses a slightly
dierent formulation of sc. Instead of a total order over only
seqcst
events, the model requires the
existence of a total order across all events, which we will refer to as tot throughout the paper. This
distinction is trivial, as the JavaScript model never uses tot to restrict the behaviour of non-atomic
events, meaning that the model could equally well be formulated using sc (which would be tot
restricted to
seqcst
events). Moreover, read and write events are now characterised by a list of byte
values rather than a single value as in C++, and reads-from now relates a read event R to a list of
write events, with each list element describing the source of one byte in R’s range
1
convention that R reads-from(i) W describes W as the i-th event in the list.
happens-before is a strict partial order, and tot is a strict total order on accesses.
happens-before is the transitive closure of
sequenced-before synchronizes-with
, and happens-
before is a subset of tot.
For all read accesses R, for all
i<
size R, there must exist a write access W such that R
reads-from(i) W, and R and W access the same shared array buer.
init events happen before all other accesses with overlapping ranges to the same buer.
For all accesses R and W, for all
i<
size R such that R reads-from(
i
) W, the following must
hold:
value-consistent:
(1)
W must access (among others) the
i
-th byte index of R, and the value read by R must
be consistent with the value written by W at that index.
hb-consistent:
(1) It is not the case that R happens-before W.
(2) W synchronizes-with R i both R and W are seqcst, and aect equal byte ranges.
(3)
There exists no W
such that W happens-before W
, W
happens-before R, W
writes to
the i-th byte index of R, and Waccesses the same buer as R and W.
sc-last-visible:
(1)
If both R and W are
seqcst
and have equal byte ranges, then W is the last
seqcst
write
with equal byte range to R that is tot before R.
(2)
If R is
seqcst
, W is
unordered
, then there exists no
seqcst
write W
such that W happens-
before W
, W happens-before R, W
is tot before R, W
has equal byte range to R, and
Waccesses the same buer as R and W. (†)
(3)
If R is
unordered
and W is
seqcst
, then there exists no
seqcst
write W
such that W
happens-before R, W happens-before R, W is tot before W
, W
has equal byte range to
W, and Waccesses the same buer as R and W. (‡)
no-tear:
(1)
If both R and W have equal ranges, and R and W are tear-free, then no other access
W
and index
i
can exist such that R reads-from(
i
) W
, R and W and W
have equal
ranges, and Wis tear-free.
1
This sketch glosses over the exact formal details of how indices and ranges are compared and related. The JavaScript
language specication gives this aspect of the model an exceptionally complicated and prosaic denition, which is not
necessary to discuss the model intuitively.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:6 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
As part of this work, we identied that the JavaScript model replicated the SC-DRF violation of
the uncorrected C++ model. As in C++, this violation is corrected by extending
sc-last-visible
with
the
(†)
rule, slightly modied to explicitly not apply in the case of a data race between W and R,
which is implicit in the C++ model. In addition, we discovered a dual violation caused by an
unord
seqcst
write. This has no direct analogy in C++, as such accesses are not permitted by the
language. This violation is corrected by the
(‡)
rule. We have proposed the addition of both these
rules to the JavaScript model. We discuss this in more detail in Section 6.
The nal condition, ł
no-tear
ž, describes the circumstances in which a write to multiple bytes
may be decomposed into independantly observable writes to individual bytes. This is possible when
dealing with non-aligned, racing writes, or writes larger than the word size of the architecture.
The tear-free predicate describes when a write is guaranteed to be visible indivisibly to reads of
identical alignment and range, even when racing. All
seqcst
accesses are guaranteed to be tear-free.
Our memory model for Wasm adopts the basic approach of the JavaScript model.
/* = 0x2 */ print(x[0]);
/* = 0x1 */ print(x[0]);
/* = 0x2 */ print(x[0]);
/* = 0x1 */ print(x[0]);
...
x[0] = 0x1;
x[0] = 0x2;
Fig. 1. A surprising JavaScript execution, permied by its memory model in lieu of a totally undefined
2.3 Contrasting C++ and JavaScript
JavaScript has found itself co-opted as an ad-hoc compilation target for C++ [Herman et al
.
2014].
It is therefore not surprising that their memory models have many similarities. In a JavaScript
program which respects the following conditions, it can be seen that
unordered
JavaScript accesses
are equivalent to non-atomic C++ accesses, and JavaScript and C++
seqcst
accesses are equivalent
to each other, in the sense of the memory consistency behaviours that are allowed:
There are no data races involving unordered access.
All accesses are naturally aligned.
No two accesses have overlapping but non-equal ranges.
No access ranges beyond the bounds of the buer.
We can observe that these restrictions eectively re-establish the language-level invariants of C++,
and ensure that the byte ranges of JavaScript accesses can each be treated as a discrete location.
Compilation from C++ to JavaScript can then be accomplished by allocating each shared object
on a disjoint, aligned area of a shared array buer, and promoting all C++ atomic accesses (of any
consistency) to
seqcst
. C++ pointers to memory then become indices into a data view over the
shared array buer in the translated code.
The JavaScript and C++ models dier in the consistency behaviour that is allowed when the
aforementioned conditions are not met. In C++, dereferencing a null pointer or a pointer to
unallocated memory results in undened behaviour. In JavaScript, accessing a shared array buer
at an out-of-bounds or łnonsensež index results in a regular JavaScript value that is, confusingly
enough, named
undefined
. Consequently, executions with out-of-bounds accesses have dened
behaviour. Moreover, in C++, data races and overlapping mixed-size accesses all instantly trigger
undened behaviour. The JavaScript specication instead chooses to maintain a dened behaviour,
but one that is far weaker than the behaviour of real hardware. In particular,
unordered
accesses
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:7
which race with other accesses may be freely read from, without creating any coherence guarantees.
This means that executions such as the one shown in Fig. 1are possible, and well-dened behaviour
according to the JavaScript specication.
We have presented the core of both the C++ and JavaScript models, focussing on the semantics
of data accesses. Beyond these, the full languages contain additional features which interact with
the memory model such as locks, thread creation and suspension, and so on. They are not included
here because there are few similarities between the feature sets of the two languages. Generally
these features imply additional happens-before edges, for example, if one thread spawns another, all
previous actions in the spawning thread will be happens-before all actions in the spawned thread.
(value types) t::=nt |rt
(numeric types) nt ::=i32 |i64 |f32 |f64
(sign extension) sx ::=s|u
(storage size) sz ::=8|16 |32 |64
(packed type) pt ::=sz_sx?
(order) ord ::=unord |seqcst
(sharing) sh ::=local |shared
(reference types) rt ::=sh anyref |sh funcref
(function types) ::=sh tt
(global types) gt ::=sh mut t
(table types) ::=sh rt[n]
(memory types) mt ::=sh [n]
(instructions) e::=. . . |call i|call_indirect.ord |global.get.ord i|global.set.ord i|
table.get.ord |table.set.ord |table.size |table.grow |ref.null |ref.func i|
nt.load.ord pt a o |nt.store.ord sz a o |nt.rmw.binop pt a o |
nt.wait |notify |memory.size |memory.grow |
fork i|instantiate mod
(functions) func ::=exfunc  im |ex func local te
(globals) glob ::=exglobal gt im |exglobal gt e
(tables) tab ::=extable  im |ex table (e)
(memories) mem ::=exmemory mt im |exmemory mt
(export) ex ::=export łnamež
(import) im ::=import łnamež łnamež
(modules) mod ::=module funcglobtab?mem?
Fig. 2. Abstract syntax of concurrent Wasm (excerpt)
3 CONCURRENT WASM
The concurrent version of Wasm that we describe in this paper is an extension of the base language
dened by the ocial specication [WebAssembly Working Group 2019], which itself is heavily
based on [Haas et al
.
2017]. Fig. 2shows an extract of the abstract syntax of concurrent Wasm.
For space reasons, we omit instructions that are not relevant to the memory model and carry over
unchanged from basic Wasm.
Wasm code is organised into individual functions that are in turn bundled into a module, forming
a Wasm binary. Wasm code is executed in a host environment that it can only interact with through
a module’s imports and exports. In particular, the host may invoke exported Wasm functions, and
Wasm code may call imported host functions.
A module can also dene, import, or export stateful denitions. Three forms of global state exist
in Wasm: the linear memory providing a bounds-checked address space of raw bytes, tables storing
and indexing opaque references to functions, and plain global variables. Through import and export,
access to stateful denitions can be shared with other modules or the host, which can potentially
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:8 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
mutate them. Separate modules can dene linear memories or tables separately, such that Wasm
eectively supports multiple disjoint address spaces as well as the dynamic creation of new ones.
A recent proposal for Wasm [Rossberg 2018], which will soon be adopted by the standard, turns
references to functions or stateful objects into rst-class values and generalises the notion of table
to a general store for opaque reference values. Because this extension is deeply aected by threading
as well, we include and describe it here; respective constructs are highlighted in blue in Fig. 2.
References. The set of values that a program can store or compute over is codied by Wasm’s
notion of value type. As a low-level language, Wasm so far only allowed numeric value types
(integers and oats), which also encode pointers into linear memory. A recent extension proposed
to complement them with reference types [Rossberg 2018], which abstract physical pointers into
the host system’s memory. This extension enables Wasm code to safely round-trip pointers to
host objects (such as DOM objects on the Web), which previously required bijective mappings
to numbers at the language boundary and brittle manual lifetime management. The extension
also enables a rst-class representation of function references, and hence (in another extension not
considered here) type-safe indirect calls. We consider only minimal support for references here,
where the only types available are
anyref
and
funcref
ś the former is the top type of all references,
the latter includes all function references.
Wasm code can either form references from a local function index (
ref.func
) or as the null
reference (
ref.null
) ś both
anyref
and
funcref
are inhabited by null (future renements to the type
system will exclude it from certain types). In addition, we assume that the host environment can
create unspecied forms of references and pass them to Wasm.
Unlike numeric types, whose representation is transparent and whose values can hence be stored
into memory, reference types must be opaque for safety and security reasons; that is, their bit
pattern must not be observable and they cannot be allowed into raw memory.
Tables. To make up for this, Wasm’s existing notion of table is generalised. Originally, it only
allowed holding function references, which was useful for emulating raw function pointers and
indirect calls, especially when compiling C-like languages [Haas et al
.
2017]. The reference proposal
repurposes tables as a general storage for references, by allowing any reference type as element.
Accordingly, the instruction set is extended with instructions for manipulating table slots
(
table.get
and
table.set
) and table size (
table.size
and
table.grow
), the latter being analogous to
the existing instructions for memories.2
another language proposal [Smith 2019]. The current proposal only supports shared linear memory,
but here we generalise it to globals, tables, and references, since such further extensions are on
Wasm’s long-term roadmap, and we aim to have a formalism that can handle the full enchilada
seamlessly. In Fig. 2, all respective extensions are highlighted in red.
Sharing. Unlike most other languages, concurrent Wasm is explicit about which objects can be
shared between threads, and which ones are only accessed in a thread-local manner. Accordingly,
all denitions and references are complemented with a sharing annotation
sh
. These annotations
make it easy for engines to pick the most ecient compilation scheme for each access to mutable
state, for example, by avoiding unnecessary barriers. Validation (see supplemental material) ensures
that annotations are consistent and transitive, e.g., a shared reference can only refer to shared
denitions.
2
The ability to mutate tables and their size always existed in Wasm, but was previously only accessible through the host-side
API.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:9
Atomics. In order to enable synchronisation between multiple threads, concurrent Wasm in-
corporates the ability to specify an ordering constraint (or consistency mode) for instructions that
access a program’s state. In the current proposal, which follows JavaScript in that regard, only
two modes are supported: non-atomic unordered access (
unord
) and sequentially consistent atomic
access (
seqcst
). Additional atomic consistency modes like acquire/release may be added as future
extensions.
An ordering annotation is included in instructions for accessing the Wasm memory (
,
t.store
), as well as instructions accessing global variables (
global.get
,
global.set
) and table slots,
including indirect calls (
table.get
,
table.set
,
call_indirect
). In addition, the language oers an
atomic read-modify-write instruction (t.rmw) for memory access, where the modication can be
drawn from a large set of binary numeric operators
binop
, whose denition we omit here. It also
provides a pair of low-level
wait
and
notify
instructions that block a thread and resume blocked
threads, respectively, indexed on a memory location.
Host Instructions. Although not part of the current threading proposal, which assumes that
threads are only created by the host environment, we also include an instruction for spawning
a thread from a function (
fork
). In addition, we provide a pseudo instruction (
instantiate
) for
instantiating and linking a module [Haas et al. 2017].
Including these two instructions allows us to express all interesting eects that can be performed
by the host environment ś in particular, the dynamic creation of threads and the dynamic allocation
of new pieces of shared state (including new memories, i.e., address spaces) ś as Wasm code. That
in turn enables us to model all relevant host computation as Wasm computation, and all interesting
interactions with host threads can be expressed as interaction with other Wasm threads.
(global conguration) conf ::=s;p
(local conguration) lconf ::=s;f;e
(time stamp) h
(frames) f::={module m,local v}
(module instances) m::={func a,global a,table a?,mem a?}
(store) s::={(aobj)
h}
(instance objects) obj ::=func m func |global gt |table |mem mt
(where func is restricted to the form func local te)
(administrative instr’s) e::=. . . |ref a|trap |call’ a|fork’ e|wait’ l n |notify’ l n m
(values) v::=nt.const c|ref.null |ref a
(store values) w::=b|v
(events) ev ::=(act)h|ϵ
(actions) act ::=rdord l w|wrord l w |rmw l wmwm
(eld) d ::=val |data |elem |length
(region) r::=a.d
(location) l::=r|r[i]
Fig. 3. Runtime structure
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:10 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
4 OPERATIONAL SEMANTICS
The execution semantics of concurrent Wasm is dened in two layers: an operational semantics,
specifying execution of individual instructions (described in this Section), and an axiomatic seman-
tics, specifying the interaction with the memory (described in Section 5). Both interact through
events that are generated by the operational semantics and łwired upž by the axiomatic semantics.
Congurations. The operational semantics is dened via two small-step reduction relations: (1)
reduction of local congurations, i.e., individual threads, and (2) reduction of global congurations,
i.e., the entire program state. Their denitions are given in Fig. 3.
Global congurations consist of a set of threads
p
, each represented by its remaining instruction
sequence, annotated by a time stamp
h
whose explanation we defer to Section 5, and the global
program store
s
, which records all abstract instance objects that have been created (i.e., allocated)
by the program. These objects are the runtime instances of all entities that can be dened and
ex/imported by modules, i.e., functions, globals, tables, and memories. Every object is identied
a
in the store. Every entry is also indexed with a time stamp
h
that
indicates when it was created. We write
s(a)
for the object associated with
a
in the store
s
, and
stime(a)for its respective creation time h.
Local congurations consist of the store, the frame
f
of the current function, and the instruction
sequence
e
left to execute. The frame records the module instance a function resides in and the
state of its local variables.
Actions and Events. We make a key generalisation of existing work by dening actions, which
record an individual access to shared state, and events, which are the units ordered by the consistency
predicate of the axiomatic semantics, as distinct formal objects. Existing formal memory models
implicitly unify these, but in our model a single event may contain multiple actions, which are
all considered to be performed atomically. Both are also dened in Fig. 3. Like threads, events are
annotated with a time stamp, a matter we will explain in Section 5. Per convention, we implicitly
identify all events ()hwith no action with the empty event ϵ.
Intuitively, actions express those store operations that can be reordered, subject to certain con-
ditions that the axiomatic semantics denes. An action can be one of three avours of access to
a mutable location
l
rd
), write (
wr
rmw
). In each case the
action records the sequence of store values
w
write. Reads and writes may be annotated with dierent consistency modes; as a convention, we
abbreviate rdunord and wrunord to just rd and wr, respectively.
A location describes the component of an instance object that is being accessed. It is given as an
a
paired with a virtual eld name
d
. Globals only have a
val
eld that is their
current value. Tables and memories both have a
len
eld storing their current size, and an
elem
eld
(for tables) or
data
eld (for memories) that denotes the respective content. The latter are vectors
of store values (references for
elem
, bytes for
data
). Hence a location in these elds additionally
involves an oset i.
Local Reduction. The local reduction relation is labelled by sets of actions. Figs. 4,5,6,7,8,9,10
show a selection of all local reduction rules that touch the store. Rules for constructs missing from
the gure do not access the store and hence carry over unchanged from [Haas et al. 2017].
s;f;v(global.set.ord i)֒(wrord a.val v)ϵ(if a=fglobal(i))
s;f;(global.get.ord i)֒(rdord a.val v)v
(if sv:ttype(s(a)) =sh mut ta=fglobal(i))
Fig. 4. Local reduction (globals)
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:11
s;f;(i32.const i)vtable.set.ord ֒(rd a.len n)trap (if a=ftable in)
s;f;(i32.const i)vtable.set.ord ֒(rd a.len n) (wrord a.elem[i]v)ϵ(if a=ftable i<n)
s;f;(i32.const i)table.get.ord ֒(rd a.len n)trap (if a=ftable in)
s;f;(i32.const i)table.get.ord ֒(rd a.len n) (rdord a.elem[i]v)v
(if sv:ttype(s(a)) =sht[n] ∧ a=ftable i<n)
Fig. 5. Local reduction (table access)
f;table.size ֒(rdseqcst a.len n)i32.const n(if a=ftable)
f;(i32.const k)vtable.grow ֒(rdseqcst a.len n)i32.const (−1) (if a=ftable)
f;(i32.const k)vtable.grow ֒(rmw a.len n(n+k)) (wr a.elem[n]vk)i32.const n
(if a=ftable)
Fig. 6. Local reduction (table growth)
s;f;(i32.const i) (call_indirect.ord )֒(rd a.len n)trap (if a=ftable in)
s;f;(i32.const i) (call_indirect.ord )֒(rd a.len n) (rdord a.elem[i] (ref a)) trap
(if a=ftable i<ntype(s(a)) ,)
s;f;(i32.const i) (call_indirect.ord )֒(rd a.len n) (rdord a.elem[i] (ref a)) call’ a
(if a=ftable i<ntype(s(a)) =)
f;(ref.func i)֒ref ffunc(i)
f;(call i)֒call’ ffunc(i)
s;vn(call’ a)֒framem{module m,local vn(t.const 0)k}eend
(if s(a)=func m(func (sh tn
1tm
2)local tke))
Fig. 7. Local reduction (functions)
f;(i32.const k) (t.load.ord pt a o)֒(rd a.len n)trap (if a=fmem k+o+|pt|>n)
f;(i32.const k) (t.load.ord pt a o)֒(rd a.len n) (rdord a.data[k+o]bitspt
t(c)) t.const c
(if a=fmem k+o+|pt| n)
f;(i32.const k) (t.const c) (t.store.ord sz a o)֒(rd a.len n)trap (if a=fmem k+o+sz >n)
f;(i32.const k) (t.const c) (t.store.ord sz a o)֒(rd a.len n) (wrord a.data[k+o]bitssz
t(c)) ϵ
(if a=fmem k+o+sz n)
f;(i32.const k) (t.const c2) (t.rmw.op pt a o)֒(rd a.len n)trap (if a=fmem k+o+|pt|>n)
f;(i32.const k) (t.const c2) (t.rmw.op pt a o)֒(rd a.len n) (rmw a.data[k+o]bitspt
t(c1)bitspt
t(opt(c1,c2))) c1
(if a=fmem k+o+|pt| n)
Fig. 8. Local reduction (memory access)
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:12 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
f;memory.size ֒(rdseqcst a.len n)i32.const n/64 Ki (if a=fmem)
f;(i32.const k/64 Ki)memory.grow ֒(rdseqcst a.len n)i32.const (−1) (if a=fmem)
f;(i32.const k/64 Ki)memory.grow ֒(rmw a.len n(n+k)) (wr a.data[n] (0)k)i32.const n/64 Ki
(if a=fmem)
Fig. 9. Local reduction (memory growth)
f;(i64.const q) (t.const c) (i32.const k)t.wait ֒trap (if kmod |t|,0)
f;(i64.const q) (t.const c) (i32.const k)t.wait ֒(rd a.len n)trap (if a=fmem k+|t|>n)
f;(i64.const q) (t.const c) (i32.const k)t.wait ֒(rd a.len n) (rdseqcst a.data[k]b)wait’ a.data[k]q
(if a=fmem k+|t| ≤ nb=bitst(c))
f;(i64.const q) (t.const c) (i32.const k)t.wait ֒(rd a.len n) (rdseqcst a.data[k]b)i32.const 1
(if a=fmem k+|t| ≤ nb,bitst(c))
f;(i32.const m) (i32.const k)notify ֒notify’ a.data[k]0m(if a=fmem nm)
s;f;vn(fork i)֒fork’ (vn(call’ a))
(if a=ffunc(i) ∧ type(s(a)) =shared tnϵ)
s;(ref ai)(instantiate mod)֒(wr l w )s(aobj)
h;(ref ae)
(side conditions omitted)
Fig. 10. Local reduction (thread synchronization and creation)
Let us start with the simplest rules, those for accessing globals (
global.get
,
global.set
, Fig. 4).
a
of the global under its static index
i
in the current frame’s module
instance and then perform a single action to read or write the
val
eld of that global, with the
appropriate memory ordering. The respective value
v
that is observed by these actions is picked
non-deterministically in these rules, but the axiomatic semantics will constrain this choice such
that each read matches up with some write to the same location (Section 5).
Accessing tables (
table.get
,
table.set
, Fig. 5) is more interesting. It involves an index that could
be out of bounds, but the bounds may be dynamically altered by the execution of a
table.grow
instruction. It is here where multiple actions per event come into play, because such an access
len
eld as well, to check its size.
3
The size
is a value of the form
(i32.const n)
, which we abbreviate to
n
here. If
ni
then the access is out
of bounds and execution traps. Otherwise, the actual read or write action for the table slot at the
indexed location is also performed. Both these actions are to be performed as one atomic event.
The size may also be explicitly queried (
table.size
, Fig. 6), and updated (
table.grow
, Fig. 6). The
chosen consistency behaviours of dierent implicit and explicit methods that observe the table
size must support desired compilation schemes. Implicit bounds checks are permitted to observe
changes to the table size with a semantics that is not sequentially consistent, for reasons which
will be justied in Section 4.1.
3
An implementation might use more ecient ways to implement this check, e.g., via hardware protection, but the semantics
must be the same.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:13
Some other instructions deal with manipulating function references (Fig. 7) , which may be stored
in tables. The (type-annotated)
call_indirect
instruction dynamically retrieves a function from
a table of function references and calls it. The
ref.func
instruction returns a rst-class function
reference for the function at the static index
i
. When a function is called, a new local frame is
created for the function’s body, with the function arguments becoming new local variables.
Loads and stores to memory (t.load,t.store, Fig. 8) work analogously to table accesses, except
o
, and (2) operate on a sequence of multiple bytes at once
and interconvert to a numeric value, by interpretation through the meta function
bits
[Haas et al
.
2017]. The latter has the additional implication that the axiomatic semantics allows non-atomic
accesses to tear, i.e., reads can observe individual bytes from multiple dierent writes. Instructions
manipulating memory size (
memory.size
,
memory.grow
, Fig. 9) are similar to the table analogues,
the only complication being that size values are given in units of page size, which is 64 KiB.
The next bunch of instructions (
wait
,
notify
,
fork
, Fig. 10) are related to threads. Their semantics
is mostly dened by the global reduction relation, while the local relation only handles their
operands and respective side conditions. To that end, these instructions are reduced to auxiliary
administrative variants that carry the nal operands, to be picked up by global reduction. The
t.wait
instruction performs a bounds-checked read and suspends if the read value equals the operand, or
immediately returns 1 otherwise. It traps if the access is out of bounds or unaligned (a behaviour
chosen to align with common hardware). Suspending is represented by the administrative variant
wait’
that records the location and the time-out value
q
(in nanoseconds). The symmetric operation
is
notify
, which, given a memory index
k
and a number
m
, will notify at most
m
for the same location to wake them up. This is analogously modelled by an administrative variant
notify’
recording the location, the number
n
of woken threads (0 initially) and the maximum
m
.
The
fork
instruction performs a function call in a new thread by looking up the function in the
local frame and then forking the actual call via the auxiliary fork’.
Lastly, the
instantiate
instruction creates a new module instance. It takes a reference for each
import, allocates and initialises the instance objects attached to the module, and returns a reference
for each export. It is this instruction that extends the store with new objects. Details can be found
in the supplemental material.
h}
s;(e)hp֒(act)hss;(e)hp
hhb h
s;(E[fork’ e])hp֒s;(E[ϵ])h(e)hp
n<m h hb h
s;(E[wait’ l q])h(E[notify’ l n m])hp֒s;(E[i32.const 0])h(E[notify’ l(n+1)m])hp
n=m∨ ¬∃E,h,q,hhb h∧ (E[wait’ l q])hp
s;(E[notify’ l n m])hp֒s;(E[i32.const n])hp
signed(q) ≥ 0hhb h
s;(E[wait’ l q])hp֒s;(E[i32.const 2])hp
Fig. 11. Global reduction
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:14 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
Global reduction. Fig. 11 denes global reduction. In all rules we assume that the sequence
p
is
in fact a set that can be implicitly reordered.
The main rule non-deterministically selects a thread and advances it one step by invoking the
local reduction relation with an empty frame
fϵ
. The previous time stamp
h
h
, which also is taken as the time stamp of the atomic event formed by the performed
actions and of any newly allocated objects in the store’s extension
s
. As we will see soon, it is this
condition that imposes program order on all events within the same thread. Another side condition
checks that the actions only refer to object addresses that have previously been allocated in the
store, at an earlier time, ensuring respective causality and avoiding use-before-denition. This
obviates the need for the ad-hoc init consistency mode of the JavaScript model.
The eect of the
fork’
instruction is to simply add a new thread to the conguration. Both the
new and the originating thread will be assigned the same new time stamp h.
The interaction between
wait’
and
notify’
on a common location
l
is modelled by a kind of
reaction rule. It reduces
wait’
to the result 0, which indicates successful notication to the program,
thereby waking up the thread for future local reductions. This time, the side condition
hhb h
enforces an ordering relation between the thread performing the notication (at time
h
) and the
thread woken up (which was suspended earlier at time
h
). The
notify’
instruction keeps iterating
with an increased wake count
n
. Once
n
reaches
m
, or no more matching waits can be found, the
next rule nalises the operation and returns n.
A
wait’
instruction may also time out, if (the signed interpretation of) its time-out value is not
negative (indicating innite timeout). Since there is no consistent notion of execution speed across
platforms, time-out is simply modelled as a non-deterministic reduction in the semantics; a result
of 2 indicates this outcome to the program.
4.1 Bounds Checks
As previously discussed, all accesses to memories or tables are bounds-checked, and out-of-bounds
access will immediately
trap
. There have been ongoing discussions with implementers regarding
how they expect to be able to compile Wasm accesses to memories and tables, considering their
bounds-checking behaviour [Hansen 2017]. A key motivating example is given in Fig 12. Given
desirable implementation schemes, one must ask łif the rst access of thread 1 (
c
) succeeds, requiring
the memory growth (
b
) to be łvisiblež, is it guaranteed that the store of (
a
) is visible to (
d
)?ž This is
a standard message passing (MP) shape, but with some data accesses replaced by length accesses.
More ecient implementation techniques on modern hardware notwithstanding, it is inevitable
that some platforms will have to compile explicit bounds checks, as noted in the threads pro-
posal [Smith 2019]. In that case, a large section of memory is pre-allocated, and a real memory
location is used to store the current maximum bound. This way, a memory.grow or table.grow
instruction can be implemented as a simple atomic increment of this location, without the need for
further allocation. All compiled Wasm accesses are then guarded by a conditional jump to a
trap
procedure based on the current value of this location.
a: (i32.store (i32.const k) (i32.const 54)) c: (i32.load (i32.const j))
b: (memory.grow . . .)d: (i32.load (i32.const k))
Fig. 12. A message-passing (MP) shape involving memory growth (assuming index
k
is in-bounds, and index
j
is out-of-bounds before the growth (
b
), and in-bounds aer). Even if (
c
) successfully executes, it is not
guaranteed that (d) will read 54.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:15
MOV X0,#54 LDR X0,[X3]
STR X0,[X1] CBZ X0, LCTRAP
DMB SY LDR X2,[X1]
MOV X2,#1
STR X2,[X3]
a: W x=54
b: W len=1
dmb
c: R len=1
d: R x=0
ctrl
rf
rf
fr
Fig. 13. MP+dmb+ctrl, a permied execution on ARM
This leads to a natural view of a WebAssembly memory and table instruction as abstractly
carrying out up to two accesses. As well as accessing data in memory, the instruction will also
access a distinguished łlengthž location, to perform the bounds check. Both accesses are potentially
subject to relaxed behaviour, since implementers wish to compile this bounds check as a bare
architectural load with few ordering guarantees, for eciency reasons.
We use this scheme as the basis for our formal specication of the relaxed behaviour of bounds
checking, considering it the weakest discipline we are prepared to support. As detailed in Fig. 19,
Wasm unord accesses correspond to bare architectural load/stores. We model the implicit bounds
unord
len
abstract loca-
tion. The
memory.grow
and
table.grow
instructions are modeled with an atomic
rmw
increment,
and explicit
memory.size
or
table.size
checks are modeled as atomic reads of the
len
location.
This means that explicit length checks (for example to ensure that a
trap
does not occur later in
the code) guarantee that subsequent instructions will observe the same (or greater) length, and
all side-eects from instructions before the last growth. However, implicit observations about the
length, through the success or failure of bounds checks, guarantee no synchronization whatsoever.
This means that the example of Fig 12 is allowed to exhibit a non-SC behaviour.
This can be justied at two levels. First, any program fragment observing relaxed behaviour in its
bounds checks must have a race between a
memory.grow
/
table.grow
in one thread and a regular
access in another thread, such that this access is out-of-bounds łbeforež the grow but in-bounds
łafterž. Even forgetting about relaxed behaviour, such a program is clearly wrong, and will exhibit
executions which
trap
, depending on the interleaving of the two threads. We are not interested in
providing strong guarantees for such programs when it may restrict our range of implementation
choices and optimisations, or complicate the model. For an access which is out-of-bounds before a
racing grow łcommitsž, but is in-bounds after, our semantics makes it entirely non-deterministic
whether the access will succeed or trap, independent of the success or failure of other accesses.
Second, this scheme, implemented on real architectures, genuinely exhibits some counter-
intuitive relaxed behaviours. We give a example of this on the ARM architecture, in Fig. 13. The
execution, a previously studied ARM litmus test, was veried using the rmem tool [Gray et al
.
2015], which can explore and visualise the possible relaxed behaviours of program fragments in
various architectures. It can be viewed as an abstraction (for brevity) of the compiled code of Fig. 12,
sucent to depict its memory consistency properties. The
memory.grow
operation is represented
as a store of the new length, guarded by a barrier. Compilation of a real Wasm program would
generate an atomic read-modify-write here, however only the initial barrier and the write are
relevant to the consistency behaviour of this example. Thread 1 represents a bounds check-guarded
unord
load which is racing with the
memory.grow
. After the load of the memory size, there is a
conditional branch to a trap label which will carry out error-handling in the case that the access is
out of bounds. The precise condition of the branch is not relevant to the example’s consistency
behaviour, so we choose
CBZ
for brevity. The rmem tool shows that the ARM memory model allows
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:16 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
the execution depicted, where despite a barrier and control dependency, it is possible for access
d
Another interesting shape is depicted in Fig. 14. This example arises out of the lack of read-read
coherence (CoRR) in the denition of our
unord
accesses (discussed in more detail in Section 5).
Real architectures mostly guarantee the absence of CoRR violations, with a few exceptions [Alglave
et al
.
2015]. This behaviour could also arise out of other implementation schemes, which may be
unable to guarantee that the entire growth is atomic. For example, (
a
) could be an especially large
growth of multiple pages which must be allocated piecemeal, with indexes
j
and
k
falling far apart
from each other.
Production engines implement ecient compilation schemes for bounds checks involving virtual
memory manipulation and trap handlers. These implementations allow Wasm accesses to be
compiled without explicit bounds checks, and instead rely on catching and handling OS/CPU faults
to detect out-of-bounds accesses. This approach is expected by implementers to be at least as strong
as the naïve strategy. However such implementations are dicult to reason about formally, as
discussed in Section 7.1.
a: (memory.grow . . .)b: (i32.store (i32.const j). . .)
c: (i32.store (i32.const k). . .)
j,k
are out-of-
bounds before the growth (
a
), and in-bounds aer). Even if (
b
) successfully executes, it is not guaranteed that
(c) will be in-bounds.
5 AXIOMATIC MEMORY MODEL
As discussed earlier, the top-level intuition for an axiomatic memory model is that the operational
semantics generates a set of events, which is then subject to a consistency predicate that classies
whole executions as either valid or invalid. Fig. 15 and Fig. 16 dene everything that is needed for
our axiomatic memory model. Unlike the C++ semantics, but like the JavaScript one, our semantics
does not introduce undened behaviour.
Time Stamps. The C++ and JavaScript memory models capture the ordering of events by dening
various post-hoc relations over the event set as part of a candidate execution (see Section 2). This
results in an graph structure, where the vertices are memory events, with the operational semantics
xing some edges such as sequenced-before, while some others are non-deterministically picked
in the candidate execution and are later constrained by the axiomatic memory model, such as
We instead chose to adopt a more compact representation based on time stamps, in the style of
the promising semantics [Kang et al
.
2017] or the OCaml memory model [Dolan et al
.
2018]. This is
equivalent to dening an explicit graph, but has the advantage that the operational semantics does
not need to manipulate graph edges. Our time stamps are drawn from an innite set of abstract
objects that is equipped with an a priori partial order (written
hb
, pronounced łhappens beforež)
corresponding to the happens-before relation, as well as a total order (
tot
, pronounced łtotž),
corresponding to a total memory order, such that
hb ⊆ ≺tot
. Our operational semantics (Section 4)
assigns time stamps to events eagerly but non-deterministically. Individual threads keep track of
the time stamp of their last emitted event, and force future same-thread events to be ordered later
by
hb
, mimicking the eect of explicit sequenced-before edges. Otherwise, the time stamp choices
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:17
are constrained by various ordering conditions imposed in the axiomatic memory model. Therefore,
a Wasm candidate execution consists of a set of time stamped events, and is valid if the time stamps
are chosen such that the axiomatic model is satised. In valid executions, all events have distinct
time stamps.
Our denition of
hb
also allows us to avoid the inter-thread synchronisation specication
mechanisms of C++ and JavaScript, which rely on an additional-synchronizes-with relation to
specify the eects of mutexes and other thread-blocking primitives on the axiomatic model [Batty
et al
.
2011]. For example, when thread A executes
notify’
, waking
wait’
reduction rule (Fig. 11, 3rd rule) species that thread B must advance its łlast observedž time stamp
to that of A, meaning that all subsequent events from B will observe previous events from A. In
this capacity, time stamps function as a Lamport clock [Lamport 1978].
Traces. As seen in Section 4, our (global) small-step reduction relation is labelled with (possibly
empty) events. An execution trace can be dened as the set of events generated by the coinductive
closure of the reduction relation, as per the
traces
relation dened in Fig. 16. This relation is satised
by all nite, terminated traces, but also by all innite traces of non-terminating programs. The
consistency predicate for a valid execution can then be dened over these traces.
Because we do not model garbage collection of store objects or terminated threads, the store
s
p
in congurations considered in the xpoint of a trace could become
innitely large. By slight abuse of notation, we take their łgrammarž given in Fig. 3to actually
dene these components as proper mathematical sets.
Auxiliary denitions:
ord(rdol w):=o
ord(wrol w):=o
ord(rmw l w
1w
2):=seqcst
loc(rdol w):=l
loc(wrol w):=l
loc(rmw l w
1w
2):=l
size(rdol wn):=n
size(wrol wn):=n
size(rmw l wn
1wn
2):=n
1w
2):=w
1
write(rdol w):=ϵ
write(wrol w):=w
write(rmw l w
1w
2):=w
2
region(act):=region(loc(act))
region(r):=r
region(r[i]) :=r
oset(act):=oset(loc(act ))
oset(r):=0
oset(r[i]) :=i
range(act):=[oset(act),oset(act)+size(act )[
writing(act):write(act),ϵ
aligned(act):⇔ ∃n.oset(act)=n·size(act)
tearfree(act):ord(act)=seqcst ∨ (aligned(act) ∧ size(act) ≤ 4)
same(act1,act2):region(act1)=region(act2) ∧ range(act1)=range(act2)
overlap(act1,act2):region(act1)=region(act 2) ∧ range(act1) ∩ range(act2),
ev1ev2:time(ev1) ≺ time(ev2)
time((act)h):=h
accessr((act)h):=act i {act }={act act|region(act)=r}
fr(ev):=f(accessr(ev))
fr(Ev):={ev Ev |fr(ev)}
Fig. 15. Axiomatic memory model, auxiliary definitions
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:18 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
Terminal conguration: pterm ::=v|trap |E[(wait’ l q)]
conf term ::=s;p
term
Can synchronise with:
syncr(ev1,ev2):ordr(ev1)=ordr(ev2)=seqcst samer(ev1,ev2)
Trace:
traces conf term
conf ֒ev conf tr traces conf
tr ev traces conf
Valid trace:
(Note: by construction, hb ⊆≺tot )
r,rtr valid
tr valid
W
rtr valid
ev
W
i<|ev
W|,tr i
W[i]
revRno-tear ev
W
W
evR,evW
evWwritingr(tr)
tr i,k
revRvalue-consistent evW
tr k
revRhb-consistent evWtr revRsc-last-visible ev
W
tr i
k=osetr(evR)+i=osetr(evW)+j
tr i,k
revRvalue-consistent evW
¬(evRhb evW)
syncr(evW,evR) ⇒ evWhb evR
ev
Wwritingr(tr),
evWhb ev
Whb evRk<ranger(ev
W)
tr k
revRhb-consistent evW
ev
Wwritingr(tr),evWhb evR
evWtot ev
Wtot evRsyncr(evW,evR) ¬ syncr(ev
W,evR)
evWhb ev
Wtot evR ¬ syncr(ev
W,evR) (†)
evWtot ev
Whb evR ¬ syncr(evW,ev
W) (‡)
tr revRsc-last-visible evW
tearfreer(evR) ⇒ | {evWev
W|samer(evR,evW) ∧ tearfreer(evW)}| ≤ 1
revRno-tear ev
W
Fig. 16. Axiomatic memory model
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:19
Validity. The
valid
predicate encodes the conditions under which a trace is considered a valid
execution. Our top-level quantication over regions,
r
, captures not only that
related
events must access the same memory, but also that they must also access the same eld. For regular
memories, the two possible elds are
data
, representing the values of memory locations, and
len
,
representing the memory’s current length. Analogously, tables have an
elem
eld representing
the array of reference values and
len
for their current lengths. Globals only have a
value
eld,
representing the current value they hold. All these elds are handled uniformly by the semantics.
Our per-region validity predicate species, via
events must be associated with an appropriate list of write events, with each write event in the
list being the source of one store value (i.e., an individual byte in the case of memories). The
predicate enforces the well-formedness condition that a read event of
n
store
values must be associated with a list of precisely nwrite events.
The
predicate’s sub-conditions are named so as to be analogous to the prose
JavaScript model we present in Section 2.2. The
value-consistent
relation enforces some basic
well-formedness over the indexes of read-write pairs (i.e., a store value can only be read from the
same index it was written). The
hb-consistent
predicate captures a core part of the model; the
treatment of
hb
as a strong ordering, which means that all read-write pairs must be consistent
with it. Observations made by individual
unord
hb
, and therefore confer
no ordering guarantees to the rest of the program. By contrast, a
seqcst
guarantees (synchronisation, implying
hb
) if and only if it reads from a
seqcst
write of equal range.
If this condition is not fullled, such a read-write pair is treated identically to the
unord
case. In
JavaScript, synchronization is recorded with an explicit synchronizes-with relation. We inline this
as the condition
syncr(evR,evW) ⇒ evWhb evR
. The
sync
predicate expresses the circumstances
under which a pair of accesses may restrict which writes are observable in the rest of the program.
Note that this predicate requires both accesses to be
seqcst
and access the same location range
(discussed below).
sync
sc-last-visible
. This ensures that
same-range
seqcst
events will act in a sequentially consistent way (see Section 6), and requires
these events to respect the weaker
tot
ordering. Finally, the
no-tear
predicate replicates the
no-tear
condition of JavaScript.
Note that the memory model (like JavaScript), has no notion of coherence, an explicit same-
location ordering which appears in some other memory models. Wasm
unord
accesses do not enjoy
any coherence guarantees, and coherence for seqcst accesses is subsumed by sc-last-visible.
Mixed-size Behaviour. As discussed, accesses in Wasm are to ranges of bytes rather than to
discrete locations. Several ordering guarantees of the model require that accesses have identical
ranges.
The
no-tear
rule only applies to same-range accesses. For example, an 8-byte tear-free write may
still appear to tear when observed by a 4-byte tear-free read.
Atomic
seqcst
accesses also only provide stronger ordering guarantees when interacting with
other same-range accesses. Otherwise, they are essentially identical to
unord
accesses. For example,
as discussed, same-range
seqcst
hb
ordering. However, mixed-size pairs will not synchronize. Moreover,
sc-last-visible
allows
mixed-size seqcst accesses to interact in a way that does not respect tot .
One rule that constrains even mixed-size accesses is
hb-consistent
. The model treats
hb
as a
strong ordering, and even mixed-size read-write pairs must respect it. This means that mixed-size
accesses can still be productively used, given appropriate synchronization.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:20 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
As explored in recent work [Flur et al
.
2017], mixed size guarantees are under-explored, and
surprisingly weak on hardware, so Wasm, like JavaScript, picks a maximally weak (but dened)
semantics. However, as discussed by [Flur et al
.
2017], some hyper-optimised low-level data struc-
tures rely on mixed-size consistency guarantees which our model does not currently provide. As
our formal understanding of mixed-size accesses grows, it should become possible for us to give
more guarantees.
Thin-Air Behaviour. Unlike C++, every well-typed Wasm program has a well-dened semantics.
Racing non-atomics will not trigger undened behaviour in the C++ sense. However, the dened
semantics in this case is very weak, to the point that it is still recommended for Wasm programs to
be race-free. For example, Wasm non-atomics are specied weakly enough to exhibit out-of-thin-
air behaviours when racing [Batty et al
.
2015]. These behaviours are known to impair modular
reasoning about program properties [Batty et al
.
2013;Ševčík 2011]. However, unlike relaxed
atomics, the C++ primitive that results in out-of-thin-air executions, it is reasonable to expect
that a Wasm program should contain no data races on non-atomics, since the source program it
was compiled from will disallow this. At least for Wasm code generated from C++, a data race
will trigger undened behaviour at the source-level, meaning that all Wasm programs generated
from well-dened C++ should already be data-race-free. Moreover, we guarantee that all observed
references have actually been allocated previously (Fig. 11), while C++ is ambiguous as to the
thin-air behaviour of pointers.
Even in the case that a data race does occur, such races are łbounded in spacež by the Wasm
semantics. This property, coined by Dolan et al
.
[2018], states (informally) that a data race cannot
aect the results of computations involving only unrelated locations. This property is not true of
the C/C++11 model, because a data race results in undened behaviour, but is true of our model.
ev tr evtr
rev data-race-with ev
tr data-race
¬syncr(ev,ev)
rev race-with ev
rev data-race-with ev
¬(ev hb evev hb ev)
writingr(ev) ∨ writingr(ev)overlapr(ev,ev)
rev race-with ev
W
tr is-seqcst
|ev
(tr i
i
W
evWtot evRevWwritingl(tr)
tr i,k
revRvalue-consistent evW
ev
Wwritingr(tr),
evWtot ev
Wtot evRk<ranger(ev
W)
tr i
Fig. 17. Formulation of the SC-DRF property
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:21
6 SEQUENTIAL CONSISTENCY OF DATA-RACE-FREE PROGRAMS
Sequential Consistency of Data-Race-Free programs (SC-DRF) is considered by many to be the
desirable correctness property for a relaxed memory model [Adve and Hill 1990;Boehm and Adve
2008;Gharachorloo et al
.
1992;Lahav et al
.
2017]. A data-race-free program is one which does not
have two non-atomic accesses, at least one of which is a write, in a race condition on the same
memory location. SC-DRF guarantees that a program lacking such races will exhibit sequentially
consistent behaviour, in the sense that the program will appear to execute as a naïve sequential
interleaving of the operations of each thread, regardless of how weakly-specied its non-atomics
are.
6.1 Wasm is SC-DRF
The axiomatic model presented in Fig. 16 is SC-DRF:
Proposition 6.1 (Wasm is SC-DRF).
(⊢ tr valid) ∧ ¬(⊢ tr data-race)=⇒ ⊢ tr is-seqcst
A proof of this property can be found in the supplemental materials. The auxiliary relations
data-race
and
is-seqcst
are dened in Fig. 17. An execution is dened to have a data race if two
events not related by
hb
, at least one of which is a write, touch the same memory location with a
non-
seqcst
consistency (denoted by the condition
¬syncr(ev,ev))
. Note that this denition relies
on the memory model to dene what a data race is, in common with the SC-DRF proof of [Batty
et al
.
2012]. More recent work [Batty et al
.
2015] advocates for a denition of data-race-freedom
which is model-agnostic, and therefore simpler for programmers to reason about. We use the
łmodel-internalž approach here, because it is the approach taken by the JavaScript specication in
stating their SC-DRF guarantees [ECMA International 2018a] and, as discussed below, it is useful
for us to be able to directly contrast the guarantees of the two models.
This denition considers
seqcst
accesses that overlap but do not have equal ranges as racy. This
is consistent with the axiomatic model, which eectively degrades such accesses to
unord
. The
is-seqcst
condition requires that every read must observe the most recent write in the total order
tot.
6.2 JavaScript is not SC-DRF
The ocial specication for JavaScript claims that its relaxed memory model guarantees SC-
DRF [ECMA International 2018a]. However, we have identied two reasons why this is not the
case.
As previously discussed in Section 2, the JavaScript model suers from the SC-DRF violations
previously identied in a draft version of the C++ model by Batty [Batty 2014;Batty et al
.
2011].
Moreover, merely adapting the C++ model’s strengthening alone is not sucient to enforce SC-
DRF. The JavaScript model is additionally vulnerable to a novel counter-example which cannot
be expressed in the formal C++ model, since it relies on an
unord
a
seqcst
write. Fig. 18 shows such a counter-example; a JavaScript program that is data-race-
free, but not sequentially consistent. While both atomic writes are guaranteed to occur before
x[0]
, no sequential interleaving can explain the fact that both reads are
allowed to take dierent values. We have conrmed the validity of this execution using the EMME
tool [Mattarei et al
.
2018], a model checker for the (uncorrected) JavaScript memory model. To
the best of our knowledge, this execution is not observable on any real hardware because of the
coherence guarantees between two same-location atomic writes, which force the second thread
to observe them as totally ordered. We have vered this in rmem for x86 and ARM, based on the
compilation schemes laid out in Section 7.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:22 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
store(x, 0, 0x1);
store(y, 0, 0x1);
store(x, 0, 0x2)
if (load(y, 0) == 0x1) {
print(x[0]); // = 0x2
print(x[0]); // = 0x1
}
Fig. 18. A data-race-free JavaScript program that is not sequentially consistent; {load/store} abbreviates
Atomics.{load, store}, the thick line represents synchronizes-with, which ensures that no data race occurs
6.3 Contrasting Wasm and JavaScript
Because Wasm and JavaScript are required to interoperate extensively on the Web, we must address
how their memory models can be aligned. If the two conditions highlighted in Fig. 16, marked
(†)
and
(‡)
, are removed from the Wasm model, this model becomes a superset of the uncorrected
JavaScript one in the following sense:
Proposition 6.2. Taking a model without
(†)
and
(‡)
, the data accesses of a Wasm program with
no out-of-bounds
trap
errors will exhibit the same consistency behaviours as a JavaScript program
carrying out equivalent accesses on a shared array buer.
However, such a model is clearly not SC-DRF. We have engaged with ECMA TC39, the standards
body for JavaScript, about the possibility of amending the JavaScript model to include (†) and (‡)
as a strict strengthening of the existing model. The correctness of the
(†)
been extensively investigated for C++ [Batty 2014;Batty et al
.
2012,2011], and we believe that
the
(‡)
condition, as its dual, is also supported by current compilation schemes, given that real
hardware disallows our counter-example. The standards body has provisionally agreed to accept
our proposed changes in a future edition of the standard, and we continue to investigate more
formal guarantees of their correctness.
Experimental Validation. We have implemented an SMT-based litmus checking tool for the Wasm
memory model, with and without
(†)
and
(‡)
. The tool accepts small fragments of Wasm code
writen in an abstracted syntax, and computes and visualizes all valid executions. We expect that it
will be useful in communicating the model to implementers and users, and can be used as an oracle
for future testing of implementations.
We also implement a front-end that allows our tool to accept litmus tests written in (a subset of)
4
the syntax used by the EMME tool. This allows us to experimentally validate both Proposition 6.2
and our tool, by running the test in both tools, and checking that both tools generate the same set
of visible behaviours for each litmus.
The EMME implementers provide a number of hand-written litmus tests. We observe that
Proposition 6.2 holds for all 21 tests that our parser currently supports.
7 COMPILATION
In this section we discuss Wasm compilation both in its capacity as a łsourcež language compiled
to hardware machine code by a Wasm engine (consumers), and as a target language for existing
low-level languages like C/C++ and their compilers (producers). We motivate the correctness of
Wasm to platform assembly schemes to the best of our ability, and describe several outstanding
problems in the wider eld of low-level relaxed memory research which will need to be solved in
order to fully formalise a correctness proof. We show that compilation of C/C++ accesses to Wasm
is correct as a direct consequence of our SC-DRF result.
4
The EMME litmus syntax allows written bytes to be represented by integer or oat literals. For now, we only support the
integer syntax. Additionally, we do not support JavaScript-level constructs such as for loops.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:23
Wasm instruction JS operation x86 ARMv7 AArch64
t.store v[k] = n MOV str STR
t.store.atomic Atomics.store XCHG dmb ish; str; dmb ish STLR
t.rmw.cmpxchg Atomics.compareExchange [Sewell and Sevcik 2016]
Fig. 19. Compilation of Wasm and JS memory accesses to selected platforms; atomics compilation follows
C/C++11 SC atomics [Sewell and Sevcik 2016]; read-modify-write operations are generally not supported
directly by platform instructions, and must be compiled to loops, which are omied here for space.
7.1 Compiling Wasm to Hardware
The expected mapping of Wasm (and JavaScript) memory accesses is given in Fig. 19. These
compilation schemes are identical to C/C++11 [Sewell and Sevcik 2016]. Accesses to globals and
tables can be compiled in a similar manner.
7.1.1 Mixed-size Access. Unfortunately, given the current state of the art in mixed-size concurrency
research, it is dicult to fully investigate the correctness of these compilation schemes. Almost all
research into platform assembly relaxed memory models concerns only a non-mixed-size fragment.
No mixed-size axiomatic model exists for any architecture; this is a substantial open problem.
To the best of our knowledge, the only existing formal work on the correctness of a mixed-size
compilation scheme is [Flur et al
.
2017]. This work presents mixed-size operational models for
ARMv8 and Power, and sketches a mixed-size generalisation of a previous proof from non-mixed-
size C/C++11 to Power [Batty et al
.
2012;Sarkar et al
.
2012]. Mixed-size C11 is less general than our
model, as it only allows non-atomics to be mixed-size. Our model allows atomics to be mixed-size,
although such accesses do not provide the same guarantees as non-mixed-size atomics (for example,
mixed-size atomics are not related by
sync
in our model). Flur et al
.
[2017] state that creating an
abstract axiomatic model, suitable for more involved proofs, is important future work.
We can still give some limited intuition regarding the correctness of the scheme, as a guide for
future proof. Considering a non-mixed-size fragment of our model (i.e. no overlapping accesses), our
unord
accesses share a compilation scheme not only with C/C++ non-atomics, but also with C/C++
relaxed atomics, which are expected to be stronger, and our
seqcst
accesses share a compilation
scheme with C/C++ sequentially consistent atomics, both of which must respect a total order. We
expect that for such a fragment of Wasm, the correctness of our compilation scheme should be
provable following a similar strategy to existing proofs of the correctness of a (non-mixed-size)
C/C++ scheme.
Going beyond this fragment, our model’s mixed-size accesses have a very weak behaviour, with
mis-aligned accessses eectively treated by our
no-tear
rule as being decomposed into independant
byte accesses. This ts the architectural models proposed by [Flur et al. 2017], where mis-aligned
architectural loads and stores are treated as being decomposed in the same way. The main remaining
concern is aligned, but mixed-size accesses; for example, 32-bit and 64-bit accesses to the same
location. The architectural models of [Flur et al
.
2017] guarantee that such accesses experience a
form of coherence, but there are some edge-cases that warrant further investigation. Our model
deliberately chooses a behaviour here that we expect to be far weaker than the behaviour of
real architectures; mixed-size
seqcst
atomic accesses are eectively treated like
unord
non-atomic
accesses for the purpose of
sc-last-visible
. This means two overlapping mixed-size accesses are
not subject to coherence guarantees under any circumstances. There is room to strengthen this
guarantee, but it would need to be motivated by additional investigation into the precise guarantees
of mixed-size architectural models.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:24 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
7.1.2 Bounds Checks. The discussion above has focussed purely on correctly compiling in-bounds
accesses. We must also deal with the relaxed behaviour of access bounds checks. As previously
discussed, our model supports an implementation where bounds checks are compiled as explicit non-
atomic reads. In this case, compilation of bounds checks may be treated identically to compilation
of data accesses, as the bounds check will be compiled as a bare architectural load (Fig. 19) followed
by a conditional branch to code which handles the
trap
result. The correctness of this compilation
scheme would therefore follow from the correctness of the compilation scheme for data accesses.
To the best of our knowledge, there is no existing concurrency-aware research that is capable of
facilitating the verication of the more ecient łtrap handlerž implementations, since they rely on
the concurrent (relaxed) semantics of memory protection behaviour in both the OS and the under-
lying architecture. However implementers are committed to ensuring that these implementations
are at least as strong as the naïve strategy.
7.1.3 Wait/Wake. The
wait/wake
operations are not directly compiled as platform assembly, but
are implemented using OS system calls. This is in common with other languages with these features
such as Java, which guarantees similar synchronization. To the best of our knowledge, the formal
correctness of these mappings has not been investigated in any language, but at least on Linux,
suspending and waking a thread are documented as implying several strong barriers [Howells et al
.
2019] which we expect to be sucient to support our Wasm-level synchronization. Again, existing
literature does not explore the relaxed behaviour of OS calls, which would be necessary for formal
proof.
7.2 Compiling C/C++ to Wasm
The expected mapping of C/C++11 accesses to Wasm memory accesses is given in Fig. 20. (Thread-
local storage will be compiled to Wasm globals, but we omit that case here.) The correctness of
this mapping can be justied straightforwardly as follows. First, note that this mapping eectively
treats weaker C/C++ atomic accesses (e.g. release/acquire, relaxed) as sequentially consistent
(
memory_order_seq_cst
). Batty [Batty 2014] shows that C/C++ programs made up of such accesses
are SC-DRF, and moreover that they admit all sequentially consistent executions. Since all valid
C/C++ programs must be data-race-free, the compiled Wasm program will also be data-race-
free, assuming that atomic locations are allocated to disjoint, aligned portions of the Wasm heap.
Therefore, by our SC-DRF result (Section 6), the compiled Wasm program must have only SC
executions, which must therefore be valid executions of the original C/C++ program.
Of course, this sketch only justies a correctness result between the axiomatic parts of the C/C++
and Wasm memory access semantics. Full correctness of compilation relies on many operational
aspects, such giving a scheme for C/C++ memory allocation (
malloc/new
) to be correctly imple-
mented in Wasm. This is orthogonal to the relaxed memory model, and therefore not approached
by this work. It should be noted that, given a correct implementation of C/C++ memory allocation
in Wasm, the accesses of valid programs will always be in-bounds.
C/C++11 operation Wasm instruction
Store (non-atomic) t.store
Cmpxchg (any consistency) t.rmw.cmpxchg
Fig. 20. Compilation of C/C++11 accesses to Wasm memory access (for appropriate Wasm type t)
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:25
Treating weaker C/C++ atomics as strongly as sequentially consistent atomics when compiling
to Wasm loses some eciency, as it implies additional barriers when the resulting Wasm is further
compiled to platform assembly, compared to directly compiling the original C/C++. This strength-
ening has been accepted practice when compiling C/C++ to the Web platform since at least 2015,
through Emscripten and asm.js [Herman et al
.
2014;Jylänki 2015]. Compiler optimisations prior to
the nal Wasm generation may still take advantage of weaker consistency modes, but it is true
that this approach, while semantically simpler, leaves some performance on the table. The Wasm
Working Group is open to the possibility of adding weaker consistency modes to the language in
the future, which would improve on this. Such an extended model would require a more involved
proof of correctness for C/C++ compilation.
Similarly, higher-level languages such as Java [Cenciarelli et al
.
2007] and Multicore OCaml [Dolan
et al
.
2018] dene stronger relaxed memory models, so that their basic accesses are not supported by
Wasm’s
unord
consistency mode. In general, features for explicitly supporting ecient compilation
of higher-level languages to Wasm are at an early stage of standardisation (e.g. the Garbage
Collection proposal [Rossberg 2019]), and we expect that the specication of ecient consistency
modes for this use-case will occur on a similar timescale.
8 RELATED WORK
Our memory model follows the existing denitional presentations of the axiomatic relaxed memory
models of Java [Cenciarelli et al
.
2007;Manson et al
.
2005] and C++ [Batty et al
.
2011;Boehm and
Adve 2008]. These existing works, and those that build atop them, are limited by the fact that
the wider normative specications that they are embedded within are not formal, meaning that
a signicant part of subsequent work involves dening an appropriate formal specication for
the concurrent operational semantics and motivating its correctness. This was the case with the
JinjaThreads project [Lochbihler 2018], which was the result of Java formalisation work spanning
over fteen years. Because the Wasm operational semantics is fully formal, all denitional work is
already incorporated into the normative specication, paving the way for mature formal analyses
of the memory model in future work.
Recent work criticising the state of the art in axiomatic memory models has focused on the
semantics of race conditions and out-of-thin-air. It is a well-known result that current axiomatic
models must choose between admitting out-of-thin-air executions, and requiring a less ecient
compilation scheme [Batty et al
.
2015;Ševčík and Aspinall 2008]. The models of high-level languages
such as OCaml [Dolan et al
.
2018] and Java [Manson et al
.
2005] have the freedom to choose a less
relaxed semantics, as they are not chasing bare-metal performance. Lower-level languages such
as C/C++ [Batty et al
.
2015] admit out-of-thin-air executions in order to compile their weakest
primitives to bare loads and stores [Batty et al
.
2012;Sewell and Sevcik 2016;Vafeiadis et al
.
2015].
Our model must be pragmatic in this regard, allowing out-of-thin-air executions for racing non-
atomics. Due to the low-level nature of Wasm, implementers expect to compile its non-atomics to
bare loads and stores. At the very least, we do not make racy non-atomics an undened behaviour
in the style of C/C++, and a program without racing non-atomics will not admit thin-air executions
as a consequence of our SC-DRF result.
We have begun to see a new generation of models with the explicit aim of disallowing out-of-
thin-air while preserving ecient compilation schemes [Kang et al
.
2017;Pichon-Pharabod and
Sewell 2016;Podkopaev et al
.
2018]. As these models become more mature, it may be possible to
use aspects of them to disallow our out-of-thin-air executions.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:26 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
9 FUTURE WORK AND CONCLUSION
Our formal semantics for Wasm extended with shared memory concurrency anticipates future
extensions such as shared globals, tables, and references. To achieve maximum generality and avoid
preempting future design choices, this semantics supports all consistency modes for all stateful
objects. We expect that concrete proposals to incorporate these features into Wasm in the future
will make more specic choices, i.e., only support sequentially consistent access to tables.
We leave space within the operational semantics for additional consistency modes to be intro-
duced. We expect that this will be necessary, for example, to eciently support the compilation of
programs that make use of so-called łlow-level atomicsž [Boehm and Adve 2008]. Moreover, future
features such as memory protection may have to be integrated into the model.
The research trajectory of each language’s relaxed memory model follows a predictable pattern.
It is often a signicant eort to even represent the memory model formally [Boehm and Adve 2008;
Manson et al
.
2005], let alone integrate it with the language’s existing semantics. Even then, it will
often be many more years of collaborative research eort before mature tooling and mechanised
proofs over the model can be developed [Batty et al. 2011;Lochbihler 2018].
We believe that our presentation of the WebAssembly memory model lays a rm foundation for
this further work. We present a fully mathematised specication of not only the axiomatic model,
but the operational semantics, a signicant improvement on the foundational presentations of
the Java and C++ models [Boehm and Adve 2008;Manson et al. 2005]. Previous work has already
mechanised the core of WebAssembly’s sequential semantics [Watt 2018], and we expect that our
work here will be the basis of future mechanisation of WebAssembly concurrency.
ACKNOWLEDGMENTS
We thank Shu-yu Guo, Lars T Hansen, and Peter Sewell for their valuable feedback. We thank
the members of the WebAssembly Community Group, the WebAssembly Working Group, and
ECMA TC39 for useful discussions. This work was partly supported by the EPSRC Programme
Grant REMS: Rigorous Engineering for Mainstream Systems (EP/K008528/1). The rst author was
supported by an EPSRC DTP award (EP/N509620/1), and a Google PhD Fellowship in Programming
Technology and Software Engineering.
REFERENCES
Sarita V. Adve and Mark D. Hill. 1990. Weak Ordering Ð a New Denition. In Proceedings of the 17th Annual International
Symposium on Computer Architecture (ISCA ’90). ACM, New York, NY, USA, 2ś14. https://doi.org/10.1145/325164.325100
Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen,
and John Wickerson. 2015. GPU Concurrency: Weak Behaviours and Programming Assumptions. In Proceedings of the
Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
’15). ACM, New York, NY, USA, 577ś591. https://doi.org/10.1145/2694344.2694391
Jade Alglave, Anthony Fox, Samin Ishtiaq, Magnus O. Myreen, Susmit Sarkar, Peter Sewell, and Francesco Zappa Nardelli.
2009. The Semantics of Power and ARM Multiprocessor Machine Code. In Proceedings of the 4th Workshop on Declarative
Aspects of Multicore Programming. ACM, New York, NY, USA. https://doi.org/10.1145/1481839.1481842
Mark Batty. 2014. The C11 and C++11 Concurrency Model. Ph.D. Dissertation. University of Cambridge.
Mark Batty, Mike Dodds, and Alexey Gotsman. 2013. Library Abstraction for C/C++ Concurrency. SIGPLAN Not. 48, 1 (Jan.
2013), 235ś248. https://doi.org/10.1145/2480359.2429099
Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon-Pharabod, and Peter Sewell. 2015. The Problem of
Programming Language Concurrency Semantics. In Programming Languages and Systems, Jan Vitek (Ed.). Springer Berlin
Heidelberg, Berlin, Heidelberg, 283ś307.
Mark Batty, Kayvan Memarian, Scott Owens, Susmit Sarkar, and Peter Sewell. 2012. Clarifying and Compiling C/C++
Concurrency: from C++11 to POWER. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of
Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ Concurrency. In
Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11).
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
Weakening WebAssembly 133:27
ACM, New York, NY, USA, 55ś66. https://doi.org/10.1145/1926385.1926394
Mark Batty and Peter Sewell. 2014. The Thin-air Problem. https://www.cl.cam.ac.uk/~pes20/cpp/notes42.html.
Hans-J. Boehm. 2005. Threads Cannot Be Implemented As a Library. In Proceedings of the 2005 ACM SIGPLAN Conference on
Programming Language Design and Implementation (PLDI ’05). ACM, New York, NY, USA, 261ś268. https://doi.org/10.
1145/1065010.1065042
Hans-J. Boehm. 2011. How to Miscompile Programs with "Benign" Data Races. In Proceedings of the 3rd USENIX Conference
on Hot Topic in Parallelism (HotPar’11). USENIX Association, Berkeley, CA, USA, 3ś3. http://dl.acm.org/citation.cfm?id=
2001252.2001255
Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ Concurrency Memory Model. In Proceedings of the 29th
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). ACM, New York, NY, USA,
68ś78. https://doi.org/10.1145/1375581.1375591
Hans-J. Boehm and Brian Demsky. 2014. Outlawing Ghosts: Avoiding Out-of-thin-air Results. In Proceedings of the
Workshop on Memory Systems Performance and Correctness (MSPC ’14). ACM, New York, NY, USA, Article 7, 6 pages.
https://doi.org/10.1145/2618128.2618134
Pietro Cenciarelli, Alexander Knapp, and Eleonora Sibilio. 2007. The Java Memory Model: Operationally, Denotationally,
Axiomatically. In Proceedings of the 16th European Symposium on Programming (ESOP’07). Springer-Verlag, Berlin,
Heidelberg, 331ś346. http://dl.acm.org/citation.cfm?id=1762174.1762206
Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy. 2018. Bounding Data Races in Space and Time. In Proceedings
of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New
York, NY, USA, 242ś255. https://doi.org/10.1145/3192366.3192421
ECMA International. 2018a. ECMAScript 2018 Language Specication - Data Race Freedom. https://www.ecma-
international.org/ecma-262/9.0/index.html#sec- data-race-freedom.
ECMA International. 2018b. ECMAScript 2018 Language Specication - Memory Model. https://www.ecma-international.
org/ecma-262/9.0/index.html#sec- memory-model.
Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell.
2016. Modelling the ARMv8 architecture, operationally: concurrency and ISA. In Proceedings of the 43rd Annual ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages, St. Petersburg, FL, USA. 608ś621.
Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis, Luc Maranget, Kathryn E. Gray, Ali Sezgin, Mark Batty,
and Peter Sewell. 2017. Mixed-size Concurrency: ARM, POWER, C/C++11, and SC. SIGPLAN Not. 52, 1 (Jan. 2017),
429ś442. https://doi.org/10.1145/3093333.3009839
Kourosh Gharachorloo, Sarita V. Adve, Anoop Gupta, John L. Hennessy, and Mark D. Hill. 1992. Programming for Dierent
Memory Consistency Models. J. Parallel and Distrib. Comput. 15 (1992), 399ś407.
Kathryn E. Gray, Gabriel Kerneis, Dominic Mulligan, Christopher Pulte, Susmit Sarkar, and Peter Sewell. 2015. An Integrated
Concurrency and core-ISA Architectural Envelope Denition, and Test Oracle, for IBM POWER Multiprocessors. In
Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 635ś646.
https://doi.org/10.1145/2830772.2830775
Andreas Haas, Andreas Rossberg, Derek Schu, Ben Titzer, Dan Gohman, Luke Wagner, Alon Zakai, JF Bastien, and Michael
Holman. 2017. Bringing the Web up to Speed with WebAssembly. In Principles of Programming Languages (POPL).
Lars T Hansen. 2017. Resizing details / underspecication. https://github.com/WebAssembly/threads/issues/26.
David Herman, Luke Wagner, and Alon Zakai. 2014. asm.js.http://asmjs.org/spec/latest.
Lisa Higham, LillAnne Jackson, and Jalal Kawash. 2006. Programmer-centric Conditions for Itanium Memory Consistency.
In Proceedings of the 8th International Conference on Distributed Computing and Networking (ICDCN’06). Springer-Verlag,
58ś69. https://doi.org/10.1007/11947950_7
David Howells, Paul E. McKenney, Will Deacon, and Peter Zijlstra. 2019. Linux Kernel Memory Barriers. https://www.
kernel.org/doc/Documentation/memory-barriers.txt.
discuss/gQQRjajQ6iY.
Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017. A Promising Semantics for Relaxed-
memory Concurrency. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages
(POPL 2017). ACM, New York, NY, USA, 175ś189. https://doi.org/10.1145/3009837.3009850
Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing Sequential Consistency in
C/C++11. SIGPLAN Not. 52, 6 (June 2017), 618ś632. https://doi.org/10.1145/3140587.3062352
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21, 7 (July 1978),
558ś565. https://doi.org/10.1145/359545.359563
Andreas Lochbihler. 2018. Mechanising a Type-Safe Model of Multithreaded Java with a Veried Compiler. Journal of
Automated Reasoning 61, 1 (01 Jun 2018), 243ś332. https://doi.org/10.1007/s10817-018- 9452-x
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
133:28 Conrad Wa, Andreas Rossberg, and Jean Pichon-Pharabod
Sela Mador-Haim, Luc Maranget, Susmit Sarkar, Kayvan Memarian, Jade Alglave,Scott O wens, Rajeev Alur, Milo M. K. Martin,
Peter Sewell, and Derek Williams. 2012. An Axiomatic Memory Model for POWER Multiprocessors. In Proceedings of the
24th International Conference on Computer Aided Verication. 495ś512. https://doi.org/10.1007/978-3-642-31424-7_36
Jeremy Manson, William Pugh, and Sarita V. Adve. 2005. The Java Memory Model. In Proceedings of the 32Nd ACM
SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’05). ACM, New York, NY, USA, 378ś391.
https://doi.org/10.1145/1040305.1040336
Cristian Mattarei, Clark Barrett, Shu-yu Guo, Bradley Nelson, and Ben Smith. 2018. EMME: A Formal Tool for ECMAScript
Memory Model Evaluation. In Tools and Algorithms for the Construction and Analysis of Systems, Dirk Beyer and Marieke
Huisman (Eds.). Springer International Publishing, Cham, 55ś71.
Paul E. McKenney, Alan Jerey, and Ali Sezgin. 2005. N4375: Out-of-Thin-Air Execution is Vacuous. C++ Standards
Committee Papers (2005).
Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. 2016. An operational semantics for C/C++11 concurrency. In
Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and
Applications. ACM, New York, NY, USA, 18. https://doi.org/10.1145/2983990.2983997
Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A better x86 memory model: x86-TSO. In Proceedings of Theorem Proving
in Higher Order Logics, LNCS 5674. 391ś407. https://doi.org/10.1007/978-3-642-03359-9_27
Jean Pichon-Pharabod and Peter Sewell. 2016. A concurrency semantics for relaxed atomics that permits optimisation
and avoids thin-air executions. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages, St. Petersburg, FL, USA, January 20 - 22, 2016. 622ś633. https://doi.org/10.1145/2837614.2837616
Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2018. Bridging the Gap between Programming Languages and Hardware
Weak Memory Models. Technical Report. https://doi.org/10.1145/3290382 arXiv:arXiv:1807.07892
Andreas Rossberg. 2018. Reference Types Proposal for WebAssembly. https://github.com/WebAssembly/reference-types.
Andreas Rossberg. 2019. GC Extension. https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md.
Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams.
2012. Synchronising C/C++ and POWER. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language
Design and Implementation (PLDI ’12). ACM, New York, NY, USA, 311ś322. https://doi.org/10.1145/2254064.2254102
Peter Sewell and Jaroslav Sevcik. 2016. C/C++11 mappings to processors. https://www.cl.cam.ac.uk/~pes20/cpp/
cpp0xmappings.html.
Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common
Compiler Optimisations Are Invalid in the C11 Memory Model and What We Can Do About It. In Proceedings of the
42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). ACM, New York,
NY, USA, 209ś220. https://doi.org/10.1145/2676726.2676995
Jaroslav Ševčík. 2011. Safe Optimisations for Shared-memory Concurrent Programs. In Proceedings of the 32Nd ACM
SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA,
306ś316. https://doi.org/10.1145/1993498.1993534
Jaroslav Ševčík and David Aspinall. 2008. On Validity of Program Transformations in the Java Memory Model. In Proceedings
of the 22nd European Conference on Object-Oriented Programming (ECOOP ’08). Springer-Verlag, Berlin, Heidelberg, 27ś51.
https://doi.org/10.1007/978-3-540-70592- 5_3
Conrad Watt. 2018. Mechanising and Verifying the WebAssembly Specication. In Proceedings of the 7th ACM SIGPLAN
International Conference on Certied Programs and Proofs (CPP 2018). ACM, New York, NY, USA, 53ś65. https://doi.org/
10.1145/3167082
WebAssembly Working Group. 2019. WebAssembly Specications. https://webassembly.github.io/spec/.
Proc. ACM Program. Lang., Vol. 3, No. OOPSLA, Article 133. Publication date: October 2019.
... The chapter draws from two previously published papers: Weakening WebAssembly (OOPSLA 2019) [13,14], authored by myself, Jean Pichon-Pharabod, and Andreas Rossberg, and Repairing and Mechanising the JavaScript Relaxed Memory Model (PLDI 2020) [15], authored by myself, Christopher Pulte, Anton Podkopaev, Guillaume Barbier, Stephen Dolan, Shaked Flur, Jean Pichon-Pharabod, and Shu-yu Guo. ...
... This work was carried out in collaboration with a number of other academics and industry figures, as detailed below. The chapter draws from two previously published papers [13,15]. ...
... As mentioned, JavaScript's relaxed memory model is based on a fragment of the C++11 containing non-atomics and sequentially consistent atomics [131,13]. Differences between the models will be highlighted as appropriate. ...
Thesis
WebAssembly is the first new programming language to be supported natively by all major Web browsers since JavaScript. It is designed to be a natural low-level compilation target for languages such as C, C++, and Rust, enabling programs written in these languages to be compiled and executed efficiently on the Web. WebAssembly’s specification is managed by the W3C WebAssembly Working Group (made up of representatives from a number of major tech companies). Uniquely, the language is specified by way of a full pen-and-paper formal semantics. This thesis describes a number of ways in which I have both helped to shape the specification of WebAssembly, and built upon it. By mechanising the WebAssembly formal semantics in Isabelle/HOL while it was being drafted, I discovered a number of errors in the specification, drove the adoption of official corrections, and provided the first type soundness proof for the corrected language. This thesis also details a verified type checker and interpreter, and a security type system extension for cryptography primitives, all of which have been mechanised as extensions of my initial WebAssembly mechanisation. A major component of the thesis is my work on the specification of shared memory concurrency in Web languages: correcting and verifying properties of JavaScript’s existing relaxed memory model, and defining the WebAssembly-specific extensions to the corrected model which have been adopted as the basis of WebAssembly’s official threads specification. A number of deficiencies in the original JavaScript model are detailed. Some errors have been corrected, with the verified fixes officially adopted into subsequent editions of the language specification. However one discovered deficiency is fundamental to the model, an instance of the well-known "thin-air problem". My work demonstrates the value of formalisation and mechanisation in industrial programming language design, not only in discovering and correcting specification errors, but also in building confidence both in the correctness of the language’s design and in the design of proposed extensions.
... Wasm supports correct compilation of various applications, and provides precise specification of the memory model. [3] Based on related research references, it has been explained that the development of a web-based academic information system is important to implement WebAssembly technology that can maintain system security when interaction with other devices is required flexibly, using a more careful design, as well as a simple and easy-todevelop interface. ...
... Wasm is a low-level language whose instructions are used to compile hardware directly, focusing on single-threaded computing, with the capability of atomic instructions for synchronized access on the principle of Shared Memory. [3] shows the binary code compiled or converted into a WASM file and downloaded by javascript to display web pages through the browser engine. So far, Web Browsers use JavaScript to run code and activate functions on websites, but they often have problems. ...
Article
Full-text available
... Several works have aimed at improving the security of WebAssembly [4], [9], [13], [14], [15]. CT-wasm [9] proposes a type system to check the constant-time policy. ...
Preprint
Full-text available
This paper explores the use of relational symbolic execution to counter timing side channels in WebAssembly programs. We design and implement Vivienne, an open-source tool to automatically analyze WebAssembly cryptographic libraries for constant-time violations. Our approach features various optimizations that leverage the structure of WebAssembly and automated theorem provers, including support for loops via relational invariants. We evaluate Vivienne on 57 real-world cryptographic implementations, including a previously unverified implementation of the HACL* library in WebAssembly. The results indicate that Vivienne is a practical solution for constant-time analysis of cryptographic libraries in WebAssembly.
Article
Program logics and semantics tell a pleasant story about sequential composition: when executing (S1;S2), we first execute S1 then S2. To improve performance, however, processors execute instructions out of order, and compilers reorder programs even more dramatically. By design, single-threaded systems cannot observe these reorderings; however, multiple-threaded systems can, making the story considerably less pleasant. A formal attempt to understand the resulting mess is known as a “relaxed memory model.” Prior models either fail to address sequential composition directly, or overly restrict processors and compilers, or permit nonsense thin-air behaviors which are unobservable in practice. To support sequential composition while targeting modern hardware, we enrich the standard event-based approach with preconditions and families of predicate transformers. When calculating the meaning of (S1; S2), the predicate transformer applied to the precondition of an event e from S2 is chosen based on the set of events in S1 upon which e depends. We apply this approach to two existing memory models.
Article
Software sandboxing or software-based fault isolation (SFI) is a lightweight approach to building secure systems out of untrusted components. Mozilla, for example, uses SFI to harden the Firefox browser by sandboxing third-party libraries, and companies like Fastly and Cloudflare use SFI to safely co-locate untrusted tenants on their edge clouds. While there have been significant efforts to optimize and verify SFI enforcement, context switching in SFI systems remains largely unexplored: almost all SFI systems use heavyweight transitions that are not only error-prone but incur significant performance overhead from saving, clearing, and restoring registers when context switching. We identify a set of zero-cost conditions that characterize when sandboxed code has sufficient structured to guarantee security via lightweight zero-cost transitions (simple function calls). We modify the Lucet Wasm compiler and its runtime to use zero-cost transitions, eliminating the undue performance tax on systems that rely on Lucet for sandboxing (e.g., we speed up image and font rendering in Firefox by up to 29.7% and 10% respectively). To remove the Lucet compiler and its correct implementation of the Wasm specification from the trusted computing base, we (1) develop a static binary verifier , VeriZero, which (in seconds) checks that binaries produced by Lucet satisfy our zero-cost conditions, and (2) prove the soundness of VeriZero by developing a logical relation that captures when a compiled Wasm function is semantically well-behaved with respect to our zero-cost conditions. Finally, we show that our model is useful beyond Wasm by describing a new, purpose-built SFI system, SegmentZero32, that uses x86 segmentation and LLVM with mostly off-the-shelf passes to enforce our zero-cost conditions; our prototype performs on-par with the state-of-the-art Native Client SFI system.
Preprint
Full-text available
As the volume of data that needs to be processed continues to increase, we also see renewed interests in near-data processing in the form of computational storage, with eBPF (extended Berkeley Packet Filter) being proposed as a vehicle for computation offloading. However, discussions in this regard have so far ignored viable alternatives, and no convincing analysis has been provided. As such, we qualitatively and quantitatively evaluated eBPF against WebAssembly, a seemingly similar technology, in the context of computation offloading. This report presents our findings.
Article
Liveness properties, such as termination, of even the simplest shared-memory concurrent programs under sequential consistency typically require some fairness assumptions about the scheduler. Under weak memory models, we observe that the standard notions of thread fairness are insufficient, and an additional fairness property, which we call memory fairness, is needed. In this paper, we propose a uniform definition for memory fairness that can be integrated into any declarative memory model enforcing acyclicity of the union of the program order and the reads-from relation. For the well-known models, SC, x86-TSO, RA, and StrongCOH, that have equivalent operational and declarative presentations, we show that our declarative memory fairness condition is equivalent to an intuitive model-specific operational notion of memory fairness, which requires the memory system to fairly execute its internal propagation steps. Our fairness condition preserves the correctness of local transformations and the compilation scheme from RC11 to x86-TSO, and also enables the first formal proofs of termination of mutual exclusion lock implementations under declarative weak memory models.
Chapter
Full-text available
Article
Full-text available
We develop a new intermediate weak memory model, IMM, as a way of modularizing the proofs of correctness of compilation from concurrent programming languages with weak memory consistency semantics to mainstream multi-core architectures, such as POWER and ARM. We use IMM to prove the correctness of compilation from the promising semantics of Kang et al. to POWER (thereby correcting and improving their result) and ARMv7, as well as to the recently revised ARMv8 model. Our results are mechanized in Coq, and to the best of our knowledge, these are the first machine-verified compilation correctness results for models that are weaker than x86-TSO.
Article
Full-text available
This article presents JinjaThreads, a unified, type-safe model of multithreaded Java source code and bytecode formalised in the proof assistant Isabelle/HOL. The semantics strictly separates sequential aspects from multithreading features like locks, forks and joins, interrupts, and the wait-notify mechanism. This separation yields an interleaving framework and a notion of deadlocks that are independent of the language, and makes the type safety proofs modular. JinjaThreads’s non-optimising compiler translates source code into bytecode. Its correctness proof guarantees that the generated bytecode exhibits exactly the same observable behaviours as the source code, even for infinite executions and under the Java memory model. The semantics and the compiler are executable. JinjaThreads builds on and reuses the Java formalisations Jinja, Bali, $$\mu$$Java, and Java$$^{\ell ight}$$ by Nipkow’s group. Being the result of more than fifteen years of studying Java in Isabelle/HOL, it constitutes a large and long-lasting case study. It shows that fairly standard formalisation techniques scale well and highlights the challenges, benefits, and drawbacks of formalisation reuse.
Article
Full-text available
The maturation of the Web platform has given rise to sophisticated and demanding Web applications such as interactive 3D visualization, audio and video software, and games. With that, efficiency and security of code on the Web has become more important than ever. Yet JavaScript as the only built-in language of the Web is not well-equipped to meet these requirements, especially as a compilation target. Engineers from the four major browser vendors have risen to the challenge and collaboratively designed a portable low-level bytecode called WebAssembly. It offers compact representation, efficient validation and compilation, and safe low to no-overhead execution. Rather than committing to a specific programming model, WebAssembly is an abstraction over modern hardware, making it language-, hardware-, and platform-independent, with use cases beyond just the Web. WebAssembly has been designed with a formal semantics from the start. We describe the motivation, design and formal semantics of WebAssembly and provide some preliminary experience with implementations.
Article
When constructing complex concurrent systems, abstraction is vital: programmers should be able to reason about concurrent libraries in terms of abstract specifications that hide the implementation details. Relaxed memory models present substantial challenges in this respect, as libraries need not provide sequentially consistent abstractions: to avoid unnecessary synchronisation, they may allow clients to observe relaxed memory effects, and library specifications must capture these. In