A comprehensive study of Convergent and Commutative Replicated Data Types
ABSTRACT Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are adhoc and errorprone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both \add and \remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement nontrivial CRDTs.

Article: File system on CRDT
[Show abstract] [Hide abstract]
ABSTRACT: In this report we show how to manage a distributed hierarchical structure representing a file system. This structure is optimistically replicated, each user work on his local replica, and updates are sent to other replica. The different replicas eventually observe same view of file systems. At this stage, conflicts between updates are very common. We claim that conflict resolution should rely as little as possible on users. In this report we propose a simple and modular solution to resolve these problems and maintain data consistency.07/2012;  SourceAvailable from: Pascal Urso[Show abstract] [Hide abstract]
ABSTRACT: Collaborative working is increasingly popular, but it presents challenges due to the need for high responsiveness and disconnected work support. To address these challenges the data is optimistically replicated at the edges of the network, i.e. personal computers or mobile devices. This replication requires a merge mechanism that preserves the consistency and structure of the shared data subject to concurrent modifications. In this paper, we propose a generic design to ensure eventual consistency (every replica will eventually view the same data) and to maintain the specific constraints of the replicated data. Our layered design provides to the application engineer the complete control over system scalability and behavior of the replicated data in face of concurrent modifications. We show that our design allows replication of complex data types with acceptable performances.12/2012;  SourceAvailable from: export.arxiv.org[Show abstract] [Hide abstract]
ABSTRACT: To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items. In this work, we consider the problem of providing Highly Available Transactions (HATs): transactional guarantees that do not suffer unavailability during system partitions or incur high network latency. We introduce a taxonomy of highly available systems and analyze existing ACID isolation and distributed data consistency guarantees to identify which can and cannot be achieved in HAT systems. This unifies the literature on weak transactional isolation, replica consistency, and highly available systems. We analytically and experimentally quantify the availability and performance benefits of HATsoften two to three orders of magnitude over widearea networksand discuss their necessary semantic compromises.02/2013;
Page 1
apport ?
?
de recherche?
ISSN 02496399
ISRN INRIA/RR7506FR+ENG
Thème COM
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
A comprehensive study of
Convergent and Commutative Replicated Data Types
Marc Shapiro, INRIA & LIP6, Paris, France
Nuno Preguiça, CITI, Universidade Nova de Lisboa, Portugal
Carlos Baquero, Universidade do Minho, Portugal
Marek Zawirski, INRIA & UPMC, Paris, France
N° 7506
Janvier 2011
inria00555588, version 1  13 Jan 2011
Page 2
inria00555588, version 1  13 Jan 2011
Page 3
Unité de recherche INRIA Rocquencourt
Domaine de Voluceau, Rocquencourt, BP 105, 78153 Le Chesnay Cedex (France)
Téléphone : +33 1 39 63 55 11 — Télécopie : +33 1 39 63 53 30
A comprehensive study of
Convergent and Commutative Replicated Data Types∗
Marc Shapiro, INRIA & LIP6, Paris, France
Nuno Preguiça, CITI, Universidade Nova de Lisboa, Portugal
Carlos Baquero, Universidade do Minho, Portugal
Marek Zawirski, INRIA & UPMC, Paris, France
Thème COM — Systèmes communicants
Projet Regal
Rapport de recherche n° 7506 — Janvier 2011 — 47 pages
Abstract:
object converge without foreground synchronisation. Previous approaches to eventual con
sistency are adhoc and errorprone. We study a principled approach: to base the design of
shared data types on some simple formal conditions that are sufficient to guarantee even
tual consistency. We call these types Convergent or Commutative Replicated Data Types
(CRDTs). This paper formalises asynchronous object replication, either state based or op
eration based, and provides a sufficient condition appropriate for each case. It describes
several useful CRDTs, including container data types supporting both add and remove op
erations with clean semantics, and more complex types such as graphs, montonic DAGs,
and sequences. It discusses some properties needed to implement nontrivial CRDTs.
Eventual consistency aims to ensure that replicas of some mutable shared
Keywords:
Data replication, optimistic replication, commutative operations
∗This research was supported in part by ANR project ConcoRDanT (ANR10BLAN 0208), and a
Google Research Award 2009. Marek Zawirski is a recipient of the Google Europe Fellowship in Distributed
Computing, and this research is supported in part by this Google Fellowship. Carlos Baquero is partially
supported by FCT project Castor (PTDC/EIAEIA/104022/2008).
inria00555588, version 1  13 Jan 2011
Page 4
Étude approfondie des types de données répliqués
convergents et commutatifs
Résumé :
La cohérence à terme vise à assurer que les répliques d’un objet partagé
modifiable convergent sans synchronisation à priori. Les approches antérieures du problème
sont adhoc et sujettes à erreur. Nous proposons une approche basée sur des principes
formels: baser la conception des types de données sur des propriétés mathématiques simples,
suffisantes pour garantir la cohérence à terme. Nous appelons ces types de données des
CRDT (Convergent/Commutative Replicated Data Types). Ce papier fournit formalise la
réplication asynchrone, qu’elle soit basée sur l’état ou sur les opérations, et fournit une
condition suffisante adaptée à chacun de ces cas. Il décrit plusieurs CRDT utiles, dont des
contenants permettant les opérations add et remove avec une sémantique propre, et des
types de données plus complexes comme les graphes, les graphes acycliques monotones, et
les séquences. Il contient une discussion de propriétés dont on a besoin pour mettre en
œuvre des CRDT non triviaux.
Motsclés :
Réplication des données, réplication optimiste, opérations commutatives
inria00555588, version 1  13 Jan 2011
Page 5
A comprehensive study of CRDTs
3
1 Introduction
Replication is a fundamental concept of distributed systems, well studied by the distributed
algorithms community. Much work focuses on maintaining a global total order of operations
[24] even in the presence of faults [8].However, the associated serialisation bottleneck
negatively impacts performance and scalability, while the CAP theorem [13] imposes a trade
off between consistency and partitiontolerance.
An alternative approach, eventual consistency or optimistic replication, is attractive to
practioners [37, 41]. A replica may execute an operation without synchronising a priori with
other replicas. The operation is sent asynchronously to other replicas; every replica even
tually applies all updates, possibly in different orders. A background consensus algorithm
reconciles any conflicting updates [4, 40]. This approach ensures that data remains available
despite network partitions. It performs well (as the consensus bottleneck has been moved
off the critical path), and the weaker consistency is considered acceptable for some classes
of applications. However, reconciliation is generally complex. There is little theoretical
guidance on how to design a correct optimistic system, and adhoc approaches have proven
brittle and errorprone.1
In this paper, we study a simple, theoretically sound approach to eventual consistency.
We propose the concept of a convergent or commutative replicated data type (CRDT), for
which some simple mathematical properties ensure eventual consistency. A trivial example
of a CRDT is a replicated counter, which converges because the increment and decrement
operations commute (assuming no overflow). Provably, replicas of any CRDT converge
to a common state that is equivalent to some correct sequential execution. As a CRDT
requires no synchronisation, an update executes immediately, unaffected by network latency,
faults, or disconnection. It is extremely scalable and is faulttolerant, and does not require
much mechanism. Application areas may include computation in delaytolerant networks,
latency tolerance in widearea networks, disconnected operation, churntolerant peertopeer
computing, data aggregation, and partitiontolerant cloud computing.
Since, by design, a CRDT does not use consensus, the approach has strong limitations;
nonetheless, some interesting and nontrivial CRDTs are known to exist. For instance, we
previously published Treedoc, a sequence CRDT designed for cooperative text editing [32].
Previously, only a handful of CRDTs were known. The objective of this paper is to push
the envelope, studying the principles of CRDTs, and presenting a comprehensive portfolio of
useful CRDT designs, including variations on registers, counters, sets, graphs, and sequences.
We expect them to be of interest to practitioners and theoreticians alike.
Some of our designs suffer from unbounded growth; collecting the garbage requires a
weak form of synchronisation [25]. However, its liveness is not essential, as it is an optimi
sation, off the critical path, and not in the public interface. In the future, we plan to extend
the approach to data types where commoncase, timecritical operations are commutative,
1The anomalies of the Amazon Shopping Cart are a wellknown example [10].
RR n° 7506
inria00555588, version 1  13 Jan 2011
Page 6
4
Shapiro, Preguiça, Baquero, Zawirski
and rare operations require synchronisation but can be delayed to periods when the network
is well connected. This concurs with Brewer’s suggestion for sidestepping the CAP impos
sibility [6]. It is also similar to the shopping cart design of Alvaro et al. [1], where updates
commute, but checkout requires coordination. However, this extension is out of the scope
of the present study.
In the literature, the preferred consistency criterion is linearisability [18]. However,
linearisability requires consensus in general. Therefore, we settle for the much weaker qui
escent consistency [17, Section 3.3]. One challenge is to minimise “anomalies,” i.e., states
that would not be observed in a sequential execution. Note also that CRDTs are weaker
than nonblocking constructs, which are generally based on a hardware consensus primitive
[17].
Some of the ideas presented here paper are already known in the folklore. The contri
butions of this paper include:
• In Section 2: (i) An specification language suited to asynchronous replication. (ii) A
formalisation of statebased and operationbased replication. (iii) Two sufficient con
ditions for eventual consistency.
• In Section 3, an comprehensive collection of useful data type designs, starting with
counters and registers. We focus on container types (sets and maps) supporting both
add and remove operations with clean semantics, and more complex derived types,
such as graphs, monotonic DAGs, and sequence.
• In Section 4, a study of the problem of garbagecollecting metadata.
• In Section 5, exercising some of our CRDTs in a practical example, the shopping cart.
• A comparison with previous work, in Section 6.
Section 7 concludes with a summary of lessons learned, and perspectives for future work.
2 Background and system model
We consider a distributed system consisting of processes interconnected by an asynchronous
network. The network can partition and recover, and nodes can operate in disconnected
mode for some time. A process may crash and recover; its memory survives crashes. We
assume nonbyzantine behaviour.
2.1Atoms and objects
A process may store atoms and objects. An atom is a base immutable data type, identified
by its literal content. Atoms can be copied between processes; atoms are equal if they have
the same content. Atom types considered in this paper include integers, strings, sets, tuples,
INRIA
inria00555588, version 1  13 Jan 2011
Page 7
A comprehensive study of CRDTs
5
x3
123
99
3.14159
x3
x1
x2
x
Figure 1: Object
a
A
b
c
add (a)
add (b)
add (c)
add (b)
Figure 2:
GSet
Growonly Set:
A
R
a
b
c
add (a)
add (b)
remove (a)
add (c)
add (b)
add (a)
Figure 3: 2PSet
etc., with their usual nonmutating operations. Atom types are written in lower case, e.g.,
“set.”
An object is a mutable, replicated data type. Object types are capitalised, e.g., “Set.”
An object has an identity, a content (called its payload), which may be any number of atoms
or objects, an initial state, and an interface consisting of operations. Two objects having
the same identity but located in different processes are called replicas of one another. As
an example, Figure 1 depicts a logical object x, its replicas at processes 1,2 and 3, and the
current state of the payload of replica 3.
We assume that objects are independent and do not consider transactions. Therefore,
without loss of generality, we focus on a single object at a time, and use the words process
and replica interchangeably.
2.2Operations
The environment consists of unspecified clients that query and modify object state by calling
operations in its interface, against a replica of their choice called the source replica. A query
executes locally, i.e., entirely at one replica. An update has two phases: first, the client calls
the operation at the source, which may perform some initial processing. Then, the update
is transmitted asynchronously to all replicas; this is the downstream part. The literature
[37] distinguishes the statebased and operationbased (opbased for short) styles, explained
next.
RR n° 7506
inria00555588, version 1  13 Jan 2011
Page 8
6
Shapiro, Preguiça, Baquero, Zawirski
Specification 1 Outline of a statebased object specification. Preconditions, arguments, return
values and statements are optional.
1: payload Payload type; instantiated at all replicas
2:
initial Initial value
3: query Query (arguments) : returns
4:
pre Precondition
5:
let Evaluate synchronously, no side effects
6: update Sourcelocal operation (arguments) : returns
7:
pre Precondition
8:
let Evaluate at source, synchronously
9:
Sideeffects at source to execute synchronously
10: compare (value1, value2) : boolean b
11:
Is value1 ≤ value2 in semilattice?
12: merge (value1, value2) : payload mergedValue
13:
LUB merge of value1 and value2, at any replica
M
mergemerge
M
g(x2)
S
S
source
f(x1)
x3
x1
x2
x
merge
M
Figure 4: Statebased replication
014
144
4
4
4
M
max max
M
x2 := 4
G+A
G+A
x1 := 1
x3
x1
x2
x
max
M
0
0
0
4
4
4
4
4
Figure 5: Example CvRDT: integer + max
INRIA
inria00555588, version 1  13 Jan 2011
Page 9
A comprehensive study of CRDTs
7
2.2.1Statebased replication
In statebased (or passive) replication, an update occurs entirely at the source, then propa
gates by transmitting the modified payload between replicas, as illustrated in Figure 4.
We specify statebased object types as shown in Specification 1. Keyword payload indi
cates the payload type, and initial specifies its initial value at every replica. Keyword update
indicates an update operation, and query a query. Both may have (optional) arguments
and return values. Nonmutating statements are marked let, and payload is mutated by
assignment :=. An operation executes atomically.
To capture safety, an operation is enabled only if a given source precondition (marked
pre in a specification) holds in the source’s current state. The source precondition is omitted
if always enabled, e.g., incrementing or decrementing a Counter. Conversely, nonnull pre
conditions may be necessary, for instance an element can be removed from a Set only if it
is in the Set at the source.
The system transmits state between arbitrary pairs of replicas, in order to propagate
changes. This updates the payload of the receiver with the output of operation merge,
invoked with two arguments, the local payload state and the received state. Operation
compare compares replica states, as will be explained shortly.
We define the causal history [38] C of replicas of some object x as follows:2
Definition 2.1 (Causal History — statebased). For any replica xiof x:
• Initially, C(xi) = ∅.
• After executing update operation f, C(f(xi)) = C(xi) ∪ {f}.
• After executing merge against states xi,xj, C(merge(xi,xj)) = C(xi) ∪ C(xj).
The classical happensbefore [24] relation between operations can be defined as f →
g ⇔ C(f) ⊂ C(g).
Liveness requires that any update eventually reaches the causal history of every replica.
To this effect, we assume an underlying system that transmits states between pairs of replicas
at unspecified times, infinitely often, and that replica communication forms a connected
graph.
2.2.2Operationbased (opbased) objects
In operationbased (or active) replication, the system transmits operations, as illustrated in
Figure 6. This style is specified as outlined in Spec. 2. The payload and initial clauses are
identical to the statebased specifications. An operation that does not mutate the state is
marked query and executes entirely at a single replica.
An update is specified by keyword update. Its first phase, marked atSource, is local to
the source replica. It is enabled only if its (optional) source precondition, marked pre, is
2C is a logical function, it is not part of the object.
RR n° 7506
inria00555588, version 1  13 Jan 2011
Page 10
8
Shapiro, Preguiça, Baquero, Zawirski
Specification 2 Outline of operationbased object specification. Preconditions, return values
and statements are optional.
1: payload Payload type; instantiated at all replicas
2:
initial Initial value
3: query Sourcelocal operation (arguments) : returns
4:
pre Precondition
5:
let Execute at source, synchronously, no side effects
6: update Global update (arguments) : returns
7:
atSource (arguments) : returns
8:
pre Precondition at source
9:
let 1st phase: synchronous, at source, no side effects
10:
downstream (arguments passed downstream)
11:
pre Precondition against downstream state
12:
2nd phase, asynchronous, sideeffects to downstream state
f(x1)
g(x2)
D
f(x3)
g(x1)
D
g(x3)
D
S
S
D
f(x2)
x3
x1
x2
x
Figure 6: OperationBased Replication
INRIA
inria00555588, version 1  13 Jan 2011
Page 11
A comprehensive study of CRDTs
9
true in the source state; it executes atomically. It takes its arguments from the operation
invocation; it is not allowed to make side effects; it may compute results, returned to the
caller, and/or prepare arguments for the second phase.
The second phase, marked downstream, executes after the sourcelocal phase; immedi
ately at the source, and asynchronously, at all other replicas; it can not return results. It
executes only if its downstream precondition is true. It updates the downstream state; its
arguments are those prepared by the sourcelocal phase. It executes atomically.
As above, we define the causal history of a replica C(xi).
Definition 2.2 (Causal History — opbased). The causal history of a replica xiis defined
as follows.
• Initially, C(xi) = ∅.
• After executing the downstream phase of operation f at replica xi, C(f(xi)) = C(xi) ∪
{f}.
Liveness requires that every update eventually reaches the causal history of every replica.
To this effect, we assume an underlying system reliable broadcast that delivers every update
to every replica in an order <d(called delivery order) where the downstream precondition
is true.
As in the statebased case, the happensbefore relation between operations is defined
by f → g ⇔ C(f) ⊂ C(g). We define causal delivery <→ as follows: if f → g then f is
delivered before g is delivered. We note that all downstream preconditions in this paper
are satisfied by causal delivery, i.e., delivery order is the same or weaker as causal order:
f <dg ⇒ f <→g.
2.3 Convergence
We now formalise convergence.
Definition 2.3 (Eventual Convergence). Two replicas xi and xj of an object x converge
eventually if the following conditions are met:
• Safety: ∀i,j : C(xi) = C(xj) implies that the abstract states of i and j are equivalent.
• Liveness: ∀i,j : f ∈ C(xi) implies that, eventually, f ∈ C(xj).
Furthermore, we define state equivalence as follows: xiand xjhave equivalent abstract
state if all query operations return the same values.
Pairwise eventual convergence implies that any nonempty subset of replicas of the object
converge, as long as all replicas receive all updates.
RR n° 7506
inria00555588, version 1  13 Jan 2011
Page 12
10
Shapiro, Preguiça, Baquero, Zawirski
2.3.1Statebased CRDT: Convergent Replicated Data Type (CvRDT)
A join semilattice [9] (or just semilattice hereafter) is a partial order ≤v equipped with a
least upper bound (LUB) ?v, defined as follows:
Definition 2.4 (Least Upper Bound (LUB)). m = x?vy is a Least Upper Bound of {x,y}
under ≤viff x ≤vm and y ≤vm and there is no m?≤vm such that x ≤vm?and y ≤vm?.
It follows from the definition that ?vis: commutative: x ?vy =vy ?vx; idempotent:
x ?vx =vx; and associative: (x ?vy) ?vz =vx ?v(y ?vz).
Definition 2.5 (Join Semilattice). An ordered set (S,≤v) is a Join Semilattice iff ∀x,y ∈
S,x ?vy exists.
A statebased object whose payload takes its values in a semilattice, and where merge(x,y)
x?vy, converges towards the LUB of the initial and updated values. If, furthermore, updates
monotonically advance upwards according to ≤v(i.e., the payload value after an update is
greater than or equal to the one before), then it converges towards the LUB of the most
recent values. Let us call this combination “monotonic semilattice.”
A type with these properties will be called a Convergent Replicated Data Type or
CvRDT. We require that, in a CvRDT, compare(x,y) to return x ≤v y, that abstract
states be equivalent if x ≤v y ∧ y ≤v x, and merge be always enabled. As an example,
Figure 5 illustrates a CvRDT with integer payload, where ≤v is integer order, and where
merge() = max().
Eventual convergence requires that all replicas receive all updates. The communication
channels of a CvRDT may have very weak properties. Since merge is idempotent and com
mutative (by the properties of ?v), messages may be lost, received out of order, or multiple
times, as long as new state eventually reaches all replicas, either directly or indirectly via
successive merges. Updates are propagated reliably even if the network partitions, as long
as eventually connectivity is restored.
Proposition 2.1. Any two object replicas of a CvRDT eventually converge, assuming the
system transmits payload infinitely often between pairs of replicas over eventuallyreliable
pointtopoint channels.
def
=
def
Proof. Any two replicas xi,xj will converge, as long as they can exchange states by some
(direct or indirect) channel that eventually delivers, by merging their states. Since CvRDT
values form a monotonic semilattice, merge is always enabled, and one can make x?
merge(xi,xj) and x?
in x?
x?
i:=
j:= merge(xj,xi). By Definition 2.1, we have the same causal history
j, since C(xi)∪C(xj) = C(xj)∪C(xi). Finally we have equivalent abstract states
jsince, by commutativity of LUB, xi?vxj=vxj?vxi.
iand x?
i=vx?
INRIA
inria00555588, version 1  13 Jan 2011
Page 13
A comprehensive study of CRDTs
11
2.3.2Operationbased CRDT: Commutative Replicated Data Type (CmRDT)
In an opbased object, a reliable broadcast channel guarantees that all updates are delivered
at every replica, in the delivery order <dspecified by the data type. Operations not ordered
by <dare said concurrent; formally f ?dg ⇔ f ?<dg ∧ g ?<df. If all concurrent operations
commute, then all execution orders consistent with delivery order are equivalent, and all
replicas converge to the same state. Such an object is called a Commutative Replicated
Data Type (CmRDT).
As noted earlier, for all data types studied here, causal delivery <→(which is readily
implementable in static distributed systems and does not require consensus) satisfies delivery
order <d. For some data types, a weaker ordering suffices, but then more pairs of operations
need to be proved commutative.
Definition 2.6 (Commutativity). Operations f and g commute, iff for any reachable replica
state S where their source precondition is enabled, the source precondition of f (resp. g)
remains enabled in state S · g (resp. S · f), and S · f · g and S · f · g are equivalent abstract
states.
Proposition 2.2. Any two replicas of a CmRDT eventually converge under reliable broad
cast channels that deliver operations in delivery order <d.
Proof. Consider object replicas xi,xj. Under the channel assumptions, eventually the two
replicas will deliver the same operations (if no new operations are generated), and we have
C(xi) = C(xj). For any two operations f,g in C(xi): (1) if they are not causally related then
they are concurrent under <d(that is never stronger than causality) and must commute;
(2) if they are causally related a → b but are not ordered in delivery order <dthey must
also commute; (3) if they are causaly related a → b and delivered in that order a <db then
they are applied in that same order everyware. In all cases an equivalent abstract state is
reached in all replicas.
Recall that reliable causal delivery does not require agreement. It is immune to parti
tioning, in the sense that replicas in a connected subset can deliver each other’s updates,
and that updates are eventually delivered to all replicas. As delivery order is never stricter
than causal delivery, a fortiori this is true of all CmRDTs.
2.4Relation between the two approaches
We have shows two approaches to eventual convergence, CvRDTs and CmRDTS, which
together we call CRDTs. There are similarities and differences between the two.
Statebased mechanisms (CvRDTs) are simple to reason about, since all necessary in
formation is captured by the state. They require weak channel assumptions, allowing for
unknown numbers of replicas. However, sending state may be inefficient for large objects;
this can be tackled by shipping deltas, but this requires mechanisms similar to the opbased
RR n° 7506
inria00555588, version 1  13 Jan 2011
Page 14
12
Shapiro, Preguiça, Baquero, Zawirski
Specification 3 Operationbased emulation of statebased object
1: payload Statebased S
2:
initial Initial payload
3: update Statebasedupdate (operation f, args a) : state s
4:
atSource (f,a) : s
5:
pre S.f.precondition(a)
6:
let s = S.f(a)
7:
downstream (s)
8:
S := merge(S,s)
? S: Emulated statebased object
? Compute state applying f to S
approach. Historically, the statebased approach is used in file systems such as NFS, AFS
[19], Coda [22], and in keyvalue stores such as Dynamo [10] and Riak.
Specifying operationbased objects (CmRDTs) can be more complex since it requires
reasoning about history, but conversely they have greater expressive power. The payload
can be simpler since some state is effectively offloaded to the channel. Opbased replication
is more demanding of the channel, since it requires reliable broadcast, which in general
requires tracking group membership. Historically, opbased approaches have been used in
cooperative systems such as Bayou [31], Rover [21] IceCube [33], Telex [4].
2.4.1 Operationbased emulation of a statebased object
Interestingly, it is always possible to emulate a statebased object using the operationbased
approach, and viceversa.3
In Spec. 3 we show operationbased emulation of a statebased object (taking some
liberties with notation). Ignoring queries (which pose no problems), the emulating operation
based object has a single update that computes some statebased update (after checking for
its precondition) and performs merge downstream. The downstream precondition is empty
because merge must be enabled in any reachable state. The emulation does not make use of
compare.
Note that if the base object is a CvRDT, then merge operations commute, and the
emulated object is a CmRDT.
2.4.2 Statebased emulation of an operationbased object
Statebased emulation of an operationbased object essentially formalises the mechanics of
an epidemic reliable broadcast, as shown in Spec. 4 (taking some liberties with notation).
Again, we ignore queries, which pose no problems. Calling an operationbased update adds
it to a set of M messages to be delivered; merge takes the union of the two message sets.
3Contrary to what Helland says [16], because he only considers readwrite state, not a merge operation.
INRIA
inria00555588, version 1  13 Jan 2011
Page 15
A comprehensive study of CRDTs
13
Specification 4 Statebased emulation of operationbased object
1: payload Operationbased P, set M, set D
2:
initial Initial state of payload, ∅,∅
3: update opbasedupdate (update f, args a) : returns
4:
pre P.f.atSource.pre(a)
5:
let returns = P.f.atSource(a)
6:
let u = unique()
7:
M := M ∪ {(f,a,u)}
8:
deliver()
9: update deliver ()
10:
for (f,a,u) ∈ (M \ D) : f.downstream.pre(a) do
11:
P := P.f.downstream(a)
12:
D := D ∪ {(f,a,u)}
13: compare (R, R?) : boolean b
14:
let b = R.M ≤ R?.M ∨ R.D ≤ R?.D
15: merge (R, R?) : payload R??
16:
let R??.M = R.M ∪ R?.M
17:
R??.deliver()
? Payload of emulated object, messages, delivered
? Check atsource precondition
? Perform atsource computation
? Send unique operation
? Deliver to local opbased object
? Apply downstream update to replica
? Remember delivery
? Deliver pending enabled updates
Specification 5 opbased Counter
1: payload integer i
2:
initial 0
3: query value () : integer j
4:
let j = i
5: update increment ()
6:
downstream ()
7:
i := i + 1
8: update decrement ()
9:
downstream ()
10:
i := i − 1
? No precond: delivery order is empty
? No precond: delivery order is empty
When an update’s downstream precondition is true, the corresponding message is delivered
by executing the downstream part of the update. In order to avoid duplicate deliveries,
delivered messages are stored in a set D.
Note that the states of the emulating object form a monotonic semilattice. Calling or
delivering an operation adds it to the relevant message set, and therefore advances the state
in the partial order. merge is defined to take the union of the M sets, and is thus a LUB
operation. Remark that M is identical to the causal history of the replica; nonconcurrent
updates appear in M in causal order. If the emulated opbased object is a CmRDT, then
delivery order is satisfied. Concurrent operations appear in M in any order; if the emulated
object is a CmRDT, they commute. Therefore, after two replicas merge mutually, their D
sets are identical and their P payloads have equivalent state.
RR n° 7506
inria00555588, version 1  13 Jan 2011