A comprehensive study of Convergent and Commutative Replicated Data Types
ABSTRACT Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both \add and \remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement non-trivial CRDTs.
-
Citations (0)
- Cited In (1)
-
Article: Controlled conflict resolution for replicated document
[show abstract] [hide abstract]
ABSTRACT: Collaborative working is increasingly popular, but it presents challenges due to the need for high responsiveness and disconnected work support. To address these challenges the data is optimistically replicated at the edges of the network, i.e. personal computers or mobile devices. This replication requires a merge mechanism that preserves the consistency and structure of the shared data subject to concurrent modifications. In this paper, we propose a generic design to ensure eventual consistency (every replica will eventually view the same data) and to maintain the specific constraints of the replicated data. Our layered design provides to the application engineer the complete control over system scalability and behavior of the replicated data in face of concurrent modifications. We show that our design allows replication of complex data types with acceptable performances.12/2012;
Page 1
apport ?
?
de recherche?
ISSN 0249-6399
ISRN INRIA/RR--7506--FR+ENG
Thème COM
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
A comprehensive study of
Convergent and Commutative Replicated Data Types
Marc Shapiro, INRIA & LIP6, Paris, France
Nuno Preguiça, CITI, Universidade Nova de Lisboa, Portugal
Carlos Baquero, Universidade do Minho, Portugal
Marek Zawirski, INRIA & UPMC, Paris, France
N° 7506
Janvier 2011
inria-00555588, version 1 - 13 Jan 2011
Page 2
inria-00555588, version 1 - 13 Jan 2011
Page 3
Unité de recherche INRIA Rocquencourt
Domaine de Voluceau, Rocquencourt, BP 105, 78153 Le Chesnay Cedex (France)
Téléphone : +33 1 39 63 55 11 — Télécopie : +33 1 39 63 53 30
A comprehensive study of
Convergent and Commutative Replicated Data Types∗
Marc Shapiro, INRIA & LIP6, Paris, France
Nuno Preguiça, CITI, Universidade Nova de Lisboa, Portugal
Carlos Baquero, Universidade do Minho, Portugal
Marek Zawirski, INRIA & UPMC, Paris, France
Thème COM — Systèmes communicants
Projet Regal
Rapport de recherche n° 7506 — Janvier 2011 — 47 pages
Abstract:
object converge without foreground synchronisation. Previous approaches to eventual con-
sistency are ad-hoc and error-prone. We study a principled approach: to base the design of
shared data types on some simple formal conditions that are sufficient to guarantee even-
tual consistency. We call these types Convergent or Commutative Replicated Data Types
(CRDTs). This paper formalises asynchronous object replication, either state based or op-
eration based, and provides a sufficient condition appropriate for each case. It describes
several useful CRDTs, including container data types supporting both add and remove op-
erations with clean semantics, and more complex types such as graphs, montonic DAGs,
and sequences. It discusses some properties needed to implement non-trivial CRDTs.
Eventual consistency aims to ensure that replicas of some mutable shared
Key-words:
Data replication, optimistic replication, commutative operations
∗This research was supported in part by ANR project ConcoRDanT (ANR-10-BLAN 0208), and a
Google Research Award 2009. Marek Zawirski is a recipient of the Google Europe Fellowship in Distributed
Computing, and this research is supported in part by this Google Fellowship. Carlos Baquero is partially
supported by FCT project Castor (PTDC/EIA-EIA/104022/2008).
inria-00555588, version 1 - 13 Jan 2011
Page 4
Étude approfondie des types de données répliqués
convergents et commutatifs
Résumé :
La cohérence à terme vise à assurer que les répliques d’un objet partagé
modifiable convergent sans synchronisation à priori. Les approches antérieures du problème
sont ad-hoc et sujettes à erreur. Nous proposons une approche basée sur des principes
formels: baser la conception des types de données sur des propriétés mathématiques simples,
suffisantes pour garantir la cohérence à terme. Nous appelons ces types de données des
CRDT (Convergent/Commutative Replicated Data Types). Ce papier fournit formalise la
réplication asynchrone, qu’elle soit basée sur l’état ou sur les opérations, et fournit une
condition suffisante adaptée à chacun de ces cas. Il décrit plusieurs CRDT utiles, dont des
contenants permettant les opérations add et remove avec une sémantique propre, et des
types de données plus complexes comme les graphes, les graphes acycliques monotones, et
les séquences. Il contient une discussion de propriétés dont on a besoin pour mettre en
œuvre des CRDT non triviaux.
Mots-clés :
Réplication des données, réplication optimiste, opérations commutatives
inria-00555588, version 1 - 13 Jan 2011
Page 5
A comprehensive study of CRDTs
3
1 Introduction
Replication is a fundamental concept of distributed systems, well studied by the distributed
algorithms community. Much work focuses on maintaining a global total order of operations
[24] even in the presence of faults [8].However, the associated serialisation bottleneck
negatively impacts performance and scalability, while the CAP theorem [13] imposes a trade-
off between consistency and partition-tolerance.
An alternative approach, eventual consistency or optimistic replication, is attractive to
practioners [37, 41]. A replica may execute an operation without synchronising a priori with
other replicas. The operation is sent asynchronously to other replicas; every replica even-
tually applies all updates, possibly in different orders. A background consensus algorithm
reconciles any conflicting updates [4, 40]. This approach ensures that data remains available
despite network partitions. It performs well (as the consensus bottleneck has been moved
off the critical path), and the weaker consistency is considered acceptable for some classes
of applications. However, reconciliation is generally complex. There is little theoretical
guidance on how to design a correct optimistic system, and ad-hoc approaches have proven
brittle and error-prone.1
In this paper, we study a simple, theoretically sound approach to eventual consistency.
We propose the concept of a convergent or commutative replicated data type (CRDT), for
which some simple mathematical properties ensure eventual consistency. A trivial example
of a CRDT is a replicated counter, which converges because the increment and decrement
operations commute (assuming no overflow). Provably, replicas of any CRDT converge
to a common state that is equivalent to some correct sequential execution. As a CRDT
requires no synchronisation, an update executes immediately, unaffected by network latency,
faults, or disconnection. It is extremely scalable and is fault-tolerant, and does not require
much mechanism. Application areas may include computation in delay-tolerant networks,
latency tolerance in wide-area networks, disconnected operation, churn-tolerant peer-to-peer
computing, data aggregation, and partition-tolerant cloud computing.
Since, by design, a CRDT does not use consensus, the approach has strong limitations;
nonetheless, some interesting and non-trivial CRDTs are known to exist. For instance, we
previously published Treedoc, a sequence CRDT designed for co-operative text editing [32].
Previously, only a handful of CRDTs were known. The objective of this paper is to push
the envelope, studying the principles of CRDTs, and presenting a comprehensive portfolio of
useful CRDT designs, including variations on registers, counters, sets, graphs, and sequences.
We expect them to be of interest to practitioners and theoreticians alike.
Some of our designs suffer from unbounded growth; collecting the garbage requires a
weak form of synchronisation [25]. However, its liveness is not essential, as it is an optimi-
sation, off the critical path, and not in the public interface. In the future, we plan to extend
the approach to data types where common-case, time-critical operations are commutative,
1The anomalies of the Amazon Shopping Cart are a well-known example [10].
RR n° 7506
inria-00555588, version 1 - 13 Jan 2011
Page 6
4
Shapiro, Preguiça, Baquero, Zawirski
and rare operations require synchronisation but can be delayed to periods when the network
is well connected. This concurs with Brewer’s suggestion for side-stepping the CAP impos-
sibility [6]. It is also similar to the shopping cart design of Alvaro et al. [1], where updates
commute, but check-out requires coordination. However, this extension is out of the scope
of the present study.
In the literature, the preferred consistency criterion is linearisability [18]. However,
linearisability requires consensus in general. Therefore, we settle for the much weaker qui-
escent consistency [17, Section 3.3]. One challenge is to minimise “anomalies,” i.e., states
that would not be observed in a sequential execution. Note also that CRDTs are weaker
than non-blocking constructs, which are generally based on a hardware consensus primitive
[17].
Some of the ideas presented here paper are already known in the folklore. The contri-
butions of this paper include:
• In Section 2: (i) An specification language suited to asynchronous replication. (ii) A
formalisation of state-based and operation-based replication. (iii) Two sufficient con-
ditions for eventual consistency.
• In Section 3, an comprehensive collection of useful data type designs, starting with
counters and registers. We focus on container types (sets and maps) supporting both
add and remove operations with clean semantics, and more complex derived types,
such as graphs, monotonic DAGs, and sequence.
• In Section 4, a study of the problem of garbage-collecting meta-data.
• In Section 5, exercising some of our CRDTs in a practical example, the shopping cart.
• A comparison with previous work, in Section 6.
Section 7 concludes with a summary of lessons learned, and perspectives for future work.
2 Background and system model
We consider a distributed system consisting of processes interconnected by an asynchronous
network. The network can partition and recover, and nodes can operate in disconnected
mode for some time. A process may crash and recover; its memory survives crashes. We
assume non-byzantine behaviour.
2.1Atoms and objects
A process may store atoms and objects. An atom is a base immutable data type, identified
by its literal content. Atoms can be copied between processes; atoms are equal if they have
the same content. Atom types considered in this paper include integers, strings, sets, tuples,
INRIA
inria-00555588, version 1 - 13 Jan 2011
Page 7
A comprehensive study of CRDTs
5
x3
123
-99
3.14159
x3
x1
x2
x
Figure 1: Object
a
A
b
c
add (a)
add (b)
add (c)
add (b)
Figure 2:
G-Set
Grow-only Set:
A
R
a
b
c
add (a)
add (b)
remove (a)
add (c)
add (b)
add (a)
Figure 3: 2P-Set
etc., with their usual non-mutating operations. Atom types are written in lower case, e.g.,
“set.”
An object is a mutable, replicated data type. Object types are capitalised, e.g., “Set.”
An object has an identity, a content (called its payload), which may be any number of atoms
or objects, an initial state, and an interface consisting of operations. Two objects having
the same identity but located in different processes are called replicas of one another. As
an example, Figure 1 depicts a logical object x, its replicas at processes 1,2 and 3, and the
current state of the payload of replica 3.
We assume that objects are independent and do not consider transactions. Therefore,
without loss of generality, we focus on a single object at a time, and use the words process
and replica interchangeably.
2.2Operations
The environment consists of unspecified clients that query and modify object state by calling
operations in its interface, against a replica of their choice called the source replica. A query
executes locally, i.e., entirely at one replica. An update has two phases: first, the client calls
the operation at the source, which may perform some initial processing. Then, the update
is transmitted asynchronously to all replicas; this is the downstream part. The literature
[37] distinguishes the state-based and operation-based (op-based for short) styles, explained
next.
RR n° 7506
inria-00555588, version 1 - 13 Jan 2011
Page 8
6
Shapiro, Preguiça, Baquero, Zawirski
Specification 1 Outline of a state-based object specification. Preconditions, arguments, return
values and statements are optional.
1: payload Payload type; instantiated at all replicas
2:
initial Initial value
3: query Query (arguments) : returns
4:
pre Precondition
5:
let Evaluate synchronously, no side effects
6: update Source-local operation (arguments) : returns
7:
pre Precondition
8:
let Evaluate at source, synchronously
9:
Side-effects at source to execute synchronously
10: compare (value1, value2) : boolean b
11:
Is value1 ≤ value2 in semilattice?
12: merge (value1, value2) : payload mergedValue
13:
LUB merge of value1 and value2, at any replica
M
mergemerge
M
g(x2)
S
S
source
f(x1)
x3
x1
x2
x
merge
M
Figure 4: State-based replication
014
144
4
4
4
M
max max
M
x2 := 4
G+A
G+A
x1 := 1
x3
x1
x2
x
max
M
0
0
0
4
4
4
4
4
Figure 5: Example CvRDT: integer + max
INRIA
inria-00555588, version 1 - 13 Jan 2011
Page 9
A comprehensive study of CRDTs
7
2.2.1State-based replication
In state-based (or passive) replication, an update occurs entirely at the source, then propa-
gates by transmitting the modified payload between replicas, as illustrated in Figure 4.
We specify state-based object types as shown in Specification 1. Keyword payload indi-
cates the payload type, and initial specifies its initial value at every replica. Keyword update
indicates an update operation, and query a query. Both may have (optional) arguments
and return values. Non-mutating statements are marked let, and payload is mutated by
assignment :=. An operation executes atomically.
To capture safety, an operation is enabled only if a given source pre-condition (marked
pre in a specification) holds in the source’s current state. The source pre-condition is omitted
if always enabled, e.g., incrementing or decrementing a Counter. Conversely, non-null pre-
conditions may be necessary, for instance an element can be removed from a Set only if it
is in the Set at the source.
The system transmits state between arbitrary pairs of replicas, in order to propagate
changes. This updates the payload of the receiver with the output of operation merge,
invoked with two arguments, the local payload state and the received state. Operation
compare compares replica states, as will be explained shortly.
We define the causal history [38] C of replicas of some object x as follows:2
Definition 2.1 (Causal History — state-based). For any replica xiof x:
• Initially, C(xi) = ∅.
• After executing update operation f, C(f(xi)) = C(xi) ∪ {f}.
• After executing merge against states xi,xj, C(merge(xi,xj)) = C(xi) ∪ C(xj).
The classical happens-before [24] relation between operations can be defined as f →
g ⇔ C(f) ⊂ C(g).
Liveness requires that any update eventually reaches the causal history of every replica.
To this effect, we assume an underlying system that transmits states between pairs of replicas
at unspecified times, infinitely often, and that replica communication forms a connected
graph.
2.2.2Operation-based (op-based) objects
In operation-based (or active) replication, the system transmits operations, as illustrated in
Figure 6. This style is specified as outlined in Spec. 2. The payload and initial clauses are
identical to the state-based specifications. An operation that does not mutate the state is
marked query and executes entirely at a single replica.
An update is specified by keyword update. Its first phase, marked atSource, is local to
the source replica. It is enabled only if its (optional) source pre-condition, marked pre, is
2C is a logical function, it is not part of the object.
RR n° 7506
inria-00555588, version 1 - 13 Jan 2011
Page 10
8
Shapiro, Preguiça, Baquero, Zawirski
Specification 2 Outline of operation-based object specification. Preconditions, return values
and statements are optional.
1: payload Payload type; instantiated at all replicas
2:
initial Initial value
3: query Source-local operation (arguments) : returns
4:
pre Precondition
5:
let Execute at source, synchronously, no side effects
6: update Global update (arguments) : returns
7:
atSource (arguments) : returns
8:
pre Precondition at source
9:
let 1st phase: synchronous, at source, no side effects
10:
downstream (arguments passed downstream)
11:
pre Precondition against downstream state
12:
2nd phase, asynchronous, side-effects to downstream state
f(x1)
g(x2)
D
f(x3)
g(x1)
D
g(x3)
D
S
S
D
f(x2)
x3
x1
x2
x
Figure 6: Operation-Based Replication
INRIA
inria-00555588, version 1 - 13 Jan 2011
Page 11
A comprehensive study of CRDTs
9
true in the source state; it executes atomically. It takes its arguments from the operation
invocation; it is not allowed to make side effects; it may compute results, returned to the
caller, and/or prepare arguments for the second phase.
The second phase, marked downstream, executes after the source-local phase; immedi-
ately at the source, and asynchronously, at all other replicas; it can not return results. It
executes only if its downstream precondition is true. It updates the downstream state; its
arguments are those prepared by the source-local phase. It executes atomically.
As above, we define the causal history of a replica C(xi).
Definition 2.2 (Causal History — op-based). The causal history of a replica xiis defined
as follows.
• Initially, C(xi) = ∅.
• After executing the downstream phase of operation f at replica xi, C(f(xi)) = C(xi) ∪
{f}.
Liveness requires that every update eventually reaches the causal history of every replica.
To this effect, we assume an underlying system reliable broadcast that delivers every update
to every replica in an order <d(called delivery order) where the downstream precondition
is true.
As in the state-based case, the happens-before relation between operations is defined
by f → g ⇔ C(f) ⊂ C(g). We define causal delivery <→ as follows: if f → g then f is
delivered before g is delivered. We note that all downstream preconditions in this paper
are satisfied by causal delivery, i.e., delivery order is the same or weaker as causal order:
f <dg ⇒ f <→g.
2.3 Convergence
We now formalise convergence.
Definition 2.3 (Eventual Convergence). Two replicas xi and xj of an object x converge
eventually if the following conditions are met:
• Safety: ∀i,j : C(xi) = C(xj) implies that the abstract states of i and j are equivalent.
• Liveness: ∀i,j : f ∈ C(xi) implies that, eventually, f ∈ C(xj).
Furthermore, we define state equivalence as follows: xiand xjhave equivalent abstract
state if all query operations return the same values.
Pairwise eventual convergence implies that any non-empty subset of replicas of the object
converge, as long as all replicas receive all updates.
RR n° 7506
inria-00555588, version 1 - 13 Jan 2011
Page 12
10
Shapiro, Preguiça, Baquero, Zawirski
2.3.1State-based CRDT: Convergent Replicated Data Type (CvRDT)
A join semilattice [9] (or just semilattice hereafter) is a partial order ≤v equipped with a
least upper bound (LUB) ?v, defined as follows:
Definition 2.4 (Least Upper Bound (LUB)). m = x?vy is a Least Upper Bound of {x,y}
under ≤viff x ≤vm and y ≤vm and there is no m?≤vm such that x ≤vm?and y ≤vm?.
It follows from the definition that ?vis: commutative: x ?vy =vy ?vx; idempotent:
x ?vx =vx; and associative: (x ?vy) ?vz =vx ?v(y ?vz).
Definition 2.5 (Join Semilattice). An ordered set (S,≤v) is a Join Semilattice iff ∀x,y ∈
S,x ?vy exists.
A state-based object whose payload takes its values in a semilattice, and where merge(x,y)
x?vy, converges towards the LUB of the initial and updated values. If, furthermore, updates
monotonically advance upwards according to ≤v(i.e., the payload value after an update is
greater than or equal to the one before), then it converges towards the LUB of the most
recent values. Let us call this combination “monotonic semilattice.”
A type with these properties will be called a Convergent Replicated Data Type or
CvRDT. We require that, in a CvRDT, compare(x,y) to return x ≤v y, that abstract
states be equivalent if x ≤v y ∧ y ≤v x, and merge be always enabled. As an example,
Figure 5 illustrates a CvRDT with integer payload, where ≤v is integer order, and where
merge() = max().
Eventual convergence requires that all replicas receive all updates. The communication
channels of a CvRDT may have very weak properties. Since merge is idempotent and com-
mutative (by the properties of ?v), messages may be lost, received out of order, or multiple
times, as long as new state eventually reaches all replicas, either directly or indirectly via
successive merges. Updates are propagated reliably even if the network partitions, as long
as eventually connectivity is restored.
Proposition 2.1. Any two object replicas of a CvRDT eventually converge, assuming the
system transmits payload infinitely often between pairs of replicas over eventually-reliable
point-to-point channels.
def
=
def
Proof. Any two replicas xi,xj will converge, as long as they can exchange states by some
(direct or indirect) channel that eventually delivers, by merging their states. Since CvRDT
values form a monotonic semilattice, merge is always enabled, and one can make x?
merge(xi,xj) and x?
in x?
x?
i:=
j:= merge(xj,xi). By Definition 2.1, we have the same causal history
j, since C(xi)∪C(xj) = C(xj)∪C(xi). Finally we have equivalent abstract states
jsince, by commutativity of LUB, xi?vxj=vxj?vxi.
iand x?
i=vx?
INRIA
inria-00555588, version 1 - 13 Jan 2011
Page 13
A comprehensive study of CRDTs
11
2.3.2Operation-based CRDT: Commutative Replicated Data Type (CmRDT)
In an op-based object, a reliable broadcast channel guarantees that all updates are delivered
at every replica, in the delivery order <dspecified by the data type. Operations not ordered
by <dare said concurrent; formally f ?dg ⇔ f ?<dg ∧ g ?<df. If all concurrent operations
commute, then all execution orders consistent with delivery order are equivalent, and all
replicas converge to the same state. Such an object is called a Commutative Replicated
Data Type (CmRDT).
As noted earlier, for all data types studied here, causal delivery <→(which is readily
implementable in static distributed systems and does not require consensus) satisfies delivery
order <d. For some data types, a weaker ordering suffices, but then more pairs of operations
need to be proved commutative.
Definition 2.6 (Commutativity). Operations f and g commute, iff for any reachable replica
state S where their source pre-condition is enabled, the source precondition of f (resp. g)
remains enabled in state S · g (resp. S · f), and S · f · g and S · f · g are equivalent abstract
states.
Proposition 2.2. Any two replicas of a CmRDT eventually converge under reliable broad-
cast channels that deliver operations in delivery order <d.
Proof. Consider object replicas xi,xj. Under the channel assumptions, eventually the two
replicas will deliver the same operations (if no new operations are generated), and we have
C(xi) = C(xj). For any two operations f,g in C(xi): (1) if they are not causally related then
they are concurrent under <d(that is never stronger than causality) and must commute;
(2) if they are causally related a → b but are not ordered in delivery order <dthey must
also commute; (3) if they are causaly related a → b and delivered in that order a <db then
they are applied in that same order everyware. In all cases an equivalent abstract state is
reached in all replicas.
Recall that reliable causal delivery does not require agreement. It is immune to parti-
tioning, in the sense that replicas in a connected subset can deliver each other’s updates,
and that updates are eventually delivered to all replicas. As delivery order is never stricter
than causal delivery, a fortiori this is true of all CmRDTs.
2.4Relation between the two approaches
We have shows two approaches to eventual convergence, CvRDTs and CmRDTS, which
together we call CRDTs. There are similarities and differences between the two.
State-based mechanisms (CvRDTs) are simple to reason about, since all necessary in-
formation is captured by the state. They require weak channel assumptions, allowing for
unknown numbers of replicas. However, sending state may be inefficient for large objects;
this can be tackled by shipping deltas, but this requires mechanisms similar to the op-based
RR n° 7506
inria-00555588, version 1 - 13 Jan 2011
Page 14
12
Shapiro, Preguiça, Baquero, Zawirski
Specification 3 Operation-based emulation of state-based object
1: payload State-based S
2:
initial Initial payload
3: update State-based-update (operation f, args a) : state s
4:
atSource (f,a) : s
5:
pre S.f.precondition(a)
6:
let s = S.f(a)
7:
downstream (s)
8:
S := merge(S,s)
? S: Emulated state-based object
? Compute state applying f to S
approach. Historically, the state-based approach is used in file systems such as NFS, AFS
[19], Coda [22], and in key-value stores such as Dynamo [10] and Riak.
Specifying operation-based objects (CmRDTs) can be more complex since it requires
reasoning about history, but conversely they have greater expressive power. The payload
can be simpler since some state is effectively offloaded to the channel. Op-based replication
is more demanding of the channel, since it requires reliable broadcast, which in general
requires tracking group membership. Historically, op-based approaches have been used in
cooperative systems such as Bayou [31], Rover [21] IceCube [33], Telex [4].
2.4.1 Operation-based emulation of a state-based object
Interestingly, it is always possible to emulate a state-based object using the operation-based
approach, and vice-versa.3
In Spec. 3 we show operation-based emulation of a state-based object (taking some
liberties with notation). Ignoring queries (which pose no problems), the emulating operation-
based object has a single update that computes some state-based update (after checking for
its precondition) and performs merge downstream. The downstream precondition is empty
because merge must be enabled in any reachable state. The emulation does not make use of
compare.
Note that if the base object is a CvRDT, then merge operations commute, and the
emulated object is a CmRDT.
2.4.2 State-based emulation of an operation-based object
State-based emulation of an operation-based object essentially formalises the mechanics of
an epidemic reliable broadcast, as shown in Spec. 4 (taking some liberties with notation).
Again, we ignore queries, which pose no problems. Calling an operation-based update adds
it to a set of M messages to be delivered; merge takes the union of the two message sets.
3Contrary to what Helland says [16], because he only considers read-write state, not a merge operation.
INRIA
inria-00555588, version 1 - 13 Jan 2011
Page 15
A comprehensive study of CRDTs
13
Specification 4 State-based emulation of operation-based object
1: payload Operation-based P, set M, set D
2:
initial Initial state of payload, ∅,∅
3: update op-based-update (update f, args a) : returns
4:
pre P.f.atSource.pre(a)
5:
let returns = P.f.atSource(a)
6:
let u = unique()
7:
M := M ∪ {(f,a,u)}
8:
deliver()
9: update deliver ()
10:
for (f,a,u) ∈ (M \ D) : f.downstream.pre(a) do
11:
P := P.f.downstream(a)
12:
D := D ∪ {(f,a,u)}
13: compare (R, R?) : boolean b
14:
let b = R.M ≤ R?.M ∨ R.D ≤ R?.D
15: merge (R, R?) : payload R??
16:
let R??.M = R.M ∪ R?.M
17:
R??.deliver()
? Payload of emulated object, messages, delivered
? Check at-source precondition
? Perform at-source computation
? Send unique operation
? Deliver to local op-based object
? Apply downstream update to replica
? Remember delivery
? Deliver pending enabled updates
Specification 5 op-based Counter
1: payload integer i
2:
initial 0
3: query value () : integer j
4:
let j = i
5: update increment ()
6:
downstream ()
7:
i := i + 1
8: update decrement ()
9:
downstream ()
10:
i := i − 1
? No precond: delivery order is empty
? No precond: delivery order is empty
When an update’s downstream precondition is true, the corresponding message is delivered
by executing the downstream part of the update. In order to avoid duplicate deliveries,
delivered messages are stored in a set D.
Note that the states of the emulating object form a monotonic semilattice. Calling or
delivering an operation adds it to the relevant message set, and therefore advances the state
in the partial order. merge is defined to take the union of the M sets, and is thus a LUB
operation. Remark that M is identical to the causal history of the replica; non-concurrent
updates appear in M in causal order. If the emulated op-based object is a CmRDT, then
delivery order is satisfied. Concurrent operations appear in M in any order; if the emulated
object is a CmRDT, they commute. Therefore, after two replicas merge mutually, their D
sets are identical and their P payloads have equivalent state.
RR n° 7506
inria-00555588, version 1 - 13 Jan 2011