Content uploaded by Manuel Fähndrich

Author content

All content in this area was uploaded by Manuel Fähndrich on Mar 16, 2014

Content may be subject to copyright.

Eventually Consistent Transactions

January 6, 2012

Technical Report

MSR-TR-2011-117

Microsoft Research

Microsoft Corporation

One Microsoft Way

Redmond, WA 98052

This page intentionally not left blank.

Eventually Consistent Transactions

Sebastian Burckhardt Manuel F

¨

ahndrich

Daan Leijen

Microsoft Research

{sburckha,daan,maf}@microsoft.com

Mooly Sagiv

Tel-Aviv University

msagiv@post.tau.ac.il

Abstract

When distributed clients query or update shared data, eventual

consistency can provide better availability than strong consistency

models. However, programming and implementing such systems

can be difﬁcult unless we establish a reasonable consistency model,

i.e. some minimal guarantees that programmers can understand and

systems can provide effectively.

To this end, we propose a novel consistency model based on

eventually consistent transactions. Unlike serializable transactions,

eventually consistent transactions are ordered by two order rela-

tions (visibility and arbitration) rather than a single order relation.

To demonstrate that eventually consistent transactions can be effec-

tively implemented, we establish a handful of simple operational

rules for managing replicas, versions and updates, based on graphs

called revision diagrams. We prove that these rules are sufﬁcient to

guarantee correct implementation of eventually consistent transac-

tions. Finally, we present two operational models (single server and

server pool) of systems that provide eventually consistent transac-

tions.

1. Introduction

Eventual Consistency [16] is a well-known workaround to the fun-

damental problem of providing CAP [8] (consistency, availability,

and partition tolerance) to clients that perform queries and updates

against shared data in a distributed system. It weakens traditional

consistency guarantees (such as linearizability) in order to allow

clients to perform updates against any replica, at any time. Even-

tually consistent systems guarantee that all updates are eventually

delivered to all replicas, and that they are applied in a consistent

order.

Eventual consistency is popular with system builders. One rea-

son is that it allows temporarily disconnected replicas to remain

fully available to clients. This is particularly useful for implement-

ing clients on mobile devices [19]. Another reason is that it does not

require updates to be immediately performed on all server replicas,

thus improving scalability. In more theoretical terms, the beneﬁt of

eventual consistency can be understood as its ability to delay con-

sensus [15].

However, eventual consistency is a weak consistency model

that breaks with traditional approaches (e.g. serializable operations)

and thus requires developers to be more careful. The essential

problem is that updates are not immediately applied globally, thus

the conditions under which they are applied are subject to change,

which can easily break data invariants. Many eventually consistent

systems address this issue by providing higher-level data types

to programmers. Still, the semantic details often remain sketchy.

Experience has shown that ad-hoc approaches to the semantics and

implementation of such systems can lead to surprising behaviors

(e.g. a shopping cart where deleted items reappear [6]). To take

eventual consistency to its full potential, we need answers to the

following questions:

•

How can we provide consistency guarantees that are as strong

as possible without forsaking lazy consensus?

•

How can we effectively understand and implement systems that

provide those guarantees?

In this paper, we propose a two-pronged solution that addresses

both questions, based on (1) a notion of transactions for eventual

consistency, and (2) a general implementation technique based on

revision diagrams.

Eventually consistent transactions differ signiﬁcantly from tra-

ditional transactions, as they are not serializable. Nevertheless, they

uphold traditional atomicity and isolation guarantees. Even better,

they exhibit some strong properties that simplify the life of pro-

grammers and are not typically offered by traditional transactions:

(1) transactions cannot fail and never roll back, and (2) all code,

even long-running tasks, can run inside transactions without com-

promising performance.

We ﬁrst present an abstract, concise speciﬁcation of eventually

consistent transactions. This formalization uses mathematical tech-

niques (sets of events, partial orders, and equivalence relations) that

are commonly used in research on relaxed memory models and

transactional memory. Our deﬁnition provides immediate insight

on how eventual consistency is related to strong consistency: the

only difference is that eventual consistency uses two separate order

relations (visibility order and arbitration order) rather than a single

order over transactions.

We then proceed to describe a more concrete and operational

implementation technique based on revision diagrams [5]. Revi-

sion diagrams provide implementors with a simple set of rules for

managing updates and replicas. Revision diagrams make the fork

and join of versions explicit, which determines the visibility and

arbitration of transactions. We prove a theorem that guarantees that

any system following the revision diagram rules provides eventu-

ally consistent transactions according to the abstract deﬁnition. We

also illustrate the use of revision diagrams by presenting two sim-

ple system models (one using a single server, and one using a server

pool).

Overall, we make the following contributions:

•

We introduce a notion of eventually consistent transactions and

give a concise and abstract deﬁnition.

•

We present a systematic approach for building systems that sup-

port such transactions, based on revision diagrams. We present

a precise, operational deﬁnition of revision diagrams.

•

We prove a theorem stating that the revision diagram rules

are sufﬁcient to guarantee eventual consistency. The proof is

nontrivial as it depends on deep structural properties of revision

diagrams.

•

We illustrate the use of revision diagrams by presenting two

operational system models, using a single server and a server

pool, respectively.

2. Formulation

To get started, we need to establish some precise terminology. Per-

haps the very ﬁrst question is: what is a database? At a high abstrac-

tion level, databases are no different than abstract data types, which

are semantically deﬁned by the operations they support to update

them and retrieve data. Taking cues from common deﬁnitions of

abstract data types, we deﬁne:

DEFINITION 1. A query-update interface is a tuple (Q, V, U )

where Q is an abstract set of query operations, V is an abstract set

of values returned by queries, and U is an abstract set of update

operations.

Note that the sets of queries, query results, and updates are not

required to be ﬁnite (and usually are not). Query-update interfaces

can apply in various scenarios, where they may describe abstract

data types, relational databases, or simple random-access mem-

ory, for example. For databases, queries are typically deﬁned re-

cursively by a query language.

EXAMPLE 1. Consider random-access memory that supports loads

and stores of bytes in a 64-bit address space A = {a ∈ N | 0 <

a ≤ 2

64

}. For that example we deﬁne Q = {load(a) | a ∈ A},

V = {v ∈ N | 0 < v ≤ 2

8

} and U = {store(a, v) | a ∈

A and v ∈ V }.

This example is excellent for illustration purposes (we will revisit

it throughout), and it provides an explicit connection between our

results and previous work on relaxed memory models and transac-

tional memory. Of course, most databases also ﬁt in this abstract

interface where the queries are SQL queries and the update opera-

tions are SQL updates like insertion and deletion.

So far, our interfaces have no inherent meaning. The most direct

way to deﬁne the semantics of queries and updates is to relate them

to some notion of state:

DEFINITION 2. A query-update automaton (QUA) for the interface

(Q, V, U) is a tuple (S, s

0

) where S is a set of states with (1) an

initial state s

0

∈ S, (2) an interpretation q

#

of each query q ∈ Q

as a function S → V , and (3) an interpretation u

#

of each update

operation u ∈ U as a a function S → S.

EXAMPLE 2. The random-access memory interface described in

Example 1 above can be represented by a QUA (S, s

0

) where S

is the set of total functions A → V , and where s

0

is the constant

function that maps all locations to zero, and where load(a)

#

(s) =

s(a) and store(a, v)

#

(s) = s[a 7→ v].

QUAs can naturally support abstract data types (e.g. collections, or

even entire documents) that offer higher-level operations (queries

and updates) beyond just loads and stores. Such data types are often

important when programming against a weak consistency model

[17], since they can ensure that the data representation remains in-

tact when handling concurrent and potentially conﬂicting updates.

The following two characteristics of QUAs are important to

understand how they relate to other deﬁnitions of abstract data

types:

•

There is a strict separation between query and update opera-

tions: it is not possible for an operation to both update the data

and return information to the caller.

•

All updates are total functions. It is thus not possible for an

update to ’fail’; however, it is of course possible to deﬁne

updates to have no effect in the case some precondition is not

satisﬁed.

For instance, in our formalization, we would not allow a classic

stack abstract data type with a pop operation for two reasons, (1)

pop both removes the top element of the stack and returns it, so it

is neither an update nor a query, and (2) pop is not total, i.e. it can

not be applied to the empty stack.

This restriction is crucial to enable eventual consistency, where

the sequencing and application of updates may be delayed, and

updates may thus be applied to a different state than the one in

which they were originally issued by the program.

2.1 Clients and Transactions

Things become more interesting and challenging once we consider

a distributed system. We call the participants of our system clients.

Clients typically reside on physically distinct devices, but are not

required to do so. When clients in a distributed system issue queries

and updates against some shared QUA, we need to deﬁne what con-

sistency programmers can expect. This consistency model should

also address the semantics of transactions, which provide clients

with the ability to perform several updates as an atomic “bundle”.

We formally represent this scenario by deﬁning a set C of

clients. Each client, at its own speed, issues a sequence of trans-

actions. Supposedly, each client runs some form of program (the

details of which we leave unspeciﬁed for simplicity and general-

ity). This program determines when to begin and end a transaction,

and what operations to perform in each transaction, which may de-

pend on various factors, such as the results returned by queries, or

external factors such as user inputs.

For uniformness, we require that all operations are part of a

transaction. This assumption comes at no loss of generality: a

device that does not care about transactions can simply issue each

operation in its own transaction.

Since all operations are inside transactions, we need not dis-

tinguish between the end of a transaction and the beginning of a

transaction. Formally, we can thus represent the activities on a de-

vice as a stream of operations (queries or updates) interrupted by

special yield operations that mark the transaction boundary.

1

We can thus fully describe the interaction between programs

executing on the clients and the database by the following three

types of operations:

1. Updates u ∈ U issued by the program,

2. Pairs (q, v) representing a query q ∈ Q issued by the program,

together with a response v ∈ V by the database system,

3. The yield operations issued by the program.

DEFINITION 3. A history H for a set C of clients and a query-

update interface (Q, V, U) is a map H which maps each client

c ∈ C to a ﬁnite or inﬁnite sequence H(c) of operations from

the alphabet Σ = U ∪ (Q × V ) ∪ {yield}.

Note that our history does not a priori include a global ordering

of events, since such an order is not always meaningful when

working with relaxed consistency models. Rather, the existence of

certain orderings, subject to certain conditions, is what determines

whether a history satisﬁes a consistency model or not.

1

We call this operation yield() since it is semantically similar to a yield

we may encounter on a uniprocessor performing cooperative multittasking:

such a yield marks locations where other threads may read and modify the

current state of the data, while at all other locations, only the current thread

may read or modify the state.

2.1.1 Notation and Terminology

To reason about a history H, it is helpful to introduce the following

auxiliary terminology. We let E

H

be the set of all events in H, by

which we mean all occurrences of operations in Σ \ {yield} in the

sequences H(c) (we consider yield to be just a marker within the

operation sequence, but not an event).

For a client c, we call a maximal nonempty contiguous subse-

quence of events in H(c) that does not contain yield a transac-

tion of c. We call a transaction committed if it is succeeded by

a yield operation, and uncommitted otherwise. We let T

H

be the

set of all transactions of all clients, and committed(T

H

) ⊆ T

H

the subset of all committed transactions. For an event e, we let

trans(e) ∈ T

H

be the transaction that contains e. Moreover, we let

committed(E

H

) ⊆ E

H

be the subset of events that are contained in

committed transactions. We conclude by giving deﬁnitions related

to ordering events and transactions:

•

Program order. For a given history H, we deﬁne a partial order

<

p

over events in H such that e <

p

e

0

iff e appears before e

0

in some sequence H(c).

•

Apply in order. For a history H, for a state s ∈ S, for a subset

of events E

0

⊂ E

H

, and for a total order < over the events in

E

0

, we let apply(E

0

, <, s) be the state obtained by applying all

updates appearing in E

0

to the state s, in the order speciﬁed by

<.

•

Factoring. We deﬁne an equivalence relation ∼

t

(meaning

same-transaction) over events such that e ∼

t

e

0

iff trans(e) =

trans(e

0

). For any partial order ≺ over events, we say that ≺

factors over ∼

t

iff for any events x and y from different trans-

actions, x ≺ y implies x

0

≺ y

0

for any x, y such that x ∼

t

x

0

and y ∼

t

y

0

. This is an important property to have for any or-

dering ≺, since if ≺ factors over ∼

t

, it induces a corresponding

partial order on the transactions.

2.2 Sequential Consistency

Sequential consistency posits that the observed behavior must be

consistent with an interleaving of the transactions by the various de-

vices. We formalize this interleaving as a partial order over events

(rather than a total order as more commonly used) since some

events are not instantly ordered by the system; for example, the

relative order of operations in uncommitted transactions may not

be fully determined yet.

DEFINITION 4. A history H is sequentially consistent if there ex-

ists a partial order < over the events in E

H

that satisﬁes the fol-

lowing conditions for all events e

1

, e

2

, e ∈ E

H

:

•

(compatible with program order) if e

1

<

p

e

2

then e

1

< e

2

•

(total order on past events) if e

1

< e and e

2

< e then either

e

1

< e

2

or e

2

< e

1

.

•

(consistent query results) for all (q, v) ∈ E

H

,

v = q

#

(apply({e ∈ (H) | e < q}, <, s

0

)).

This simply says that a query returns the state as it results from

applying all past updates to the initial state.

•

(atomicity) < factors over ∼

t

.

•

(isolation) if e

1

/∈ committed(E

H

) and e

1

< e

2

, then e

1

<

p

e

2

. That is, events in uncommitted transactions precede only

events on the same client.

•

(eventual delivery) for all committed transactions t, there exist

only ﬁnitely many transactions t

0

∈ T

H

such that t 6< t

0

.

Sequential consistency fundamentally limits availability in the

presence of network partitions. The reason is that any query issued

by some transaction t must see the effect of all updates that occur in

transactions that are globally ordered before t, even if on a remote

device. Thus we cannot conclusively commit transactions in the

presence of network partitions.

2.3 Eventual Consistency

Eventual consistency relaxes sequential consistency by allowing

queries in a transaction t to see only a subset of all transactions that

are globally ordered before t. It does so by distinguishing between a

visibility order (a partial order that deﬁnes what updates are visible

to a query), and an arbitration order (a partial order that determines

the relative order of updates).

DEFINITION 5. A history H is eventually consistent if there exist

two partial orders <

v

(the visibility order) and <

a

(the arbitration

order) over events in H, such that the following conditions are

satisﬁed for all events e

1

, e

2

, e ∈ E

H

:

•

(arbitration extends visibility) if e

1

<

v

e

2

then e

1

<

a

e

2

.

•

(total order on past events) if e

1

<

v

e and e

2

<

v

e, then either

e

1

<

a

e

2

or e

2

<

a

e

1

.

•

(compatible with program order) if e

1

<

p

e

2

then e

1

<

v

e

2

.

•

(consistent query results) for all (q, v) ∈ E

H

,

v = q

#

(apply({e ∈ H) | e <

v

q}, <

a

, s

0

)).

This says that a query returns the state as it results from apply-

ing all preceding visible updates (as determined by the visibility

order) to the initial state, in the order given by the arbitration

order.

•

(atomicity) Both <

v

and <

a

factor over ∼

t

.

•

(isolation) if e

1

/∈ committed(E

H

) and e

1

<

v

e

2

, then e

1

<

p

e

2

. That is, events in uncommitted transactions are visible only

to later events by the same client.

•

(eventual delivery) for all committed transactions t, there exist

only ﬁnitely many transactions t

0

∈ T

H

such that t 6<

v

t

0

.

The reason why eventual consistency can tolerate temporary

network partitions is that the arbitration order can be constructed

incrementally, i.e. may remain only partially determined for some

time after a transaction commits. This allows conﬂicting updates to

be committed even in the presence of network partitions.

Note that eventual consistency is a weaker consistency model

than sequential consistency. We can prove this statement as follows.

LEMMA 1. A sequentially consistent history is eventually consis-

tent.

PROOF. Given a history H that is sequentially consistent, we know

there exists a partial order < satisfying all conditions. Now deﬁne

<

v

=<

a

=<; then all conditions for eventual consistency follow

easily.

2.4 Eventual Consistency In Related Work

Eventual consistency across the literature uses a variety of tech-

niques to propagate updates (e.g. general causally-ordered broad-

cast [17, 18], or pairwise anti-entropy [14]). All of these techniques

are particular implementations that specialize our general deﬁnition

of visibility as a partial order. As for the arbitration order, we found

that two main approaches prevail. The most common one is to use

(logical or actual) timestamps: Timestamps provide a simple way

to arbitrate events. Another approach (sometimes combined with

timestamps) is to make updates commutative, which makes arbitra-

tion unnecessary (i.e. we can pick an arbitrary serialization of the

visibility order to satisfy the conditions in Def. 5).

We show in the next section (Section 3) how to arbitrate updates

without using timestamps or requiring commutativity, a feature that

sets our work apart. We prefer to not use timestamps because they

exhibit the write stabilization problem [19], i.e. the inability to

ﬁnalize the effect of updates while older updates may still linger in

disconnected network partitions. Consider, for example, a mobile

user called Robinson performing an important update, but getting

stranded on a disconnected island before transmitting it. When

Robinson reconnects after years of exile, Robinson’s update is older

than (and may thus alter the effect of) all the updates committed by

other users in the meantime. So either (1) none of these updates can

stabilize until Robinson returns, or (2) after some timeout we give

up on Robinson and discard his update. Clearly, neither of these

solutions is satisfactory. A better solution is to abandon time stamps

and instead use an arbitration order that simply orders Robinson’s

update after all the other updates. In fact, this is the outcome we

achieve when using revision diagrams, as explained in Section 3.

3. Revision Consistency

Our deﬁnition of eventual consistency (Def. 5) is concise and gen-

eral. By itself, it is however not very constructive, insofar that it

does not give practical guidelines as to how a system can efﬁciently

and correctly construct the necessary ordering (visibility and arbi-

tration). We now proceed to describe a more speciﬁc implementa-

tion technique for eventually consistent systems, based on the no-

tion of revision diagrams introduced in [5].

Revision diagrams show an extended history not only of the

queries, updates, and transactions by each client, but also of the

forking and joining of revisions, which are logical replicas of the

state (Fig. 1(a)). A client works with one revision at a time, and

can perform operations (queries and updates) on it. Since differ-

ent clients work with different revisions, clients can perform both

queries and updates concurrently and in isolation (i.e. without cre-

ating race conditions). Reconciliation happens during join opera-

tions. When a revision joins another revision, it replays all the up-

dates performed in the joined revision at the join point.

2

After a

revision is joined, no more operations can be performed on it (i.e.

clients may need to fork new revisions to keep enough revisions

available).

3.1 Revision Diagrams

Revision diagrams are directed graphs constructed from three types

of edges (successor, fork, and join edges, or s-, f- and j-edges for

short), and ﬁve types of vertices (start, fork, join, update, and query

vertices). A start vertex represents the beginning of a revision, s-

edges represent successors within a revision, and fork/join edges

represent the forking and joining of revisions.

We pictorially represent revision diagrams using the following

conventions

•

Use · for start, query, and update vertices

•

Use • and ◦ for fork and join vertices, respectively

•

Use vertical down-arrows for s-edges

•

Use horizontal-to-vertical curved arrows for f-edges

•

Use vertical-to-horizontal curved arrows for j-edges

A vertex x has a s-path (i.e. a path contanining only s-edges) to

vertex y if and only if they are part of the same revision. Since all

s-edges are vertical in our pictures, vertices belonging to the same

revision are always aligned vertically. For any vertex x we let S(x)

be the start vertex of the revision that x belongs to. For any vertex

x whose start vertex S(x) is not the root, we deﬁne F (x) to be the

fork vertex such that F (x)

f

−→ S(x) (i.e. the fork vertex that started

2

This replay operation is conceptual. Rather than replaying a potentially

unbounded log, actual implementations can often use much more space-

and time-efﬁcient merge functions, as explained in Section 4.

Figure 2. Visualization of the construction rules for revision dia-

grams in Def. 6.

the revision x belongs to). We call a vertex with no outgoing s-

or j-edges a terminal; terminals are the last operation in a revision

that can still perform operations (has not been joined yet), and thus

represent potential extension points of the graph.

We now give a formal, constructive deﬁnition for revision dia-

grams.

DEFINITION 6. A revision diagram is a directed graph constructed

by applying a (possibly empty or inﬁnite) sequence of the following

construction steps (see Fig. 2) to a single initial start vertex (called

the root):

Query Choose some terminal t, create a new query vertex x, and

add an edge t

s

−→ x.

Update Choose some terminal t, create a new update vertex x, and

add an edge t

s

−→ x.

Fork Choose some terminal t, create a new fork vertex x and a

new start vertex y, and add edges t

s

−→ x and x

f

−→ y.

Join Choose two terminals t, t

0

satisfying the join condition

F (t

0

) →

∗

t, then create a new join vertex x and add edges

t

s

−→ x and t

0

j

−→ x.

The join condition expresses that the terminal t (the “joiner”)

must be reachable from the fork vertex that started the revision that

contains t

0

(the “joinee”). This condition makes revision diagrams

more restricted than general task graphs. See Fig 1(b) for some

examples of invalid diagrams where the join condition does not

hold at construction of the join nodes.

The join condition has some important, not immediately obvi-

ous consequences. For example, it implies that revision diagrams

are always semilattices (for a proof of this nontrivial fact see [5]).

Also, it ensures some diagram properties (Lemmas 2 and 3) that we

need to prove our main result (Thm. 1). Futhermore, it still allows

more general graphs than strict series-parallel graphs [20], which

allow only the recursive serial and parallel composition of tasks

(and are also called fork-join concurrency in some contexts, which

is potentially misleading). For instance, the right-most revision di-

agram in Fig. 1(a) is not a series-parallel graph but it is a valid revi-

sion diagram. While series-parallel graphs are easier to work with

than revision diagrams, they are not ﬂexible enough for our pur-

pose, since they would enforce too much synchronization between

participants.

Also, note that fork and the join are fundamentally asymmetric:

the revision that initiates the fork (the “forker”) continues to exist

after the fork, but also starts a new revision (the “forkee”), and sim-

(a) (b)

(fork) (nested) (non-planar) (non-series-parallel) (shortcut) (parent join)

·

•

·

·

·

•

·

·

qq

·

◦

· ·

·

•

·

·

•

·

·

qq

·

◦

qq

◦

·

•

•

·

·

•

·

·

qq

·

·

--

◦

·

◦

·

mm

◦

·

•

·

·

•

qq

◦

·

•

·

·

·

qq

·

◦

qq

◦

·

•

·

·

·

•

t ·

·

· t

0

mm

x ◦

·

·

•

·

·

t

0

·

--

· t

◦ x

·

Figure 1. (a) Four examples of revision diagrams. (b) Two diagrams that are not revision diagrams since they violate the join property at the

creation of the join node x. In the rightmost diagram, F (t

0

) is undeﬁned on the main revision and therefore F(t

0

) →

∗

t does not hold.

ilarly, the revision that initiates the join (the “joiner”) can continue

to perform operations after the join, but ends the joined revision

(the “joinee”).

3.2 Graph Properties

We now examine some properties of the revision diagrams, for

better visualization, and because we need some technical properties

in our later proofs. Most statements are easily proved by induction

over the construction rules in Def. 6; if not, we mention how to

prove them.

Revision diagrams are connected, and all vertices are reachable

from the root vertex. There can be multiple paths from the root to a

given vertex, but exactly one of those is free of j-edges.

DEFINITION 7. For any vertex v in a revision diagram, let the root-

path of v be the unique path from the root to v that does not contain

j-edges.

The join condition does not make revision diagrams necessarily

planar, i.e. when drawing revision diagrams, it is not always possi-

ble to avoid crossing lines (see the third diagram in Fig. 1(a) for an

example). However, it is always possible to choose horizontal coor-

dinates for the vertices such that (1) vertices in the same revisions

are vertically aligned, and (2) revisions are horizontally arranged

such that forkers are left of forkees, and (3) joiners are left of joi-

nees. The existence of such an order is not immediately obvious;

for example, such a layout is not possible for the incorrect revision

diagram at the right in Fig. 1(b). The following lemma formalizes

the claims (1,2,3) above (where the preorder ≤

l

corresponds to a

relation on vertices that compares their horizontal coordinates):

LEMMA 2. [Layout Preorder] In any revision diagram, there exists

a preorder ≤

l

on vertices

3

such that

S(x) = S(y) ⇔ (x ≤

l

y) ∧ (y ≤

l

x) (1)

x

f

−→ y ⇒ x ≤

l

y (2)

x

j

−→ y ⇒ y ≤

l

x (3)

3

A preorder is a reﬂexive transitive binary relation. Unlike partial orders,

preorders are not necessarily antisymmetric, i.e. they may contain cycles.

·

•

store(a, 1) ·

· store(a, 2)

· store(b, 2)

ss

store(a, 2); store(b, 2) ◦

load(a) ·

Figure 3. A labeled revision diagram. The path-result of the

bottom vertex is now the query applied to its root-path:

load(a)

#

(store(b, 2)

#

(store(a, 2)

#

(store(a, 1)

#

(s

0

)))) = 2.

We include the proof in the appendix. For proving our main result

later on, we need to establish another basic fact about revision

diagrams. We call a path direct if all of its f-edges (if any) appear

after all of its j-edges (if any). The following lemma (which appears

as a theorem in [5], and for which we include a proof in the

appendix as well) shows that we can always choose direct paths:

LEMMA 3 (Direct Paths.). Let x, y be vertices in a revision dia-

gram. If x →

∗

y, there exists a direct path from x to y.

3.3 Query and Update Semantics

We now proceed to explain how to determine the results of a query

in a revision diagram. The basic idea is to (1) return a result that is

consistent with applying all the updates along the root path, and (2)

if there are join vertices along that path, they summarize the effect

of all updates by the joined revision.

For example, consider the diagram in Fig. 3. This is an example

of a revision diagram labeled with the operations of the random

access memory example described in Example 2. The join vertex

is labeled with the composition of all update operations of the

joinee. The path-result of the ﬁnal query node load(a) can now be

evaluated by applying to the composition of all update operations

along the root-path:

load(a)

#

(store(b, 2)

#

(store(a, 2)

#

(store(a, 1)

#

(s

0

)))) = 2.

We can deﬁne this more formally. To reduce the verbosity of

our deﬁnitions, we assume a ﬁxed query-update interface (Q, V, U)

and QUA (S, s

0

) for the rest of this section.

DEFINITION 8. For any vertex x, we let the effect of x be a function

x

◦

: S → S deﬁned inductively as follows:

•

If x is a start, fork, or query vertex, the effect is a no-op, i.e.

x

◦

(s) = s.

•

If x is an update vertex for the update operation u, then the

effect is that update, i.e. x

◦

(s) = u

#

(s).

•

If x is a join vertex, then the effect is the composition of all

effects in the joined revision, i.e. if y

1

, . . . , y

n

is the sequence

of vertices in the joined revision (i.e. y

1

is a start vertex, y

i

s

−→

y

i+1

for all 1 ≤ i < n, and y

n

j

−→ x), then x

◦

(s) =

y

◦

n

(y

◦

n−1

(. . . y

◦

1

(s))).

We can then deﬁne the expected query result as follows.

DEFINITION 9. Let x be a query vertex with query q, and let

(y

1

, . . . , y

n

, x) be the root path of x. Then deﬁne the path-result

of x as q

#

(y

◦

n

(y

◦

n−1

(. . . y

◦

1

(s

0

))).

3.4 Revision Diagrams and Histories

We can naturally relate histories to revision diagrams by associating

each query event (q, v) ∈ E

H

with a query vertex, and each update

event u ∈ E

H

with a update vertex. The intention is to validate

the query results in the history using the path results, and to keep

transactions atomic and isolated by ensuring that their events form

contiguous sequences within a revision.

DEFINITION 10. We call a revision diagram a witness for the

history H if it satisﬁes the following conditions:

1. For all query events (q, v) in E

H

, the value v matches the path-

result of the query vertex.

2. If x, y are two successive non-yield operations in H(c) for some

c, then they must be connected by a s-edge.

3. If x is the last event of H(c) for some c and not a yield, then it

must be a terminal.

4. If x, y are two operations preceding and succeeding some yield

in H(c) for some c, then there must exist a path from x to y. In

other words, the beginning of a transaction must be reachable

from the end of the previous transaction.

We call a history H revision-consistent if there exists a witness

revision diagram.

To ensure eventual delivery of updates, we need to somehow

make sure there are enough forks and joins. To formulate a liveness

condition on inﬁnite histories, we deﬁne “neglected vertices” as

follows:

DEFINITION 11. We call a vertex x in a revision diagram ne-

glected if there exists an inﬁnite number of vertices y such that

there is no path from x to y.

We are now ready to state and prove our main result.

THEOREM 1. Let H be a history. If there exists a witness diagram

for H such that no committed events are neglected, then H is

eventually consistent.

Note that this theorem gives us a solid basis for implementing

eventually consistent transactions: an implementation can be based

on dynamically constructing a witness revision diagram and as a

consequence guarantee eventual consistent transactions. Moreover,

as we will see in Section 4, implementations do not need to actually

construct such witness diagrams at runtime but can rely on efﬁcient

state-based implementations.

The proof of our Theorem (in Section 3.5 below) constructs

partial orders <

v

, <

a

from the revision diagram by (1) specifying

x <

v

y iff there is a path from x to y in the revision diagram,

and (1) specifying <

a

to order all events in a joined revision to

occur in between the joiner terminal and the join vertex. Note that

the converse of Thm. 1 is not true, not even if restricted to ﬁnite

histories (we include a ﬁnite counterexample in the appendix). Also

Note that the most difﬁcult part of the proof is the safety, not the

liveness, since the proof that <

a

is a partial order extending <

v

depends on the join condition in a nontrivial way.

3.5 Proof of Thm 1

We devote the rest of this section to this proof, which requires some

deeper insight into structural properties of revision diagrams. First,

however, we need some deﬁnitions, notations, and lemmas.

A revision diagrams is a connected graph. However, if we re-

move all f-edges from the picture, it may decompose into several

components. We deﬁne a join-component to be a maximal compo-

nent connected by s and j edges only. We say x ∼

j

y if they are in

the same join component, and let J(x) = {y | x ∼

j

y}. It is easy

to see that each join-component contains exactly one terminal. For

a vertex x, we let T (x) be the terminal of J(x) (note that T (x) is

the unique terminal reachable from x by a path containing j and s

edges only).

DEFINITION 12. Deﬁne the binary relation →

a

on vertices by

adding the following edges during the construction of a revision

diagram as in Def. 6:

•

(Query, Update, Fork) for all y ∈ J(t), add y →

a

x

•

(Join) for all y ∈ J(t) and y

0

∈ J(t

0

), add edges y →

a

x,

y

0

→

a

x, and y →

a

y

0

.

LEMMA 4. For any revision diagram, →

a

as deﬁned above is a

partial order over all vertices in the diagram satisfying (1) when

restricted to any one join-component, →

a

is a total order (2) →

a

does not cross join-components.

LEMMA 5. For vertices x, y in a revision diagram and a preorder

≤

l

as guaranteed by Lemma 2, x →

∗

y implies T (x) ≤

l

T (y).

We include proofs for both lemmas in the appendix. The ﬁrst

one is a simple induction, the second one is a bit more intricate

and uses the path properties guaranteed by Lemma 3 and the layout

preorder guaranteed by Lemma 2.

We are now ready to prove Theorem 1. Given a history H and a

witness revision diagram, deﬁne two binary relations

<

v

= →

∗

and <

a

= (<

v

∪ →

a

)

∗

.

By Lemma 6 below, <

a

and <

v

are partial orders. We can then

prove the remaining claims as follows:

•

(arbitration extends visibility) By Lemma 6 below.

•

(total order on past events) if e

1

<

v

e and e

2

<

v

e, then by

Lemma 3 there exist direct paths for e

1

→

∗

e and for e

2

→

∗

e.

If either path is a preﬁx of the other, e

1

and e

2

are ordered by

<

v

and thus by <

a

. If not, they must combine in a join vertex,

implying that e

1

∼

j

e

2

, which implies (by Lemma 4) that they

are ordered by <

a

.

•

(compatible with program order) By conditions 2 and 4 of

Def. 10.

•

(consistent query results) We can show inductively (over Def. 6)

that for any vertex x, the combined effect of the vertices on the

root path (as in Def. 8) to x is equal to the combined effect of

all updates {x

0

| x

0

<

v

x} ordered by <

a

. This is trivial for all

but the join case. In the join case, Def. 12 orders all all updates

in the joinee after updates in the joiner which is consistent with

interpreting them as an effect of the join vertex.

•

(atomicity) By condition 2 we know there can be no intervening

forks or joins. This implies that both → and <

a

factor over ∼

t

.

•

(isolation) By condition 3.

•

(eventual delivery) Assume the condition is violated. Then there

exists a committed transaction t ∈ committed(T

H

) and an

inﬁnite number of transactions t

1

, t

2

, . . . such that for all i,

t 6<

v

t

i

. Since transactions can not be empty, we can pick

vertices x ∈ t and x

i

∈ t

i

, with x 6<

v

x

i

for all i. But that

implies that x is neglected, contradicting the condition in the

theorem.

The only thing left to prove is the lemma below, which ar-

guably contains the most interesting part of the proof. In particular,

it shows how consequences of the join condition (speciﬁcally, Lem-

mas 2 and 5) are used in the construction of an arbitration order <

a

that satisﬁes <

v

⊆<

a

as required for eventual consistency.

LEMMA 6. Given some revision diagram, deﬁne binary relations

<

v

=→

∗

and <

a

= (<

v

∪ →

a

)

∗

. Then both <

v

and <

a

are

partial orders, and <

v

⊆<

a

.

PROOF. Clearly, <

v

is a partial order (since revision diagrams are

acyclic) and <

v

⊆<

a

. The interesting part is to show that <

a

is

antisymmetric (i.e. x <

a

y and y <

a

x implies x = y). We prove

this by showing that (→

a

∪ →) is acyclic. Consider some minimal

cycle. Since →

a

is transitive, and both →

a

and → are acyclic on

their own, it must be of the following form (where n ≥ 1):

x

1

→

∗

y

1

→

a

x

2

→

∗

y

2

→

a

. . . →

a

x

n

→

∗

y

n

→

a

x

1

By Lemma 4 this implies

x

1

→

∗

y

1

∼

j

x

2

→

∗

y

2

∼

j

. . . →

a

x

n

→

∗

y

n

∼

j

x

1

using the preorder guaranteed by Lemma 2 and Lemma 5, we get

T (x

1

) ≤

l

T (y

1

) = T (x

2

) ≤

l

T (y

2

) . . . T (x

n

) ≤

l

T (y

n

) = T (x

1

)

But by Lemma 2 such an ≤

l

-cycle implies that all vertices are in

the same revision which is a contradiction.

4. System Implementation

Revision diagrams can help to develop efﬁcient implementations

since they provide a solid abstraction that decouples the consis-

tency model from actual implementation choices. In this section,

we describe some implementation techniques that are likely to be

useful for that purpose. We present three sketches of client-server

systems that implement eventual consistency.

It is usually not necessary for implementations to store the

actual revision diagram. Rather, we found it highly convenient to

work with state representations that can directly provide fork and

join operations.

DEFINITION 13. A fork-join QUA (FJ-QUA) for a query-update

interface (Q, V, U ) is a tuple (Σ, σ

0

, f , j ) where (1) (Σ, σ

0

) is a

QUA over (Q, V, U), (2) f : Σ → Σ × Σ, and (3) j : Σ × Σ → Σ.

If we have a fork-join QUA, we can simply associate a Σ-state

with each revision, and then perform all queries and updates locally

on that state, without communicating with other revisions. The join

function of the FJ-QUA, if implemented correctly, guarantees that

all updates are applied at the join time. We can state this more

formally as follows.

DEFINITION 14. For a FJ-QUA (Σ, σ

0

, f , j ) and a revision dia-

gram over the same interface (Q, V, U), deﬁne the state σ(x) of

each vertex x inductively by setting σ(r) = σ

0

for the initial vertex

r, and (for the construction rules as they appear in Def. 6)

•

(Query) Let σ(x) = σ(t)

•

(Update) Let σ(x) = u

#

(σ(t))

•

(Fork) Let (σ(x), σ(y)) = f (σ(t))

•

(Join) Let σ(x) = j (σ(t), σ(t

0

))

DEFINITION 15. A FJ-QUA (Σ, σ

0

, f , j ) implements the QUA

(S, s

0

) over the same interface if and only if for all revision di-

agrams, for all vertices x, the locally computed state σ(x) (as in

Def. 14) matches the path result (as in Def. 9).

EXAMPLE 3. Consider the QUA representing random access mem-

ory as deﬁned in Example 2. We can implement this QUA using an

FJ-QUA that maintains a “write-set” as follows:

Σ = S × P(A)

σ

0

= (s

0

, ∅)

load(a)

#

(s, W ) = s(a)

store(a, v)

#

(s, W ) = (s[a 7→ v], W ∪ {a})

f (s, W ) = ((s, W ), (s, ∅))

j ((s

1

, W

1

), (s

2

, W

2

)) = (s

0

, W

1

∪ W

2

)

where s

0

(a) =

s

1

(a) if a /∈ W

2

s

2

(a) if a ∈ W

2

The write set (together with the current state) provides sufﬁcient

information to conceptually replay all updates during join (since

only the last written value matters). Note that the write set gets

cleared on forks.

Since we can store a log of updates inside Σ, it is always possi-

ble to provide an FJ-QUA for any QUA (we show this construction

in detail in Section B.4 in the appendix). However, more space-

effective implementations are often possible for QUAs since logs

are typically compressible. We include several ﬁnite-state exam-

ples of FJ-QUAs in the appendix (Section C) as well.

4.1 System Models

If we have a FJ-QUA, we can implement eventually consistent

systems quite easily. We now present two models that demonstrate

this principle.

4.2 Single Synchronous Server Model

We ﬁrst present a model using a single server. We deﬁne the set of

devices I = C ∪{s} where C is the set of clients and s is the single

server. We store on each device i a state from the FJ-QUA, that is,

we deﬁne R : I * Σ. To keep the transition rules simple, we use

the notation R[i 7→ σ] to denote the map R modiﬁed by mapping

i to σ, and we let R(c 7→ σ) be a pattern that matches R, c, and σ

such that R(c) = σ. Each client can perform updates and queries

while reading and writing only the local state:

UPDATE(c, u):

σ

0

= u

#

(σ)

R(c 7→ σ) → R[c 7→ σ

0

]

QUERY(c, q, v):

q

#

(σ) = v

R(c 7→ σ) → R

As for synchronization, all we need is two rules, one to create a

new client (forking the server state), and one to perform the yield

on the client (joining the client state into the server, then forking a

fresh client state from the server):

SPAWN(c):

c /∈ dom R f (σ) = (σ

1

, σ

2

)

R(s 7→ σ) → R[s 7→ σ

1

][c 7→ σ

2

]

YIELD(c):

j (σ

1

, σ

2

) = σ

3

f (σ

3

) = (σ

4

, σ

5

)

R(s 7→ σ

1

)(c 7→ σ

2

) → R[s 7→ σ

4

][c 7→ σ

5

]

Thanks to Theorem 1, we can precisely argue why this system

is eventually consistent. By induction over the transitions, we can

show that each state σ appearing in R corresponds to a terminal

in the revision diagram, and each transition rule manipulates those

terminals (applying fork, join, update or query) in accordance with

the revision diagram construction rules. In particular, the join con-

dition is always satisﬁed since all forks and joins are performed by

the same server revision. Transactions are not interrupted by forks

or joins, and no vertices are neglected: each yield creates a path

from the freshly committed vertices into the server revision, from

where it must be visible to any new clients, and to any client that

performs an inﬁnite number of yields.

An interesting observation is that, if the fork does not modify

the left component (i.e. for all σ ∈ Σ, f(σ) = (σ, σ

0

) for some

σ

0

), the server is effectively stateless, in the sense that it does not

store any information about the client. This is a highly desirable

characteristics for scalability, and in our experience it is well worth

to go through some extra length in deﬁning FJ-QUAs that have this

property.

4.3 Server Pool Model

The single server model still suffers some drawbacks. For one,

clients performing a yield access both server and client state. This

means clients block if they have no connection. Also, a single

server may not scale to large numbers of clients.

We can ﬁx both of these issues by using a server pool rather than

a single server, i.e. we let the set of devices be I = C∪S where S is

a set of server identiﬁers. Using multiple servers not only improves

scalability, but it helps with disconnected operation as well: if we

keep one server next to each client (e.g. on the same mobile device),

we can guarantee that the client does not block on yield. Servers

themselves can perform a sync operation (at any convenient time)

to exchange state with other servers.

However, we need to keep additional information in each device

to ensure that the join condition is maintained. We do so by (1)

storing on each client c a pair (σ, n) where σ is the revision state

as before, and n is a counter indicating the current transaction, and

(2) storing on each server s a triple (σ, J, L) where σ is the revision

state as before, J is the set of servers that s may join, and L is a

vectorclock (a partial function (I → N)) indicating for each client

the latest transaction of c that s may join.

The transitions that involve the client are then as shown in

Fig. 4. The servers can perform forks and joins without involving

clients. On joins, servers join the state, take the union of the sets J

of joinable servers, and merge the vector clocks (deﬁned as taking

the pointwise maximum).

Again, we can use Theorem 1 to reason that ﬁnite executions

of this system are eventually consistent (for inﬁnite executions we

need additional fairness guarantees as discussed below). Again, all

states σ stored in R correspond to terminals in a revision diagram

and are manipulated according to the rules. This time, the join

condition is satisﬁed because of the following invariants: (1) if the

set J of server s

1

contains s

2

, then s

1

’s terminal is reachable from

the fork vertex that forked s

2

’s revision, and (2) if L(c) = n for

server s, and client c’s transaction counter is n, then s’ terminal is

reachable from the fork vertex that forked c’s revision.

Since the transition rules do not contain any guarantees that

force servers to synchronize with each other, it is possible to con-

struct inﬁnite executions that violate eventual consistency. Actual

implementations would thus likely add a mechanism to guarantee

that updates eventually reach the main revision, and that clients that

perform an inﬁnite sequence of transactions receive versions from

the main revision inﬁnitely often.

5. Related Work

For a high-level comparison of our work with various notions of

eventual consistency appearing in the literature, see Section 2.4.

Brieﬂy stated, our work is set apart by its unique use of revision

diagrams to determine both arbitration and visibility, rather than

separately using a causally consistent partial order for visibility,

and timestamps for arbitration.

There is of course a large body of work on transactions. Most

academic work considers strong consistency (serializable transac-

tions) only, and is thus not directly applicable to eventual consis-

tency. Nevertheless there are some similarities, to pick a few:

•

[9] provides insight on the limitations of serializable transac-

tions, and proposes similar workarounds as used by eventual

consistency (timestamps and commutative updates). However,

transactions remain tentative during disconnection.

•

Snapshot isolation [7] relaxes the consistency model, but trans-

actions can still fail, and can not commit in the presence of net-

work partitions.

•

Coarse-grained transactions [10, 13] share with our work the

use of abstract data types to facilitate concurrent transactions.

•

Automatic Mutual Exclusion [1], like our work, uses yield state-

ments to separate transactions.

Previous work on revisions [2, 5, 3, 4] introduces revision di-

agrams and conﬂict resolution. In this paper we feature a simpler,

more direct deﬁnition using graph construction rules. Also, we pur-

sue a different goal (eventually consistent transactions in a dis-

tributed system, rather than deterministic parallel programming).

In particular, eventually consistent transactions exhibit pervasive

nondeterminism caused by factors that are by deﬁnition outside the

control of the system, such as network partitions. Also, this paper

is the ﬁrst to give a single, simple formalization of merge functions

(FJ-QUAS are optimized implementations of QUAs).

Research on persistent data types [12] is related to our deﬁ-

nition of FJ-QUAs insofar it concerns itself with efﬁcient imple-

mentations of data types that permit retrieval and mutations of past

versions. However, it does not concern itself with apects related to

transactions or distribution.

Prior work on operational transformations [18] can be under-

stood as a specialized form of eventual consistency where updates

are applied to different replicas in different orders, but are them-

selves modiﬁed in such a way as to guarantee convergence. This

specialized formulation can provide highly efﬁcient broadcast-

based real-time collaboration, but poses signiﬁcant implementation

challenges [11].

If we consider transactions with single elements only, it is sen-

sible to compare our work with related work on conﬂict-free repli-

cated data types (CRDTs) [17] and Bayou’s weakly consistent

replication [19].

•

Our deﬁnition is strictly more general than CRDTs [17] in the

following sense: From any state-based CRDT we can obtain

a FJ-QUA by using the same state and initial state, the same

UPDATE(c, u):

σ

0

= u

#

(σ)

R(c 7→ (σ, n)) → R[c 7→ (σ

0

, n)]

QUERY(c, q, v):

q

#

(σ) = v

R(c 7→ (σ, L)) → R

SPAWN(c):

c /∈ dom R f (σ) = (σ

1

, σ

2

) L

0

= L[c 7→ 0]

R(s 7→ (σ, J, L)) → R[s 7→ (σ

1

, J, L

0

)][c 7→ (σ

2

, 0)]

YIELD(s, c):

L(c) = n L

0

= L[c 7→ n + 1] j (σ

1

, σ

2

) = σ

3

f (σ

3

) = (σ

4

, σ

5

)

R(s 7→ (σ

1

, J, L))(c 7→ (σ

2

, n)) → R[s 7→ (σ

4

, J, L

0

)][c 7→ (σ

5

, n + 1)]

FORK(s

1

, s

2

):

s

2

/∈ dom R f (σ) = (σ

1

, σ

2

) J

0

= J ∪ {s

2

}

R(s

1

7→ (σ, J, L)) → R[s

1

7→ (σ

1

, J

0

, L)][s

2

7→ (σ

2

, J, L)]

JOIN(s

1

, s

2

):

s

2

∈ J

1

σ

0

= j (σ

1

, σ

2

) J

0

= J

1

∪ J

2

L

0

= merge(L

1

, L

2

)

R(s

1

7→ (σ

1

, J

1

, L

1

))(s

2

7→ (σ

2

, J

2

, L

2

)) → R[s

1

7→ (σ

0

, J

0

, L

0

)][s

2

7→ ⊥]

Figure 4. The server pool model.

query and update functions, a fork function that creates a new

replica and then merges the forker state, and a join function

that uses the merge. Note that the deﬁnition of strong eventual

consistency in [17], just like ours, requires that updates can be

applied to any state.

•

In Bayou [19], and in the Concurrent Revisions work[5], users

can specify how to resolve conﬂicting updates by writing cus-

tom merge functions. At ﬁrst sight, this may appear more gen-

eral that QUAs. However, by performing a simple automatic

transformation of the QUA and the client program, we can sup-

port merge functions for conﬂict resolution purposes. The rea-

son is that QUAs already allow updates to perform any desired

total function. We describe this transformation in Section B.2

in the appendix.

6. Conclusion and Future Work

We have proposed eventually consistent transactions as a consis-

tency model that (1) generalizes earlier deﬁnitions of eventual con-

sistency and (2) shows how to make some strong guarantees (trans-

actions never fail, all code runs in transactions) to compensate for

weak consistency. We have shown that revision diagrams provide a

convenient way to build correct implementations of eventual con-

sistency, by relying on just a handful of simple rules that are easily

visualized using diagrams.

In future work, we would like to extend the study of the pro-

gramming model, investigate a selection of basic FJ-QUAs, and

ways to combine them. Furthermore, we would like to understand

whether stronger consistency guarantees are possible for subclasses

of eventually consistent transactions, and whether such classes can

be automatically recognized or synthesized.

Acknowledgments. We thank Marc Shapiro for introducing us to

a principled world of eventual consistency, and for general guid-

ance. We also thank Tom Ball, Sean McDirmid and Benjamin

Wood for inspired discussions, helpful examples, and constructive

comments.

References

[1] M. Abadi, A. Birrell, T. Harris, and M. Isard. Semantics of

transactional memory and automatic mutual exclusion. In Principles

of Programming Languages (POPL), 2008.

[2] S. Burckhardt, A. Baldassin, and D. Leijen. Concurrent programming

with revisions and isolation types. In Object-Oriented Programming,

Systems, Languages, and Applications (OOPSLA), 2010.

[3] S. Burckhardt, D. Leijen, and M. F

¨

ahndrich. Roll forward, not

back: A case for deterministic conﬂict resolution. In Workshop on

Determinism and Correctness in Parallel Progr., 2011.

[4] S. Burckhardt, D. Leijen, J. Yi, C. Sadowski, and T. Ball. Two for

the price of one: A model for parallel and incremental computation

(distinguished paper award). In Object-Oriented Programming,

Systems, Languages, and Applications (OOPSLA), 2011.

[5] Sebastian Burckhardt and Daan Leijen. Semantics of concurrent

revisions,. In European Symposium on Programming (ESOP),

LNCS, volume 6602, pages 116–135, 2011. Full version as Microsoft

Technical Report MSR-TR-2010-94.

[6] G. Decandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,

A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo:

amazon’s highly available key-value store. In Symposium on

Operating Systems Principles, pages 205–220, 2007.

[7] A. Fekete, D. Liarokapis, E. O’Neil, P. O’Neil, and D. Shasha.

Making snapshot isolation serializable. ACM Trans. Database Syst.,

30(2):492–528, 2005.

[8] S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of

consistent, available, partition-tolerant web services. SIGACT News,

33:51–59, June 2002.

[9] J. Gray, P. Helland, P. O’Neil, and D. Shasha. The dangers of

replication and a solution. Sigmod Record, 25:173–182, 1996.

[10] M. Herlihy and E. Koskinen. Transactional boosting: a methodology

for highly-concurrent transactional objects. In Principles and

Practice of Parallel Programming (PPoPP), 2008.

[11] A. Imine, M. Rusinowitch, G. Oster, and P. Molli. Formal design

and veriﬁcation of operational transformation algorithms for copies

convergence. Theoretical Computer Science, 351:167–183, 2006.

[12] H. Kaplan. Persistent data structures. In Handbook on Data Structures

and Applications, pages 241–246. CRC Press, 1995.

[13] E. Koskinen, M. Parkinson, and M. Herlihy. Coarse-grained

transactions. In Principles of Programming Languages (POPL),

2010.

[14] K. Petersen, M. Spreitzer, D. Terry, M. Theimer, and A. Demers.

Flexible update propagation for weakly consistent replication.

Operating Systems Review, 31:288–301, 1997.

[15] Y. Saito and M. Shapiro. Optimistic replication. ACM Computing

Surveys, 37:42–81, 2005.

[16] M. Shapiro and B. Kemme. Eventual consistency. In Encyclopedia

of Database Systems, pages 1071–1072. 2009.

[17] M. Shapiro, N. Preguia, C. Baquero, and M. Zawirski. Conﬂict-free

replicated data types. 2011.

[18] C. Sun and C. Ellis. Operational transformation in real-time group

editors: issues, algorithms, and achievements. In Conference on

Computer Supported Cooperative Work, pages 59–68, 1998.

[19] D. Terry, M. Theimer, K. Petersen, A. Demers, M. Spreitzer, and

C. Hauser. Managing update conﬂicts in bayou, a weakly connected

replicated storage system. SIGOPS Oper. Syst. Rev., 29:172–182,

December 1995.

[20] J. Valdes, R. Tarjan, and E. Lawler. The recognition of series parallel

digraphs. In ACM Symposium on Theory of Computing, pages 1–12,

1979.

A. Proofs

A.1 Proof of Lemma 2

PROOF. For a vertex v, we deﬁne the pedigree ped(x) to the

word in {s, j}

∗

obtained by (1) taking the root path of x and

concatenating the edge labels along the path into a word, and (2)

removing any trailing s letters from the word. Clearly, two vertices

are in the same revision iff they have the same pedigree.

Now we deﬁne x ≤

l

y iff ped(x) < ped(y) in the lexico-

graphic order on the words {s, j}

∗

where we order the letters as

s < j.

4

Then it is straightforward to show that ≤

l

is transitive. Claim

(1) follows from the observation above (revision determined by

pedigree). Claim (2) follows since adding a f at the end of a word

increases it in the lexicographic order. For claim (3), consider the

construction rule for the join: the join condition tells us that the

root path of the joiner terminal and the root path of the joinee

terminal must have a common preﬁx from which they diverge in

such a way that the joiner terminal’s path continues with s and the

joinee terminal’s path continues with f , which implies the desired

order after we add the join edge.

A.2 Proof of Lemma 3

PROOF. For a path p, let w(p) ⊂ {f, s, j}

∗

be the word representing

the sequence of edge labels, and let < be the lexicographic order

on words induced by the total order s < j < f on labels. Now

assume the claim is false and choose a path p = (v

0

, v

1

, . . . , v

n

)

from x = v

0

to y = v

n

such that w(p) is minimal with respect to

<. Since the claim is false there must exist i < j such that

x = v

0

→

∗

v

i

f

−→ v

i+1

s

−→

∗

v

j

j

−→ v

j+1

→

∗

v

n

= y.

Now, consider the construction step that added v

j+1

to the graph

(which is a join vertex). The join condition implies that there exists

an alternate path from v

i

to v

j+1

that starts with an s-edge rather

than a f-edge, which contradicts the assumption that p is minimal

with respect to <.

A.3 Proof of Lemma 5

PROOF. Since y has a path to T (y), and x has a path to y, x has a

path to T(y), thus by Lemma 3 there must exist a direct path (i.e.

all j-edges appear before all f-edges) from x to T (y). If this path

does not contain fork edges, then x ∼

j

y and the claim follows

because T (x) = T (y). Otherwise, let z be the vertex originating

the ﬁrst f-edge on the path. Then (a) x ∼

j

z, and (b) z has a

path to T (y) containing only s- and f -edges. From (a) we know

z has a path to T(x) containing only s- and j-edges, which implies

T (x) ≤

l

z (using conditions (1) and (3) of Lemma 2). From (b) we

know z ≤

l

T (y) (using conditions (1) and (2) of Lemma 2). Thus

T (x) ≤

l

T (y) q.e.d.

B. Additional Material

B.1 Notions of Equivalence

Naturally, we may ask questions about how different QUAs with

the same interface can be compared, and whether two implementa-

tions are equivalent.

DEFINITION 16. Two QUAs (S

a

, s

a

) and (S

b

, s

b

) for the same

interface (Q, V, U) are isomorphic if there exists a bijection ρ :

S

a

→ S

b

such that ρ(s

a

) = s

b

, and such that for all u ∈

U, q ∈ Q, and s ∈ S

a

we have q

#a

(s) = q

#b

(ρ(s)) and

ρ(u

#a

(s)) = u

#b

(ρ(s)).

4

To be precise: we deﬁne the lexicographic order over words by stating

w

1

< w

2

if either w

1

is a preﬁx of w

2

, or there exists an index k such that

w

1

[i] = w

2

[i] for i < k and w

1

[k] < w

2

[k].

Isomorphism is a very strong notion of equivalence. In fact, it is

often not necessary to require the existence of a bijection between

S

a

and S

b

. The reason is that we consider the state of a database

(its “internal organization”) to be only indirectly observable, by

queries. Thus, some aspects of the state may be irrelevant and can

be abstracted away. The following deﬁnition clariﬁes this intuition.

DEFINITION 17. Two QUAs (S

a

, s

a

) and (S

b

, s

b

) for the same

interface (Q, V, U) are observationally equivalent if for all n ≥ 0,

for all u

1

, . . . , u

n

∈ U

∗

, and for all q ∈ Q it is always the case

that q

#a

(u

#a

n

(. . . (u

#a

1

(s

a

)))) = q

#b

(u

#b

n

(. . . (u

#b

1

(s

b

)))).

This deﬁnition has a very practical implication: it means that

many different implementations of a QUA can satisfy the same re-

quirements. Even more interestingly, if the set of queries is limited,

it may be possible to dramatically reduce the space requirements

for storing the state.

B.2 Supporting custom conﬂict resolution in QUAs

Suppose we start with some QUA and would like to add a cus-

tomized conﬂict resolution. Given some user-deﬁned conﬂict res-

olution function f : U × S × S → S that computes the state

f(u, s, s

0

) that should result from update u being issued in state s

but applied in a possibly different state s

0

, we can construct a QUA

that uses f by (1) adding a query q

snapshot

to the set Q of queries,

that returns the current state q

#

snapshot

(s) = s, and (2) add for each

state s and update u an update u

s

to the set of U of updates, and

deﬁne u

#

s

(s

0

) = f(s, s

0

, u

#

(s)). Now, if the user issues updates

of the form u

s

that include a snapshot s of the current state (ob-

tained by calling q

snapshot

), the conﬂict resolution function is used

as desired.

B.3 Counterexample to the converse of Thm 1

Consider a history containing 7 operations (all by different clients,

and all succeeded by a yield), ordered by the following visibility

order:

x

y

z

u

OO

FF

v

XX0

0

0

0

0

FF

w

OO

XX1

1

1

1

1

s

FF

OOXX1

1

1

1

1

Where arrows represent the partial order <

v

. Deﬁne now the arbi-

tration order as x <

a

y <

a

z <

a

u <

a

v <

a

w <

a

s. Then all

the conditions for eventual consistency are satisﬁed (we need not

bother determining exactly what updates and queries to use). How-

ever, these partial orders cannot result from any revision diagram

since we cannot place the events on a revision diagrams without

creating additional <

v

edges.

To prove this, we need a few more structural graph properties.

First, for any set of nodes x

1

, . . . , x

n

in the same join component

(i.e. nodes for which T (x

1

) = · · · = T (x

n

)) we deﬁne the

“join point” J(x

1

, . . . , x

n

) to be the vertex that was added during

the construction step which ﬁrst caused the equality T (x

1

) =

T (x

2

) = · · · = T(x

n

) to hold (this will either be an update, query,

or join step).

LEMMA 7. If x

1

, . . . , x

n

are nodes in a revision diagram, and

x

i

<

v

z, then J(x

1

, . . . x

n

) <

v

z.

PROOF. Suppose the lemma is not true, i.e. we have x

i

with

x

i

<

v

z but not J(x

1

, . . . , x

n

) <

v

z (i.e. either J(x

1

, . . . , x

n

)

does not exist, or does not have a path to z). There must exist di-

rect paths p

i

from x

i

to z by Lemma 3. Without loss of generality,

we can assume we picked x

i

and p

i

in such a way as to mini-

mize the sum of the lenghts of p

i

. Consider the vertex on which

all of these paths merge, lets call it a. One of the incoming edges

of a must be a j edge. Let’s say this edge is on the path p

t

com-

ing from x

t

. Since the segment of p

t

from x

t

to a ends with a j-

edge, it can contain only s- and j-edges (because it is direct), thus

T (x

t

) = T (a). But this means we could have picked a instead of

x

t

(clearly a <

v

z, but not J(x

1

, . . . , x

t−1

, a, x

t+1

, . . . x

n

) <

v

z:

if J(x

1

, . . . , x

t−1

, a, x

t+1

, . . . x

n

) exists, so does J(x

1

, . . . , x

n

),

and J(x

1

, . . . , x

n

) <

v

J(x

1

, . . . , x

t−1

, a, x

t+1

, . . . x

n

) contra-

dicting the assumption) , contradicting minimality.

This lemma implies that J(u, v, w) <

v

s, in particular, it must

exist. Since join components are (upside-down) trees whose nodes

have arity one or two, the following must hold without loss of

generality (for some permutation of x,y,z)

J(x, y) <

v

J(x, y, z) and J(y, z) = J(x, z) = J(x, y, z)

But by the above lemma, J(y, z) <

v

w, thus J(x, y, z) <

v

w,

thus x <

v

w which contradicts the deﬁnition of <

v

for this

example.

B.4 The standard FJ-QUA construction

THEOREM 2. For any QUA (S, s

0

) over some interface (Q, V, U),

there exists a FJ-QUA (Σ, σ

0

, f , j ) that implements it.

PROOF. By construction. Let the state Σ = S × U

∗

be a pair of

the QUA state and a log of operations. Deﬁne,

σ

0

= (s

0

, ) (4)

u

#r

(s, l) = (s, l.u) (5)

q

#r

(s, l) = q

#

(apply l s) (6)

f

r

(s, l) = ((s, l), (apply l s, )) (7)

j

r

(s

1

, l

1

)(s

2

, l

2

) = (s

1

, l

1

.l

2

) (8)

(9)

where

apply s = s (10)

apply l.u s = u

#

(apply l s) (11)

Let ρ : Σ → S be apply l s. The labeling of the original QUA

corresponds exactly to the labeling of the FJ-QUA mapped by ρ

on the same revision diagram. Furthermore, q

#

ρ(σ) = q

#r

σ by

deﬁnition.

We call the QUA algebra constructed in the above proof the

reference implementation of the given QUA implementation.

DEFINITION 18. A FJ-QUA B = (Σ

b

, σ

b

, f

b

, j

b

) implements

another FJ-QUA A = (Σ

a

, σ

a

, f

a

, j

a

) over the same interface

(Q, V, U) if and only if 1) there exists a function comp : Σ

a

→ Σ

b

,

such that the labeling of a revision diagram computed by A can be

transformed into the labeling computed by B on the same diagram

by applying comp to each vertex, and 2) for each query q and each

state σ ∈ Σ

a

, the query is answered the same by both FJ-QUAs:

q

#a

σ = q

#b

(comp σ).

Deﬁnition 18 together with Theorem 2 provide us with an alter-

native formulation of Deﬁnition 15 by saying that a revision algebra

B implements a QUA, if B implements the reference implementa-

tion of the QUA.

C. Bounded FJ-QUA Implementations

This section contains a few examples of QUA’s along with opti-

mized FJ-QUAs. For each case, we show that the revision algebra

implements the QUA using bounded space.

C.1 Counter with Reset

A counter with reset consists of a single integer along with two

update operations: inc, incrementing the counter, and reset, setting

the counter to 0. The query is get which simply retrieves the counter

value. The QUA implementation is given by

Σ = int

σ

0

= 0

inc

#

n = n + 1

reset

#

n = 0

get

#

n = n

C.1.1 Counter FJ-QUA

An effective implementation of a revision consistent system re-

quires ﬁnding a more optimized representation of the state and cor-

responding updates/queries in order to avoid having to compute the

sequence of operations of a forked branch at a join point and ap-

plying them one-by-one to the target revision. Instead, the goal of

the optimized representation is to summarize the effect of a branch

compactly, so that it can be applied at a join point in time bounded

by the amount of observable information, rather than number of

updates on the branch.

For the counter example, our FJ-QUA (Σ

b

, σ

b

, f

b

, j

b

) consists

of

Σ

b

= bool × int × int

σ

b

= (false, 0, 0)

inc

#b

(r, b, d) = (r, b, d + 1)

reset

#b

(r, b, d) = (true, 0, 0)

get

#b

(r, b, d) = b + d

f

b

(r, b, d) = ((r, b, d), (false, b + d, 0))

j

b

(r

1

, b

1

, d

1

) (r

2

, b

2

, d

2

) =

(true, b

2

, d

2

) if r

2

= true

(r

1

, b

1

, d

1

+ d

2

) otherwise

THEOREM 3. The counter FJ-QUA implements the counter QUA

in constant space.

PROOF. The space bound is clear given Σ

b

. We show that the FJ-

QUA implements the reference implementation (Σ

r

, σ

r

, f

r

, j

r

) of

the counter QUA using the function comp : Σ

r

→ Σ

b

given by:

comp(n, ) = (false, n, 0) (12)

comp(n, l.u) = u

#b

(comp(n, l)) (13)

We now need to show 1) the consistent labeling of any revision

diagram and 2) that queries are anwered the same by both imple-

mentations as summarized by the following proof obligations:

comp(σ

0

, ) = σ

b

(14)

∀u. comp(u

#r

(n, l)) = u

#b

(comp(n, l)) (15)

(σ

1

, σ

2

) = f

r

σ ⇒ (comp σ

1

, comp σ

2

) = f

b

(comp σ)

(16)

comp(j

r

(σ

1

, σ

2

)) = j

b

(comp σ

1

, comp σ

2

) (17)

∀q.q

#r

(n, l) = q

#b

(comp(n, l)) (18)

To show (14), we apply (12) to σ

0

= 0 which gives us (false, 0, 0) =

σ

b

. To show (15), observe that comp(u

#r

(n, l)) = comp(n, l.u)

by (5), applying (13), this is equal to u

#b

(comp(n, l)). To show (16),

ﬁrst note that by (7) σ

1

= σ, and thus comp(σ

1

) = comp(σ) which

matches the deﬁnition of f

b

. Assuming σ = (n, l), we have σ

2

=

(apply l n, ), and we need to show comp σ

2

= (false, b+ d, 0), as-

suming comp σ = (x, b, d). comp(apply l n, ) = (false, apply l n, 0)

by (12). Thus it remains to show that apply l n = b+d, i.e., the sum

of the second and third component of comp(n, l). We proceed by

induction over the length of log l. If l = , then apply l n = n

and comp σ = (false, n, 0) (where b = n and d = 0). If

l = l

1

.u, then we can assume that comp(n, l

1

) = (.., b

0

, d

0

) and

that b

0

+ d

0

= apply l

1

n. We split on the form of u. If u = reset,

then apply l n = 0 and comp(n, l) = reset

#b

(comp(n, l

1

)) =

(true, 0, 0). If u = inc, then apply l n = 1 + apply l

1

n, and

comp(n, l) = inc

#b

(comp(n, l

1

)) = (.., b

0

, d

0

+ 1), thus estab-

lishing that b

0

+ d

0

+ 1 = apply l n.

To show (17), note that comp(j

r

(σ

1

, σ

2

)) = comp(j

r

(n

1

, l

1

)(n

2

, l

2

)) =

comp(n

1

, l

1

.l

2

) = comp(comp(n

1

, l

1

), l

2

) = comp(comp σ

1

, l

2

).

We consider 2 cases based on the implementation of j

b

: if comp(n

2

, l

2

) =

(false, b

2

, d

2

), then by f

b

, inc

#b

, reset

#b

, we know that l

2

con-

tains no set operation, and that the number of inc operations is

d

2

. Let comp σ

1

= (s

1

, b

1

, d

1

), then j

b

(comp σ

1

, comp σ

2

) =

(s

1

, b

1

, d

1

+d

2

). Note that comp(comp σ

1

, l

2

) = inc

#b

(... inc

#b

(s

1

, b

1

, d

1

)),

where the number of inc

#b

operations is d

2

. By inc

#b

, the result

is (s

1

, b

1

, d

1

+ d

2

). For the other case, we have comp(n

2

, l

2

) =

(true, b

2

, d

2

) and j

b

(comp σ

1

, comp σ

2

) = comp σ

2

. By f

b

, inc

#b

, reset

#b

,

we know there exists a reset operation in l

2

followed by d

2

inc op-

erations. Thus, comp(comp(n

1

, l

2

), l

2

) = comp(x, l

2

) for any x,

in particular for n

2

, thus comp(j

r

(σ

1

, σ

2

)) = comp(σ

2

).

Finally, we show that all queries q return the same value in

both implementations (18) by induction over l. If l = , then

get(n, l) = get

#

(apply n ) = n and get

#b

(comp(n, l)) =

get

#b

(false, n, 0) = n + 0 = n. For l = l

1

.u, we perform a case

analysis based on u. We have get(n, l

1

) = get

#b

(comp(n, l

1

)). If

u = inc, then get(n, l) = get

#

(apply n l) = get

#

(inc

#

(apply n l

1

)) =

1+get

#

(apply n l

1

) = 1+get(n, l

1

) = 1+get

#b

(comp(n, l

1

)) =

get

#b

(inc

#b

(comp(n, l

1

))) = get

#b

(comp(n, l)). If u = reset,

then get(n, l) = get

#

(apply n l) = get

#

(reset

#

(apply n l

1

)) =

get

#

0 = 0 = get

#b

(reset

#b

(comp(n, l

1

))) = get

#b

(comp(n, l)).

C.2 Integer Register

An integer register is a generalization of a counter with reset. It also

consists of a single integer value along with two update operations:

add d, adding a delta to the integer register, and set n, setting the

counter to n. The query is get which simply retrieves the register

value. The initial state is 0. The corresponding QUA is

Σ = int

σ

0

= 0

add

#

d n

0

= n

0

+ d

set

#

n n

0

= n

get

#

n

0

= n

0

A FJ-QUA for the integer register with constant space looks

similar to the optimized counter.

Σ

b

: bool × int × int

σ

b

: (false, 0, 0)

add

#b

n (r, b, d) = (r, b, d + n)

set

#b

n (r, b, d) = (true, n, 0)

get

#b

(r, b, d) = b + d

f

b

(r, b, d) = (r, b, d), (false, b + d, 0)

j

b

(r

1

, b

1

, d

1

)(r

2

, b

2

, d

2

) =

(true, b

2

, d

2

) if r

2

= true

(r

1

, b

1

, d

1

+ d

2

) otherwise

THEOREM 4. The integer register FJ-QUA implements the integer

register QUA in constant space.

PROOF. We use the identical comp function as for the counter and

the proof is analogous.

C.3 High Score

The high-score problem is to maintain the top k score-name pairs

for some game. The state consists of a list of at most k such pairs,

ordered by decreasing score (and increasing arbitration order in

case of a tie). The single update operation is post s p, posting a new

score of s from player p. The query operation is get i, retrieving the

ith score (where i is less than k). The initial value of the high-score

table is all 0s and empty names. Written as a QUA we have

Σ = listhint, stringi

σ

0

= [(0, ), (0, ), . . . , (0, )]

post

#

s n l = take k(insertion-sort(s, n)l)

get

#

i l = l[i]

An optimized high-score maintains an additional high-score list

for only the newly posted scores on a branch, such that at a merge,

only the new scores are merged into the main revision. If we were

to merge the high score table from the branch, we might end up

with duplicate scores that were posted prior to the fork.

Σ

b

: listhint, stringi × listhint, stringi

σ

b

: ([(0, ), (0, ), . . . , (0, )], [(0, ), (0, ), . . . , (0, )])

post

#b

s n (l, r) = (take k(insertion-sort(s, n)l),

take k(insertion-sort(s, n)r))

get

#b

i (l, r) = l[i]

f

b

(l, r) = (l, r), (l, [(0, ), (0, ), . . . , (0, )])

j

b

(l

1

, r

1

)(l

2

, r

2

) = (merge-sort l

1

r

2

,

merge-sort r

1

r

2

)