Article

Efficient State-based CRDTs by Delta-Mutation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs achieve this by sharing local state changes through shipping the entire state, that is then merged to other replicas with an idempotent, associative, and commutative join operation, ensuring convergence. This imposes a large communication overhead as the state size becomes larger. We introduce Delta State Conflict-Free Replicated Datatypes ({\delta}-CRDT), which make use of {\delta}-mutators, defined in such a way to return a delta-state, typically, with a much smaller size than the full state. Delta-states are joined to the local state as well as to the remote states (after being shipped). This can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. We introduce the {\delta}-CRDT framework, and we explain it through establishing a correspondence to current state- based CRDTs. In addition, we present two anti-entropy algorithms: a basic one that provides eventual convergence, and another one that ensures both convergence and causal consistency. We also introduce two {\delta}-CRDT specifications of well-known replicated datatypes.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... On the other hand, UN is based on non-conflicting operations. So, no concurrency control mechanism is needed and no operation abortion or rollback may arise [59,60,2]. This deserves a good (++) mark. ...
... Some criteria examples have been proposed elsewhere. For instance, delta-state CRDTs [2] reduce the amount of elements to be modified by each pending operation, improving in this way the behaviour of UN. Also, they transfer only the latest value of the updated elements in case of long lasting partitions. ...
... No single combination is the best for all possible goals. In some proposals [2], this has led to the implementation of several mechanisms in a particular axis, using the best of them in each particular scenario. ...
Article
Eventual consistency is demanded nowadays in geo-replicated services that need to be highly scalable and available. According to the CAP constraints, when network partitions may arise, a distributed service should choose between being strongly consistent or being highly available. Since scalable services should be available, a relaxed consistency (while the network is partitioned) is the preferred choice. Eventual consistency is not a common data-centric consistency model, but only a state convergence condition to be added to a relaxed consistency model. There are still several aspects of eventual consistency that have not been analysed in depth in previous works: 1. which are the oldest replication proposals providing eventual consistency, 2. which replica consistency models provide the best basis for building eventually consistent services, 3. which mechanisms should be considered for implementing an eventually consistent service, and 4. which are the best combinations of those mechanisms for achieving different concrete goals. This paper provides some notes on these important topics. This paper is available at: http://dx.doi.org/10.4149/cai_2018_5_1037
... Delta-groups can always be re-transmitted and re-joined, possibly out of order, or can simply be subsumed by a less frequent sending of the full state, e.g. for performance reasons or when doing state transfers to new members. Due to space limits, we only address causal consistency in this paper, while information about state convergence can be found in the associated technical report [13]. ...
... Proof. Please see the associated technical report [13]. Corollary 1. (δ-CRDT causal consistency) Any δ-CRDT in which states are propagated and joined using a delta-interval-based anti-entropy algorithm satisfying the causal delta-merging condition ensures causal consistency. ...
... Proof. Please see the associated technical report [13]. ...
Article
Eventual consistency is a relaxed consistency model used in large-scale distributed systems that seek better availability when consistency can be delayed. CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs achieve this through shipping the entire replica state that is, eventually, merged to other replicas ensuring convergence. This imposes a large communication overhead when the replica size or the number of replicas gets larger. In this work, we introduce a decomposable version of state-based CRDTs, called Delta State-based CRDTs (δ-CRDT). A δ-CRDT is viewed as a join of multiple fine-grained CRDTs of the same type, called deltas (δ). The deltas are produced by applying δ-mutators, on a replica state, which are modified versions of the original CRDT mutators. This makes it possible to ship small deltas (or batches) instead of shipping the entire state. The challenges are to make the join of deltas equivalent to the join of the entire object in classical state-based CRDTs, and to find a way to derive the δ-mutators. We address this challenge in this work, and we explore the minimal requirements that a communication algorithm must offer according to the guarantees provided by the underlying messaging middleware.
... Delta-state CRDTs (or delta CRDTs for short) have been proposed to alleviate this overhead by passing only partial information about the sender's state. This information typically consists in a representation of the effect of the last update operations performed on the local state [5,6]. ...
... The drawback is that they require shipping the entire state of a replica, which can yield significant communication overhead for container CRDTs (e.g., sets, maps, graphs), which can store large amounts of data [3]. Delta-state based CRDTs ( -CRDTs) mitigate this issue by only shipping in a synchronization message the change that has been made recently to a replica, rather than its full state [5,18,19]. This change is expressed as the join of multiple fine-grained states called deltas. ...
Article
Full-text available
Conflict-Free Replicated Data Types (CRDTs) are data types that can be used in distributed systems when optimistic replication is tolerable. Replicas can be updated locally, without coordination, and consistency is obtained eventually by asynchronously propagating updates among replicas. Because CRDTs can tolerate asynchronous transmissions, they can serve as software elements in opportunistic networks (OppNets), where the dissemination of information is dependent on unplanned transient radio contacts between mobile nodes. In this paper we investigate the problem of implementing operation-based, state-based, and delta-state-based CRDTs in OppNets. A contact-driven synchronization algorithm is proposed for each kind of CRDT, and experiments based on realistic tracesets are conducted in order to compare how these algorithms can perform in an OppNet. Experimental results show that delta-state-based CRDTs globally outperforms operation-based and pure state-based CRDTs, especially when considering the number of messages required to ensure the synchronization of replicas.
... This becomes costly when CRDTs grow larger. A solution to this problem is discussed by Almeida et al. [2] by only transmitting state-deltas instead of the complete data structure. In addition, certain CRDT designs su er from state in ation, e.g., due to accumulation of tombstone values. ...
Preprint
General solutions of state machine replication have to ensure that all replicas apply the same commands in the same order, even in the presence of failures. Such strict ordering incurs high synchronization costs caused by distributed consensus or by the use of a leader. This paper presents a protocol for linearizable state machine replication of conflict-free replicated data types (CRDTs) that neither requires consensus nor a leader. By leveraging the properties of state-based CRDTs - in particular the monotonic growth of a join semilattice - synchronization overhead is greatly reduced. In addition, updates just need a single round trip and modify the state `in-place' without the need for a log. Furthermore, the message size overhead for coordination consists of a single counter per message. While reads in the presence of concurrent updates are not wait-free without a coordinator, we show that more than 97% of reads can be handled in one or two round trips under highly concurrent accesses. Our protocol achieves high throughput without auxiliary processes like command log management or leader election. It is well suited for all practical scenarios that need linearizable access on CRDT data on a fine-granular scale.
... We use it both as a distributed key/value store for IoT sensor data and a propagation tool for our generic task model. Lasp provides access to a wide range of Conflict-free Replicated Data Types (CRDTs), that ensure that conflicting operations on a same data entry are automatically handled using the underlying conflict resolution algorithm [10,18,31]. Consequently, Achlys clusters are able to preserve strong eventual consistency of data across nodes. ...
Preprint
Full-text available
Edge computing is one of the key success factors for future Internet solutions that intend to support the ongoing IoT evolution. By offloading central areas using resources that are closer to clients, providers can offer reliable services with higher quality. But even industry standards are still lacking a valid solution for edge systems with actual sense-making capabilities when no preexisting infrastructure whatsoever is available. The current edge model involves a tight coupling with gateway devices and Internet access, even when autonomous ad hoc IoT networks could perform partial or even complete tasks correctly. In our previous research efforts, we have introduced Achlys, an Erlang programming framework that takes advantage of the GRiSP embedded system capabilities in order to bring edge computing one step further. GRiSP is an embedded board that can easily be programmed directly in Erlang without requiring deep low level knowledge, which offers the extensive toolset of the Erlang ecosystem directly on bare metal hardware. We have been able to demonstrate that our framework allows building reliable applications on unreliable networks of unreliable GRiSP nodes with a very simple programming API. In this paper, we present how Erlang can successfully be used to address edge computing challenges directly on IoT sensor nodes, taking advantage of our existing framework. We display results of deployed distributed programs at the edge and examples of the unique advantage that is offered by Erlang higher-order and concurrent programming in order to achieve reliable general-purpose computing through Achlys.
... CRDTs in Lasp are implemented using additional metadata that allows each operation at each node to be taken into consideration. In fact, the Lasp library uses an efficient implementation of CRDTs called delta-based dissemination mode, which propagates only delta-mutators [27], [22], i.e., update operations, instead of the full state, to achieve consistency. This uses significantly less traffic between nodes than a naive implementation that propagates the full state. ...
Conference Paper
Full-text available
Internet of Things (IoT) continues to grow exponentially , in number of devices and the amount of data they generate. Processing this data requires an exponential increase in computing power. For example, aggregation can be done directly at the edge. However, aggregation is very limited; ideally we would like to do more general computations at the edge. In this paper we propose a framework for doing general-purpose edge computing directly on sensor networks themselves, without requiring external connections to gateways or cloud. This is challenging because sensor networks have unreliable communication, unreliable nodes, and limited (if any) computing power and storage. How can we implement production-quality components directly on these networks? We need to bridge the gap between the unreliable, limited infrastructure and the stringent requirements of the components. To solve this problem we present Achlys, an edge computing framework that provides reliable storage, computation, and communication capabilities directly on wireless networks of IoT sensor nodes. Using Achlys, the sensor network is able to configure and manage itself directly, without external connectivity. Achlys combines the Lasp key/value store and the Partisan communication library. Lasp provides efficient decentralized storage based on the properties of CRDTs (Conflict-Free Replicated Data Types). Partisan provides efficient connectivity and broadcast based on hybrid gossip. Both Lasp and Partisan are specifically designed to be extremely resilient. They are able to continue working despite high node churn, frequent network partitions, and unreliable communication. Our first implementation of Achlys is on a network of GRiSP embedded system boards. We choose GRiSP as our first implementation platform because it implements high-level functionality, namely Erlang, directly on the bare hardware and because it directly supports Pmod sensors and wireless connectivity. We give some first results on using Achlys for building edge systems and we explain how we plan to evolve Achlys in the future. Achlys is a work in progress that is being done in the context of the LightKone European H2020 research project, and we are in the process of implementing and evaluating a proof-of-concept application in the area of precision agriculture.
... Delta-state CRDTs [3,4] address this issue in a principled way by propagating delta-mutators, that encode the changes that have been made to a replica since the last communication. The first time a replica communicates with some other replica, the full state needs to be propagated. ...
Preprint
Internet-scale distributed systems often replicate data at multiple geographic locations to provide low latency and high availability, despite node and network failures. Geo-replicated systems that adopt a weak consistency model allow replicas to temporarily diverge, requiring a mechanism for merging concurrent updates into a common state. Conflict-free Replicated Data Types (CRDT) provide a principled approach to address this problem. This document presents an overview of Conflict-free Replicated Data Types research and practice, organizing the presentation in the aspects relevant for the application developer, the system developer and the CRDT developer.
... There is room to optimize SwiftCloud for more ecient processing of batched updates. Also, semantic compression of batched updates, as suggested by in [1], would also improve the system. ...
Conference Paper
Conflict-free Replicated Data Types (CRDTs) are high-level data types that can be replicated with minimal coordination among replicas due to its confluent semantics. This property makes CRDTs specially appealing for geo-replicated settings. Different approaches, such as state transfer and operation forwarding, have been proposed to propagate updates among replicas, with different tradeoffs among the amount of network traffic generated and the staleness of local information. This paper proposes and evaluates techniques to automatically adapt a CRDT implementation, such that the best approach is used, based on the application needs (captured by a SLA) and the observed system configuration. Our techniques have been integrated in SwiftCloud, a state of the art geo-replicated store based on CRDTs.
... This ensures convergence over unreliable communication (on the contrary to op-based CRDTs that demand exactly-once delivery and are prone to message duplication ). To achieve this, we develop in detail the concept of Delta State-based CRDTs (δ-CRDT) that we initially introduced in [13] . In this new (delta) framework , the state is still a join-semilattice that now results from the join of multiple fine-grained states, i.e., deltas, generated by what we call δ-mutators. ...
Article
Full-text available
CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Data Types ($\delta$-CRDTs) that can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. This is achieved by defining delta mutators to return a delta-state, typically with a much smaller size than the full state, that to be joined with both local and remote states. We introduce the $\delta$-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm for eventual convergence, and another one that ensures causal consistency. Finally, we introduce several $\delta$-CRDT specifications of both well-known replicated datatypes and novel datatypes, including a generic map composition.
... A recent alternative, named δ-CRDTs [1], has been proposed as a middle ground between the two approaches. δ-CRDTs assumes that communication is mostly pairwise, with each replica maintaining a communication buffer for each of its peers where it stores the operations that have not been propagated (and acknowledged) to the remote peer. ...
Conference Paper
Replication is a key technique for providing both fault tolerance and availability in distributed systems. However, managing replicated state, and ensuring that these replicas remain consistent, is a non trivial task, in particular in scenarios where replicas can reside on the client-side, as clients might have unreliable communication channels and hence, exhibit highly dynamic communication patterns. One way to simplify this task is to resort to CRDTs, which are data types that enable replication and operation over replicas with no coordination, ensuring eventual state convergence when these replicas are synchronized. However, when the communication patters, and therefore synchronization patterns, are highly dynamic, existing designs of CRDTs might incur in excessive communication overhead. To address those scenarios, in this paper we propose a new design for CRDTs which we call Δ-CRDT, and experimentally show that under dynamic communication patters, this novel design achieves better network utilization than existing alternatives.
... As described above in section 2, Riak uses "full state replication" to propagate updates to Sets. Delta-CRDTs [2] are a CRDT variant that promise a more effecient network utilisation for replication of updates. In order to improve performance we implemented delta-datatypes [10]. ...
Conference Paper
CRDT[24] Sets as implemented in Riak[6] perform poorly for writes, both as cardinality grows, and for sets larger than 500KB[25]. Riak users wish to create high cardinality CRDT sets, and expect better than O(n) performance for individual insert and remove operations. By decomposing a CRDT set on disk, and employing delta-replication[2], we can achieve far better performance than just delta replication alone: relative to the size of causal metadata, not the cardinality of the set, and we can support sets that are 100s times the size of Riak sets, while still providing the same level of consistency. There is a trade-off in read performance but we expect it is mitigated by enabling queries on sets.
Conference Paper
Conflict-free replicated data types (CRDTs) [7] aid programmers develop highly available and scalable distributed systems. However, CRDTs require operations to commute which is not practical. This means that programmers cannot replicate regular objects without worrying about concurrency. In this paper, we introduce strong eventually consistent replicated objects (SECROs), a generic data type that is highly available and guarantees strong eventual consistency (SEC) without imposing restrictions on its operations.
Conference Paper
Many web applications are built around direct interactions among users, from collaborative applications and social networks to multi-user games. Despite being user-centric, these applications are usually supported by services running on servers that mediate all interactions among clients. When users are in close vicinity of each other, relying on a centralized infrastructure for mediating user interactions leads to unnecessarily high latency while hampering fault-tolerance and scalability. In this paper, we propose to extend user-centric Internet services with peer-to-peer interactions. We have designed a framework named Legion that enables client web applications to securely replicate data from servers, and synchronize these replicas directly among them. Legion allows for client-side modules, that we dub adapters, to leverage existing web platforms for storing data and to assist in Legion operation. Using these adapters, legacy applications accessing directly the web platforms can co-exist with new applications that use our framework, while accessing the same shared objects.Our experimental evaluation shows that, besides supporting direct client interactions, even when disconnected from the servers, Legion provides lower latency for update propagation with decreased network traffic for servers.
Conference Paper
Pure operation-based (op-based) Conflict-free Replicated Data Types (CRDTs) are generic and very efficient as they allow for compact solutions in both sent messages and state size. Although the pure op-based model looks promising, it is still not fully understood in terms of practical implementation. In this paper, we explain the challenges faced in implementing pure op-based CRDTs in a real system: the well-known in-memory cache key-value store Redis. Our purpose of choosing Redis is to implement a multi-master replication feature, which the current system lacks. The experience demonstrates that pure op-based CRDTs can be implemented in existing systems with minor changes in the original API.
Conference Paper
State-based CRDTs allow updates on local replicas without remote synchronization. Once these updates are propagated, possible conflicts are resolved deterministically across all replicas. δ-CRDTs bring significant advantages in terms of the size of messages exchanged between replicas during normal operation. However, when a replica joins the system after a network partition, it needs to receive the updates it missed and propagate the ones performed locally. Current systems solve this by exchanging the full state bidirectionally or by storing additional metadata along the CRDT. We introduce the concept of join-decomposition for state-based CRDTs, a technique orthogonal and complementary to delta-mutation, and propose two synchronization methods that reduce the amount of information exchanged, with no need to modify current CRDT definitions.
Conference Paper
We propose Lasp, a new programming model designed to simplify large-scale distributed programming. Lasp combines ideas from deterministic dataflow programming together with conflict-free replicated data types (CRDTs). This provides support for computations where not all participants are online together at a given moment. The initial design presented here provides powerful primitives for composing CRDTs, which lets us write long-lived fault-tolerant distributed applications with nonmonotonic behavior in a monotonic framework. Given reasonable models of node-to-node communications and node failures, we prove formally that a Lasp program can be considered as a functional program that supports functional reasoning and programming techniques. We have implemented Lasp as an Erlang library built on top of the Riak Core distributed systems framework. We have developed one nontrivial large-scale application, the advertisement counter scenario from the SyncFree research project. We plan to extend our current prototype into a general-purpose language in which synchronization is used as little as possible.
Conference Paper
CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Datatypes (\(\delta \)-CRDT) that can achieve the best of both worlds: small messages with an incremental nature, disseminated over unreliable communication channels. This is achieved by defining \(\delta \) -mutators to return a delta-state, typically with a much smaller size than the full state, that is joined to both: local and remote states. We introduce the \(\delta \)-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm that ensures causal consistency, and two \(\delta \)-CRDT specifications of well-known replicated datatypes.
Article
A CRDT is a data type specially designed to allow multiple instances to be replicated and modified without coordination, while providing an automatic mechanism to merge concurrent updates that guarantee eventual consistency. In this paper we present a brief study of computational CRDTs, a class of CRDTs whose state is the result of a computation over the executed updates. We propose three generic designs that reduce the amount of information that each replica maintains and propagates for synchronizations. For each of the designs, we discuss the properties that the function being computed needs to satisfy.
Conference Paper
Full-text available
Mobile devices commonly access shared data stored on a server. To ensure responsiveness, many applications maintain local replicas of the shared data that remain instantly accessible even if the server is slow or temporarily unavailable. Despite its apparent simplicity and commonality, this scenario can be surprisingly challenging. In particular, a correct and reliable implementation of the communication protocol and the conflict resolution to achieve eventual consistency is daunting even for experts. To make eventual consistency more programmable, we propose the use of specialized cloud data types. These cloud types provide eventually consistent storage at the programming language level, and thus abstract the numerous implementation details (servers, networks, caches, protocols). We demonstrate (1) how cloud types enable simple programs to use eventually consistent storage without introducing undue complexity, and (2) how to provide cloud types using a system and protocol comprised of multiple servers and clients.
Conference Paper
Full-text available
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Article
Full-text available
A CRDT is a data type whose operations commute when they are concurrent. Replicas of a CRDT eventually converge without any complex concurrency control. As an existence proof, we exhibit a non-trivial CRDT: a shared edit buffer called Treedoc. We outline the design, implementation and performance of Treedoc. We discuss how the CRDT concept can be generalised, and its limitations.
Article
We propose efficient algorithms to maintain a replicated dictionary using a log in an unreliable network. A non-serializable approach is used to achieve high concurrency. The solutions are resilient to both node and communication failures. Optimizations are developed for networks which are not completely connected.
Conference Paper
Eventual consistency is a relaxation of strong consistency that guarantees that if no new updates are made to a replicated data object, then all replicas will converge. The conflict free replicated datatypes (CRDTs) of Shapiro et al. are data structures whose inherent mathematical structure guarantees eventual consistency. We investigate a fundamental CRDT called Observed-Remove Set (OR-Set) that robustly implements sets with distributed add and delete operations. Existing CRDT implementations of OR-Sets either require maintaining a permanent set of “tombstones” for deleted elements, or imposing strong constraints such as causal order on message delivery. We formalize a concurrent specification for OR-Sets without ordering constraints and propose a generalized implementation of OR-sets without tombstones that provably satisfies strong eventual consistency. We introduce Interval Version Vectors to succinctly keep track of distributed time-stamps in systems that allow out-of-order delivery of messages. The space complexity of our generalized implementation is competitive with respect to earlier solutions with causal ordering. We also formulate k-causal delivery, a generalization of causal delivery, that provides better complexity bounds.
Article
Conflict-Free Replicated Data-Types (CRDTs) [6] provide greater safety properties to eventually-consistent distributed systems without requiring synchronization. CRDTs ensure that concurrent, uncoordinated updates have deterministic outcomes via the properties of bounded join-semilattices. We discuss the design of a new convergent (state-based) replicated data-type, the Map, as implemented by the Riak DT library [4] and the Riak data store [3]. Like traditional dictionary data structures, the Map associates keys with values, and provides operations to add, remove, and mutate entries. Unlike traditional dictionaries, all values in the Map data structure are also state-based CRDTs and updates to embedded values preserve their convergence semantics via lattice inflations [1] that propagate upward to the top-level. Updates to the Map and its embedded values can also be applied atomically in batches. Metadata required for ensuring convergence is minimized in a manner similar to the optimized OR-set [5]. This design allows greater flexibility to application developers working with semi-structured data, while removing the need for the developer to design custom conflict-resolution routines for each class of application data. We also discuss the experimental validation of the data-type using stateful property-based tests with QuickCheck [2].
Article
Conflict-free Replicated Datatypes can simplify the design of predictable eventual consistency. They can be classified into state-based or operation-based. Operation-based approaches have the potential for allowing compact designs in both the sent message and the object state size, but current approaches are still far from this objective. Here we explore the design space for operation-based solutions, and we leverage the interaction with the middleware by offering a technique that delivers very compact solutions, while only broadcasting operation names and arguments.
Conference Paper
In recent years there has been interest in achieving application-level consistency criteria without the latency and availability costs of strongly consistent storage infrastructure. A standard technique is to adopt a vocabulary of commutative operations; this avoids the risk of inconsistency due to message reordering. Another approach was recently captured by the CALM theorem, which proves that logically monotonic programs are guaranteed to be eventually consistent. In logic languages such as Bloom, CALM analysis can automatically verify that programs achieve consistency without coordination. In this paper we present BloomL, an extension to Bloom that takes inspiration from both of these traditions. BloomL generalizes Bloom to support lattices and extends the power of CALM analysis to whole programs containing arbitrary lattices. We show how the Bloom interpreter can be generalized to support efficient evaluation of lattice-based code using well-known strategies from logic programming. Finally, we use BloomL to develop several practical distributed programs, including a key-value store similar to Amazon Dynamo, and show how BloomL encourages the safe composition of small, easy-to-analyze lattices into larger programs.
Conference Paper
Geographically distributed systems often rely on replicated eventually consistent data stores to achieve availability and performance. To resolve conflicting updates at different replicas, researchers and practitioners have proposed specialized consistency protocols, called replicated data types, that implement objects such as registers, counters, sets or lists. Reasoning about replicated data types has however not been on par with comparable work on abstract data types and concurrent data types, lacking specifications, correctness proofs, and optimality results. To fill in this gap, we propose a framework for specifying replicated data types using relations over events and verifying their implementations using replication-aware simulations. We apply it to 7 existing implementations of 4 data types with nontrivial conflict-resolution strategies and optimizations (last-writer-wins register, counter, multi-value register and observed-remove set). We also present a novel technique for obtaining lower bounds on the worst-case space overhead of data type implementations and use it to prove optimality of 4 implementations. Finally, we show how to specify consistency of replicated stores with multiple objects axiomatically, in analogy to prior work on weak memory models. Overall, our work provides foundational reasoning tools to support research on replicated eventually consistent stores.
Conference Paper
Programs written using a deterministic-by-construction model of parallel computation are guaranteed to always produce the same observable results, offering programmers freedom from subtle, hard-to-reproduce nondeterministic bugs that are the scourge of parallel software. We present LVars, a new model for deterministic-by-construction parallel programming that generalizes existing single-assignment models to allow multiple assignments that are monotonically increasing with respect to a user-specified lattice. LVars ensure determinism by allowing only monotonic writes and "threshold" reads that block until a lower bound is reached. We give a proof of determinism and a prototype implementation for a language with LVars and describe how to extend the LVars model to support a limited form of nondeterminism that admits failures but never wrong answers.
Conference Paper
Information has become a key commodity for most service providers. Analyzing streams of data efficiently, in real time, has become increasingly more important for supporting new products and applications. This paper outlines a novel abstraction for performing incremental stream processing based on Computational Conflict-free Replicated Data Types. C-CRDTs are replicated objects that can be updated concurrently without coordination to perform a computation and still converge to a consistent state that reflects all contributions. Results obtained with a preliminary prototype show that C-CRDTs have the potential to match and improve computational throughput when compared with a state of the art stream processing system.
Conference Paper
Replication of state is the fundamental approach to achieve scalability and availability. In order to maintain or restore replica consistency under updates, some form of synchronization is needed. Conflict-free Replicated Data Types (CRDTs) ensure eventual consistency, such that replicas converge to a common state, equivalent to a correct sequential execution without foreground synchronization. A particular CRDT is the set data type, which is a pervasive abstraction for storing collections of unique elements and constitutes an important building block for other, more complex data structures. Since the original specification is not scalable, we improve it by introducing an efficient algorithm for sending deltas of updates between replicas and by partitioning a set replica into disjunctive subsets. We further add support for limited-lifetime elements, which, in turn, enable simple garbage collection strategies to address the problem of unbounded database growth. Lastly, implementation details and evaluation results of a client library for this data structure are presented.
Conference Paper
Replicating data under Eventual Consistency (EC) allows any replica to accept updates without remote synchronisation. This ensures performance and scalability in large-scale distributed systems (e.g., clouds). However, published EC approaches are ad-hoc and error-prone. Under a formal Strong Eventual Consistency (SEC) model, we study sufficient conditions for convergence. A data type that satisfies these conditions is called a Conflict-free Replicated Data Type (CRDT). Replicas of any CRDT are guaranteed to converge in a self-stabilising manner, despite any number of failures. This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions. We study a number of useful CRDTs, such as sets with clean semantics, supporting both \add and \remove operations, and consider in depth the more complex Graph data type. CRDT types can be composed to develop large-scale distributed applications, and have interesting theoretical properties.
Article
We propose efficient algorithms to maintain a replicated dictionary using a log in an unreliable network. A non-serializable approach is used to achieve high concurrency. The solutions are resilient to both node and communication failures. Optimizations are developed for networks which are not completely connected.
Article
Conflicts naturally arise in optimistically replicated systems. The common way to detect update conflicts is via version vectors, whose storage and communication overhead are number of replicas × number of objects. These costs may be prohibitive for large systems. This paper presents predecessor vectors with exceptions (PVEs), a novel optimistic replication technique developed for Microsoft’s WinFS system. The paper contains a systematic study of PVE’s performance gains over traditional schemes. The results demonstrate a dramatic reduction of storage and communication overhead in normal scenarios, during which communication disruptions are infrequent. Moreover, they identify a cross-over threshold in communication failure-rate, beyond which PVEs loses efficiency compared with traditional schemes.
Article
Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both \add and \remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement non-trivial CRDTs.
Article
Many distributed systems are now being developed to provide users with convenient access to data via some kind of communications network. In many cases it is desirable to keep the system functioning even when it is partitioned by network failures. A serious problem in this context is how one can support redundant copies of resources such as files (for the sake of reliability) while simultaneously monitoring their mutual consistency (the equality of multiple copies). This is difficult since network faiures can lead to inconsistency, and disrupt attempts at maintaining consistency. In fact, even the detection of inconsistent copies is a nontrivial problem. Naive methods either 1) compare the multiple copies entirely or 2) perform simple tests which will diagnose some consistent copies as inconsistent. Here a new approach, involving version vectors and origin points, is presented and shown to detect single file, multiple copy mutual inconsistency effectively. The approach has been used in the design of LOCUS, a local network operating system at UCLA.
Article
this article the formal presentation of the model, which is available in another document [1], but we present a description of the environment, enumerate some of the data types, and exemplify the formal description with a sample component.
Article
Bayou is a replicated, weakly consistent storage system designed for a mobile computing environment that includes portable machines with less than ideal network connectivity. To maximize availability, users can read and write any accessible replica. Bayou's design has focused on supporting apphcation-specific mechanisms to detect and resolve the update conflicts that naturally arise in such a system, ensuring that replicas move towards eventual consistency, and defining a protocol by which the resolution of update conflicts stabilizes. It includes novel methods for conflict detection, called dependency checks, and per-write conflict resolution based on client-provided merge procedures. To guarantee eventual consistency, Bayou servers must be able to rollback the effects of previously executed writes and redo them according to a global senalization order. Furthermore, Bayou permits clients to observe the results of all writes received by a server, Including tentative writes whose conflicts have not been ultimately resolved. This paper presents the motivation for and design of these mechanisms and describes the experiences gained with an initial implementation of the system.
Article
When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.
Delta-crdt-cpp. https://github
  • C Baquero
Baquero, C.: Delta-crdt-cpp. https://github.com/CBaquero/ delta-enabled-crdts
Data structures in Riak
  • S Cribbs
  • R Brown
Cribbs, S., Brown, R.: Data structures in Riak. In: Riak Conference (RICON), San Francisco, CA, USA (oct 2012)