A comprehensive study of Convergent and Commutative Replicated Data Types

Source: OAI


Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both \add and \remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement non-trivial CRDTs.

Download full-text


Available from: Carlos Baquero
  • Source
    • "Although c-set has been designed to ensure consistency, it violates the operations intentions especially when it comes to mutually execute remote delete operations on the same triples that locally have already been removed several times then reinserted. In [34] authors present different set CRDTs, Grow Only Set (G-Set), Last Writer Wins Set (LWW-element- Set) and Observed Remove Set (OR-Set). In a G-Set, there is only an insertion operation where each element can be inserted and not deleted from the set. "

    Full-text · Article · Jan 2015
  • Source
    • "QUORUM means that each request will be completed by a subset of replicas and the intersections of any pair of subsets are not empty. Both RSMs and QUORUM systems have replica inconsistency problems and many scholars research the problem [14]–[17]. When the inconsistency exists in the system, there are two views to observe it: client-centric view which refers to the inconsistency clients can observe directly [18] and data-centric view which stands on the opposite. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Inherent replica inconsistency refers to the difference among the replicas of the same logical data item in the write propagation process of a normally running distributed storage system. In this paper, we formalize the write propagation process model of Cassandra, a widely used NoSQL storage system. In the write propagation process we explore two queueing systems, sending task queues and mutation queues, which locate at each replica node and are determinants of the replica inconsistency. The departure time difference from the mutation queue is used as the measure of inconsistency between two replicas. Furthermore, Request Per Second (RPS) and Mutation Threads Number (MTN), which affect the inherent inconsistency, are discussed and the MTN adaptation algorithm is proposed. Finally, A Cassandra inconsistency measurement framework is implemented using the source instrumentation approach. The empirical results conform well with our proposed inconsistency measurement model.
    Full-text · Conference Paper · Jul 2014
  • Source
    • "Some trending editors such as Google Docs [5] use the Operational Transform (OT) approach [14] [15]. However, an alternative based on Conflict-free Replicated Data Types (CRDTs) [12] [13] exists. Compared to OT, CRDTs are more decentralized and scale better. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Distributed collaborative editors such as Google Docs or Etherpad allow to distribute the work across time, space and organizations. In this paper, we focus on distributed collaborative editors based on the Con ict-free Replicated Data Type approach (CRDT). CRDTs encompass a set of well-known data types such as sets, graphs, sequences, etc. CRDTs for sequences model a document as a set of ele- ments (character, line, paragraph, etc.) with unique iden- tifiers, providing two commutative update operations: in- sert and delete. The identifiers of elements can be either offixed-size or variable-size. Recently, a strategy for assigning variable-size identifiers called LSEQ has been proposed for CRDTs for sequences. LSEQ lowers the space complexity of variable-size identidiers CRDTs from linear to sub-linear. While experiments show that it works locally, it fails to pro- vide this bound with multiple users and latency. In this paper, we propose h-LSEQ, an improvement of LSEQ that preserves its space complexity among multiple collaborators, regardless of the latency. Ultimately, this improvement al- lows to safely build distributed collaborative editors based on CRDTs. We validate our approach with simulations in- volving latency and multiple users.
    Full-text · Article · Sep 2013
Show more