Conference PaperPDF Available

Eventually Consistent Transactions

Authors:

Abstract and Figures

When distributed clients query or update shared data, eventual consistency can provide better availability than strong consistency models. However, programming and implementing such systems can be difficult unless we establish a reasonable consistency model, i.e. some minimal guarantees that programmers can understand and systems can provide effectively. To this end, we propose a novel consistency model based on eventually consistent transactions. Unlike serializable transactions, eventually consistent transactions are ordered by two order relations (visibility and arbitration) rather than a single order relation. To demonstrate that eventually consistent transactions can be effectively implemented, we establish a handful of simple operational rules for managing replicas, versions and updates, based on graphs called revision diagrams. We prove that these rules are sufficient to guarantee correct implementation of eventually consistent transactions. Finally, we present two operational models (single server and server pool) of systems that provide eventually consistent transactions.
Content may be subject to copyright.
A preview of the PDF is not available
... Common applications that employ replaying a set of messages in the same order as they were produced include event-carried state transfer, and system state simulation. The former consists of having two systems reading the same set of messages in the same order to replicate an entities state within their data store [2], [3], whereas the latter is used to simulate a system's state at a given point in time by reproducing the messages up until that point [4], [5]. ...
Preprint
Full-text available
Message brokers enable asynchronous communication between data producers and consumers in distributed environments by assigning messages to ordered queues. Message broker systems often provide with mechanisms to parallelize tasks between consumers to increase the rate at which data is consumed. The consumption rate must exceed the production rate or queues would grow indefinitely. Still, consumers are costly and their number should be minimized. We model the problem of determining the required number of consumers, and the partition-consumer assignments, as a variable item size bin packing variant. Data cannot be read when a queue is being migrated to another consumer. Hence, we propose the R-score metric to account for these rebalancing costs. Then, we introduce an assortment of R-score based algorithms, and compare their performance to established heuristics for the Bin Packing Problem for this application. We instantiate our method within an existing system, demonstrating its effectiveness. Our approach guarantees adequate consumption rates something the previous system was unable to at lower operational costs.
... • If the concurrent moves are down-moves, deterministically resolve conflict, eg., the operation with the highest priority wins. The priority of a move operation can be set by the application, with a condition -priorities are unique [51]. ...
Thesis
Designing distributed applications involves a fundamental trade-off between safety and performance as described by CAP theorem. We focus on the cases where safety is the top requirement.For the subclass of state-based distributed systems, we propose a proof methodology for establishing that a given application maintains a given invariant. Our approach allows reasoning about individual operations separately. We demonstrate that our rules are sound, and with a mechanized proof engine, we illustrate their use with some representative examples. For conflicting operations, the developer can choose between conflict resolution or coordination. We present a novel replicated tree data structure that supports coordination-free concurrent atomic moves, and arguably maintains the tree invariant. Our analysis identifies cases where concurrent moves are inherently safe. For the remaining cases we devise a conflict resolution algorithm. The trade-off is that in some cases a move operation "loses". Given the coordination required by some application for safety, it can be implemented in many different ways. Even restricting to locks, they can use various configurations, differing by lock granularity, type, and placement. The performance of each configuration depends on workload. We study the "coordination lattice", i.e., design space of lock configurations, and define a set of metrics to systematically navigate them.
... En losúltimos años se han propuesto distintos modelos de consistencia para bases de datos replicadas [5,23,10,38,3,4] que hacen diferentes balances entre consistencia y rendimiento, y que sirven para compararlos con gsp. ...
Thesis
Full-text available
Many of the applications we use today assure their users that they will always be available, even if the network is slow or even out of order at times. To achieve this, developers write applications where the state propagates asynchronously across different devices. A possible implementation consists of clients (devices) that keep a copy of the data and a leader or server that decides an order on the operations performed by the users. The literature offers different models, and their differences are associated with whether the spread between client-servers and server-clients is synchronous or asynchronous. In this thesis we study and implement GSP (Global Sequence Protocol), an operational model that propagates operations asynchronously in both directions, that is, client-servers and server-clients. For this, the implementation is built on a broadcast layer called RTOB (Reliable Total Order Broadcast) that guarantees that all the writes are always delivered to each client, in the same order and without getting lost. Specifically, we developed an open-source GSP library, focusing on studying which consistency guarantees, such as read my writes, causality, or prefixes, are achieved by using RTOB. Our case studies show that, in practice, GSP depends on the broadcast protocol to ensure certain guarantees of consistency.
Article
Users who care about their data store it in databases, which (at least in principle) guarantee some form of transactional isolation. However, experience shows that many databases do not provide the isolation guarantees they claim. With the recent proliferation of new distributed databases, demand has grown for checkers that can, by generating client workloads and injecting faults, produce anomalies that witness a violation of a stated guarantee. An ideal checker would be sound (no false positives), efficient (polynomial in history length and concurrency), effective (finding violations in real databases), general (analyzing many patterns of transactions), and informative (justifying the presence of an anomaly with understandable counterexamples). Sadly, we are aware of no checkers that satisfy these goals. We present Elle: a novel checker which infers an Adya-style dependency graph between client-observed transactions. It does so by carefully selecting database objects and operations when generating histories, so as to ensure that the results of database reads reveal information about their version history. Elle can detect every anomaly in Adya et al's formalism (except for predicates), discriminate between them, and provide concise explanations of each. This paper makes the following contributions: we present Elle, demonstrate its soundness over specific datatypes, measure its efficiency against the current state of the art, and give evidence of its effectiveness via a case study of four real databases.
Article
In this article we study the properties of distributed systems that mix eventual and strong consistency. We formalize such systems through acute cloud types (ACTs), abstractions similar to conflict-free replicated data types (CRDTs), which by default work in a highly available, eventually consistent fashion, but which also feature strongly consistent operations for tasks which require global agreement. Unlike other mixed-consistency solutions, ACTs can rely on efficient quorum-based protocols, such as Paxos. Hence, ACTs gracefully tolerate machine and network failures also for the strongly consistent operations. We formally study ACTs and demonstrate phenomena which are neither present in purely eventually consistent nor strongly consistent systems. In particular, we identify temporary operation reordering , which implies interim disagreement between replicas on the relative order in which the client requests were executed. When not handled carefully, this phenomenon may lead to undesired anomalies, including circular causality. We prove an impossibility result which states that temporary operation reordering is unavoidable in mixed-consistency systems with sufficiently complex semantics. Our result is startling, because it shows that apparent strengthening of the semantics of a system (by introducing strongly consistent operations to an eventually consistent system) results in the weakening of the guarantees on the eventually consistent operations.
Article
We present the first specification and verification of an implementation of a causally-consistent distributed database that supports modular verification of full functional correctness properties of clients and servers. We specify and reason about the causally-consistent distributed database in Aneris, a higher-order distributed separation logic for an ML-like programming language with network primitives for programming distributed systems. We demonstrate that our specifications are useful, by proving the correctness of small, but tricky, synthetic examples involving causal dependency and by verifying a session manager library implemented on top of the distributed database. We use Aneris's facilities for modular specification and verification to obtain a highly modular development, where each component is verified in isolation, relying only on the specifications (not the implementations) of other components. We have used the Coq formalization of the Aneris logic to formalize all the results presented in the paper in the Coq proof assistant.
Chapter
Programming loosely connected distributed applications is a challenging endeavour. Loosely connected distributed applications such as geo-distributed stores and intermittently reachable IoT devices cannot afford to coordinate among all of the replicas in order to ensure data consistency due to prohibitive latency costs and the impossibility of coordination if availability is to be ensured. Thus, the state of the replicas evolves independently, making it difficult to develop correct applications. Existing solutions to this problem limit the data types that can be used in these applications, which neither offer the ability to compose them to construct more complex data types nor offer transactions.
Article
When constructing complex concurrent systems, abstraction is vital: programmers should be able to reason about concurrent libraries in terms of abstract specifications that hide the implementation details. Relaxed memory models present substantial challenges in this respect, as libraries need not provide sequentially consistent abstractions: to avoid unnecessary synchronisation, they may allow clients to observe relaxed memory effects, and library specifications must capture these. In this paper, we propose a criterion for sound library abstraction in the new C11 and C++11 memory model, generalising the standard sequentially consistent notion of linearizability. We prove that our criterion soundly captures all client-library interactions, both through call and return values, and through the subtle synchronisation effects arising from the memory model. To illustrate our approach, we verify implementations against specifications for the lock-free Treiber stack and a producer-consumer queue. Ours is the first approach to compositional reasoning for concurrent C11/C++11 programs.
Article
Geographically distributed systems often rely on replicated eventually consistent data stores to achieve availability and performance. To resolve conflicting updates at different replicas, researchers and practitioners have proposed specialized consistency protocols, called replicated data types, that implement objects such as registers, counters, sets or lists. Reasoning about replicated data types has however not been on par with comparable work on abstract data types and concurrent data types, lacking specifications, correctness proofs, and optimality results. To fill in this gap, we propose a framework for specifying replicated data types using relations over events and verifying their implementations using replication-aware simulations. We apply it to 7 existing implementations of 4 data types with nontrivial conflict-resolution strategies and optimizations (last-writer-wins register, counter, multi-value register and observed-remove set). We also present a novel technique for obtaining lower bounds on the worst-case space overhead of data type implementations and use it to prove optimality of 4 implementations. Finally, we show how to specify consistency of replicated stores with multiple objects axiomatically, in analogy to prior work on weak memory models. Overall, our work provides foundational reasoning tools to support research on replicated eventually consistent stores.
Article
Full-text available
Enabling applications to execute various tasks in parallel is difficult if those tasks exhibit read and write conflicts. In recent work, we developed a programming model based on concurrent revisions that addresses this challenge: each forked task gets a conceptual copy of all locations that are declared to be shared. Each such location has a specific isolation type; on joins, state changes to each location are merged deterministically based on its isolation type. In this paper, we study how to specify isolation types abstractly using operation-based compensation functions rather than state based merge functions. Using several examples including a list with insert, delete and modify operations, we propose compensation tables as a concise, general and intuitively accessible mechanism for determining how to merge arbitrary operation sequences. Finally, we provide sufficient conditions to verify that a state-based merge function correctly implements a compensation table.
Article
Full-text available
When distributed clients query or update shared data, eventual consistency can provide better availability than strong consistency models. However, programming and implementing such systems can be difficult unless we establish a reasonable consistency model, i.e. some minimal guarantees that programmers can understand and systems can provide effectively. To this end, we propose a novel consistency model based on eventually consistent transactions. Unlike serializable transactions, eventually consistent transactions are ordered by two order relations (visibility and arbitration) rather than a single order relation. To demonstrate that eventually consistent transactions can be effectively implemented, we establish a handful of simple operational rules for managing replicas, versions and updates, based on graphs called revision diagrams. We prove that these rules are sufficient to guarantee correct implementation of eventually consistent transactions. Finally, we present two operational models (single server and server pool) of systems that provide eventually consistent transactions.
Article
Full-text available
Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations. Master copy replication (primary copy) schemes reduce this problem. A simple analytic model demonstrates these results. A new two-tier replication algorithm is proposed that allows mobile (disconnected) applications to propose tentative update transactions that are later applied to a master copy. Commutative update transactions avoid the instability of other replication schemes.
Article
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Conference Paper
Parallel or incremental versions of an algorithm can significantly outperform their counterparts, but are often difficult to develop. Programming models that provide appropriate abstractions to decompose data and tasks can simplify parallelization. We show in this work that the same abstractions can enable both parallel and incremental execution. We present a novel algorithm for parallel self-adjusting computation. This algorithm extends a deterministic parallel programming model (concurrent revisions) with support for recording and repeating computations. On record, we construct a dynamic dependence graph of the parallel computation. On repeat, we reexecute only parts whose dependencies have changed. We implement and evaluate our idea by studying five example programs, including a realistic multi-pass CSS layout algorithm. We describe programming techniques that proved particularly useful to improve the performance of self-adjustment in practice. Our final results show significant speedups on all examples (up to 37x on an 8-core machine). These speedups are well beyond what can be achieved by parallelization alone, while requiring a comparable effort by the programmer.