Conference PaperPDF Available

Eventually Consistent Transactions

Authors:

Abstract and Figures

When distributed clients query or update shared data, eventual consistency can provide better availability than strong consistency models. However, programming and implementing such systems can be difficult unless we establish a reasonable consistency model, i.e. some minimal guarantees that programmers can understand and systems can provide effectively. To this end, we propose a novel consistency model based on eventually consistent transactions. Unlike serializable transactions, eventually consistent transactions are ordered by two order relations (visibility and arbitration) rather than a single order relation. To demonstrate that eventually consistent transactions can be effectively implemented, we establish a handful of simple operational rules for managing replicas, versions and updates, based on graphs called revision diagrams. We prove that these rules are sufficient to guarantee correct implementation of eventually consistent transactions. Finally, we present two operational models (single server and server pool) of systems that provide eventually consistent transactions.
Content may be subject to copyright.
A preview of the PDF is not available
... Common applications that employ replaying a set of messages in the same order as they were produced include event-carried state transfer, and system state simulation. The former consists of having two systems reading the same set of messages in the same order to replicate an entities state within their data store [2], [3], whereas the latter is used to simulate a system's state at a given point in time by reproducing the messages up until that point [4], [5]. ...
Preprint
Full-text available
Message brokers enable asynchronous communication between data producers and consumers in distributed environments by assigning messages to ordered queues. Message broker systems often provide with mechanisms to parallelize tasks between consumers to increase the rate at which data is consumed. The consumption rate must exceed the production rate or queues would grow indefinitely. Still, consumers are costly and their number should be minimized. We model the problem of determining the required number of consumers, and the partition-consumer assignments, as a variable item size bin packing variant. Data cannot be read when a queue is being migrated to another consumer. Hence, we propose the R-score metric to account for these rebalancing costs. Then, we introduce an assortment of R-score based algorithms, and compare their performance to established heuristics for the Bin Packing Problem for this application. We instantiate our method within an existing system, demonstrating its effectiveness. Our approach guarantees adequate consumption rates something the previous system was unable to at lower operational costs.
... • If the concurrent moves are down-moves, deterministically resolve conflict, eg., the operation with the highest priority wins. The priority of a move operation can be set by the application, with a condition -priorities are unique [51]. ...
Thesis
Designing distributed applications involves a fundamental trade-off between safety and performance as described by CAP theorem. We focus on the cases where safety is the top requirement.For the subclass of state-based distributed systems, we propose a proof methodology for establishing that a given application maintains a given invariant. Our approach allows reasoning about individual operations separately. We demonstrate that our rules are sound, and with a mechanized proof engine, we illustrate their use with some representative examples. For conflicting operations, the developer can choose between conflict resolution or coordination. We present a novel replicated tree data structure that supports coordination-free concurrent atomic moves, and arguably maintains the tree invariant. Our analysis identifies cases where concurrent moves are inherently safe. For the remaining cases we devise a conflict resolution algorithm. The trade-off is that in some cases a move operation "loses". Given the coordination required by some application for safety, it can be implemented in many different ways. Even restricting to locks, they can use various configurations, differing by lock granularity, type, and placement. The performance of each configuration depends on workload. We study the "coordination lattice", i.e., design space of lock configurations, and define a set of metrics to systematically navigate them.
... En losúltimos años se han propuesto distintos modelos de consistencia para bases de datos replicadas [5,23,10,38,3,4] que hacen diferentes balances entre consistencia y rendimiento, y que sirven para compararlos con gsp. ...
Thesis
Full-text available
Many of the applications we use today assure their users that they will always be available, even if the network is slow or even out of order at times. To achieve this, developers write applications where the state propagates asynchronously across different devices. A possible implementation consists of clients (devices) that keep a copy of the data and a leader or server that decides an order on the operations performed by the users. The literature offers different models, and their differences are associated with whether the spread between client-servers and server-clients is synchronous or asynchronous. In this thesis we study and implement GSP (Global Sequence Protocol), an operational model that propagates operations asynchronously in both directions, that is, client-servers and server-clients. For this, the implementation is built on a broadcast layer called RTOB (Reliable Total Order Broadcast) that guarantees that all the writes are always delivered to each client, in the same order and without getting lost. Specifically, we developed an open-source GSP library, focusing on studying which consistency guarantees, such as read my writes, causality, or prefixes, are achieved by using RTOB. Our case studies show that, in practice, GSP depends on the broadcast protocol to ensure certain guarantees of consistency.
... As attractive as dependency graphs may be as a foundation for database testing, they model orderings among object versions and operations that are not necessarily visible to external clients. Instead of defining isolation levels in terms of internal operations, some declarative definitions of isolation levels [12,10] are based upon a pair of compatible dependency relations: a visibility relation capturing the order in which writes are visible to transactions and an arbitration relation capturing the order in which writes are committed. ...
Preprint
Full-text available
Users who care about their data store it in databases, which (at least in principle) guarantee some form of transactional isolation. However, experience shows [Kleppmann 2019, Kingsbury and Patella 2019a] that many databases do not provide the isolation guarantees they claim. With the recent proliferation of new distributed databases, demand has grown for checkers that can, by generating client workloads and injecting faults, produce anomalies that witness a violation of a stated guarantee. An ideal checker would be sound (no false positives), efficient (polynomial in history length and concurrency), effective (finding violations in real databases), general (analyzing many patterns of transactions), and informative (justifying the presence of an anomaly with understandable counterexamples). Sadly, we are aware of no checkers that satisfy these goals. We present Elle: a novel checker which infers an Adya-style dependency graph between client-observed transactions. It does so by carefully selecting database objects and operations when generating histories, so as to ensure that the results of database reads reveal information about their version history. Elle can detect every anomaly in Adya et al's formalism [Adya et al. 2000] (except for predicates), discriminate between them, and provide concise explanations of each. This paper makes the following contributions: we present Elle, demonstrate its soundness, measure its efficiency against the current state of the art, and give evidence of its effectiveness via a case study of four real databases.
... We now fill in the distributed semantics of Carol programs by relating guarded events to abstract executions, a standard model [Burckhardt 2014;Burckhardt et al. 2012 for describing and reasoning about executions of distributed systems. ...
Article
Full-text available
We introduce Carol, a refinement-typed programming language for replicated data stores. The salient feature of Carol is that it allows programming and verifying replicated store operations modularly, without consideration of other operations that might interleave, and sequentially, without requiring reference to or knowledge of the concurrent execution model. This is in stark contrast with existing systems, which require understanding the concurrent interactions of all pairs of operations when developing or verifying them. The key enabling idea is the consistency guard, a two-state predicate relating the locally-viewed store and the hypothetical remote store that an operation’s updates may eventually be applied to, which is used by the Carol programmer to declare their precise consistency requirements. Guards appear to the programmer and refinement typechecker as simple data pre-conditions, enabling sequential reasoning, while appearing to the distributed runtime as consistency control instructions. We implement and evaluate the Carol system in two parts: (1) the algorithm used to statically translate guards into the runtime coordination actions required to enforce them, and (2) the networked-replica runtime which executes arbitrary operations, written in a Haskell DSL, according to the Carol language semantics.
Article
We present the first specification and verification of an implementation of a causally-consistent distributed database that supports modular verification of full functional correctness properties of clients and servers. We specify and reason about the causally-consistent distributed database in Aneris, a higher-order distributed separation logic for an ML-like programming language with network primitives for programming distributed systems. We demonstrate that our specifications are useful, by proving the correctness of small, but tricky, synthetic examples involving causal dependency and by verifying a session manager library implemented on top of the distributed database. We use Aneris's facilities for modular specification and verification to obtain a highly modular development, where each component is verified in isolation, relying only on the specifications (not the implementations) of other components. We have used the Coq formalization of the Aneris logic to formalize all the results presented in the paper in the Coq proof assistant.
Chapter
Programming loosely connected distributed applications is a challenging endeavour. Loosely connected distributed applications such as geo-distributed stores and intermittently reachable IoT devices cannot afford to coordinate among all of the replicas in order to ensure data consistency due to prohibitive latency costs and the impossibility of coordination if availability is to be ensured. Thus, the state of the replicas evolves independently, making it difficult to develop correct applications. Existing solutions to this problem limit the data types that can be used in these applications, which neither offer the ability to compose them to construct more complex data types nor offer transactions.
Article
When constructing complex concurrent systems, abstraction is vital: programmers should be able to reason about concurrent libraries in terms of abstract specifications that hide the implementation details. Relaxed memory models present substantial challenges in this respect, as libraries need not provide sequentially consistent abstractions: to avoid unnecessary synchronisation, they may allow clients to observe relaxed memory effects, and library specifications must capture these. In this paper, we propose a criterion for sound library abstraction in the new C11 and C++11 memory model, generalising the standard sequentially consistent notion of linearizability. We prove that our criterion soundly captures all client-library interactions, both through call and return values, and through the subtle synchronisation effects arising from the memory model. To illustrate our approach, we verify implementations against specifications for the lock-free Treiber stack and a producer-consumer queue. Ours is the first approach to compositional reasoning for concurrent C11/C++11 programs.
Article
Geographically distributed systems often rely on replicated eventually consistent data stores to achieve availability and performance. To resolve conflicting updates at different replicas, researchers and practitioners have proposed specialized consistency protocols, called replicated data types, that implement objects such as registers, counters, sets or lists. Reasoning about replicated data types has however not been on par with comparable work on abstract data types and concurrent data types, lacking specifications, correctness proofs, and optimality results. To fill in this gap, we propose a framework for specifying replicated data types using relations over events and verifying their implementations using replication-aware simulations. We apply it to 7 existing implementations of 4 data types with nontrivial conflict-resolution strategies and optimizations (last-writer-wins register, counter, multi-value register and observed-remove set). We also present a novel technique for obtaining lower bounds on the worst-case space overhead of data type implementations and use it to prove optimality of 4 implementations. Finally, we show how to specify consistency of replicated stores with multiple objects axiomatically, in analogy to prior work on weak memory models. Overall, our work provides foundational reasoning tools to support research on replicated eventually consistent stores.
Article
Full-text available
Enabling applications to execute various tasks in parallel is difficult if those tasks exhibit read and write conflicts. In recent work, we developed a programming model based on concurrent revisions that addresses this challenge: each forked task gets a conceptual copy of all locations that are declared to be shared. Each such location has a specific isolation type; on joins, state changes to each location are merged deterministically based on its isolation type. In this paper, we study how to specify isolation types abstractly using operation-based compensation functions rather than state based merge functions. Using several examples including a list with insert, delete and modify operations, we propose compensation tables as a concise, general and intuitively accessible mechanism for determining how to merge arbitrary operation sequences. Finally, we provide sufficient conditions to verify that a state-based merge function correctly implements a compensation table.
Article
Full-text available
When distributed clients query or update shared data, eventual consistency can provide better availability than strong consistency models. However, programming and implementing such systems can be difficult unless we establish a reasonable consistency model, i.e. some minimal guarantees that programmers can understand and systems can provide effectively. To this end, we propose a novel consistency model based on eventually consistent transactions. Unlike serializable transactions, eventually consistent transactions are ordered by two order relations (visibility and arbitration) rather than a single order relation. To demonstrate that eventually consistent transactions can be effectively implemented, we establish a handful of simple operational rules for managing replicas, versions and updates, based on graphs called revision diagrams. We prove that these rules are sufficient to guarantee correct implementation of eventually consistent transactions. Finally, we present two operational models (single server and server pool) of systems that provide eventually consistent transactions.
Article
Full-text available
Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations. Master copy replication (primary copy) schemes reduce this problem. A simple analytic model demonstrates these results. A new two-tier replication algorithm is proposed that allows mobile (disconnected) applications to propose tentative update transactions that are later applied to a master copy. Commutative update transactions avoid the instability of other replication schemes.
Article
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Conference Paper
Parallel or incremental versions of an algorithm can significantly outperform their counterparts, but are often difficult to develop. Programming models that provide appropriate abstractions to decompose data and tasks can simplify parallelization. We show in this work that the same abstractions can enable both parallel and incremental execution. We present a novel algorithm for parallel self-adjusting computation. This algorithm extends a deterministic parallel programming model (concurrent revisions) with support for recording and repeating computations. On record, we construct a dynamic dependence graph of the parallel computation. On repeat, we reexecute only parts whose dependencies have changed. We implement and evaluate our idea by studying five example programs, including a realistic multi-pass CSS layout algorithm. We describe programming techniques that proved particularly useful to improve the performance of self-adjustment in practice. Our final results show significant speedups on all examples (up to 37x on an 8-core machine). These speedups are well beyond what can be achieved by parallelization alone, while requiring a comparable effort by the programmer.