Quality-of-Data for Consistency Levels in Geo-replicated Cloud Data Stores
ABSTRACT Cloud computing has recently emerged as a key technology to provide individuals and companies with access to remote computing and storage infrastructures. In order to achieve highly-available yet high-performing services, cloud data stores rely on data replication. However, providing replication brings with it the issue of consistency. Given that data are replicated in multiple geo-graphically distributed data centers, and to meet the increasing requirements of distributed applications, many cloud data stores adopt eventual consistency and therefore allow to run data intensive operations under low latency. This comes at the cost of data staleness. In this paper, we prioritise data replication based on a set of flexible data semantics that can best suit all types of Big Data applications, avoiding overloading both network and systems during large periods of disconnection or partitions in the network. Therefore we integrated these data semantics into the core architecture of a well-known NoSQL data store (e.g., HBase), which leverages a three-dimensional vector-field model (i.e., regarding timeliness, number of pending updates and divergence bounds) to provision data selectively in an on-demand fashion to applications. This enhances the former consistency model by providing a number of required levels of consistency to different applications such as, social networks or commerce sites, where priority of updates also differ. In addition, our implementation of the model into HBase allows updates to be tagged and grouped atomically in logical batches, akin to transactions, ensuring atomic changes and correctness of updates as they are propagated.
SourceAvailable from: Sergio Esteves[Show abstract] [Hide abstract]
ABSTRACT: Today we are increasingly more dependent on critical data stored in cloud data centers across the world. To deliver high-availability and augmented performance, different replication schemes are used to maintain consistency among replicas. With classical consistency models, performance is necessarily degraded, and thus most highly-scalable cloud data centers sacrifice to some extent consistency in exchange of lower la-tencies to end-users. More so, those cloud systems blindly allow stale data to exist for some constant period of time and disregard the seman-tics and importance data might have, which undoubtedly can be used to gear consistency more wisely, combining stronger and weaker levels of consistency. To tackle this inherent and well-studied trade-off between availability and consistency, we propose the use of V F C 3 , a novel consis-tency model for replicated data across data centers with framework and library support to enforce increasing degrees of consistency for different types of data (based on their semantics). It targets cloud tabular data stores, offering rationalization of resources (especially bandwidth) and improvement of QoS (performance, latency and availability), by provid-ing strong consistency where it matters most and relaxing on less critical classes or items of data.Euro-Par 2012; 01/2012
[Show abstract] [Hide abstract]
ABSTRACT: Data replication is a very relevant technique for improving performance, availability and scalability. These are requirements of many applications such as multiplayer distributed games, cooperative software tools, etc. How- ever, consistency of the replicated shared state is hard to ensure. Current consistency models and middleware sys- tems lack the required adaptability and efficiency. Thus, developing such robust applications is still a daunting task. We propose a new consistency model, named Vector- Field Consistency (VFC), that unifies i) several forms of consistency enforcement and a multi-dimensional criteria (time, sequence and value) to limit replica divergence, with ii) techniques based on locality-awareness (w.r.t. players position). Based on the VFC model, we propose a generic meta- architecture that can be easily instantiated both to cen- tralized and (dynamically) partitioned architectures: i) a single central server in which the VFC algorithm runs, or b) a set of servers in which each one is responsible for a slice of the data being shared. The first approach is clearly more adapted to ad-hoc networks of resource-constrained devices while the second, being more scalable, is well adapted to large-scale networks. We developed and evaluated two pro- totypes of VFC (for ad-hoc and large-scale networks) with very good performance results.11/2010; 1:95-115. DOI:10.1007/s13174-010-0011-x
[Show abstract] [Hide abstract]
ABSTRACT: When distributed clients query or update shared data, eventual consistency can provide better availability than strong consistency models. However, programming and implementing such systems can be difficult unless we establish a reasonable consistency model, i.e. some minimal guarantees that programmers can understand and systems can provide effectively. To this end, we propose a novel consistency model based on eventually consistent transactions. Unlike serializable transactions, eventually consistent transactions are ordered by two order relations (visibility and arbitration) rather than a single order relation. To demonstrate that eventually consistent transactions can be effectively implemented, we establish a handful of simple operational rules for managing replicas, versions and updates, based on graphs called revision diagrams. We prove that these rules are sufficient to guarantee correct implementation of eventually consistent transactions. Finally, we present two operational models (single server and server pool) of systems that provide eventually consistent transactions.