Chapter

Introduction to Azure Cosmos DB

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The database space has been greatly dominated by relational database management systems (or RDBMSs) such as Microsoft® SQL Server or Oracle. This dominance was made possible in part by the wide range of solutions that can be built on top of those systems but also because of the powerful products that are available. There is, however, a different approach to data management, commonly known as NoSQL. The term NoSQL stands for “non SQL” or “not only SQL” since SQL (Structured Query Language) is almost exclusively tied to relational systems. NoSQL databases have existed since the 1960s but it wasn’t until the early 2000s that they gained a lot of popularity with companies like Facebook and Amazon implementing them and products such as MongoDB, Cassandra, and Redis becoming the choices for many developers.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Tunable consistency. To meet the needs of different applications, many popular distributed data stores begin to provide tunable consistency [10]- [13], allowing clients to specify the consistency level per individual operation. Amazon Dy-namoDB provides eventual consistency as default and strong consistency with ConsistentRead [10]. ...
... For example, a read with readConcern = majority guarantees that the returned value has been written to a majority of replicas. Azure Cosmos DB offers five well-defined consistency levels for read operations, namely, from strongest to weakest, Strong, Bounded staleness, Session, Consistent prefix, and Eventual [13]. 2 For example, Strong consistency offers linearizability [5], while Eventual consistency provides no ordering guarantees for reads. ...
... Systems Providing Tunable Consistency. Several popular data stores provide tunable consistency [10]- [13]. Amazon DynamoDB [10] provides eventual consistency as the default and stronger consistency with ConsistentRead. ...
Preprint
To achieve high availability and low latency, distributed data stores often geographically replicate data at multiple sites called replicas. However, this introduces the data consistency problem. Due to the fundamental tradeoffs among consistency, availability, and latency in the presence of network partition, no a one-size-fits-all consistency model exists. To meet the needs of different applications, many popular data stores provide tunable consistency, allowing clients to specify the consistency level per individual operation. In this paper, we propose tunable causal consistency (TCC). It allows clients to choose the desired session guarantee for each operation, from the well-known four session guarantees, i.e., read your writes, monotonic reads, monotonic writes, and writes follow reads. Specifically, we first propose a formal specification of TCC in an extended (vis,ar) framework originally proposed by Burckhardt et al. Then we design a TCC protocol and develop a prototype distributed key-value store called TCCSTORE. We evaluate TCCSTORE on Aliyun. The latency is less than 38ms for all workloads and the throughput is up to about 2800 operations per second. We also show that TCC achieves better performance than causal consistency and requires a negligible overhead when compared with eventual consistency.
... Azure CosmosDB [82] is Microsoft's globally distributed, fully-decentralized, multi-model database. It supports key-value, graph, and document data models. ...
Article
Efficient data storage and query processing systems play a vital role in many different research areas. Blockchain technology and distributed ledgers attract massive attention and trigger multiple projects in various industries. Nevertheless, blockchain still lacks the features of Database Management System ((DBMS) or simply databases), such as high throughput, low latency, and high capacity. For that purpose, there have been many proposed approaches for handling data storage and query processing solutions in the blockchain. This paper presents a complete overview of many different DBMS types and how these systems can be used to implement, enhance, and further improve blockchain technology. More concretely, we give an overview of 10 transactional, an extensive overview of 14 analytical, 9 hybrids, i.e., translytical and 13 blockchain DBMSs. We explain how database technology has influenced the development of blockchain technology by unlocking different features, such as, Consistency, Isolation, and Durability (ACID), transaction consistency, rich queries, real-time analysis, and low latency. Using a relaxation approach analogous to the one used to prove the Consistency, Availability, Partition tolerance (CAP)-theorem, we postulate a ”Decentralization, Consistency, and Scalability (DCS)-satisfiability conjecture” and give concrete strategies for achieving the relaxed DCS conditions. We also provide an overview of the different DBMSs, emphasizing their architecture, storage manager, query processing, and implementation.
... Consistency in DBMS may be defined as a single valid state for all database instances, in the sense that every redundant database server has the same data [17,30]. Thus, system performance might be affected, as DBMS needs to confirm that a write operation was executed on each redundant server. ...
Article
Full-text available
Cloud storage systems are increasingly adopting NoSQL database management systems (DBMS), since they generally provide superior availability and performance than traditional DBMSs. To the detriment of better consistency guarantees, several NoSQL DBMSs allow eventual consistency, in which an operation is confirmed without checking all nodes. Different consistency levels for an operation (e.g. read) can be adopted, and such levels may distinctly affect system behaviour. Thus, the assessment of a system design taking into account distinct consistency levels is important for developing cloud storage systems. This work proposes an approach based on reliability block diagrams and generalized stochastic Petri nets to evaluate availability and performance of cloud storage systems, considering redundant nodes and eventual consistency based on NoSQL DBMS. Experimental results demonstrate system configuration may influence unavailability from 1 s to 21 h in a year, and performance can be impacted by up to 17.9%.
... CosmosDB Azure Cosmos [41] DB is Microsoft's globally distributed, fully-decentralized, multi-model database. The database models can be key-value, graph, or document. ...
... CosmosDB Azure Cosmos [41] DB is Microsoft's globally distributed, fully-decentralized, multi-model database. The database models can be key-value, graph, or document. ...
Preprint
Full-text available
This work is about the mutual influence between two technologies: Databases and Blockchain. It addresses two questions: 1. How the database technology has influenced the development of blockchain technology?, and 2. How blockchain technology has influenced the introduction of new functionalities in some modern databases? For the first question, we explain how database technology contributes to blockchain technology by unlocking different features such as ACID (Atomicity, Consistency, Isolation, and Durability) transactional consistency, rich queries, real-time analytics, and low latency. We explain how the CAP (Consistency, Availability, Partition tolerance) theorem known for databases influenced the DCS (Decentralization, Consistency, Scalability) theorem for the blockchain systems. By using an analogous relaxation approach as it was used for the proof of the CAP theorem, we postulate a "DCS-satisfiability conjecture." For the second question, we review different databases that are designed specifically for blockchain and provide most of the blockchain functionality like immutability, privacy, censorship resistance, along with database features.
Article
In many scenarios, information must be disseminated over intermittently-connected environments when the network infrastructure becomes unavailable, e.g., during disasters where first responders need to send updates about critical tasks. If such updates pertain to a shared data set, dissemination consistency is important. This can be achieved through causal ordering and consensus. Popular consensus algorithms, e.g., Paxos, are most suited for connected environments. While some work has been done on designing consensus algorithms for intermittently-connected environments, such as the One-Third Rule (OTR) algorithm, there is still need to improve their efficiency and timely completion. We propose CoNICE, a framework to ensure consistent dissemination of updates among users in intermittently-connected, infrastructure-less environments. It achieves efficiency by exploiting hierarchical namespaces for faster convergence, and lower communication overhead. CoNICE provides three levels of consistency to users, namely replication, causality and agreement. It uses epidemic propagation to provide adequate replication ratios, and optimizes and extends Vector Clocks to provide causality. To ensure agreement, CoNICE extends OTR to also support long-term network fragmentation and decision invalidation scenarios; we define local and global consensus pertaining to within and across fragments respectively. We integrate CoNICE’s consistency preservation with a naming schema that follows a topic hierarchy-based dissemination framework, to improve functionality and performance. Using the Heard-Of model formalism, we prove CoNICE’s consensus to be correct. Our technique extends previously established proof methods for consensus in asynchronous environments. Performing city-scale simulation, we demonstrate CoNICE’s scalability in achieving consistency in convergence time, utilization of network resources, and reduced energy consumption.
Conference Paper
In many scenarios, information must be disseminated over intermittently-connected environments when network infrastructure becomes unavailable. Example scenarios include disasters in which first responders need to send updates about their tasks and provide critical information for search and rescue. If such updates pertain to a shared data set (e.g., pins on a map), their consistent dissemination is important. We can achieve this through causal ordering and consensus. Popular consensus algorithms, such as Paxos and Raft, are most suited for connected environments with reliable links. While some work has been done on designing consensus algorithms for intermittently-connected environments, such as the One-Third Rule (OTR) algorithm, there is need to improve their efficiency and timely completion. We propose CoNICE, a framework to ensure consistent dissemination of updates among users in intermittently-connected, infrastructure-less environments. It achieves efficiency by exploiting hierarchical namespaces for faster convergence, and lower communication overhead. CoNICE provides three levels of consistency to users’ views, namely replication, causality and agreement. It uses epidemic propagation to provide adequate replication ratios, and optimizes and extends Vector Clocks to provide causality. To ensure agreement, CoNICE extends basic OTR to support long-term fragmentation and critical decision invalidation scenarios. We integrate the multi-level consistency schema of CoNICE, with a naming schema that follows a topic hierarchy-based dissemination framework, to improve functionality and performance. Performing city-scale simulation experiments, we demonstrate that CoNICE is effective in achieving its consistency goals, and is efficient and scalable in the time for convergence and utilized network resources.
Article
Cloud-native databases become increasingly important for the era of cloud computing, due to the needs for elasticity and on-demand usage by various applications. These challenges from cloud applications present new opportunities for cloud-native databases that cannot be fully addressed by traditional on-premise enterprise database systems. A cloud-native database leverages software-hardware co-design to explore accelerations offered by new hardware such as RDMA, NVM, kernel bypassing protocols such as DPDK. Meanwhile, new design architectures, such as shared storage, enable a cloud-native database to decouple computation from storage and provide excellent elasticity. For highly concurrent workloads that require horizontal scalability, a cloud-native database can leverage a shared-nothing layer to provide distributed query and transaction processing. Applications also require cloud-native databases to offer high availability through distributed consensus protocols. At Alibaba, we have explored a suite of technologies to design cloud-native database systems. Our storage engine, X-Engine and PolarFS, improves both write and read throughputs by using a LSM-tree design and self-adapted separation of hot and cold data records. Based on these efforts, we have designed and implemented POLARDB and its distributed version POLARDB-X, which has successfully supported the extreme transaction workloads during the 2018 Global Shopping Festival on November 11, 2018, and achieved commercial success on Alibaba Cloud. We have also designed an OLAP system called AnalyticDB (ADB in short) for enabling real-time interactive data analytics for big data. We have explored a self-driving database platform to achieve autoscaling and intelligent database management. We will report key technologies and lessons learned to highlight the technical challenges and opportunities for cloud-native database systems at Alibaba.
ResearchGate has not been able to resolve any references for this publication.