Figure 1- uploaded by João Ricardo Lourenço
Content may be subject to copyright.
CAP theorem with databases that “choose” CA, CP and AP

CAP theorem with databases that “choose” CA, CP and AP

Source publication
Article
Full-text available
For over forty years, relational databases have been the leading model for data storage, retrieval and management. However, due to increasing needs for scalability and performance, alternative systems have emerged, namely NoSQL technology. The rising interest in NoSQL technology, as well as the growth in the number of use case scenarios, over the l...

Context in source publication

Context 1
... data at the same time [42]. Indeed, of Brewer's CAP theorem, most databases choose to be "AP", meaning they provide Availability and Partition-Tolerance. Since Partition-Tolerance is a property that often cannot be traded off, Availability and Consistency are juggled, with most databases sacrificing more consistency than availability [43]. In Fig. 1, an illustration of CAP is ...

Citations

... Thus, we surveyed DBMS to compare additional quality attributes. The work of Lourenço et al. [24] gave us a starting point, as shown in Table II. Although not considered a database, but rather a pattern, we found worthwhile to also analyze CQRS [25] as a candidate solution. ...
Conference Paper
Full-text available
[Context] Data-intensive systems, a.k.a. big data systems (BDS), are software systems that handle a large volume of data in the presence of performance quality attributes, such as scalability and availability. Before the advent of big data management systems (e.g. Cassandra) and frameworks (e.g. Spark), organizations had to cope with large data volumes with custom-tailored solutions. In particular, a decade ago, Tecgraf/PUC-Rio developed a system to monitor truck fleet in real-time and proactively detect events from the positioning data received. Over the years, the system evolved into a complex and large obsolescent code base involving a costly maintenance process. [Goal] We report our experience on replacing a legacy BDS with a microservice-based event-driven system. [Method] We applied action research, investigating the reasons that motivate the adoption of a microservice-based event-driven architecture, intervening to define the new architecture, and documenting the challenges and lessons learned. [Results] We perceived that the resulting architecture enabled easier maintenance and fault-isolation. However, the myriad of technologies and the complex data flow were perceived as drawbacks. Based on the challenges faced, we highlight opportunities to improve the design of big data reactive systems. [Conclusions] We believe that our experience provides helpful takeaways for practitioners modernizing systems with data-intensive requirements.
... Recently, we have witnessed the emergence of many NoSQL systems, such as mongoDB [40], HBase [41], VoltDB [42], Cassandra [5], Voldemort [42], Redis [22], Memcached [19], Pilaf [35], HERD [34], HydraDB [33], FaRM [37], DrTM [36], and Nessie [43]. Therefore, there have been several efforts to evaluate NoSQL systems [44], [45], [46], [47], [48]. Some of these studies performed experimental analysis and some of them adopt case studies in their evaluation. ...
Article
Full-text available
In-memory key-value stores have quickly become a key enabling technology to build high-performance applications that must cope with massively distributed workloads. In-memory key-value stores (also referred to as NoSQL) primarly aim to offer low-latency and high-throughput data access which motivates the rapid adoption of modern network cards such as Remote Direct Memory Access (RDMA). In this paper, we present the fundamental design principles for exploiting RDMAs in modern NoSQL systems. Moreover, we describe a break-down analysis of the state-of-the-art of the RDMA-based in-memory NoSQL systems regarding the indexing, data consistency, and the communication protocol. In addition, we compare traditional in-memory NoSQL with their RDMA-enabled counterparts. Finally, we present a comprehensive analysis and evaluation of the existing systems according to the impact of the number of clients, real-world request distributions, and workload read-write ratios.
... Tables contain all information about relationships between different data. In the Product table example, each row is a specific product and columns list product attributes, such as colour, size, etc. [17]. ...
Article
Full-text available
As Big Data applications grow, many existing systems expect expanding their service to cover data dramatic increase. New software development systems are no longer working on a single database but on current multidatabases. These distributed data sources are under the name of NoSQL (Not only Structured Query Language) databases. Several companies try taking advantages from these technologies but without leaving their traditional systems. Especially, Data Warehouses (DW) are conceived based on users’ feedbacks. To allow and support this integration, a mechanism that takes data from NoSQL databases and stores it into relational databases is needed to have great added value without impacting organization existing systems. This paper proposes an integration algorithm to support hybrid database architecture, including MongoDB and MySQL, by allowing users to query data from NoSQL systems into relational SQL (Structured Query Language) systems.
... MOST COMMON DATABASES AGAINST QUALITY ATTRIBUTES[23] ...
... et al.[40] have conducted another study to benchmark and compare three NoSQL databases (e.g., MongoDB, Cassandra, HBase). In this study, the systems have been deployed on Amazon EC2 platform equipped with various kinds of cluster configurations and virtual nodes in order to evaluate different configurations impact factors on the various databases.Lourenço et al.[41] conducted a qualitative study for assessing and comparing the quality attributes for various NoSQL systems (e.g., Cassandra, Aerospike, Couchbase, CouchDB, MongoDB, HBase, Voldemort). The study concluded that each system differs from the other and all the required functionalities and mechanisms highly affect the database choice. ...
Article
With the enormous growth on the availability and usage of Big Data storage and processing systems, it has become essential to assess the various performance aspects of these systems so that we can carefully understand their strong and weak aspects. In practice, currently, when an individual/enterprise aims to develop a Big Data storage and processing solution for harnessing the knowledge inside their data, they will get challenged by the availability of several frameworks from which they need to select. This is a challenging task which needs to directed by with good knowledge about various perspectives of such systems. Additionally, the choice normally vary from one scenario to another according to the essential needs of the application. In practice, there is no single benchmark study which can cover the different types of big data processing requirements, systems, application scenarios and metrics. Several benchmarks and benchmarking studies have been developed where each study focuses on some representative type of frameworks and only consider some aspects to cover. In this article, we provide a comprehensive survey and analysis of the state-of-the-art of benchmarking the different types of big data systems (e.g., NoSQL databases, Big SQL engines, Big Streaming engines, Big Graph Processing engines, Big Machine/Deep Learning engines). Additionally, we highlight some of the significant open challenges and missing requirements of current benchmarks of big data systems with suggestions of directions for future extensions and improvements.
... Hence, utilizing the state-of-the-art performance benefits provided by the database systems for Big Data applications, without compromising the security, is the new challenge for modern-day database systems. There has been a lot of research in the comparison of different datastores over the past [2] [3] [4] based on performance and quality attributes; yet, there has been no security and privacy focused classification of different database models giving more emphasize on security/privacy aspects of database systems. ...
Article
Full-text available
For over many decades, relational database model has been considered as the leading model for data storage and management. However, as the Big Data explosion has generated a large volume of data, alternative models like NoSQL and NewSQL have emerged. With the advancement of communication technology, these database systems have given the potential to change the existing architecture from centralized mechanism to distributed in nature, to deploy as cloud-based solutions. Though all these evolving technologies mostly focus on performance guarantees, it is still being a major concern how these systems can ensure the security and privacy of the information they handle. Different datastores support different types of integrated security mechanisms, however, most of the non-relational database systems have overlooked the security requirements of modern Big Data applications. This paper reviews security implementations in today's leading database models giving more emphasis on security and privacy attributes. A set of standard security mechanisms have been identified and evaluated based on different security classifications. This provides a thorough review and a comprehensive analysis on maturity of security and privacy implementations in these database models along with future directions/enhancements so that data owners can decide on most appropriate datastore for their data-driven BigData applications.
... Under each category, there are many specific instances of NoSQL databases. Different NoSQL stores have different characteristics and applicability [8][9][10][11][12]. In [8], from data model, the consistency model, data partitioning, and the CAP theorem, the researchers elucidated the design decisions of NoSQL stores with regard to the four design principles of distributed database systems. ...
Article
Full-text available
The dependability and elasticity of various NoSQL stores in critical application are still worth studying. Currently, the cluster and backup technologies are commonly used for improving NoSQL availability, but these approaches do not consider the availability reduction when NoSQL stores encounter performance bottlenecks. In order to enhance the availability of Riak TS effectively, a resource-aware mechanism is proposed. Firstly, the data table is sampled according to time, the correspondence between time and data is acquired, and the real-time resource consumption is recorded by Prometheus. Based on the sampling results, the polynomial curve fitting algorithm is used to constructing prediction curve. Then the resources required for the upcoming operation are predicted by the time interval in the SQL statement, and the operation is evaluated by comparing with the remaining resources. Using the real hydrological sensor dataset as experimental data, the effectiveness of the mechanism is experimented in two aspects of sensitivity and specificity, respectively. The results show that through the availability enhancement mechanism, the average specificity is 80.55% and the sensitivity is 76.31% which use the initial sampling dataset. As training datasets increase, the specificity increases from 80.55% to 92.42%, and the sensitivity increases from 76.31% to 87.90%. Besides, the availability increases from 40.33% to 89.15% in hydrological application scenarios. Experimental results show that this resource-aware mechanism can effectively prevent potential availability problems and enhance the availability of Riak TS. Moreover, as the number of users and the size of the data collected grow, our method will become more accurate and perfect.
... Related technologies include data models, queries, concurrency controls, partitions, and replication. (2) Lourenço et al. [7] compared several quality attributes for several NoSQL databases. The evaluated NoSQL databases contain Aerospike, Cassandra, Couchbase, CouchDB, HBase, MongoDB, and Voldemort, while the quality attributes include availability, consistency, durability, maintainability, read and write performance, recovery time, reliability, robustness, scalability, and stabilization time. ...
Article
Full-text available
The popularization of big data makes the enterprise need to store more and more data. The data in the enterprise’s database must be accessed as fast as possible, but the Relational Database (RDB) has the speed limitation due to the join operation. Many enterprises have changed to use a NoSQL database, which can meet the requirement of fast data access. However, there are more than hundreds of NoSQL databases. It is important to select a suitable NoSQL database for a certain enterprise because this decision will affect the performance of the enterprise operations. In this paper, fifteen categories of NoSQL databases will be introduced to find out the characteristics of every category. Some principles and examples are proposed to choose an appropriate NoSQL database for different industries.
... This survey enables the user to choose the best key-value store according to the needs of his or her application. There is another survey that discusses an up-to-date and concise comparison of NoSQL databases on the basis of quality attributes [21]. Since then, the software engineers and architects make design-oriented decisions and develop software using quality attributes, and this survey evaluates the NoSQL databases from the perspective of quality attributes. ...
Article
Full-text available
Key-Value stores (KVSs) are the most flexible and simplest model of NoSQL databases, which have become highly popular over the last few years due to their salient features such as availability, portability, reliability, and low operational cost. From the perspective of software engineering, the chief obstacle for KVSs is to achieve software quality attributes (consistency, throughput, latency, security, performance, load balancing, and query processing) to ensure quality. The presented research is a Systematic Literature Review (SLR) to find the state-of-the-art research in the KVS domain, and through doing so determine the major challenges and solutions. This work reviews the 45 papers between 2010-2018 that were found to be closely relevant to our study area. The results show that performance is addressed in 31% of the studies, consistency is addressed in 20% of the studies, latency and throughput are addressed in 16% of the studies, query processing is addressed in 13% of studies, security is addressed in 11% of the studies, and load balancing is addressed in 9% of the studies. Different models are used for execution. The indexing technique was used in 20% of the studies, the hashing technique was used in 13% of the studies, the caching and security techniques were used together in 9% of the studies, the batching technique was used in 5% of the studies, the encoding techniques and Paxos technique were used together in 4% of the studies, and 36% of the studies used other techniques. This systematic review will enable researchers to design key-value stores as efficient storage. Regarding future collaborations, trust and privacy are the quality attributes that can be addressed; KVS is an emerging facet due to its widespread popularity, opening the way to deploy it with proper protection.
... NoSQL databases management systems (DBMS) can be classified into four categories: Key-Value databases, Document Oriented databases, Column Oriented databases and Graph databases. This classification is due to the fact that each type of database arises in a specific context and based on different architectures [4]. Comparing different models provides a clear vision for choosing the most appropriate model for a given context. ...
Article
Full-text available
Relational database management systems (RDBMS) have been imposed for more than three decades as a facto standard for data storage, management, and analysis. They have a good reputation by supporting ACID properties (Atomicity, Consistency, Isolation, and Durability) and by adopting the SQL language which has become a standardized language. However, despite their power, RDBMS have failed to meet the modern application’s requirements. That’s why the need arises for new database management systems that support the manipulation of large amounts of data. NoSQL database systems allow a flexible schema, whereas RDBMSs require a strictly defined schema. They support horizontal scalability and prioritize data availability over consistency (BASE properties) and have performance that remains good with scalability. In this paper, we present an experimental comparison between a relational database (MySQL) and a NoSQL database (HBase) in terms of runtime and latency in different scenarios using the YCSB Framework.