Conference Paper

MDA-Based Approach for NoSQL Databases Modelling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

It is widely accepted today that relational systems are not appropriate to handle Big Data. This has led to a new category of databases commonly known as NoSQL databases that were created in response to the needs for better scalability, higher flexibility and faster data access. These systems have proven their efficiency to store and query Big Data. Unfortunately, only few works have presented approaches to implement conceptual models describing Big Data in NoSQL systems. This paper proposes an automatic MDA-based approach that provides a set of transformations, formalized with the QVT language, to translate UML conceptual models into NoSQL models. In our approach, we build an intermediate logical model compatible with column, document and graph oriented systems. The advantage of using a unified logical model is that this model remains stable, even though the NoSQL system evolves over time which simplifies the transformation process and saves developers efforts and time.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A systematic literature review of the current state of research regarding database design methods in the new database era is performed in [21]. Some NoSQL DBs are based on a Model-Driven Architecture (MDA), i.e., on transformation rules starting from UML diagrams and generating a NoSQL physical data [1]. A generic logical model that describes data according to the common features of the three types of NoSQL systems: column-oriented, document-oriented, and graph-oriented is used here as an intermediate link. ...
... There are some approaches using, e.g., UML [22]. The MDAbased approach presented in [1] implements the UML conceptual model describing Big Data in column-oriented NoSQL systems. These approaches are rather conservative and not sufficiently general for NoSQL data modelling. ...
... Then NAME:T is a member of TO. Gaspar' or, in a more friendly way, b.AUTHORS [1].NAME = 'Gaspar' ...
Chapter
NoSQL databases (DB) support the ability to handle large volumes of data in the absence of an explicit data schema. On the other hand, schema information is sometimes essential for applications during data retrieval. Consequently, there are approaches to schema construction, e.g., in the JSON DB and graph DB communities. The difference between a conceptual and database schema is often vague in this case. We use functional constructs—typed attributes for a conceptual view of DB that provide a sufficiently structured approach for expressing semantics of document and graph data. Attribute names are natural language expressions. Such typed functional data objects can be manipulated by terms of a typed λ-calculus, providing powerful nonprocedural query features for considered data structures. The calculus is extendible. Logical, arithmetic operations, and aggregation functions can be included there. Really, conceptual and database modelling merge in this case. The paper focuses on conceptual/database schemas for JSON and graph NoSQL data models.KeywordsTyped lambda calculusFunctional data objectsNoSQL databasesRelational databasesDatabase integration
... Therefore, Abdelhedi et al. [6] explain how to store Big Data in NoSQL databases and they propose a MDA-based approach that transforms an UML conceptual model describing Big Data into a column-oriented NoSQL model. The result of this transformation is PSM model. ...
... The result of this transformation is PSM model. This paper aims to rethink the work presented in [6]. However, we develop the transformation rules using the MOF 2.0 QVT standard to generate a file which contains a code for creation a column-oriented NoSQL model. ...
... The purpose of the work [10] presented by Abdelhedi et al. is to implement a conceptual model describing Big Data into NoSQL database and they choose to focus on column-oriented NoSQL model. This paper aims to rethink and to complete the work presented by Abdelhedi et al. [6,10], by applying the standard MOF 2.0 QVT and Acceleo to develop the transformation rules aiming at automatically generating the creation code of column-oriented NoSQL database. It is actually the only work for reaching this goal. ...
Article
Full-text available
span>The growth of application architectures in all areas (e.g. Astrology, Meteorology, E-commerce, social network, etc.) has resulted in an exponential increase in data volumes, now measured in Petabytes. Managing these volumes of data has become a problem that relational databases are no longer able to handle because of the acidity properties. In response to this scaling up, new concepts have emerged such as NoSQL. In this paper, we show how to design and apply transformation rules to migrate from an SQL relational database to a Big Data solution within NoSQL. For this, we use the Model Driven Architecture (MDA) and the transformation languages like as MOF 2.0 QVT (Meta-Object Facility 2.0 Query-View-Transformation) and Acceleo which define the meta-models for the development of transformation model. The transformation rules defined in this work can generate, from the class diagram, a CQL code for creation column-oriented NoSQL database.</span
... In parallel, there is increasing adoption of the Model-Driven Engineering (MDE) practices [11,14] as MDE has been proven useful to tame the development complexity. Unfortunately, while we have several solutions to facilitate the integration of SQL [10,16] and No-SQL [1,7] backends in MDE-based processes, there is a lack of solutions to support applications that rely on headless CMSs as a content source. Therefore, the integration of headless CMSs with other apps has been done by manual solutions, being these solutions time-consuming and error-prone. ...
... Egea [10], and Nguyen [16]), NoSQL databases (e.g. [8], [1], [5]), Data Warehouses (e.g. [27]) and even spreadsheets [2] have been integrated in MDE processes. ...
Chapter
Full-text available
Content Management Systems (CMSs) are the most popular tool when it comes to create and publish content across the web. Recently, CMSs have evolved, becoming headless. Content served by a headless CMS aims to be consumed by other applications and services through REST APIs rather than by human users through a web browser. This evolution has enabled CMSs to become a notorious source of content to be used in a variety of contexts beyond pure web navigation. As such, CMS have become an important component of many information systems. Unfortunately, we still lack the tools to properly discover and manage the information stored in a CMS, often highly customized to the needs of a specific domain. Currently, this is mostly a time-consuming and error-prone manual process. In this paper, we propose a model-based framework to facilitate the integration of headless CMSs in software development processes. Our framework can discover and explicitly represent the information schema behind the CMS. This facilitates designing the interaction between the CMS model and other components consuming that information. These interactions are then generated as part of a middleware library that offers platform-agnostic access to the CMS to all the client applications. The complete framework is open-source and available online.
... Abdelhedi et al. [4] document-oriented, graph column -UML MongoDB, Neo4j, and Cassandra models Abdelhedi et al. [1] document-oriented, graph column -XMI MongoDB, Neo4j, and Cassandra models Imam et al. [41] document-oriented --MongoDB models Shoval [66] graph -UML -Roy-Hubara et al. [60] graph -UML -Reniers et al. [58] document-oriented -new notation de la Vega et al. [25] document-oriented, column -UML JSON Varga et al. [73] graph ERD -XML Bugiotti et al. [16] key-value UML -Oracle NoSQL model and column NoSQL databases (see Table 10). For this, a list of inputs is required like system requirement (availability, consistency or scalability), CRUD operations, entities, and a number of records. ...
... Benchmark [53] Evaluation [60] Guidelines [6], [9], [12], [14], [17], [20], [21], [29], [37], [38], [40], [43], [56], [61], [65], [68], [71], [72], [73], [74], [76], [79], [80] Migration [34] Ontology [12] Process Transform [1], [2], [3], [4], [5], [13], [23], [26], [28], [47], [48], [49], [50], [54], [58], [59], [62], [63], [66], [69], [75], [77], [78] Query Oriented [46] Schema Generation [39], [41], [52] A distribution of NoSQL databases types along with the contexts where the models were used is shown in Figure 8. ...
Article
Modeling is one of the most important steps in developing a database. In traditional databases, the Entity Relationship (ER) and Unified Modeling Language (UML) models are widely used. But how are NoSQL databases being modeled? We performed a systematic mapping review to answer three research questions to identify and analyze the levels of representation, models used, and contexts where the modeling process occurred in the main categories of NoSQL databases. We found 54 primary studies where we identified that conceptual and logical levels received more attention than the physical level of representation. The UML, ER, and new notation based on ER and UML were adapted to model NoSQL databases, in the same way, formats such as JSON, XML, and XMI were used to generate schemas through the three levels of representation. New contexts such as benchmark, evaluations, migration, and schema generation were identified, as well as new features to be considered for modeling NoSQL databases, such as the number of records by entities, CRUD operations, and system requirements (availability, consistency, or scalability). Additionally, a coupling and co-citation analysis was carried out to identify relevant works and researchers.
... To properly define this process, it is necessary to know the framework in which it will fit. It's about our approach Object2NoSQL developed in previous work [12]. This section outlines this models transformation approach. ...
... In [12], we have only considered some constraints such as data type and the uniqueness constraint for identifiers. In this article, we aim to complete our approach taking into account other additional constraints, defined using the Object Constraint Language (OCL), that require writing code (cf. ...
Article
Full-text available
Big data have received a great deal of attention in recent years. Not only is the amount of data on a completely different level than before, but also the authors have different type of data including factors such as format, structure, and sources. This has definitely changed the tools one needs to handle big data, giving rise to NoSQL systems. While NoSQL systems have proven their efficiency to handle big data, it's still an unsolved problem how the automatic storage of big data in NoSQL systems could be done. This paper proposes an automatic approach for implementing UML conceptual models in NoSQL systems, including the mapping of the associated OCL constraints to the code required for checking them. In order to demonstrate the practical applicability of the work, this paper has realized it in a tool supporting four fundamental OCL expressions: iterate-based expressions, OCL predefined operations, If expression, and Let expression.
... Par contre, nous créons plusieurs alternatives de structuration afin de les évaluer à l'aide de nos métriques structurelles, avant de prendre une décision finale. Les travaux présentés dans (Abdelhedi et al., 2017) et (Atzeni et al., 2016) rassemblent les concepts de différentes familles de données NoSQL afin de créer à partir d'un modèle UML, des alternatives de structuration selon le système cible. Abdelhedi et al. (Abdelhedi et al., 2017) créent comme nous, plusieurs alternatives de structuration pour des systèmes orientés documents, mais aussi pour des autres systèmes tels que orientés colonnes et graphes. ...
... Les travaux présentés dans (Abdelhedi et al., 2017) et (Atzeni et al., 2016) rassemblent les concepts de différentes familles de données NoSQL afin de créer à partir d'un modèle UML, des alternatives de structuration selon le système cible. Abdelhedi et al. (Abdelhedi et al., 2017) créent comme nous, plusieurs alternatives de structuration pour des systèmes orientés documents, mais aussi pour des autres systèmes tels que orientés colonnes et graphes. La validation de ces travaux repose essentiellement sur le temps de réponse des requêtes, nous proposons un ensemble de métriques qui permet d'évaluer les structures statiquement. ...
... Other studies adopt a model-driven architecture (MDA) to transform a class diagram into NoSQL DB. [1] transforms a class diagram into a NoSQL DB. The authors present a common logical model which describes the four families of NoSQL DB. ...
... UML classes from the PIM1 are transformed to the PIM2 using traditional mappings between concepts, keys and their values. Rules are expressed in QVT (for more details, see [1]). ...
Chapter
Full-text available
With data evolution in terms of volume, variety and velocity, Information Systems (IS) administrators need to find the best solution to store and manipulate data with respect to their requirements. So far, existing approaches provide rules to transform a source model to a target model, but none of them propose a method to lead the choice of the most suitable solution. ModelDrivenGuide suggests a model transformation approach that focuses on proposing the different relevant solutions to the case of study. It is based on a common meta-model for the 5 families (Relational & NoSQL) and a generation heuristic. Our approach is validated using the TPC-C benchmark.
... Different works present formal definitions for NoSQL document data models [1,2,19]. In [2], they present NoAM, the NoSQL Abstract Model that use as the main modelling unit the concept of aggregates (set of entities) and is driven by application use cases (functional requirements). ...
... In [2], they present NoAM, the NoSQL Abstract Model that use as the main modelling unit the concept of aggregates (set of entities) and is driven by application use cases (functional requirements). [1] and [19] present approaches that transform a conceptual model (UML) into NoSQL physical model. These approaches consist in methodologies for defining NoSQL schemas according to user-supplied parameters. ...
Chapter
Full-text available
Document stores are frequently used as representation format in many applications. It is often necessary to transform a set of data stored in a relational database (RDB) into a document store. There are several approaches that execute such translation. However, it is difficult to evaluate which target document structure is the most appropriate. In this article, we present a set of query-based metrics for evaluating and comparing documents schemas against a set of existing queries, that represent the application access pattern. We represent the target document schema and the queries as DAGs (Directed Acyclic Graphs), which are used to calculate the metrics. The metrics allow to evaluate if a given target document schema is adequate to answer the queries. We performed a set of experiments to calculate the metrics over a set of documents produced by existing transformation solutions. The metric results are related with smaller coding effort, showing that the metrics are effective to guide the choice of a target NoSQL document structure.
... • In addition to schemas and validators, other artifacts (indexes and reference management) are generated in our approach. A few MDE-based approaches for NoSQL databases have been proposed [36]- [38], which have in common that the generation process starts from a UML class diagram. An MDE approach to generate code aimed to manipulate graph databases is presented in [36]. ...
... Unlike our work, this approach is based on a forward engineering strategy, where a designer creates a conceptual schema, and mapper code is not generated. In [37] and [38] a proposal to generate NoSQL schema models from UML class diagrams is presented, but no code is generated, only mappings are addressed. ...
Article
Full-text available
Many actual NoSQL systems are schemaless, that is, the structure of the data is not defined beforehand in any schema, but it is implicit on the data itself. This characteristic is very convenient when the data structure suffers frequent changes. However, the agility and flexibility achieved is at the cost of losing some important benefits such as (i) assuring that the data stored and retrieved fits the database schema, (ii) some database utilities require to know the schema, and (iii) schema visualization helps developers to write better code. In a previous work, we proposed a model-based reverse engineering approach to infer schema models from NoSQL data. Model-driven engineering (MDE) techniques can be used to take advantage of extracted models with different purposes, such as schema visualization or automatic code generation. Here, we present an MDE solution to automate the usage of Object-NoSQL mappers when the database already exists. We will focus on mappers that are available for document systems (Object-Document mappers, ODMs), but the proposed approach is mapper-independent. These mappers are emerging to provide a similar functionality to Object-Relational mappers: they are in charge of the mapping of objects into NoSQL data (documents in the case of ODMs) for object-oriented applications. We show how schemas and other artifacts (e.g. validators and indexes) for ODMs can be automatically generated from inferred schemas. The solution consists of a two-step model transformation chain, where an intermediate model is generated to ease the code generation. We have applied our approach for two popular ODMs: Mongoose and Morphia, and validated it with the StackOverflow dataset.
... Concerning NoSQL approaches, some works investigate about data modelling alternatives [5,6,26]. In [26], Abdelhedi et al. propose to translate an UML model into several alternatives of "schema" for Cassandra, Neo4J and MongoDB. ...
... Concerning NoSQL approaches, some works investigate about data modelling alternatives [5,6,26]. In [26], Abdelhedi et al. propose to translate an UML model into several alternatives of "schema" for Cassandra, Neo4J and MongoDB. For Cassandra in [5,6] the main concerns are the storage requirements and query performance. ...
Chapter
Document-oriented bases allow high flexibility in data representation which facilitates a rapid development of applications and enables many possibilities for data structuring. Nevertheless, the structural choices remain crucial because of their impact on several aspects of the document base and application quality, e.g., memory print, data redundancy, readability and maintainability. Our research is motivated by quality issues of document-oriented bases. We aim at facilitating the study of the possibilities of data structuring and providing objective metrics to better reveal the advantages and disadvantages of each solution with respect to user needs. In this paper, we propose a set of structural metrics for a JSON compatible schema abstraction. These metrics reflect the complexity of the structure and are intended to be used in decision criteria for schema analysis and design process. This work capitalizes on experiences with MongoDB, XML and software complexity metrics. The paper presents the definition of the metrics together with a validation scenario where we discuss how to use the results in a schema recommendation perspective.
... As illustrated in Table 1, the authors in [19,20,21] have proposed a transformation process with a set of mapping rules from multidimensional models to column-oriented and document-oriented models. While the authors in [22,23,24] took the UML class diagram as input to offer a columnar and document oriented models that respect, at most, two Big Data characteristics. ...
... These systems are schema-free and built upon distributed systems, which makes them easy to scale and shard. However, in a rush to solve the challenges of big data and large numbers of concurrent users, NoSQL abandoned some of the core features of relational databases, which make them highly scalable and easy to use [1][2][3]. Although the use of NoSQL systems is widely accepted today, Business Intelligence & Analytics (BI&A) wields relational data sources [4]. ...
Article
NoSQL stores have become ubiquitous since they offer a new cost-effective and schema-free system. Although NoSQL systems are widely accepted today, Business Intelligence & Analytics (BI&A) wields relational data sources. Exploiting schema-free data for analytical purposes is a challenge since it requires reviewing all the BI&A phases, particularly the Extract-Transform-Load (ETL) process, to fit big data sources as document stores. In the ETL process, the join of several collections, with a lack of explicitly known join fields is a significant dare. Detecting these fields manually is time and effort-consuming and infeasible in large-scale datasets. In this paper, we study the problem of discovering join fields automatically. We introduce an algorithm that aims to automatically detect both identifiers and references on several document stores. The modus operandi of our approach underscores three core stages: (i) global schema extraction; (ii) discovery of candidate identifiers; and (iii) identifying candidate pairs of identifier and reference fields. We use scoring features and pruning rules to discover true candidate identifiers from many initial ones efficiently. To find candidate pairs between several document stores, we put into practice node2vec as a graph embedding technique, which yields significant advantages while using syntactic and semantic similarity measures for pruning pointless candidates. Finally, we report our experimental findings that show encouraging results.
... NoAM provides guidelines to consider application requirements such as scalability, performance and consistency during the modeling process. The work from [22,23] present approaches to transform a conceptual model (UML class diagram) into a NoSQL physical model. Additionally, [23] use a set of queries provided by the user to guide the generation of the schema, storing together related entities. ...
Article
NoSQL databases have emerged as an alternative to relational databases, which do not meet all the currently imposed scenarios. Large applications that handle a variety of data formats often use several types of databases and the need to migrate data between them is common. There are several approaches that perform this type of conversion. However, the process of choosing the ideal data structuring for the application requirements is not a trivial task. In this paper, we present a RDB to NoSQL conversion approach composed by steps for defining, evaluating and comparing candidate NoSQL schemas (data structuring) in relation to the applications access pattern, before migrating RDB data to NoSQL document store. We present a set of query-based metrics and scores to assist the user in the schema selection process and a framework to migrate RDB data to NoSQL format. Finally, we present experiments to evaluate the benefits of our approach and the correlation between metrics results and query implementation effort after migrating RDB to NoSQL.
... I first defined the source UML class diagram(umlCD) and the target generic CouchDB logical model. A class diagram umlCD defined with[15]: The class diagram name(umlCD.N),  The set of classes(umlCD.Sc), Each class( c ) in Sc has: o name (c.N), o a set of attributes (c.SA), o and a class primary key (c.Pk), Where for each attribute a in c.SA, a.N is the attribute name, and a.T is the attribute data type. The primary key (c.Pk) is a special attribute of a class and it has a name and data type. ...
Article
Full-text available
In the modern database environment, new non-traditional database types appear that are Not SQL database (NoSQL). This NoSQL database does not rely on the principles of the relational database. Couchdb is one of the NoSQL Document-Oriented databases, in Couchdb the basic element was a document. All types of databases have the same conceptual data model and it was deferent in the logical and physical model, this mean UML class diagram can be used in the NoSQL design at a conceptual level, that is, it can be used to design a Couchdb database. In this research, we suggest a method to model and implement the conceptual level of the Couchdb database from the UML class diagram in using simple way depending on the association types. Depending on the types of relationships between classes, we can have more than one database model to choose from and find the most suitable for the system to be designed. A medical clinic database was proposed to implement the transfer steps according to the proposed method. Three database models were designed and implemented to study the suitability of the proposed transfer method.
... Thus, our goal is to take advantage of this diversity provided by Web applications and the traditional data sources to design an unified, integrated and valid data warehouse [5] that enables decision makers and managers to decide gradually with ease based on multiple and varied sources. For this purpose, we describe in this paper, using a meta-model based on Model Driven Engineering (MDE) [21], the structure of unstructured data that are managed by NoSQL databases (key-value, column-oriented [19], document-oriented and graph-oriented [20]), as well as the semi-structured data structure that are presented by the XML files, and the structure of the structured data stored and manipulated by the relational and multidimensional databases. Our work is organized as follows: In the first section, we present a meta-model de-scribing the four types of NoSQL database (column-oriented, key-value, graph-oriented, document-oriented). ...
... This case study concerns international scientific programs for monitoring patients suffering from serious diseases. The main goal of this program is (1) to collect data about diseases development over time, (2) to study interactions between different diseases and (3) to evaluate the short and medium-term effects of their treatments. The medical program can last up to 3 years. ...
Preprint
Full-text available
In recent years, the need to use NoSQL systems to store and exploit big data has been steadily increasing. Most of these systems are characterized by the property "schema less" which means absence of the data model when creating a database. This property brings an undeniable flexibility by allowing the evolution of the model during the exploitation of the base. However, the expression of queries requires a precise knowledge of this model. In this paper, we propose an incremental process to extract the model while operating the document-oriented NoSQL database. To do this, we use the Model Driven Architecture (MDA) that provides a formal framework for automatic model transformation. From the insert, delete and update queries executed on the database, we propose formal transformation rules with QVT to generate the physical model of the NoSQL database. An experimentation of the extraction process was performed on a medical application.
... Currently, there are few systematic studies on data modeling for NoSQL databases, e.g., [1], [3]. Some works propose particular solutions that can be used for conceptual modeling of NoSQL databases [4], [5]; however, since the complexity of these approaches is high, they can be difficult to infiltrate into real-world applications. On the other hand, several studies refer explicitly to modeling documents in MongoDB using UML notation [6]; other approaches refer to the JSON format to represent the documents for different NoSQL databases [7]. ...
Chapter
The growing availability of data and the increased popularity of NoSQL databases, that support the idea of managing unstructured or semi-structured data, motivate implementers to skip the phase of a conceptual view of data. However, document data stores belonging to the NoSQL group show a clear tendency of looking for some common feature among documents creating collections. This aspect motivates us to propose a model for the conceptual representation of a document data store based on UML class diagrams and mapping rules for its implementation. We also include a case study using Twitter data and show implementation using three data stores: MongoDB, CouchDB, and ArangoDB.
... Les travaux [52] et [8] s'intéressent à la manipulation d'un modèle intermédiaire qui permet de assembler les concepts de différentes familles de données non-relationnelles afin de pouvoir créer a partir de la même information, des alternatives de structuration selon le système cible. ...
Thesis
De nos jours, des millions de sources de données différentes produisent une énorme quantité de données non structurées et semi-structurées qui changent constamment. Les systèmes d'information doivent gérer ces données tout en assurant la scalabilité et la performance. En conséquence, ils ont dû s'adapter pour supporter des bases de données hétérogènes, incluant des bases de données No-SQL. Ces bases de données proposent une structure de données sans schéma avec une grande flexibilité, mais sans séparation claire des couches logiques et physiques. Les données peuvent être dupliquées, fragmentées et/ou incomplètes, et ils peuvent aussi changer à mesure des besoins de métier.La flexibilité et l’absence de schéma dans les systèmes NoSQL orientés documents, telle que MongoDB, permettent d’explorer des nouvelles alternatives de structuration sans faire face aux contraintes. Le choix de la structuration reste important et critique parce qu’il y a plusieurs impacts à considérer et il faut choisir parmi des nombreuses d’options de structuration. Nous proposons donc de revenir sur une phase de conception dans laquelle des aspects de qualité et les impacts de la structure sont pris en compte afin de prendre une décision d’une manière plus avertie.Dans ce cadre, nous proposons SCORUS, un système pour l’analyse et l’évaluation des structures orientés document qui vise à faciliter l’étude des possibilités de semi-structurations orientées document, telles que MongoDB, et à fournir des métriques objectives pour mieux faire ressortir les avantages et les inconvénients de chaque solution par rapport aux besoins des utilisateurs. Pour cela, une séquence de trois phases peut composer un processus de conception. Chaque phase peut être aussi effectuée indépendamment à des fins d’analyse et de réglage. La stratégie générale de SCORUS est composée par :1. Génération d’un ensemble d’alternatives de structuration : dans cette phase nous proposons de partir d’une modélisation UML des données et de produire automatiquement un large ensemble de variantes de structuration possibles pour ces données.2. Evaluation d’alternatives en utilisant un ensemble de métriques structurelles : cette évaluation prend un ensemble de variantes de structuration et calcule les métriques au regard des données modélisées.3. Analyse des alternatives évaluées : utilisation des métriques afin d’analyser l’intérêt des alternatives considérées et de choisir la ou les plus appropriées.
... Les travaux de ce chapitre ont été présentés dans les publications suivantes : [Abdelhedi et al., 2016a], [Abdelhedi et al., 2016b], [Abdelhedi et al., 2016c], [Abdelhedi et al., 2017a], [Abdelhedi et al., 2017b], [Abdelhedi et al., 2017c], [Abdelhedi et al., 2018a] et [Abdelhedi et al., 2018b]. ...
Thesis
It is widely accepted today that relational systems are not appropriate to handle Big Data. This has led to a new category of databases commonly known as NoSQL databases that were created in response to the needs for better scalability, higher flexibility and faster data access. These systems have proven their efficiency to store and query Big Data. Unfortunately, only few works have presented approaches to implement conceptual models describing Big Da-ta in NoSQL systems. This paper proposes an automatic MDA-based approach that provides a set of transformations, formalized with the QVT language, to translate UML conceptual models into NoSQL models. In our approach, we build an intermediate logical model compatible with column, document, graph and key-value systems. The advantage of using a unified logical model is that this model remains stable, even though the NoSQL system evolves over time which simplifies the transformation process and saves developers efforts and time.
... Les travaux [52] et [8] s'intéressent à la manipulation d'un modèle intermédiaire qui permet de assembler les concepts de différentes familles de données non-relationnelles afin de pouvoir créer a partir de la même information, des alternatives de structuration selon le système cible. ...
Thesis
De nos jours, les applications et systèmes d'information doivent gérer des masses de données hétérogènes tout en répondant à des exigences fonctionnelles variées et à des besoins de performance et de passage à l’échelle. Les systèmes de gestion de données NoSQL apportent diverses solutions et offrent, pour la plupart, beaucoup de souplesse dans la structuration des données. Ils permettent une structuration des données avec une grande flexibilité et sans création préalable d'un schéma (contrairement aux SGBD relationnels). Dans ces solutions il n'y a pas de séparation claire des couches logiques et physiques.La flexibilité et l’absence de schéma dans les systèmes NoSQL orientés documents, telle que MongoDB, permettent diverses alternatives de structuration. Le choix de la structuration reste important et critique par ses impacts sur la qualité de la base et de ses applications. Dans cette thèse nous proposons de revenir sur une phase de conception dans laquelle des aspects de qualité et les impacts de la structure sont pris en compte afin de prendre une décision d’une manière plus avertie.Nous proposons SCORUS, un système pour l’analyse et l’évaluation de structures orientées document qui vise à faciliter l’étude des possibilités de semi-structuration des données, et à fournir des métriques objectives pour mieux faire ressortir les avantages et les inconvénients de chaque solution par rapport aux besoins des utilisateurs. Pour cela, une séquence de trois phases peut composer un processus de conception. Chaque phase peut être aussi effectuée indépendamment à des fins d’analyse et de réglage des bases existantes. La stratégie générale de SCORUS est composée par :- Génération d’un ensemble d’alternatives de structuration : dans cette phase nous proposons de partir d’une modélisation UML des données et de produire automatiquement un ensemble de variantes de structuration orienté document.- Évaluation d’alternatives en utilisant un ensemble de métriques structurelles : cette évaluation prend un ensemble de variantes de structuration et calcule les métriques au regard des données modélisées.- Analyse des alternatives évaluées : utilisation des métriques afin d’analyser les alternatives considérées et de choisir la ou les plus appropriées.Cette thèse présente les outils théoriques et logiciels pour SCORUS ainsi que des expérimentations avec MongoDB.
Article
Due to the scalability and availability problems with traditional relational database systems, a variety of NoSQL stores have emerged over the last decade to deal with big data. How data are structured in a NoSQL store has a large impact on the query and update performance and the storage usage. Thus, different from the traditional database design, not only the data structure but also the data access patterns need to be considered in the design of NoSQL database schemas. In this paper, we present a general workload-driven method for designing key–value, wide-column, and document NoSQL database schemas. We first present a generic logical model Query Path Graph (QPG) that can represent the data structures of the UML class diagram. We also define mappings from the SQL-based query patterns to QPG and from QPG to aggregate-oriented NoSQL schemas. We use a cost model to measure the query and update performance and optimize the QPG schemas. We evaluate the proposed method with several typical case studies by simulating workloads on databases with different schema designs. The results demonstrate that our method preserves the generality and the quality of the design.
Chapter
Full-text available
Batch processing reduces processing time in a business process at the expense of increasing waiting time. If this trade-off between processing and waiting time is not analyzed, batch processing can, over time, evolve into a source of waste in a business process. Therefore, it is valuable to analyze batch processing activities to identify waiting time wastes. Identifying and analyzing such wastes present the analyst with improvement opportunities that, if addressed, can improve the cycle time efficiency (CTE) of a business process. In this paper, we propose an approach that, given a process execution event log, (1) identifies batch processing activities, (2) analyzes their inefficiencies caused by different types of waiting times to provide analysts with information on how to improve batch processing activities. More specifically, we conceptualize different waiting times caused by batch processing patterns and identify improvement opportunities based on the impact of each waiting time type on the CTE. Finally, we demonstrate the applicability of our approach to a real-life event log.
Chapter
With data’s evolution in terms of volume, variety, and velocity, Information Systems (IS) administrators have to steadily adapt their data model and choose the best solution(s) to store and manage data in accordance with users’ requirements. In this context, many existing solutions transform a source data model into a target one, but none of them leads the administrator to choose the most suitable model by offering a limited solution space automatically calculated and adapted to his needs. We propose ModelDrivenGuide, an automatic global approach for leading the model transformation process. It starts by transforming the conceptual model into a logical model, and it defines refinement rules that help to generate all possible data models. Our approach then relies on a heuristic to reduce the search space by avoiding cycles and redundancies. We also propose a formalisation of the denormalization process and we discuss the completeness and the complexity of our approach.KeywordsNoSQLMDADenormalizationModel refinementHeuristic
Article
The importance of data security is currently increasing owing to the number of data transactions that are continuously taking place. Large amounts of data are generated, stored, modified and transferred every second, signifying that databases require an appropriate capacity, control and protection that will enable them to maintain a secure environment for so much data. Big Data is becoming a prominent trend in our society, and increasing amounts of data, including sensitive and personal information, are being loaded into NoSQL and other Big Data technologies for analysis and processing. However, current security approaches do not take into account the special characteristics of these technologies, leaving sensitive and personal data unprotected and consequently risking considerable financial losses and brand damage. In this paper, we focus on NoSQL document databases and present a proposal for the design and implementation of security policies in this type of databases. We first follow the concept of security by design in order to propose a metamodel that allows the specification of both the structure and the security policies required for document databases. We also define an implementation model by analysing the implementation features provided by a specific NoSQL document database management system (MongoDB). Having obtained the design and implementation models, we follow the model-driven development philosophy and propose a set of transformation rules that allow the automatic generation of the final implementation of security policies. We additionally provide a technological solution in which the Eclipse Modelling Framework environment is employed in order to implement both the design metamodel (Emfatic) and the transformations (Epsilon, EGL). Finally, we apply the proposed framework to a case study carried out in the airport domain. This proposal, in addition to saving development time and costs, generates more robust solutions by considering security by design. This, therefore, abstracting the designer from both specific aspects of the target tool and having to choose the best strategies for the implementation of security policies.
Article
Document-oriented bases allow high flexibility in data representation which facilitates a rapid development of applications and enables many possibilities for data structuring. Unfortunately, in many cases, due to this flexibility and the absence of data modelling, the choice of a data representation is neglected by developers leading to many issues on several aspects of the document base and application quality; e.g., memory print, data redundancy, readability and maintainability. We aim at facilitating the study of data structuring alternatives and providing objective metrics to better reveal the advantages and disadvantages of a structure with respect to user needs. The main contributions of our approach are twofold. First of all, the semi-automatic generation of many suitable alternatives for data structuring given an initial UML model. Second, the automatic computation of structural metrics, allowing a comparison of the alternatives for JSON-compatible schema abstraction. These metrics reflect the complexity of the structure and are intended to be used in decision criteria for schema analysis and design process. This work capitalises on experiences with MongoDB, XML and software complexity metrics. The paper presents the schema generation and the metrics together with a validation scenario where we discuss how to use the results in a schema recommendation perspective.
Article
Today with the growth of the internet, the use of social networks, mobile telephony, connected and communicating objects. The data has become so big, hence the need to exploit that data has become primordial. In practice, a very large number of companies specializing in the health sector, the banking and financial sector, insurance, manufacturing industry, etc… are based on traditional databases which are often well organized of customer data, machine data, etc ... but in most cases, very large volumes of data from these databases, and the speed with which they must be analyzed to meet the business needs of the company are real challenges.This article aims to respond to a problem of generating NoSQL MongoDB databases by applying an approach based on model-driven engineering (Model Driven Architecture Approach). We provide Model to Model (using the QVT model transformation language), and Model to Code transformations (using the code generator, Acceleo). We also propose vertical and horizontal transformations to demonstrate the validity of our approach on NoSQL MongoDB databases. We have studied in this article the PSM transformations towards the implementation. PIM to PSM transformations are the subject of another work.
Article
Today with the growth of the internet, the use of social networks, mobile telephony, connected and communicating objects. The data has become so big, hence the need to exploit that data has become primordial. In practice, a very large number of companies specializing in the health sector, the banking and financial sector, insurance, manufacturing industry, etc… are based on traditional databases which are often well organized of customer data, machine data, etc ... but in most cases, very large volumes of data from these databases, and the speed with which they must be analyzed to meet the business needs of the company are real challenges.This article aims to respond to a problem of generating NoSQL MongoDB databases by applying an approach based on model-driven engineering (Model Driven Architecture Approach). We provide Model to Model (using the QVT model transformation language), and Model to Code transformations (using the code generator, Acceleo). We also propose vertical and horizontal transformations to demonstrate the validity of our approach on NoSQL MongoDB databases. We have studied in this article the PSM transformations towards the implementation. PIM to PSM transformations are the subject of another work.
Chapter
Popular document-oriented systems store JSON-like data (e.g. MongoDB). Such data formats combine the flexibility of semi-structured models and traditional data structures like records and arrays. This allows numerous structuring possibilities even for simple data. The data structure choice is important as it impacts many aspects such as memory footprint, data access performances and programming complexity. Our work aims at helping users in selecting data structuring from a set of automatically generated alternatives. These alternatives can be analyzed considering complexity metrics, query requirements and best practices using such “schemaless” databases. Our approach for “schema” generation has been inspired from Software Product Lines strategies based on feature models. From a UML class diagram that represents user’s data, we generate automatically a feature model that implicitly contains the structure alternatives with their variations and common points. This feature model satisfies document-oriented constraints so as user constraints reflecting good practices or particular needs. It leads to a set of data structuring alternatives to be considered by the user for his operational choices.
Chapter
Big Data have received a great deal of attention in recent years. Not only the amount of data is on a completely different level than before, but also we have different type of data including factors such as format, structure, and sources. This has definitely changed the tools we need to handle Big Data, giving rise to NoSQL systems. While NoSQL systems have proven their efficiency to handle Big Data, it’s still an unsolved problem how the automatic storage of Big Data in NoSQL systems could be done. This paper proposes an automatic approach for implementing UML conceptual models in NoSQL systems, including the mapping of the associated OCL constraints to the code required for checking them. In order to demonstrate the practical applicability of our work, we have realized it in a tool supporting four fundamental OCL expressions: Iterate-based expressions, OCL predefined operations, If expression and Let expression.
Conference Paper
Full-text available
Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.
Conference Paper
Full-text available
The need to store and manipulate large volume of (unstructured) data has led to the development of several NoSQL databases for better scalability. Graph databases are a particular kind of NoSQL databases that have proven their efficiency to store and query highly interconnected data, and have become a promising solution for multiple applications. While the mapping of conceptual schemas to relational databases is a well-studied field of research, there are only few solutions that target conceptual modeling for NoSQL databases and none of them focusing in graph databases. This is specially true when dealing with the mapping of business rules and constraints in the conceptual schema. In this article we describe a possible mapping from UML/OCL conceptual schemas to Blueprints, an abstraction layer on top of a variety of graph databases, and Grem-lin, a graph traversal language via an intermediate Graph metamodel representing data structure. Tool support is fully available.
Conference Paper
Full-text available
Not only SQL (NoSQL) databases are becoming increasingly popular and have some interesting strengths such as scalability and flexibility. In this paper, we investigate on the use of NoSQL systems for implementing OLAP (On-Line Analytical Processing) systems. More precisely, we are interested in instantiating OLAP systems (from the conceptual level to the logical level) and instantiating an aggregation lattice (optimization). We define a set of rules to map star schemas into two NoSQL models: column-oriented and document-oriented. The experimental part is carried out using the reference benchmark TPC. Our experiments show that our rules can effectively instantiate such systems (star schema and lattice). We also analyze differences between the two NoSQL systems considered. In our experiments, HBase (column-oriented) happens to be faster than MongoDB (document-oriented) in terms of loading time.
Article
Full-text available
In this paper, we examine a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers. Originally motivated by Web 2.0 applications, these systems are designed to scale to thousands or millions of users doing updates as well as reads, in contrast to traditional DBMSs and data warehouses. We contrast the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions. These systems typically sacrifice some of these dimensions, e.g. database-wide transaction consistency, in order to achieve others, e.g. higher availability and scalability.
Conference Paper
Full-text available
We are currently witnessing an important paradigm shift in information system construction, namely the move from object and component technology to model technology. The object technology revolution has allowed the replacement of the over twenty-year-old step-wise procedural decomposition paradigm with the more fashionable object composition paradigm. Surprisingly, this evolution seems to have triggered another even more radical change, the current trend toward model transformation. A concrete example is the Object Management Group's rapid move from its previous Object Management Architecture vision to the latest Model-Driven Architecture. This paper proposes an interpretation of this evolution through abstract investigation. In order to stay as language-independent as possible, we have employed the neutral formalism of Sowa's conceptual graphs to describe the various situations characterizing this organization. This will allow us to identify potential problems in the proposed modeling framework and suggest some possible solutions.
Conference Paper
It is widely accepted today that Relational databases are not appropriate in highly distributed shared-nothing architectures of commodity hardware, that need to handle poorly structured heterogeneous data. This has brought the blooming of NoSQL systems with the purpose of mitigating such problem, specially in the presence of analytical workloads. Thus, the change in the data model and the new analytical needs beyond OLAP take us to rethink methods and models to design and manage these newborn repositories. In this paper, we will analyze state of the art and future research directions.
Article
In order to reduce the influence of requirement change for software development and improve the efficiency and portability of software development efficiently, this paper, based on the ideas of Model Driven Architecture (MDA), proposes a method that transforms UML class diagrams into HBase based on Meta-model. The method achieves the transformation from Platform Independent Model (PIM) to Platform Specific Model (PSM) on the meta-model level and is comprised of three phases. In the first phase, the meta-models of UML class diagram and HBase database are built. In the second phase, the mapping rules between the two meta-models are proposed. In the last phase, the UML class diagram is built and the HBase database model is generated by transformation. At last, the paper uses Atlas language to achieve a breakfast serving system to prove the feasibility of the MDA in the software development.
Conference Paper
With the proliferation of cloud service providers, the use of non-relational (NoSQL) data stores is increasing. In contrast to standard relational database schema design, which has its strong mathematical background in relational algebra and set theory, development with NoSQL data stores is largely based on empirical best practices. Furthermore, the huge variety of NoSQL variants may require different design considerations. In this paper, an algorithm is introduced to automatically derive cost and performance optimal schema in column-oriented data stores based on predefined queries and an initial relational database schema. Algorithms are given to perform database denormalization, as well as to transform the original queries to meet the newly created schemas.
Article
With the development of distributed system and cloud computing, more and more applications might be migrated to the cloud to exploit its computing power and scalability, where the first task is data migration. In this paper, we propose a novel approach that transforms a relational database into HBase, which is an open-source distributed database similar to BigTable. Our method is comprised of two phases. In the first phase, relational schema is transformed into HBase schema based on the data model of HBase. We present three guidelines in this phase, which could be further utilized to develop an HBase application. In the second phase, relationships between two schémas are expressed as a set of nested schema mappings, which would be employed to create a set of queries or programs that transform the source relational data into the target representation automatically.
Conference Paper
There has been a significant amount of excitement and recent work on column-oriented database systems ("column-stores"). These database systems have been shown to perform more than an or- der of magnitude better than traditional row-oriented database sys- tems ("row-stores") on analytical workloads such as those found in data warehouses, decision support, and business intelligence appli- cations. The elevator pitch behind this performance difference is straightforward: column-stores are more I/O efficient for read-only queries since they only have to read from disk (or from memory) those attributes accessed by a query. This simplistic view leads to the assumption that one can ob- tain the performance benefits of a column-store using a row-store: either by vertically partitioning the schema, or by indexing every column so that columns can be accessed independently. In this pa- per, we demonstrate that this assumption is false. We compare the performance of a commercial row-store under a variety of differ- ent configurations with a column-store and show that the row-store performance is significantly slower on a recently proposed data warehouse benchmark. We then analyze the performance differ- ence and show that there are some important differences between the two systems at the query executor level (in addition to the obvi- ous differences at the storage layer level). Using the column-store, we then tease apart these differences, demonstrating the impact on performance of a variety of column-oriented query execution tech- niques, including vectorized query processing, compression, and a new join algorithm we introduce in this paper. We conclude that while it is not impossible for a row-store to achieve some of the performance advantages of a column-store, changes must be made to both the storage layer and the query executor to fully obtain the benefits of a column-oriented approach.
Conference Paper
In this paper, we attempt to address the relative absence of empirical studies of model driven engineering through describing the practices of three commercial organizations as they adopted a model driven engineering approach to their software development. Using in-depth semi-structured interviewing we invited practitioners to reflect on their experiences and selected three to use as exemplars or case studies. In documenting some details of attempts to deploy model driven practices, we identify some ‘lessons learned’, in particular the importance of complex organizational, managerial and social factors – as opposed to simple technical factors – in the relative success, or failure, of the endeavour. As an example of organizational change management the successful deployment of model driven engineering appears to require: a progressive and iterative approach; transparent organizational commitment and motivation; integration with existing organizational processes and a clear business focus.
Using the column oriented model for implementing big data warehouses
  • K Dehdouh
  • F Bentayeb
  • O Boussaid
  • N Kabachi