ArticlePDF Available

Formalizing the Mapping of UML Conceptual Schemas to Column-Oriented Databases

Authors:

Abstract and Figures

Nowadays, most organizations need to improve their decision-making process using Big Data. To achieve this, they have to store Big Data, perform an analysis, and transform the results into useful and valuable information. To perform this, it's necessary to deal with new challenges in designing and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process based on relational databases. The influence of Big Data challenged this traditional approach primarily due to the changing nature of data. As a result, using NoSQL databases has become a necessity to handle Big Data challenges. In this article, the authors show how to create a data warehouse on NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical models starting from a UML conceptual model. To ensure efficient automatic transformation, they propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping to one or more column-oriented platforms. The authors provide experiments of their approach using a case study in the health care field.
Content may be subject to copyright.
DOI: 10.4018/IJDWM.2018070103

Volume 14 • Issue 3 • July-September 2018
Copyright © 2018, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
44



Fatma Abdelhedi, CBI2 – TRIMANE, Paris, France
Amal Ait Brahim, Toulouse Institute of Computer Science Research (IRIT), Toulouse Capitole University, Toulouse, France
Gilles Zuruh, Toulouse Institute of Computer Science Research (IRIT), Toulouse Capitole University, Toulouse, France

Nowadays, most organizations need to improve their decision-making process using Big Data. To
achieve this, they have to store Big Data, perform an analysis, and transform the results into useful
and valuable information. To perform this, it’s necessary to deal with new challenges in designing
and creating data warehouse. Traditionally, creating a data warehouse followed well-governed process
based on relational databases. The influence of Big Data challenged this traditional approach primarily
due to the changing nature of data. As a result, using NoSQL databases has become a necessity to
handle Big Data challenges. In this article, the authors show how to create a data warehouse on
NoSQL systems. They propose the Object2NoSQL process that generates column-oriented physical
models starting from a UML conceptual model. To ensure efficient automatic transformation, they
propose a logical model that exhibits a sufficient degree of independence so as to enable its mapping
to one or more column-oriented platforms. The authors provide experiments of their approach using
a case study in the health care field.

Big Data, Column-Oriented Model, Data Warehouse, Decision Support System, NoSQL Data Modelling, UML

Typically, a decision support system is based on two components: Data Warehouse (DW) and one or
more Data Marts (DMs) (Figure 1). DW is a database used for decision-making, where data are either
gathered from existing sources or directly entered to meet the needs of a decision-support application
(for this last case, see the medical application presented in Section Motivation). Staring from DW we
extract subsets, called Data Marts, on which we apply OLAP operations. DMs are designed according
to a multidimensional model (star schema, snowflake schema or fact constellation schema) (Teste,
2010) in order to meet the particular demands of a specific decision makers group. In contrast, DW
is not directly accessible to decision makers, there is therefore no need to use a multidimensional
model to describe it; the relational model was the most effective model used for this.

Volume 14 • Issue 3 • July-September 2018
45
The influence of Big Data challenged this traditional approach that uses relational databases for
data warehousing. This is primarily due to data that has become highly distributed, loosely structured
and is growing at exponential rates. Usually, we use Volume, Variety and Velocity, known as 3Vs
(Douglas, 2001), to characterize the concept of Big Data. Volume is the size of the data set that needs
to be processed, Variety describes different data types including factors such as format, structure,
and sources, and Velocity refers to the speediness with which data may be analyzed and processed.
Most organizations need to improve their decision-making process using Big Data. To achieve this,
they have to store Big Data, perform an analysis, and transform the results into useful and valuable
information. To perform these storage and analytical processes, it’s necessary to deal with new
challenges in designing and creating DW.
Indeed, some new considerations should be verified by the database used for data warehousing.
It should have the ability to: (1) integrate all possible data structures, (2) combine multiple data
sources, (3) scale at relatively low cost, and (4) analyze large volumes of data. Relational warehouses
are mature data management technology. However, with the rise of Big Data, these systems became
unfit for large, distributed data management. The major problems of relational technologies are: (1)
the horizontal scale: Relational databases were mainly designed for single-server configurations. To
scale relational database, it has to be distributed across multiple powerful servers that are expensive.
Furthermore, handling tables across different servers is difficult. (2) a strict data model to design
prior to data processing: in Big Data context, it should be easy to add and analyze new data regardless
of its type (structured, semi-structured or unstructured); But the problem is that relational models
are hard to change incrementally without impacting performance or taking the database offline. As
a result, new kind of DBMS, known as “NoSQL” (Cattell, 2011), have appeared. NoSQL databases
are well suited for managing large volume of data and they keep good performance when scaling up
(Angadi, 2013). Using NoSQL for data warehousing has become a necessity for a wider number of
reasons, mainly relating to the high performance provided by these systems (Herrero, 2016).
This work deal with creating a DW in Big Data context, and is motivated by the needs of a medical
application. This application generates a continuous stream of complex data (patient histories, visit
summaries, paper prescriptions, radiology reports, etc.) that will be directly entered into a DW (§2).
To describe this DW, a conceptual data model closer to human thinking is required; the choice for
such model has been UML (Abello, 2015). Our purpose is to assist developers in creating the DW
on a NoSQL database. For this, we propose an automatic process that transforms UML conceptual
model describing DW into a NoSQL model.
Figure 1. Decision support systems architecture
23 more pages are available in the full version of this
document, which may be purchased using the "Add to Cart"
button on the product's webpage:
www.igi-global.com/article/formalizing-the-mapping-of-uml-
conceptual-schemas-to-column-oriented-
databases/208692?camid=4v1
This title is available in InfoSci-Surveillance, Security, and
Defense eJournal Collection, InfoSci-Knowledge Discovery,
Information Management, and Storage eJournal Collection,
InfoSci-Journals, InfoSci-Journal Disciplines Engineering,
Natural, and Physical Science, InfoSci-Journal Disciplines
Library Science, Information Studies, and Education, InfoSci-
Journal Disciplines Computer Science, Security, and
Information Technology, InfoSci-Select. Recommend this
product to your librarian:
www.igi-global.com/e-resources/library-
recommendation/?id=154
Related Content
Building a Visual Analytics Tool for Location-Based Services
Erdem Kaya, Mustafa Tolga Eren, Candemir Doger and Selim Saffet Balcisoy (2016).
Big Data: Concepts, Methodologies, Tools, and Applications (pp. 615-637).
www.igi-global.com/chapter/building-a-visual-analytics-tool-for-location-
based-services/150184?camid=4v1a
Big Data: Challenges, Opportunities, and Realities
Abhay Kumar Bhadani and Dhanya Jothimani (2016). Effective Big Data
Management and Opportunities for Implementation (pp. 1-24).
www.igi-global.com/chapter/big-data/157681?camid=4v1a
A Query Beehive Algorithm for Data Warehouse Buffer Management and
Query Scheduling
Amira Kerkad, Ladjel Bellatreche, Pascal Richard, Carlos Ordonez and Dominique
Geniet (2014). International Journal of Data Warehousing and Mining (pp. 34-58).
www.igi-global.com/article/a-query-beehive-algorithm-for-data-warehouse-
buffer-management-and-query-scheduling/116892?camid=4v1a
Efficient Top-k Keyword Search Over Multidimensional Databases
Ziqiang Yu, Xiaohui Yu and Yang Liu (2013). International Journal of Data
Warehousing and Mining (pp. 1-21).
www.igi-global.com/article/efficient-top-keyword-search-
over/78373?camid=4v1a
... Imam et al. [39] document-oriented --JSON Akintoye et al. [6] document-oriented, graph --JSON Martins de Sousa and del Val Cura [49] graph new based on ER new based on ER -Vágner [76] graph EER -Neo4j model Abdelhedi et al. [2] column --Cassandra and Hbase models Nogueira et al. [53] document-oriented --JSON Hamouda and Zainol [34] document-oriented -new based on UML -Abdelhedi et al. [5] document-oriented, graph, column -generic model generic model ...
... Benchmark [53] Evaluation [60] Guidelines [6], [9], [12], [14], [17], [20], [21], [29], [37], [38], [40], [43], [56], [61], [65], [68], [71], [72], [73], [74], [76], [79], [80] Migration [34] Ontology [12] Process Transform [1], [2], [3], [4], [5], [13], [23], [26], [28], [47], [48], [49], [50], [54], [58], [59], [62], [63], [66], [69], [75], [77], [78] Query Oriented [46] Schema Generation [39], [41], [52] A distribution of NoSQL databases types along with the contexts where the models were used is shown in Figure 8. ...
Article
Modeling is one of the most important steps in developing a database. In traditional databases, the Entity Relationship (ER) and Unified Modeling Language (UML) models are widely used. But how are NoSQL databases being modeled? We performed a systematic mapping review to answer three research questions to identify and analyze the levels of representation, models used, and contexts where the modeling process occurred in the main categories of NoSQL databases. We found 54 primary studies where we identified that conceptual and logical levels received more attention than the physical level of representation. The UML, ER, and new notation based on ER and UML were adapted to model NoSQL databases, in the same way, formats such as JSON, XML, and XMI were used to generate schemas through the three levels of representation. New contexts such as benchmark, evaluations, migration, and schema generation were identified, as well as new features to be considered for modeling NoSQL databases, such as the number of records by entities, CRUD operations, and system requirements (availability, consistency, or scalability). Additionally, a coupling and co-citation analysis was carried out to identify relevant works and researchers.
... The authors in [2] presented a process called Object2NoSQL that transfers UML class diagram to column-based NoSQL databases. The process first transforms the UML model to logical model, then transforms the logical model to physical model that is independent on its implementation from the logical model. ...
Thesis
Correct operation of many critical systems is dependent on the consistency and integrity properties of underlying databases. Therefore, a verifiable and rigorous database design process is highly desirable. This research investigated and delivered a comprehensive and practical approach for modelling databases in a formal method and provide a tool that translates the verified model to a database implementation. The methodology was guided by a number of case studies, using abstraction and refinement in UML-B and verification with the Rodin tool. UML-B is a graphical representation of the Event-B formalism and the Rodin tool supports verification for Event-B and UML-B. Our method guides developers to model relational databases in UML-B through layered refinement and to specify the necessary constraints and operations on the database. The guidelines are supported by a tool we have developed called UB2DB that automatically generates a database system from a verified UML-B model. The tool generates both the structure to create the database in Oracle as well as the necessary operations on the database that has been modelled as events in UML-B model. The evaluation shows that the generated code from the models of the case studies preserves the constraints of the database and the performance of the operations is not very different from a hand written code.
... Therefore, the data mining process ought to be kept into consideration modelling of data warehouse. With regard to mining techniques modelling at early stage, several proposals are found in literature, but did not emphasize the completeness of MD model [4,9,10,29,30,31]. It is noticed that all the proposals of mining-aware design are focused on concept of integration of mining and data warehouse and their attention was towards the mining concepts mostly. ...
Article
Full-text available
Data Warehouse (DW) applications provide past detail for judgment process for the companies. It is acknowledged that these systems depend on Multidimensional (MD) modelling different from traditional database modelling. MD modelling keeps data in the form of facts and dimensions. Some proposals have been presented to achieve the modelling of these systems, but none of them covers the MD modelling completely. There is no any approach which considers all the major components of MD systems. Some proposals provide their proprietary visual notations, which force the architects to gain knowledge of new precise model. This paper describes a framework which is in the form of an extension to Unified Modelling Language (UML). UML is worldwide known to design a variety of perspectives of software systems. Therefore, any method using the UML reduces the endeavour of designers in understanding the novel notations. Another exceptional characteristic of the UML is that it can be extended to bring in novel elements for different domains. In addition, the proposed UML profile focuses on the accurate representations of the properties of the MD systems based on domain specific information. The proposed framework is validated using a specific case study. Moreover, an evaluation and comparative analysis of the proposed framework is also provided to show the efficiency of the proposed work.
... Les travaux de ce chapitre ont été présentés dans les publications suivantes : [Abdelhedi et al., 2016a], [Abdelhedi et al., 2016b], [Abdelhedi et al., 2016c], [Abdelhedi et al., 2017a], [Abdelhedi et al., 2017b], [Abdelhedi et al., 2017c], [Abdelhedi et al., 2018a] et [Abdelhedi et al., 2018b]. ...
Thesis
It is widely accepted today that relational systems are not appropriate to handle Big Data. This has led to a new category of databases commonly known as NoSQL databases that were created in response to the needs for better scalability, higher flexibility and faster data access. These systems have proven their efficiency to store and query Big Data. Unfortunately, only few works have presented approaches to implement conceptual models describing Big Da-ta in NoSQL systems. This paper proposes an automatic MDA-based approach that provides a set of transformations, formalized with the QVT language, to translate UML conceptual models into NoSQL models. In our approach, we build an intermediate logical model compatible with column, document, graph and key-value systems. The advantage of using a unified logical model is that this model remains stable, even though the NoSQL system evolves over time which simplifies the transformation process and saves developers efforts and time.
Article
Full-text available
Relational database management systems (RDMBSs) today are the predominant technology for storing. In the past few years, the "one size fits all"-thinking concerning datastores has been questioned by both, science and web affine companies, which has lead to the emergence of a great variety of alternative databases. There has been an enormous growth in the distributed databases area in the last few years, especially with the NOSQL movement. Keeping this as a motivation, this paper aims at giving a systematic overview of DBMS, discusses about the change from traditional file processing to RDS & ends with NOSQL. Also we have focused on the projects dealt by the NOSQL models with their description & we have said about when it is best suitable. Lastly we have listed the compared features of NoSQL & SQL. Further our paper will help researchers to develop new projects by overcoming the drawbacks of existing or work on the existing one & add up features.
Conference Paper
Full-text available
Big Data has recently gained popularity and has strongly questioned relational databases as universal storage systems, especially in the presence of analytical workloads. As result, co-relational alternatives, commonly known as NOSQL (Not Only SQL) databases, are extensively used for Big Data. As the primary focus of NOSQL is on performance, NOSQL databases are directly designed at the physical level, and consequently the resulting schema is tailored to the dataset and access patterns of the problem in hand. However, we believe that NOSQL design can also benefit from traditional design approaches. In this paper we present a method to design databases for analytical workloads. Starting from the conceptual model and adopting the classical 3-phase design used for relational databases, we propose a novel design method considering the new features brought by NOSQL and encompassing relational and co-relational design altogether.
Conference Paper
Full-text available
The need to store and manipulate large volume of (unstructured) data has led to the development of several NoSQL databases for better scalability. Graph databases are a particular kind of NoSQL databases that have proven their efficiency to store and query highly interconnected data, and have become a promising solution for multiple applications. While the mapping of conceptual schemas to relational databases is a well-studied field of research, there are only few solutions that target conceptual modeling for NoSQL databases and none of them focusing in graph databases. This is specially true when dealing with the mapping of business rules and constraints in the conceptual schema. In this article we describe a possible mapping from UML/OCL conceptual schemas to Blueprints, an abstraction layer on top of a variety of graph databases, and Grem-lin, a graph traversal language via an intermediate Graph metamodel representing data structure. Tool support is fully available.
Article
Vertical partitioning is the process of subdividing the attributes of a relation or a record type, creating fragments. Previous approaches have used an iterative binary partitioning method which is based on clustering algorithms and mathematical cost functions. In this paper, however, we propose a new vertical partitioning algorithm using a graphical technique. This algorithm starts from the attribute affinity matrix by considering it as a complete graph. Then, forming a linearly connected spanning tree, it generates all meaningful fragments simultaneously by considering a cycle as a fragment. We show its computational superiority. It provides a cleaner alternative without arbitrary objective functions and provides an improvement over our previous work on vertical partitioning.
Conference Paper
It is widely accepted today that Relational databases are not appropriate in highly distributed shared-nothing architectures of commodity hardware, that need to handle poorly structured heterogeneous data. This has brought the blooming of NoSQL systems with the purpose of mitigating such problem, specially in the presence of analytical workloads. Thus, the change in the data model and the new analytical needs beyond OLAP take us to rethink methods and models to design and manage these newborn repositories. In this paper, we will analyze state of the art and future research directions.
Article
In order to reduce the influence of requirement change for software development and improve the efficiency and portability of software development efficiently, this paper, based on the ideas of Model Driven Architecture (MDA), proposes a method that transforms UML class diagrams into HBase based on Meta-model. The method achieves the transformation from Platform Independent Model (PIM) to Platform Specific Model (PSM) on the meta-model level and is comprised of three phases. In the first phase, the meta-models of UML class diagram and HBase database are built. In the second phase, the mapping rules between the two meta-models are proposed. In the last phase, the UML class diagram is built and the HBase database model is generated by transformation. At last, the paper uses Atlas language to achieve a breakfast serving system to prove the feasibility of the MDA in the software development.
Conference Paper
With the proliferation of cloud service providers, the use of non-relational (NoSQL) data stores is increasing. In contrast to standard relational database schema design, which has its strong mathematical background in relational algebra and set theory, development with NoSQL data stores is largely based on empirical best practices. Furthermore, the huge variety of NoSQL variants may require different design considerations. In this paper, an algorithm is introduced to automatically derive cost and performance optimal schema in column-oriented data stores based on predefined queries and an initial relational database schema. Algorithms are given to perform database denormalization, as well as to transform the original queries to meet the newly created schemas.
Article
With the development of the Internet and cloud computing, there need databases to be able to store and process big data effectively, demand for high-performance when reading and writing, so the traditional relational database is facing many new challenges. Especially in large scale and high-concurrency applications, such as search engines and SNS, using the relational database to store and query dynamic user data has appeared to be inadequate. In this case, NoSQL database created. This paper describes the background, basic characteristics, data model of NoSQL. In addition, this paper classifies NoSQL databases according to the CAP theorem. Finally, the mainstream NoSQL databases are separately described in detail, and extract some properties to help enterprises to choose NoSQL.