Article

Conceptual modeling in the era of Big Data and Artificial Intelligence: Research topics and introduction to the special issue

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Since the first version of the Entity–Relationship (ER) model proposed by Peter Chen over forty years ago, both the ER model and conceptual modeling activities have been key success factors for modeling computer-based systems. During the last decade, conceptual modeling has been recognized as an important research topic in academia, as well as a necessity for practitioners. However, there are many research challenges for conceptual modeling in contemporary applications such as Big Data, data-intensive applications, decision support systems, e-health applications, and ontologies. In addition, there remain challenges related to the traditional efforts associated with methodologies, tools, and theory development. Recently, novel research is uniting contributions from both the conceptual modeling area and the Artificial Intelligence discipline in two directions. The first one is efforts related to how conceptual modeling can aid in the design of Artificial Intelligence (AI) and Machine Learning (ML) algorithms. The second one is how Artificial Intelligence and Machine Learning can be applied in model-based solutions, such as model-based engineering, to infer and improve the generated models. For the first time in the history of Conceptual Modeling (ER) conferences, we encouraged the submission of papers based on AI and ML solutions in an attempt to highlight research from both communities. In this paper, we present some of important topics in current research in conceptual modeling. We introduce the selected best papers from the 37th International Conference on Conceptual Modeling (ER’18) held in Xi’an, China and summarize some of the valuable contributions made based on the discussions of these papers. We conclude with suggestions for continued research.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Our approach lies in creating a graph representation of data coming from different sources to enable the execution of predictive Artificial Intelligence algorithms [2]. Achieving this objective requires appropriate data engineering considerations, including the definition of a conceptual model to help design, develop and run these artificial intelligence solutions [10][11][12]. New research fields are opening strong opportunities for the definition of conceptual models [13,14]. ...
Article
Manually integrating data of diverse formats and languages is vital to many artificial intelligence applications. However, the task itself remains challenging and time-consuming. This paper highlights the potential of Large Language Models (LLMs) to streamline data extraction and resolution processes. Our approach aims to address the ongoing challenge of integrating heterogeneous data sources, encouraging advancements in the field of data engineering. Applied on the specific use case of learning disorders in higher education, our research demonstrates LLMs’ capability to effectively extract data from unstructured sources. It is then further highlighted that LLMs can enhance data integration by providing the ability to resolve entities originating from multiple data sources. Crucially, the paper underscores the necessity of preliminary data modeling decisions to ensure the success of such technological applications. By merging human expertise with LLM-driven automation, this study advocates for the further exploration of semi-autonomous data engineering pipelines.
... Artificial intelligence is a complex category that includes multiple disciplines [11][12][13]. Big data is also called a huge amount of data, which refers to the collection of data that cannot be extracted, summarized, and processed in a short period of time due to the huge and cumbersome content of the data [14][15][16]. Later, other methods can be used to integrate these disorganized data and transform them into our use [17][18][19]. ...
Article
Ecological environment has always been an important prerequisite while reflecting people and nature. The construction reflects the degree of development and civilization of a country as a whole, so it is related to the future of mankind. The deep integration of land use is a major breakthrough in solving the complex problems in the process of ecological civilization development and transformation. By establishing a "fusion" innovation and entrepreneurship ecological civilization system, this paper applies artificial intelligence and big data in the construction path of innovation and entrepreneurship ecological system from the perspective of land use and ecological suitability. Simulation studies were conducted in parasitic mode, biased symbiosis mode, asymmetric symbiosis mode, and symmetric symbiosis mode respectively through Matlab software. According to the results of the study, the subject size of the relevant subjects in the parasitic mode is only 70.43% of the subject size of the entrepreneurial enterprise. In the biased symbiosis model, the subject size of the relevant subject is 87.82% of the subject size of the entrepreneurial enterprise.
... The basic solution of these works relies upon UML Templates ensuring ETL processes scenarios presentation in a generic way. Indeed, the construction of a template for a complex operations was and is still a crucial need especially with the emergence of new data era (Mallek et al. 2023;Trujillo et al. 2021). Moreover, Di Tria et al. (2017) reported that the need for a pattern or template is highly needed, which is demonstrated by this citation "One challenge is that an aim of Big Data systems is to reveal patterns, trends and associations, especially relating to human behavior and interactions." ...
Article
Full-text available
Currently, the blooming growth of social networks such as Facebook, Twitter, Instagram, etc., has generated and is still generating a big amount of data, which can be regarded as a gold mine for business analysts and researchers where several insights that are useful and essential for effective decision making have to be provided. However, multiple problems and challenges affect the decisional support systems, especially at the level of the Extraction–Transformation–Loading processes. These processes are responsible for the selection, filtering and normalizing of data sources in order to obtain relevant decisions. As far as this research paper is concerned, we aim to focus on adapting the transformation phase with the MapReduce paradigm to process data in a distributed and parallel environment. Subsequently, we set forward a conceptual model of this second phase that is composed of several operations that handle NoSQL structure, which is suitable for Big Data storage. Finally, we implement through Talend for Big Data our new components, which help the designer apply selection, projection and joining operations on the extracted data from social media.
... One of the most essential activities in the process of ontology engineering is conceptualization. It is concerned with recognizing concepts in the real world in order to construct a model of the relevant domain [18]. Enhancing the activity of conceptualization has a significant impact on the final ontology's quality. ...
Article
Full-text available
Ontologies provide a powerful method for representing, reusing, and sharing domain knowledge. They are extensively used in a wide range of disciplines, including artificial intelligence, knowledge engineering, biomedical informatics, and many more. For several reasons, developing domain ontologies is a challenging task. One of these reasons is that it is a complicated and time-consuming process. Multiple ontology development methodologies have already been proposed. However, there is room for improvement in terms of covering more activities during development (such as enrichment) and enhancing others (such as conceptualization). In this research, an enhanced ontology development methodology (ON-ODM) is proposed. Ontology-driven conceptual modeling (ODCM) and natural language processing (NLP) serve as the foundation of the proposed methodology. ODCM is defined as the utilization of ontological ideas from various areas to build engineering artifacts that improve conceptual modeling. NLP refers to the scientific discipline that employs computer techniques to analyze human language. The proposed ON-ODM is applied to build a tourism ontology that will be beneficial for a variety of applications, including e-tourism. The produced ontology is evaluated based on competency questions (CQs) and quality metrics. It is verified that the ontology answers SPARQL queries covering all CQ groups specified by domain experts. Quality metrics are used to compare the produced ontology with four existing tourism ontologies. For instance, according to the metrics related to conciseness, the produced ontology received a first place ranking when compared to the others, whereas it received a second place ranking regarding understandability. These results show that utilizing ODCM and NLP could facilitate and improve the development process, respectively.
... Conceptualization is one of the crucial activities in ontology engineering. Conceptualization focuses on recognizing the concepts in the real world to build the model of the relevant domain [40]. This activity has a significant impact on the quality of the final ontology, as the quality of any artifact based on a model is constrained by the model's quality [21]. ...
Article
Full-text available
During the last decade, ontology engineering has undoubtedly participated in a lot of beneficial applications in different domains. Nevertheless, ontology development still faces several significant challenges that need to be addressed. This study proposes an enhanced architecture for the ontology development lifecycle. With the help of this architecture, users can complete ontology development tasks since it provides guidance for all key activities, from requirement specification to ontology evaluation. Ontology-driven conceptual modeling (ODCM) and ontology matching serve as the foundation of this architecture. ODCM is defined as the application of ontological ideas from various fields to build engineering objects that improve conceptual modeling. Ontology matching is a promising approach to overcome the semantic heterogeneity challenge between different ontologies. The proposed architecture is applied to egovernance domain, which is one of the online services that gains a great attention worldwide, especially during the coronavirus pandemic. However, residents of Arab countries face numerous obstacles and do not receive the full benefits of e-governance. For these reasons, Egyptian e-government is selected as the suggested case study. The results are encouraging when the produced ontology is compared with 20 existing ontologies from the same domain. On the basis of OntoMetrics, the average values of metrics correlated to accuracy, understandability, cohesion and conciseness lie in the 95th, 95th, 95th and 57th percentiles respectively. The results can be further enhanced by defining more non-inheritance relations and distributing the instances across all classes.
... As summary, through the work carried out in this project, we confirm the great importance that should be allocated to the modeling phase of the ETL process for the acquisition, processing, storage, and analysis of data as pointed out in [20]. In fact, the success of a data warehousing project is tightly based on a good modeling, and which is usually very costly in terms of time and effort. ...
Chapter
Full-text available
With the explosion of new data processing and storage technologies nowadays, businesses are looking to harness the hidden value of data, each in their own way. Many contributions were proposed defining pipelines dedicated to Big Data processing and storage, but they target usually particular types of data and specific technologies to meet precise needs without considering the evolution of requirements or the data characteristics’ change. Thus, no approach has defined a generic architecture for Big Data warehousing process. In this paper, we propose a multi-layer model that integrates all the necessary elements and concepts in the different phases of a data warehousing process. It also contributes to generate an architecture that considers the specificity of data and applications and the suitable technologies. To illustrate our contribution, we have implemented the proposed model through a Business model and a Big Data architecture for the analysis of multi-source and social networks data.
Article
Anomaly detection approaches have become critically important to enhance decision-making systems, especially regarding the process of risk reduction in the economic performance of an organisation and the consumer costs. Previous studies on anomaly detection have examined mainly abnormalities that translate into fraud, such as fraudulent credit card transactions or fraud in insurance systems. However, anomalies represent irregularities in system patterns data, which may arise from deviations, adulterations or inconsistencies. Further, its study encompasses not only fraud, but also any behavioural abnormalities that signal risks. This paper proposes a literature review of methods and techniques to detect anomalies on diverse financial systems using a five-step technique. In our proposed method, we created a classification framework using codes to systematize the main techniques and knowledge on the subject, in addition to identifying research opportunities. Furthermore, the statistical results show several research gaps, among which three main ones should be explored for developing this area: a common database, tests with different dimensional sizes of data and indicators of the detection models' effectiveness. Therefore, the proposed framework is pertinent to comprehending an existing scientific knowledge base and signals important gaps for a research agenda considering the topic of anomalies in financial systems.
Article
Full-text available
Both conceptual modeling and machine learning have long been recognized as important areas of research. With the increasing emphasis on digitizing and processing large amounts of data for business and other applications, it would be helpful to consider how these areas of research can complement each other. To understand how they can be paired, we provide an overview of machine learning foundations and development cycle. We then examine how conceptual modeling can be applied to machine learning and propose a framework for incorporating conceptual modeling into data science projects. The framework is illustrated by applying it to a healthcare application. For the inverse pairing, machine learning can impact conceptual modeling through text and rule mining, as well as knowledge graphs. The pairing of conceptual modeling and machine learning in this this way should help lay the foundations for future research.
Article
Full-text available
L’ontologie générale constitue un fondement théorique important pour l’analyse, la conception et le développement dans les technologies de l’informa-tion. L’ontologie est une branche de la philosophie qui étudie ce qui existe dans la réalité. Une ontologie largement utilisée dans les systèmes d’information, en par-ticulier pour la modélisation conceptuelle, est l’ontologie BWW (Bunge-Wand-We-ber), fondée sur les idées du philosophe et physicien Mario Bunge, telles que syn-thétisées par Wand et Weber. Cette ontologie a été élaborée à partir d’une ancienne version de la philosophie de Bunge ; cependant, de nombreuses idées de Bunge ont évolué depuis lors. Une question importante est donc la suivante : les idées les plus récentes exprimées par Bunge appellent-elles une nouvelle ontolo-gie ? Dans cet article, nous analyserons des travaux récents et antérieurs de Bunge afin de répondre à cette question. Nous présentons une nouvelle ontologie basée sur les travaux plus récents de Bunge que nous nommons ontologie systémiste bungéenne (Bunge’s Systemist Ontolgy ; BSO). Nous comparons ensuite BSO aux constructions de BWW. La comparaison révèle à la fois un chevauchement consi-dérable entre BSO et BWW, ainsi que des différences substantielles. À partir de cette comparaison et de l’exposition initiale de BSO, nous proposons des sugges-tions pour diverses études ontologiques et identifions des questions qui pourraient alimenter un programme de recherche tant en modélisation conceptuelle qu’en technologie de l’information en général. General ontology is a prominent theoretical foundation for information technology analysis, design, and development. Ontology is a branch of philosophy which studies what exists in reality. A widely used ontology in information systems, especially for conceptual modeling, is the BWW (Bunge–Wand–Weber), which is based on ideas of the philosopher and physicist Mario Bunge, as synthesized by Wand and Weber. The ontology was founded on an early subset of Bunge’s philosophy; however, many of Bunge’s ideas have evolved since then. An important question, therefore, is: do the more recent ideas expressed by Bunge call for a new ontology? In this paper, we conduct an analysis of Bunge’s earlier and more recent works to address this question. We present a new ontology based on Bunge’s later and broader works, which we refer to as Bunge’s Systemist Ontology (BSO). We then compare BSO to the constructs of BWW. The comparison reveals both considerable overlap between BSO and BWW, as well as substantial differences. From this comparison and the initial exposition of BSO, we provide suggestions for further ontology studies and identify research questions that could provide a fruitful agenda for future scholarship in conceptual modeling and other areas of information technology.
Article
Full-text available
Conceptual models capture knowledge about domains of reality. Therefore, conceptual models and their modelling constructs should be based on theories about the world—that is, they should be grounded in ontology. Identity is fundamental to ontology and conceptual modelling because it addresses the very existence of objects and conceptual systems in general. Classification involves grouping objects that share similarities and delineating them from objects that fall under other concepts (qualitative identity). However, among objects that fall under the same concept, we must also distinguish between individual objects (individual identity). In this paper, we analyze the ontological question of identity, focusing specifically on institutional identity, which is the identity of socially constructed institutional objects. An institutional entity is a language construct that is ‘spoken into existence’. We elaborate on how institutional identity changes how we understand conceptual modelling and the models produced. We show that different models result if we base modelling on a property‐based conception of identity compared to an institutional one. We use the Bunge‐Wand‐Weber principles, which embrace a property‐based view of identity, as an anchor to the existing literature to point out how this type of ontology sidesteps identity in general and institutional identity in particular. We contribute theoretically by providing the first in‐depth ontological analysis of what the notion of institutional identity can bring to conceptual modelling. We also contribute a solid ontological grounding of identity management and the identity of things in digital infrastructures.
Article
Full-text available
General ontology is a prominent theoretical foundation for information technology analysis, design, and development. Ontol-ogy is a branch of philosophy which studies what exists in reality. A widely used ontology in information systems, especially for conceptual modeling, is the BWW (Bunge-Wand-Weber), which is based on ideas of the philosopher and physicist Mario Bunge, as synthesized by Wand and Weber. The ontology was founded on an early subset of Bunge's philosophy; however, many of Bunge's ideas have evolved since then. An important question, therefore, is: do the more recent ideas expressed by Bunge call for a new ontology? In this paper, we conduct an analysis of Bunge's earlier and more recent works to address this question. We present a new ontology based on Bunge's later and broader works, which we refer to as Bunge's System-ist Ontology (BSO). We then compare BSO to the constructs of BWW. The comparison reveals both considerable overlap between BSO and BWW, as well as substantial differences. From this comparison and the initial exposition of BSO, we provide suggestions for further ontology studies and identify research questions that could provide a fruitful agenda for future scholarship in conceptual modeling and other areas of information technology.
Conference Paper
Full-text available
Encoding methods affect the performance of process mining tasks but little work in the literature focused on quantifying their impact. In this paper, we compare 10 different encoding methods from three different families (trace replay and alignment, graph embeddings, and word embeddings) using measures to evaluate the overlaps in the feature space, the accuracy obtained, and the computational resources (time) consumed with a classification task. Across hundreds of event logs representing four variations of five scenarios and five anomalies, it was possible to identify the edge2vec method as the most accurate and effective in reducing class overlapping in the feature space.
Article
Full-text available
Thanks to the advances achieved in the last decade, the lack of adequate technologies to deal with Big Data characteristics such as Data Volume is no longer an issue. Instead, recent studies highlight that one of the main Big Data issues is the lack of expertise to select adequate technologies and build the correct Big Data architecture for the problem at hand. In order to tackle this problem, we present our methodology for the generation of Big Data pipelines based on several requirements derived from Big Data features that are critical for the selection of the most appropriate tools and techniques. Thus, thanks to our approach we reduce the required know-how to select and build Big Data architectures by providing a step-by-step methodology that leads Big Data architects into creating their Big Data Pipelines for the case at hand. Our methodology has been tested in two use cases.
Article
Full-text available
The role of information systems (IS) as representations of real-world systems is changing in an increasingly digitalized world, suggesting that conceptual modeling is losing its relevance to the IS field. We argue the opposite: Conceptual modeling research is more relevant to the IS field than ever, but it requires an update with current theory. We develop a new theoretical framework of conceptual modeling that delivers a fundamental shift in the assumptions that govern research in this area. This move can make traditional knowledge about conceptual modeling consistent with the emerging requirements of a digital world. Our framework draws attention to the role of conceptual modeling scripts as mediators between physical and digital realities. We identify new research questions about grammars, methods, scripts, agents, and contexts that are situated in intertwined physical and digital realities. We discuss several implications for conceptual modeling scholarship that relate to the necessity of developing new methods and grammars for conceptual modeling, broadening the methodological array of conceptual modeling scholarship, and considering new dependent variables.
Article
Full-text available
With advances in genomic sequencing technology, a large amount of data is publicly available for the research community to extract meaningful and reliable associations among risk genes and the mechanisms of disease. However, this exponential growth of data is spread in over thousand heterogeneous repositories, represented in multiple formats and with different levels of quality what hinders the differentiation of clinically valid relationships from those that are less well-sustained and that could lead to wrong diagnosis. This paper presents how conceptual models can play a key role to efficiently manage genomic data. These data must be accessible, informative and reliable enough to extract valuable knowledge in the context of the identification of evidence supporting the relationship between DNA variants and disease. The approach presented in this paper provides a solution that help researchers to organize, store and process information focusing only on the data that are relevant and minimizing the impact that the information overload has in clinical and research contexts. A case-study (epilepsy) is also presented, to demonstrate its application in a real context.
Article
Full-text available
The arrival of Big Data has contributed positively to the evolution of the data warehouse (DW ) technology. This gives birth of augmented DW s that aim at maximizing the effectiveness of existing ones. Various augmentation scenarios have been proposed and adopted by firms and industry covering several aspects such as new data sources (e.g., Linked Open Data (LOD), social, stream and IoT data), data ingestion, advanced deployment infrastructures, programming paradigms, data visualization. These scenarios allow companies reaching valuable decisions. By examining traditional DW s, we realized that they do not fulfill all decision-maker requirements since data sources alimenting a target DW are not rich enough to capture Big Data. The arrival of LOD era is an excellent opportunity to enrich traditional DW s with a new V dimension: Value. In this paper, we first conceptualize the variety of internal and external sources and study its effect on the ETL phase to ease the value capturing. Secondly, a Value-driven approach for the DW design is discussed. Thirdly, three realistic scenarios for integrating LOD in the DW landscape are given. Finally, experiments are conducted showing the added value by augmenting the existing DW environment with LOD.
Article
Full-text available
According to the FAIR guiding principles, one of the central attributes for maximizing the added value of information artifacts is interoperability. In this paper, I discuss the importance, and propose a characterization of the notion of Semantic Interoperability. Moreover, I show that a direct consequence of this view is that Semantic Interoperability cannot be achieved without the support of, on one hand, (i) ontologies, as meaning contracts capturing the conceptualizations represented in information artifacts and, on the other hand, of (ii) Ontology, as a discipline proposing formal meth- ods and theories for clarifying these conceptualizations and articulating their representations. In particular, I discuss the fundamental role of formal ontological theories (in the latter sense) to properly ground the construction of representation languages, as well as methodological and computational tools for supporting the engineering of ontologies (in the former sense) in the context of FAIR.
Chapter
Full-text available
Data access at genomic repositories is problematic, as data is described by heterogeneous and hardly comparable metadata. We previously introduced a unified conceptual schema, collected metadata in a single repository and provided classical search methods upon them. We here propose a new paradigm to support semantic search of integrated genomic metadata, based on the Genomic Knowledge Graph, a semantic graph of genomic terms and concepts, which combines the original information provided by each source with curated terminological content from specialized ontologies. Commercial knowledge-assisted search is designed for transparently supporting keyword-based search without explaining inferences; in biology, inference understanding is instead critical. For this reason, we propose a graph-based visual search for data exploration; some expert users can navigate the semantic graph along the conceptual schema, enriched with simple forms of homonyms and term hierarchies, thus understanding the semantic reasoning behind query results.
Conference Paper
Full-text available
This paper contributes to the philosophical foundations of conceptual modeling by addressing a number of foundational questions such as: What is a conceptual model? Among models used in computer science, which are conceptual , and which are not? How are conceptual models different from other models used in the Sciences and Engineering? The paper takes a stance in answering these questions and, in order to do that, it draws from a broad literature in philosophy, cognitive science, Logics, as well as several areas of Computer Science (including Databases, Software Engineering, Artificial Intelligence, Information Systems Engineering , among others). After a brief history of conceptual modeling, the paper addresses the aforementioned questions by proposing a characterization of conceptual models with respect to conceptual semantics and ontological commitments. Finally, we position our work w.r.t. to a "Reference Framework for Conceptual" modeling recently proposed in the literature.
Article
Full-text available
This paper conducts an empirical study that explores the differences between adopting a traditional conceptual modeling (TCM) technique and an ontology-driven conceptual modeling (ODCM) technique with the objective to understand and identify in which modeling situations an ODCM technique can prove beneficial compared to a TCM technique. More specifically, we asked ourselves if there exist any meaningful differences in the resulting conceptual model and the effort spent to create such model between novice modelers trained in an ontology- driven conceptual modeling technique and novice modelers trained in a traditional conceptual modeling technique. To answer this question, we discuss previous empirical research efforts and distill these efforts into two hypotheses. Next, these hypotheses are tested in a rigorously developed experiment, where a total of 100 students from two different Universities participated. The findings of our empirical study confirm that there do exist meaningful differences between adopting the two techniques. We observed that novice modelers applying the ODCM technique arrived at higher quality models compared to novice modelers applying the TCM technique. More specifically, the results of the empirical study demonstrated that it is advantageous to apply an ODCM technique over an TCM when having to model the more challenging and advanced facets of a certain domain or scenario. Moreover, we also did not find any significant difference in effort between applying these two techniques. Finally, we specified our results in three findings that aim to clarify the obtained results.
Article
Full-text available
As crowdsourced user-generated content becomes an important source of data for organizations, a pressing question is how to ensure that data contributed by ordinary people outside of traditional organizational boundaries is of suitable quality to be useful for both known and unanticipated purposes. This research examines the impact of different information quality management strategies, and corresponding data collection design choices, on key dimensions of information quality in crowdsourced user-generated content. We conceptualize a contributor-centric information quality management approach focusing on instance-based data collection. We contrast it with the traditional consumer-centric fitness-for-use conceptualization of information quality that emphasizes class-based data collection. We present laboratory and field experiments conducted in a citizen science domain that demonstrate trade-offs between the quality dimensions of accuracy, completeness (including discoveries), and precision between the two information management approaches and their corresponding data collection designs. Specifically, we show that instance-based data collection results in higher accuracy, dataset completeness and number of discoveries, but this comes at the expense of lower precision. We further validate the practical value of the instance-based approach by conducting an applicability check with potential data consumers (scientists, in our context of citizen science). In a follow-up study, we show, using human experts and supervised machine learning techniques, that substantial precision gains on instance-based data can be achieved with post-processing. We conclude by discussing the benefits and limitations of different information quality and data collection design choice for information quality in crowdsourced user-generated content.
Conference Paper
Full-text available
With decades of contributions and applications, conceptual modeling is very well-recognized in information systems engineering. However, the importance and relevance of conceptual modeling is less well understood in other disciplines. This paper, through an analysis of existing research and expert opinions, proposes a reference framework for conceptual modeling to help researchers and practitioners position their work in the field, facilitate discussion among researchers in the field, and help researchers and practitioners in other fields understand what the field of conceptual modeling has to offer as well as contribute to its continued, extended influence in multiple domains.
Conference Paper
Full-text available
In recent years, there has been a growth in the use of reference conceptual models, in general, and domain ontologies, in particular, to capture information about complex and critical domains. These models play a fundamental role in different types of critical semantic interoperability tasks. Therefore, it is essential that domain experts are able to understand and reason using the models' content. In other words, it is important that conceptual models are cognitively tractable. However, it is unavoidable that when the information of the represented domain grows, so does the size and complexity of the artifacts and models that represent them. For this reason, more sophisticated techniques for complexity management in ontology-driven conceptual models, need to be developed. Some approaches are based on the notion of model modularization. In this paper, we follow the work on model modularization to present an approach for view extraction for the ontology-driven conceptual modeling language OntoUML. We provide a formal definition for ontological views over OntoUML conceptual models that completely leverages on the ontologically well-grounded real-world semantics of that language. Moreover, we present a plug-in tool, particularly developed for an OntoUML model-based editor that implements this formal view structure in terms of queries defined over the OntoUML metamodel embedded in that tool.
Article
Full-text available
Web 2.0 and Big Data tools can be used todevelop knowledge management systems based on facili-tating the participation and collaboration of people in orderto enhance knowledge. The paper presents a methodologythat can help organizations with the use of Web 2.0 and BigData tools to discover, gather, manage and apply theirknowledge by making the process of implementi ng aknowledge management system faster and simpler. First,an initial version of the methodology was developed and itwas then applied to an oil and gas company in order toanalyze and refine it. The results obta ined show theeffectiveness of the methodology, since it helped thiscompany to carry out the implementation quickly andeffectively, thereby allowing the company to gain themaximum benefits from existi ng knowledge.
Conference Paper
Full-text available
Many repositories of open data for genomics, collected by world-wide consortia, are important enablers of biological research; moreover, all experimental datasets leading to publications in genomics must be deposited to public repositories and made available to the research community. These datasets are typically used by biologists for validating or enriching their experiments; their content is documented by metadata. However, emphasis on data sharing is not matched by accuracy in data documentation; metadata are not standardized across the sources and often unstructured and incomplete.
Conference Paper
Full-text available
In competitive markets, companies need well-designed business strategies if they seek to grow and obtain sustainable competitive advantage. At the core of a successful business strategy there is a carefully crafted value proposition, which ultimately defines what a company delivers to its customers. Despite their widely recognized importance, there is however little agreement on what exactly value propositions are. This lack of conceptual clarity harms the communication among stakeholders and the harmonization of current business strategy theories and strategy support frameworks. Furthermore , it hinders the development of systematic methodologies for crafting value propositions, as well as adequate support for representing and analyzing them. In this paper, we present an ontological analysis of value propositions based on a review of most relevant business and marketing theories and on previous work on value ascription, grounded in the Unified Foundational Ontology (UFO). Our investigation clarifies how value propositions are different from value presentations, and shows the difference between value propositions at the business level from those related to specific offerings.
Article
Full-text available
A smart grid is an intelligent electricity grid that optimizes the generation, distribution and consumption of electricity through the introduction of Information and Communication Technologies on the electricity grid. In essence, smart grids bring profound changes in the information systems that drive them: new information flows coming from the electricity grid, new players such as decentralized producers of renewable energies, new uses such as electric vehicles and connected houses and new communicating equipments such as smart meters, sensors and remote control points. All this will cause a deluge of data that the energy companies will have to face. Big Data technologies offers suitable solutions for utilities, but the decision about which Big Data technology to use is critical. In this paper, we provide an overview of data management for smart grids, summarise the added value of Big Data technologies for this kind of data, and discuss the technical requirements, the tools and the main steps to implement Big Data solutions in the smart grid context.
Article
Full-text available
Building proper reference ontologies is a hard task. There are a number of methods and tools that traditionally have been used to support this task. These include the use of foundational theories, the reuse of domain and core ontologies, the adoption of development methods, as well as the support of proper software tools. In this context, an approach that has gained increasing attention in recent years is the systematic application of ontology patterns. However, a pattern-based approach to ontology engineering requires: the existence of a set of suitable patterns that can be reused in the construction of new ontologies; a proper methodological support for eliciting these patterns, as well as for applying them in the construction of these new models. The goal of this paper is twofold: (i) firstly, we present an approach for deriving conceptual ontology patterns from ontologies. These patterns are derived from ontologies of different generality levels, ranging from foundational to domain ontologies; (ii) secondly, we present guidelines that describe how these patterns can be applied in combination for building reference domain ontologies in a reuse-oriented process. In summary, this paper is about the construction of ontology patterns from ontologies, as well as the construction of ontologies from ontology patterns.
Article
Full-text available
This paper describes a long-term research program on developing ontological foundations for conceptual modeling. This program, organized around the theoretical background of the foundational ontology UFO (Unified Foundational Ontology), aims at developing theories, methodologies and engineering tools with the goal of advancing conceptual modeling as a theoretically sound discipline but also one that has concrete and measurable practical implications. The paper describes the historical context in which UFO was conceived, briefly discusses its stratified organization, and reports on a number of applications of this foundational ontology over more than a decade. In particular, it discusses the most successful application of UFO, namely, the development of the conceptual modeling language OntoUML. The paper also discusses a number of methodological and computational tools, which have been developed over the years to support the OntoUML community. Examples of these methodological tools include ontological patterns and anti-patterns; examples of these computational tools include automated support for pattern-based model construction, formal model verification, formal model validation via visual simulation, model verbalization, code generation and anti-pattern detection and rectification. In addition, the paper reports on a variety of applications in which the language as well as its associated tools have been employed to engineer models in several institutional contexts and domains. Finally, it reflects on some of these lessons learned by observing how OntoUML has been actually used in practice by its community and on how these have influenced both the evolution of the language as well as the advancement of some of the core ontological notions in UFO.
Article
Full-text available
Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.
Article
Full-text available
The construction of large-scale reference conceptual models is a complex engineering activity. To develop high-quality models, a modeler must have the support of expressive engineering tools such as theoretically well-founded modeling languages and methodologies, patterns and anti-patterns and automated supporting environments. This paper proposes a set of Ontological Anti-Patterns for Ontology-Driven Conceptual Modeling. These anti-patterns capture error-prone modeling decisions that can result in the creation of models that fail to exclude unintended model instances (representing unintended state of affairs) or forbid intended ones (representing intended states of affairs). The anti-patterns presented here have been empirically elicited through an approach of conceptual models validation via visual simulation. The paper also presents a series of refactoring plans for rectifying the models in which these anti-patterns occur. In addition, we present here a computational tool that is able to: automatically identify these anti-patterns in user’s models, guide users in assessing their consequences, and generate corrections to these models by the automatic inclusion of OCL constraints implementing the proposed refactoring plans. Finally, the paper also presents an empirical study for assessing the harmfulness of each of the uncovered anti-patterns (i.e., the likelihood that its occurrence in a model entails unintended consequences) as well as the effectiveness of the proposed refactoring plans.
Conference Paper
Full-text available
Accessing the relevant data in Big Data scenarios is increasingly difficult both for end-user and IT-experts, due to the volume, variety, and velocity dimensions of Big Data.This brings a hight cost overhead in data access for large enterprises. For instance, in the oil and gas industry, IT-experts spend 30-70% of their time gathering and assessing the quality of data [1]. The Optique project ( http://www.optique-project.eu/ ) advocates a next generation of the well known Ontology-Based Data Access (OBDA) approach to address the Big Data dimensions and in particular the data access problem. The project aims at solutions that reduce the cost of data access dramatically.
Conference Paper
Assuring anomaly-free business process executions is a key challenge for many organizations. Traditional techniques address this challenge using prior knowledge about anomalous cases that is seldom available in real-life. In this work, we propose the usage of word2vec encoding and One-Class Classification algorithms to detect anomalies by relying on normal behavior only. We investigated 6 different types of anomalies over 38 real and synthetics event logs, comparing the predictive performance of Support Vector Machine, One-Class Support Vector Machine, and Local Outlier Factor. Results show that our technique is viable for real-life scenarios, overcoming traditional machine learning for a wide variety of settings where only the normal behavior can be labeled.
Article
The International Conference on Conceptual Modeling celebrated 40 years of existence at its 38th edition held in Salvador, Brazil, on 4–7 November 2019. As one of the most traditional and well-known conferences in the database area, it has its origins on the Entity-Relationship Model proposed by Peter P. Chen in 1975. To celebrate such an accomplishment, this article goes over the ER history from distinct perspectives. Overall, we investigate the complete ER collaboration network built on bibliographic data collected from DBLP, comprising its 38 editions held from 1979 to 2019. We analyze several aspects regarding the evolution of its network metrics, such as degree, clustering coefficient and average shortest path, over the four decades. In particular, we analyze the role of the most engaged ER authors, the number of distinct authors, institutions and published papers, and the evolution of some of the most frequent terms presented in the titles of its papers, as well as the influence and impact of the prominent ER authors.
Chapter
Choosing the right Visualization techniques is critical in Big Data Analytics. However, decision makers are not experts on visualization and they face up with enormous difficulties in doing so. There are currently many different (i) Big Data sources and also (ii) many different visual analytics to be chosen. Every visualization technique is not valid for every Big Data source and is not adequate for every context. In order to tackle this problem, we propose an approach, based on the Model Driven Architecture (MDA) to facilitate the selection of the right visual analytics to non-expert users. The approach is based on three different models: (i) a requirements model based on goal-oriented modeling for representing information requirements, (ii) a data representation model for representing data which will be connected to visualizations and, (iii) a visualization model for representing visualization details regardless of their implementation technology. Together with these models, a set of transformations allow us to semi-automatically obtain the corresponding implementation avoiding the intervention of the non-expert users. In this way, the great advantage of our proposal is that users no longer need to focus on the characteristics of the visualization, but rather, they focus on their information requirements and obtain the visualization that is better suited for their needs. We show the applicability of our proposal through a case study focused on a tax collection organization from a real project developed by the Spin-off company Lucentia Lab.
Article
The number of applications being developed that require access to knowledge about the real world has increased rapidly over the past two decades. Domain ontologies, which formalize the terms being used in a discipline, have become essential for research in areas such as Machine Learning, the Internet of Things, Robotics, and Natural Language Processing, because they enable separate systems to exchange information. The quality of these domain ontologies, however, must be ensured for meaningful communication. Assessing the quality of domain ontologies for their suitability to potential applications remains difficult, even though a variety of frameworks and metrics have been developed for doing so. This article reviews domain ontology assessment efforts to highlight the work that has been carried out and to clarify the important issues that remain. These assessment efforts are classified into five distinct evaluation approaches and the state of the art of each described. Challenges associated with domain ontology assessment are outlined and recommendations are made for future research and applications.
Article
In big data analytics, advanced analytic techniques operate on big datasets aimed at complementing the role of traditional OLAP for decision making. To enable companies to take benefit of these techniques despite the lack of in-house technical skills, the H2020 TOREADOR Project adopts a model-driven architecture for streamlining analysis processes, from data preparation to their visualization. In this article, we propose a new approach named SkyViz focused on the visualization area, in particular on (1) how to specify the user’s objectives and describe the dataset to be visualized, (2) how to translate this specification into a platform-independent visualization type, and (3) how to concretely implement this visualization type on the target execution platform. To support step (1), we define a visualization context based on seven prioritizable coordinates for assessing the user’s objectives and conceptually describing the data to be visualized. To automate step (2), we propose a skyline-based technique that translates a visualization context into a set of most suitable visualization types. Finally, to automate step (3), we propose a skyline-based technique that, with reference to a specific platform, finds the best bindings between the columns of the dataset and the graphical coordinates used by the visualization type chosen by the user. SkyViz can be transparently extended to include more visualization types on one hand, more visualization coordinates on the other. The article is completed by an evaluation of SkyViz based on a case study excerpted from the pilot applications of the TOREADOR Project.
Article
The era of big data provides many opportunities for conducting impactful research from both data-driven and theory-driven perspectives. However, data-driven and theory-driven research have progressed somewhat independently. In this paper, we develop a framework that articulates important differences between these two perspectives and proposes a role for information systems research at their intersection. The framework presents a set of pathways that combine the data-driven and theory-driven perspectives. From these pathways, we derive a set of challenges, and show how they can be addressed by research in information systems. By doing so, we identify an important role that information systems research can play in advancing both data-driven and theory-driven research in the era of big data.
Article
The ability of a user to select an appropriate, high-quality domain ontology from a set of available options would be most useful in knowledge engineering and other intelligent applications. Doing so, however, requires good quality assessment metrics as well as automated support when there is a large number of ontologies from which to make a selection. This research analyzes existing metrics for domain ontology evaluation and extends them to derive a Layered Ontology Metrics Suite based on semiotic theory. The metrics are implemented in a Domain Ontology Ranking System (DoORS) prototype, the purpose of which is to search an ontology library for specific terms to retrieve candidate domain ontologies and then assess their quality and suitability based upon the suite of metrics. The prototype system is compared to existing approaches to automated ontology quality ranking to illustrate the usefulness of the research.
Article
Data visualization is a common and effective technique for data exploration. However, for complex data, it is infeasible for an analyst to manually generate and browse all possible visualizations for insights. That motivated the need for automated solutions that can effectively recommend such visualizations. The main idea underlying those solutions is to evaluate the utility of all possible visualizations and then recommend the top-k visualizations. This process incurs high data-processing cost, that is further aggravated by the presence of numerical dimensional attributes. To address that challenge, we propose novel view recommendation schemes that are based on numerical dimensions. These schemes incorporate a hybrid multi-objective utility function, which captures the impact of numerical dimension attributes. The underlying premise of our first scheme Multi-Objective View Recommendation for Data Exploration (MuVE) is to use an incremental evaluation of the multi-objective utility function. The technique allow pruning of a large number of low utility views and unnecessary objective evaluations. Our second scheme upper MuVE (uMuVE), further improves the pruning power by setting the upper bounds on the utility of views and allowing interleaved processing of views, at the expense of high memory-usage. Our third scheme Memory-aware uMuVE (MuMuVE), provides pruning power close to that of uMuVE, while keeping memory-usage within limited space.
Conference Paper
The full potential of Big Data Analytics (BDA) can be unleashed only by overcoming hurdles like the high architectural complexity and lack of transparency of Big Data toolkits, as well as the high cost and lack of legal clearance of data collection, access and processing procedures. We first discuss the notion of Big Data Analytics-as-a-Service (BDAaaS) to help potential users of BDA in overcoming such hurdles. We then present TOREADOR, a first approach to BDAaaS.
Conference Paper
The data warehouse design methodologies require a novel approach in the Big Data context, because the methodologies have to provide solutions to face the issues related to the 5 Vs (Volume, Velocity, Variety, Veracity, and Value). So it is mandatory to support the designer through automatic techniques able to quickly produce a multidimensional schema using and integrating several data sources, which can be also unstructured and, therefore, need an ontology-based reasoning. Accordingly, the methodologies have to adopt agile techniques, in order to change the multidimensional schema as the business requirements change, without a complete design process. Furthermore, hybrid approaches must be used instead of the traditional data-driven or requirement-driven approaches, in order to avoid missing the adhesion to user requirements and to produce a valuable multidimensional schema compliant with data sources. In the paper, we perform a metric comparison among different methodologies, in order to demonstrate that methodologies classified as hybrid, ontology-based, automatic, and agile are tailored for the Big Data context.
Article
Domain ontologies and conceptual models similarly capture and represent concepts from the real world for inclusion in an information system. This paper examines challenges of conceptual modeling and domain ontology development when mapping to high-level ontologies. The intent is to reconcile apparent differences and position some of the inherent challenges in these closely-coupled areas of research, while providing insights into recognizing and resolving modeling difficulties.
Chapter
Foundational ontologies provide the basic concepts upon which any domain-specific ontology is built. This chapter presents a new foundational ontology, UFO, and shows how it can be used as a guideline in business modeling and for evaluating business modeling methods. UFO is derived from a synthesis of two other foundational ontologies, GFO/GOL and OntoClean/DOLCE. While their main areas of application are natural sciences and linguistics/cognitive engineering, respectively, the main purpose of UFO is to provide a foundation for conceptual modeling, including business modeling.
Article
Purpose – The purpose of this paper is to present a new approach toward automatically visualizing Linked Open Data (LOD) through metadata analysis. Design/methodology/approach – By focussing on the data within a LOD dataset, the authors can infer its structure in a much better way than current approaches, generating more intuitive models to progress toward visual representations. Findings – With no technical knowledge required, focussing on metadata properties from a semantically annotated dataset could lead to automatically generated charts that allow to understand the dataset in an exploratory manner. Through interactive visualizations, users can navigate LOD sources using a natural approach, in order to save time and resources when dealing with an unknown resource for the first time. Research limitations/implications – This approach is suitable for available SPARQL endpoints and could be extended for resource description framework dumps loaded locally. Originality/value – Most works dealing with LOD visualization are customized for a specific domain or dataset. This paper proposes a generic approach based on traditional data visualization and exploratory data analysis literature.
Conference Paper
Big data is characterized by volume, variety, velocity, and veracity. We should expect conceptual modeling to provide some answers since its historical perspective has always been about structuring information—making its volume searchable, harnessing its variety uniformly, mitigating its velocity with automation, and checking its veracity with application constraints. We provide perspectives about how conceptual modeling can “come to the rescue” for many big-data applications by handling volume and velocity with automation, by inter-conceptual-model transformations for mitigating variety, and by conceptualized constraint checking for increasing veracity.
Article
Ontologies have been less successful than they could be in large-scale business applications due to a wide variety of interpretations. This leads to confusion, and consequently, people from various research communities use the term with different – sometimes incompatible – meanings. This research work analyzes and clarifies the term ontology and points out its difference from taxonomy. By way of two business case studies, both their potential in ontological engineering and the perceived requirements for ontologies are highlighted, and their misuse in research and business is discussed. In order to examine the case for applying ontologies in a specific domain or use case, the main benefits of using ontologies are defined and categorized as technical-centered or user-centered. Key factors that influence the use of ontologies in business applications are derived and discussed. Finally, the paper offers a recommendation for efficiently applying ontologies, including adequate representation languages and an ontological engineering process supported by reference ontologies. To answer the questions of when ontologies should be used, how they can be used efficiently, and when they should not be used, we propose guidelines for selecting an appropriate model, methodology, and tool set to meet customer requirements while making most efficient use of resources.
Conference Paper
In this research, we propose generic requirement model for Big Data application. The model offer the use of i× Framework and Knowledge Acquisition automated System (KAOS), which are part of Goal Oriented Requirement Engineering (GORE). Big Data applications handle flood of data that occurs from anything such as climate data, genomes, even just software logs or facebook status. To build such applications demands gathering special requirements specific for Big Data. A generic requirement model is proposed using i× and KAOS model. The model is constructed from analyzing the requirements based on the characteristics of Big Data and its challenges. This generic model is then applied to a case study in an Indonesian's government agency for development planning of West Java (UPTB Pusdalisbang Jawa Barat). The result of this application has demonstrated that the model can be used to generate a valid software requirement specification.
Conference Paper
Understanding the Human Genome is currently a significant challenge. Having a Conceptual Schema of Human Genome (CSHG) is in this context a first step to link a sound Information Systems Design approach with Bioinformatics. But this is not enough. The use of an adequate ontological commitment is essential to fix the real-world semantics of the analyzed domain. Starting from a concrete proposal for CSHG, the main goal of this paper is to apply the principles of a foundational ontology, as it is UFO, to make explicit the ontological commitments underlying the concepts represented in the Conceptual Schema. As demonstrated in the paper, this ontological analysis is also able to highlight some conceptual drawbacks present in the initial version of the CSHG.
Article
Conceptual modeling continues to evolve as researchers and practitioners reflect on the challenges of modeling and implementing data-intensive problems that appear in business and in science. These challenges of data modeling and representation are well-recognized in contemporary applications of big data, ontologies, and semantics, along with traditional efforts associated with methodologies, tools, and theory development. This introduction contains a review of some current research in conceptual modeling and identifies emerging themes. It also introduces the articles that comprise this special issue of papers from the 32nd International Conference on Conceptual Modeling (ER 2013).
Article
In this Introduction we shall sketch a profile of our field of inquiry. This is necessary because semantics is too often mistaken for lexicography and therefore dismissed as trivial, while at other times it is disparaged for being concerned with reputedly shady characters such as meaning and allegedly defunct ones like truth. Moreover our special concern, the semantics of science, is a newcomer - at least as a systematic body - and therefore in need of an introduction. l. GOAL Semantics is the field of inquiry centrally concerned with meaning and truth. It can be empirical or nonempirical. When brought to bear on concrete objects, such as a community of speakers, semantics seeks to answer problems concerning certain linguistic facts - such as disclosing the interpretation code inherent in the language or explaning the speakers' ability or inability to utter and understand new sentences ofthe language. This kind of semantics will then be both theoretical and experimental: it will be a branch of what used to be called 'behavioral science'.
Article
Conceptual modelling in information systems development is the creation of an enterprise model for the purpose of designing the information system. It is an important aspect of systems analysis. The value of a conceptual modelling language (CML) lies in its ability to capture the relevant knowledge about a domain. To determine which constructs should be included in a CML it would be beneficial to use some theoretical guidelines. However, this is usually not done. The purpose of this paper is to promote the idea that theories related to human knowledge can be used as foundations for conceptual modelling in systems development. We suggest the use of ontology, concept theory, and speech act theory. These approaches were chosen because: (1) they deal with important and different aspects relevant to conceptual modelling and (2) they have already been used in the context of systems analysis. For each approach we discuss: the rationale for its use, its principles, its application to conceptual modelling, and its limitations. We also demonstrate the concepts of the three approaches by analysing an example. The analysis also serves to show how each approach deals with different aspects of modelling.