Conference Paper

ARDI: Automatic Generation of RDFS Models from Heterogeneous Data Sources

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The current wealth of information, typically known as Big Data, generates a large amount of available data for organisations. Data Integration provides foundations to query disparate data sources as if they were integrated into a single source. However, current data integration tools are far from being useful for most organisations due to the heterogeneous nature of data sources, which represents a challenge for current frameworks. To enable data integration of highly heterogeneous and disparate data sources, this paper proposes a method to extract the schema from semi-structured (such as JSON and XML) and structured (such as relational) data sources, and generate an equivalent RDFS representation. The output of our method complements current frameworks and reduces the manual workload required to represent the input data sources in terms of the integration canonical data model. Our approach consists of production rules at the meta-model level that guarantee the correctness of the model translations. Finally, a tool for implementing our approach has been developed.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... New data sources are expected to be accompanied by minimum necessary metadata to automatically access, query, and parse their data. Such metadata can be either provided by the data source itself or automatically extracted using available bootstrapping techniques (e.g., [32]). The framework supports both incremental and historical (complete) data ingestion, accommodating thus a wide variety of source access schemes. ...
Article
Full-text available
The ability to cross data from multiple sources represents a competitive advantage for organizations. Yet, the governance of the data lifecycle, from the data sources into valuable insights, is largely performed in an ad-hoc or manual manner. This is specifically concerning in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data. To overcome this challenge, we develop a framework for operationalizing and automating data governance. For the first, we propose a zoned data lake architecture and a set of data governance processes that allow the systematic ingestion, transformation and integration of data from heterogeneous sources, in order to make them readily available for business users. For the second, we propose a set of metadata artifacts that allow the automatic execution of data governance processes, addressing a wide range of data management challenges. We showcase the usefulness of the proposed approach using a real world use case, stemming from the collaborative project with the World Health Organization for the management and analysis of data about Neglected Tropical Diseases. Overall, this work contributes on facilitating organizations the adoption of data-driven strategies into a cohesive framework operationalizing and automating data governance.
... Nosúltimos anos, o crescimento de dados na Web, principalmente aqueles publicados no padrão RDF (Resource Description Framework), propiciou análises de dados a partir de interligações entre diferentes conjuntos de dados, permitindo que organizações tenham suas decisões sustentadas em dados [Tadesse et al. 2019]. Por um lado, alguns trabalhos apresentam abordagens de interligações entre conjuntos de dados através, por exemplo, do grau de similaridade entre seus recursos com base na sintaxe [Avelino et al. 2020]. ...
Conference Paper
Full-text available
O crescimento de conjuntos de dados disponíveis na Web que utilizam o padrão RDF propicia análises de dados que envolvem múltiplas dimensões. Segundo a W3C, um dos recursos para analisar dados multidimensionais é a utilização do vocabulário RDF Data Cube. Contudo ainda há uma carência de instrumentos de apoio para aplicação deste vocabulário em conjuntos de dados. Nesse sentido, este artigo propõe o INTEGRACuBe, um ambiente que utiliza um metaesquema e mecanismos semiautomatizados para apoiar o mapeamento de recursos de dados ao metamodelo RDF Data Cube. Como resultado, será possível a exploração de dados analíticos em RDF. Adicionalmente, um estudo de caso é apresentado no cenário de Gerência de Desenvolvimento de Software.
... Given a source data model (e.g., relational, key/value, documents) and its specific format (e.g., JSON, XML, CSV), the Wrapper Designer proposes wrappers that deal with variety in the sources and convert their data into a common unified model (see Section 5.1). Such wrappers allow the other components in the data integration platform to access the data in a unified manner (Hewasinghage et al. 2018 Inf Syst Front schema), but it primarily works over available data instances to incrementally extract schema information using model driven transformations (Tadesse et al. 2019). Such source schema information is expressed in the source graph. ...
Article
Full-text available
Obtaining valuable insights and actionable knowledge from data requires cross-analysis of domain data typically coming from various sources. Doing so, inevitably imposes burdensome processes of unifying different data formats, discovering integration paths, and all this given specific analytical needs of a data analyst. Along with large volumes of data, the variety of formats, data models, and semantics drastically contribute to the complexity of such processes. Although there have been many attempts to automate various processes along the Big Data pipeline, no unified platforms accessible by users without technical skills (like statisticians or business analysts) have been proposed. In this paper, we present a Big Data integration platform (Quarry) that uses hypergraph-based metadata to facilitate (and largely automate) the integration of domain data coming from a variety of sources, and provides an intuitive interface to assist end users both in: (1) data exploration with the goal of discovering potentially relevant analysis facets, and (2) consolidation and deployment of data flows which integrate the data, and prepare them for further analysis (descriptive or predictive), visualization, and/or publishing. We validate Quarry’s functionalities with the use case of World Health Organization (WHO) epidemiologists and data analysts in their fight against Neglected Tropical Diseases (NTDs).
Article
Full-text available
Constructing ontologies from relational databases is an active research topic in the Semantic Web domain. While conceptual mapping rules/principles of relational databases and ontology structures are being proposed, several software modules or plug-ins are being developed to enable the automatic conversion of relational databases into ontologies. However, the correlation between the resulting ontologies built automatically with plug-ins from relational databases and the database-to-ontology mapping principles has been given little attention. This study reviews and applies two Prot\'eg\'e plug-ins, namely, DataMaster and OntoBase to automatically construct ontologies from a relational database. The resulting ontologies are further analysed to match their structures against the database-to-ontology mapping principles. A comparative analysis of the matching results reveals that OntoBase outperforms DataMaster in applying the database-to-ontology mapping principles for automatically converting relational databases into ontologies.
Article
Full-text available
The aggregation of heterogeneous data from different institutions in cultural heritage and e-science has the potential to create rich data resources useful for a range of different purposes, from research to education and public interests. In this paper, we present the X3ML framework, a framework for information integration that handles effectively and efficiently the steps involved in schema mapping, uniform resource identifier (URI) definition and generation, data transformation, provision and aggregation. The framework is based on the X3ML mapping definition language for describing both schema mappings and URI generation policies and has a lot of advantages when compared with other relevant frameworks. We describe the architecture of the framework as well as details on the various available components. Usability aspects are discussed and performance metrics are demonstrated. The high impact of our work is verified via the increasing number of international projects that adopt and use this framework.
Conference Paper
Full-text available
Ontologies have recently became a popular mechanism to expose relational database (RDBs) due to their ability to describe the domain of data in terms of classes and properties that are clear to domain experts. Ontological terms are related to the schema of the underlying databases with the help of mappings, i.e., declarative specifications associating SQL queries to ontological terms. Developing appropriate ontologies and mappings for given RDBs is a challenging and time consuming task. In this work we present BootOX, a system that aims at facilitating ontology and mapping development by their automatic extraction (i.e., bootstrapping) from RDBs, and our experience with the use of BootOX in industrial and research contexts. BootOX has a number of advantages: it allows to control the OWL 2 profile of the output ontologies, bootstrap complex and provenance mappings, which are beyond the W3C direct mapping specification. Moreover, BootOX allows to import pre-existing ontologies via alignment.
Article
Full-text available
A huge repository of terabytes of data is generated each day from modern information systems and digital technolo-gies such as Internet of Things and cloud computing. Analysis of these massive data requires a lot of efforts at multiple levels to extract knowledge for decision making. Therefore, big data analysis is a current area of research and development. The basic objective of this paper is to explore the potential impact of big data challenges, open research issues, and various tools associated with it. As a result, this article provides a platform to explore big data at numerous stages. Additionally, it opens a new horizon for researchers to develop the solution, based on the challenges and open research issues.
Article
Full-text available
The lack of semantic richness is one of the biggest drawbacks of the data stored in the classic relational databases (RDB). This paper provides a method that gives a meaning to these data to serve the semantic web. This method allows a direct and automatic conversion of RDB to ontology, it operates on two levels: The first one is based on the principle of reverse engineering; its purpose is to extract the RDB schema and convert it directly to an ontology model (TBOX). The second level aims to populate the ontology by individuals (ABOX) using data of different records of the RDB and basing on the model of the ontology. Our approach takes into account the relationships established via foreign keys between tables, and the semantic of integrity constraints during the conversion, which allows keeping the consistency and integrity of data.
Article
Full-text available
RDF (Resource Description Framework) and RDF Schema (collectively called RDF(S)) are the normative language to describe the Web resource information. How to construct RDF(S) from the existing data sources is becoming an important research issue. In particular, UML (Unified Modeling Language) is being widely applied to data modeling in many application domains, and how to construct RDF(S) from the existing UML models becomes an important issue to be solved in the context of Semantic Web. By comparing and analyzing the characteristics of UML and RDF(S), this paper proposes an approach for constructing RDF(S) from UML and implements a prototype construction tool. First, we give the formal definitions of UML and RDF(S). After that, a construction approach from UML to RDF(S) is proposed, a construction example is provided, and the analyses and discussions about the approach are done. Further, based on the proposed approach, a prototype construction tool is implemented, and the experiment shows that the approach and the tool are feasible.
Article
Full-text available
Conference Paper
Full-text available
Semantic Data Mining refers to the data mining tasks that systematically incorporate domain knowledge, especially formal semantics, into the process. In the past, many research efforts have attested the benefits of incorporating domain knowledge in data mining. At the same time, the proliferation of knowledge engineering has enriched the family of domain knowledge, especially formal semantics and Semantic Web ontologies. Ontology is an explicit specification of conceptualization and a formal way to define the semantics of knowledge and data. The formal structure of ontology makes it a nature way to encode domain knowledge for the data mining use. In this survey paper, we introduce general concepts of semantic data mining. We investigate why ontology has the potential to help semantic data mining and how formal semantics in ontologies can be incorporated into the data mining process. We provide detail discussions for the advances and state of art of ontology-based approaches and an introduction of approaches that are based on other form of knowledge representations.
Article
Full-text available
In this paper we describe how UML schemes can be converted into OWL Ontology, thus enabling reasoning on them by Semantic Web applications. The proposed solution is based on a three phases approach, the first step is to present the class diagram in the mathematical formulation and the second one is converting the UML Class into encoded text file, finally, the structure of the classification scheme is converted into OWL ontology. We demonstrate the practical applicability of our approach by showing how the results of reasoning on these OWL ontology can help improve the Web systems.
Conference Paper
Full-text available
JSON has become a very popular lightweigth format for data exchange. JSON is human readable and easy for computers to parse and use. However, JSON is schemaless. Though this brings some benefits (e.g., flexibility in the representation of the data) it can become a problem when consuming and integrating data from different JSON services since developers need to be aware of the structure of the schemaless data. We believe that a mechanism to discover (and visualize) the implicit schema of the JSON data would largely facilitate the creation and usage of JSON services. For instance, this would help developers to understand the links between a set of services belonging to the same domain or API. In this sense, we propose a model-based approach to generate the underlying schema of a set of JSON documents.
Article
Full-text available
With the development of web services, the retrieval of relevant services has become a challenge. Keyword based discovery mechanism is insufficient, due to retrieval of large amount of irrelevant information. Here we propose a semantic search engine to conquer this problem. Semantics with meaning and use of data brings information closer to human thinking and decision making. In this work, we have described voice enabled ontology based search engine for blind. Here we use conceptual based search engine. This search receives the user query through voice and converts them into text with the help of software; this text is progressed in the ontology list and produces the exact result using semantic web. The displayed text is converted into voice.
Article
Full-text available
Full implementation of the Semantic Web requires widespread availability of OWL ontologies. Manual ontology development using current OWL editors remains a tedious and cumbersome task that requires significant understanding of the new ontology language and can easily result in a knowledge acquisition bottleneck. On the other hand, abundant domain knowledge has been specified by existing database schemata such as UML class diagrams. Thus developing an automatic tool for extracting OWL ontologies from existing UML class diagrams is helpful to Web ontology development. In this paper we propose an automatic, semantics-preserving approach for extracting OWL ontologies from existing UML class diagrams. This approach establishes a precise conceptual correspondence between UML and OWL through a semantics-preserving schema translation algorithm. The experiments with our implemented prototype tool, UML2OWL, show that the proposed approach is effective and a fully automatic ontology extraction is achievable. The proposed approach and tool will facilitate the development of Web ontologies and the realization of semantic interoperations between existing Web database applications and the Semantic Web.
Article
Full-text available
Relational databases are considered one of the most popular storage solutions for various kinds of data and they have been recognized as a key factor in generating huge amounts of data for Semantic Web applications. Ontologies, on the other hand, are one of the key concepts and main vehicle of knowledge in the Semantic Web research area. The problem of bridging the gap between relational databases and ontologies has attracted the interest of the Semantic Web community, even from the early years of its existence and is commonly referred to as the database-to-ontology mapping problem. However, this term has been used interchangeably for referring to two distinct problems: namely, the creation of an ontology from an existing database instance and the discovery of mappings between an existing database instance and an existing ontology. In this paper, we clearly define these two problems and present the motivation, benefits, challenges and solutions for each one of them. We attempt to gather the most notable approaches proposed so far in the literature, present them concisely in tabular format and group them under a classification scheme. We finally explore the perspectives and future research steps for a seamless and meaningful integration of databases into the Semantic Web.
Article
Full-text available
As Semantic Web technologies are getting ma-ture, there is a growing need for RDF applica-tions to access the content of huge, live, non-RDF, legacy databases without having to replicate the whole database into RDF. In this poster, we present D2RQ, a declarative lan-guage to describe mappings between applica-tion-specific relational database schemata and RDF-S/OWL ontologies. D2RQ allows RDF applications to treat non-RDF relational data-bases as virtual RDF graphs, which can be queried using RDQL.
Article
Full-text available
Enterprises Information Systems (EIS) have been applied for decades in Computer-Aided Engineering (CAE) and Computer-Aided Design (CAD), where huge amount of increasing data is stored in the heterogeneous and distributed systems. As systems evaluating, system redesign and reengineering are demanded. A facing challenge is how to interoperate among different systems by overcoming the gap of conceptual heterogeneity. In this article, an enlarged data representation called semantic information layer (SIL) is described for facilitating heterogeneous systems interoperable. SIL plays a role as mediation media and knowledge representation among heterogeneous systems. The SIL building process is based on ontology engineering, including ontology extraction from relational database (RDB), ontology enrichment and ontology alignment. Mapping path will maintain the links between SIL and data source, and query implementation and user interface are for retrieving data and interacting with end users. We described fully a practical ontology-driven framework for building SIL and introduced extensively relevant standards and techniques for implementing the framework. In the core part of ontology development, a dynamic multi-strategies ontology alignment with automatic matcher selection and dynamic similarity aggregation is proposed. A demonstration case study in the scenario of mobile phone industry is used to illustrate the proposed framework.
Conference Paper
Full-text available
In this paper, we present LogMap--a highly scalable ontology matching system with 'built-in' reasoning and diagnosis capabilities. To the best of our knowledge, LogMap is the only matching system that can deal with semantically rich ontologies containing tens (and even hundreds) of thousands of classes. In contrast to most existing tools, LogMap also implements algorithms for 'on the fly' unsatisfiability detection and repair. Our experiments with the ontologies NCI, FMA and SNOMED CT confirm that our system can efficiently match even the largest existing bio-medical ontologies. Furthermore, LogMap is able to produce a 'clean' set of output mappings in many cases, in the sense that the ontology obtained by integrating LogMap's output mappings with the input ontologies is consistent and does not contain unsatisfiable classes.
Conference Paper
Full-text available
As an emerging solution to the handling of complex and evolving software systems, Model Driven Engineering (MDE) is still very much in evolution. The industrial demand is quite high while the research answer for a sound set of foundation principles is still far from being stabilized. Therefore it is important to provide a current state of the art in MDE, describing what its origins are, what its present state is, and where it seems to be presently leading. One important question is how MDE relates to other contemporary technologies. This tutorial proposes the ”technical space” concept to this purpose. The two main objectives are to present first the basic MDE principles and second how these principles may be mapped onto modern platform support. Other issues that will be discussed are the applicability of these ideas, concepts, and tools to solve current practical problems. Various organizations and companies (OMG, IBM, Microsoft, etc.) are currently proposing several environments claiming to support MDE. Among these, the OMG MDATM(Model Driven Architecture) has a special place since it was historically one of the original proposals in this area. This work focuses on the identification of basic MDE principles, practical characteristics of MDE (direct representation, automation, and open standards), original MDE scenarios, and discussions of suitable tools and methods.
Book
Full-text available
When designing an information system, conceptual modeling is the activity that elicits and describes the general knowledge the system needs to know. This description, called the conceptual schema, is necessary in order to develop an information system. Recently, many researchers and professionals share a vision in which the conceptual schema becomes the only important description to be created, as the system implementation will be automatically constructed from its schema - this is e.g. the basic idea behind OMG's Model Driven Architecture. Olivé's textbook explains in detail the principles of conceptual modeling independently from particular methods and languages and shows how to apply them in real-world projects. He covers all aspects of the engineering process from structural modeling over behavioral modeling to meta-modeling, and completes the presentation with an extensive case study based on the osCommerce system, an online store-management software program freely available under the GNU General Public License. His presentation is based on well-known industry standards like UML and OCL as a particular conceptual modeling language, yet also delivers the basics of the formal logical language background. Written for computer science students in classes on information systems modeling as well as for professionals feeling the need to formalize their experiences or to update their knowledge, Olivé delivers here a comprehensive treatment of all aspects of the modeling process. His book is complemented by lots of exercises and additional online teaching material.
Article
Full-text available
Web Ontology Language (OWL) and Model-Driven Architectures (MDA) are two technologies being developed in parallel, but by different communities. They have common points and issues and can be brought closer together. Many authors have so far stressed this problem and have proposed several solutions. The result of these efforts is the recent OMG's initiative for defining an ontology development platform. However, the problem of transformation between ontology and MDA-based languages has been solved using rather partial and ad hoc solutions, most often by XSLT. In this paper we analyze OWL and MDA-compliant languages as separate technological spaces. In order to achieve a synergy between these technological spaces we define ontology languages in terms of MDA standards, recognize relations between OWL and MDA-based ontology languages, and propose mapping techniques. In order to illustrate the approach, we use an MDA-defined ontology architecture that includes ontology metamodel and ontology UML Profile. Based on this approach, we have implemented a transformation of the ontology UML Profile into OWL representation.
Chapter
The database (DB) landscape has been significantly diversified during the last decade, resulting in the emergence of a variety of non-relational (also called NoSQL) DBs, e.g., xml and json-document DBs, key-value stores, and graph DBs. To enable access to such data, we generalize the well-known ontology-based data access (OBDA) framework so as to allow for querying arbitrary data sources using sparql. We propose an architecture for a generalized OBDA system implementing the virtual approach. Then, to investigate feasibility of OBDA over non-relational DBs, we compare an implementation of an OBDA system over MongoDB, a popular json-document DB, with a triple store.
Article
Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.
Conference Paper
Recent state-of-the-art approaches and technologies for generating RDF graphs from non-RDF data, use languages designed for specifying transformations or mappings to data of various kinds of format. This paper presents a new approach for the generation of ontology-annotated RDF graphs, linking data from multiple heterogeneous streaming and archival data sources, with high throughput and low latency. To support this, and in contrast to existing approaches, we propose embedding in the RDF generation process a close-to-sources data processing and linkage stage, supporting the fast template-driven generation of triples in a subsequent stage. This approach, called RDF-Gen, has been implemented as a SPARQL-based RDF generation approach. RDF-Gen is evaluated against the latest related work of RML and SPARQL-Generate, using real world datasets.
Conference Paper
RDF aims at being the universal abstract data model for structured data on the Web. While there is effort to convert data in RDF, the vast majority of data available on the Web does not conform to RDF. Indeed, exposing data in RDF, either natively or through wrappers, can be very costly. Furthermore, in the emerging Web of Things, resource constraints of devices prevent from processing RDF graphs. Hence one cannot expect that all the data on the Web be available as RDF anytime soon. Several tools can generate RDF from non-RDF data, and transformation or mapping languages have been designed to offer more flexible solutions (GRDDL, XSPARQL, R2RML, RML, CSVW, etc.). In this paper, we introduce a new language, SPARQL-Generate, that generates RDF from: (i) a RDF Dataset, and (ii) a set of documents in arbitrary formats. As SPARQL-Generate is designed as an extension of SPARQL 1.1, it can provably: (i) be implemented on top on any existing SPARQL engine, and (ii) leverage the SPARQL extension mechanism to deal with an open set of formats. Furthermore, we show evidence that (iii) it can be easily learned by knowledge engineers that know SPARQL 1.1, and (iv) our first naive open source implementation performs better than the reference implementation of RML for big transformations.
Conference Paper
The field of data integration has expanded significantly over the years, from providing a uniform query and update interface to structured databases within an enterprise to the ability to search, ex- change, and even update, structured or unstructured data that are within or external to the enterprise. This paper describes the evolution in the landscape of data integration since the work on rewriting queries using views in the mid-1990's. In addition, we describe two important challenges for the field going forward. The first challenge is to develop good open-source tools for different components of data integration pipelines. The second challenge is to provide practitioners with viable solutions for the long-standing problem of systematically combining structured and unstructured data.
Article
A successfully repeated use case for Semantic Web technologies is Ontology-Based Data Access for data integration. In this approach, an ontology serves as a uniform conceptual federating model, which is accessible to both IT developers and business users. Here, two challenges for developing an OBDA system are considered: ontology and mapping engineering, along with a pay-as-you-go methodology that addresses these challenges and enables agility.
Article
We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL2QL and RDFS ontologies), and its support for all major relational databases.
Conference Paper
Despite the significant number of existing tools, incorporating data from multiple sources and different formats into the Linked Open Data cloud remains complicated. No mapping formalisation exists to define how to map such heterogeneous sources into RDF in an integrated and interoperable fashion. This paper introduces the RML mapping language, a generic language based on an extension over R2RML, the W3C standard for mapping relational databases into RDF. Broadening RML’s scope, the language becomes source-agnostic and extensible, while facilitating the definition of mappings of multiple heterogeneous sources. This leads to higher integrity within datasets and richer interlinking among resources.
Conference Paper
In this paper we present a transformation between UML class diagrams and OWL 2 ontologies. We specify the transformation on the M2 level using the QVT transformation language and the meta-models of UML and OWL 2. For this purpose we analyze similarities and differences between UML and OWL 2 and identify incompatible language features.
Article
This paper presents the working draft MOF™ (Meta-Object Facility) metamodels for the Resource Description Framework (RDF Schema) and the Web Ontology Language (OWL), two of the six metamodels currently envisioned for the Ontology Definition Metamodel (ODM) standards effort in the Object Management Group (OMG®), which enable model-driven development of RDF vocabularies and OWL ontologies, respectively. We provide insight into some of the design principles used in developing these metamodels, major challenges addressed to date, and the resolution of some of these issues that has influenced the resultant products. We also briefly review ongoing and future work needed to complete the subset of the ODM specific to these representation formalisms and fully support model driven development of RDF vocabularies and OWL ontologies. INTRODUCTION Over the course of the last five years, and more specifically, since the emergence of Semantic Web Activity from the World Wide Web Consortium (W3C) [1], the development of ontologies—explicit formal specifications of the concepts in a domain and relations among them [2]—has been moving from the research community to early adoption by industry. Increasing evidence of collaborative development of large, standardized controlled vocabularies and ontologies for specific applications and domains, such as in bioinformatics and pharmacogenomics research, is appearing in the literature. Broadly applicable general-purpose ontologies, for example those supporting the Semantic Web Services Initiative [3], are emerging as well.
Chapter
Data integration has been an important area of research for several years. However, such systems suffer from one of the main drawbacks of database systems: the need to invest significant modeling effort upfront. Dataspace support platforms (DSSP) envision a system that offers useful services on its data without any setup effort and that improves with time in a pay-as-you-go fashion. We argue that to support DSSPs, the system needs to model uncertainty at its core. We describe the concepts of probabilistic mediated schemas and probabilistic mappings as enabling concepts for DSSPs.
Article
This paper proposes a new approach to the schema translation problem. We deal with schemas whose metaschemas are instances of the OMG’s MOF. Most metaschemas can be defined as an instance of the MOF; therefore, our approach is widely applicable. We leverage the well-known object-oriented concepts embedded in the MOF and its instances (object types, attributes, relationship types, operations, IsA hierarchies, refinements, invariants, pre- and postconditions, etc.) to define metaschemas, schemas and their translations.The main contribution of our approach is the extensive use of object-oriented concepts in the definition of translation mappings, particularly the use of operations (and their refinements) and invariants, both of which are formalized in OCL. Our translation mappings can be used to check that two schemas are translations of each other, and to translate one into the other, in both directions. The translation mappings are declaratively defined by means of pre- and postconditions and invariants, and they can be implemented in any suitable language. From an implementation point of view, by taking a MOF-based approach we have a wide set of tools available, including tools that execute OCL. By way of example, we have defined all schemas and metaschemas in this paper and executed all the OCL expressions in the USE tool.
Article
The article presents information on the Unified Modeling Language (UML) in 2001. As the UML reaches the ripe age of four, both its proponents and its critics are scanning the recent changes in the UML 1.3 revision. In a relatively short period of time the UML has emerged as the software industry's dominant modeling UML is not only a de facto modeling language standard, it is fast becoming a de jure standard. Nearly two years ago the Object Management Group (OMG) adopted UML as its standard modeling language. As an approved Publicly Available Specification (PAS) submitter to the International Organization for Standardization (ISO), the OMG is proposing the UML specification for international standardization. This article explores how the UML is faring in the international standardization process. It assumes the reader is generally familiar with the use of UML, and instead focuses on the language's recent and future evolution. The processes and architectures for UML change management are examined, followed by discussion of how these processes and architectures were used in the recent minor revision of the language (UML 1.3), and how they may be applied in the next major revision (UML 2.0), which is tentatively scheduled to be completed in 2001. Factors contributing to the success of the UML will be assessed here, followed by speculation on its future.
Article
HP Labs developed the Jena toolkit to make it easier to develop applications that use the semantic Web information model and languages. Jena is a Java application programming interface that is available as an open-source download from www.hpl.hp.com/semweb/jena-top.html.
Article
Model-driven approaches move development focus from thirdgeneration programming language code to models, specifically models expressed in the Unified Model Language and its profiles. The objective is to increase productivity and reduce time-to-market by enabling development and using concepts closer to the problem domain at hand, rather than those offered by programming languages. Model-driven development's key challenge is to transform these higher-level models to platform-specific models that tools can use to generate code. UML gives numerous options to developers.(1) A UML model can graphically depict a system's structure and behavior from a certain viewpoint and at a certain level of abstraction. This is desirable, because typically we can better manage a complex system description through multiple models, where each captures a different aspect of the solution. We can use models not only horizontally to describe different system aspects but also vertically, to be refined from higher to lower levels of abstraction. At the lowest level, models use implementation technology concepts. Working with multiple, interrelated models requires significant effort to ensure their overall consistency. In addition to vertical and horizontal model synchronization, we can significantly reduce the burden of other activities, such as reverse engineering, view generation, application of patterns, or refactoring, through automation. Many of these activities are performed as automated processes that take one or more source models as input and produce one or more target models as output, while following a set of transformation rules. We refer to this process as model transformation. Here, we analyze current approaches to model transformation, concentrating on desirable characteristics of a model transformation language that can be used by modeling and design tools to automate tasks, thus significantly improving development productivity and quality.
A direct mapping of relational data to rdf
  • M Arenas
  • A Bertails
  • E Prudhommeaux
  • J Sequeda
Arenas, M., Bertails, A., Prudhommeaux, E., Sequeda, J.: A direct mapping of relational data to rdf. W3C recommendation 27 (2012)
R2rml: Rdb to rdf mapping language
  • S Das
  • S Sundara
  • R Cyganiak
Das, S., Sundara, S., Cyganiak, R.: R2rml: Rdb to rdf mapping language.(2012) https://www. w3. org. Tech. rep., TR/r2rml
Mdm: governing evolution in big data ecosystems
  • S Nadal
  • O Romero
  • A Abelló
  • P Vassiliadis
  • S Vansummeren
Nadal, S., Romero, O., Abelló, A., Vassiliadis, P., Vansummeren, S.: Mdm: governing evolution in big data ecosystems. In: Advances in Database Technology, EDBT 2018, 21st International Conference on Extending Database Technology: Vienna, Austria, March 26-29, 2018: proceedings. pp. 682-685. OpenProceedings (2018)
Data integration: The current status and the way forward
  • M Stonebraker
  • I F Ilyas
Stonebraker, M., Ilyas, I.F.: Data integration: The current status and the way forward. IEEE Data Eng. Bull. 41(2), 3-9 (2018)
A direct mapping of relational data to rdf
  • arenas