Conference Paper

Schema merging and mapping creation for relational sources.

DOI: 10.1145/1353343.1353357 Conference: EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings
Source: DBLP

ABSTRACT We address the problem of generating a mediated schema from a set of relational data source schemas and conjunctive queries that specify where those schemas overlap. Unlike past approaches that generate only the mediated schema, our algorithm also generates view definitions , i.e., source-to-mediated schema mappings. Our main goal is to understand the requirements that a mediated schema and views should satisfy, such as completeness, preserva- tion of overlapping information, normalization, and minimality. We show how these requirements influence the detailed structure of schemas and view definitions that are produced. We introduce a normal form for mediated schemas and view definitions, show how to generate them, and prove that schemas and views in this normal form satisfy our requirements. The view definitions in our normal form use stylized GLAV mappings, for which query rewriting is easier than general GLAV mappings. We demonstrate the efficiency of query rewriting in a prototype implementation.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Information integration from multiple heteroge- neous sources is one of the major challenges facing enterprises and service providers today, and one of the important problems in this domain is the inte- gration of structured and unstructured (or text) data. In this paper we describe our work on a data-driven approach to integrating various sources of text data, without relying on the availability of schema infor- mation. To this end, we have used various existing tools from natural language processing, data min- ing and related areas in a novel manner. The tools are used at the 'preprocessing' stage to (a) charac- terise each set of unstructured information (or col- lection of text data), (b) identify the related sets of unstructured information and (c) relate these sets to various reference data sets. All these steps are based solely on the instance values of the data sets. Subsequently the information compiled in the pre- processing stage may be used at query time to query the structured and text data. We also present our results on applying our techniques for data integra- tion across multiple unstructured data sources, re- lating to customer comments of a service provider.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite many innovative systems supporting the data integration process, designers advocate more abstract metaphors to master the inherent complexity of this activity. In fact, the visual notations provided in many modern data integration systems might run into scale up problems when facing the integration of big data sources. Thus, higher level visual notations and automatic schema mapping mechanisms might be the key factors to make the data integration process more tractable. In this paper we present the Conceptual Data Integration Language (CoDIL), a visual language providing conceptual level visual mechanisms to manipulate and integrate data sources, together with a formalization of the language icon operators by means of ALCN Description Logic. The formalization allowed us to define the logic-level semantics of CoDIL, providing reasoning rules for validating the correctness of a data integration process and for generating the logic-level reconciled schema.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Currently, there is more and more interest in geo-spatial data sources providing rich information about a huge number of interconnected geo-entities and points of interest located in the real world. Moreover, this kind of sources is one of the first to be published as linked open data. Noteworthy examples are the Geonames and GeoLinkedData initiatives. On the one hand, making available more data sources as linked open data allows querying the sources in an integrated way. On the other hand, it is known that content of geo-spatial data sources suffers from various drawbacks, mainly concerning data quality and conflicts. In this context, relevant feedbacks from users with specific experience and knowledge about POIs in a certain spatial region are considered valuable contributions to improve data quality and solve description conflicts. In this context, we propose a conceptual framework called M-PREGeD (Multi-Providers cRowd-Enhanced Geo linked Data) devoted to collect, organize and rank user-generated corrections and completions to improve accuracy and completeness of Geo-spatial Linked Data from different data sources. Metrics have been defined for both contributors and contents. In the framework, validated and ranked corrections and completions are stored as linked open data in a separate repository but linked to the original data sources. The repository can be queried in a combined way with the original data sources.
    Proceedings of the Joint EDBT/ICDT 2013 Workshops; 03/2013

Full-text (2 Sources)

Available from
May 20, 2014