Conference Paper

Schema merging and mapping creation for relational sources

DOI: 10.1145/1353343.1353357 Conference: EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings
Source: DBLP


We address the problem of generating a mediated schema from a set of relational data source schemas and conjunctive queries that specify where those schemas overlap. Unlike past approaches that generate only the mediated schema, our algorithm also generates view definitions , i.e., source-to-mediated schema mappings. Our main goal is to understand the requirements that a mediated schema and views should satisfy, such as completeness, preserva- tion of overlapping information, normalization, and minimality. We show how these requirements influence the detailed structure of schemas and view definitions that are produced. We introduce a normal form for mediated schemas and view definitions, show how to generate them, and prove that schemas and views in this normal form satisfy our requirements. The view definitions in our normal form use stylized GLAV mappings, for which query rewriting is easier than general GLAV mappings. We demonstrate the efficiency of query rewriting in a prototype implementation.

Download full-text


Available from: Philip A. Bernstein,
  • Source
    • "In the data integration research area, several efforts have also been made to automate schema merging, i.e., the automatic generation of integrated schema and the subsequent source-to-target mappings. For example, the approach proposed by Pottinger et al. is able to merge a set of relational source schemas starting from a set of conjunctive queries specifying the overlap elements between sources [29], whereas the semantic merging approach proposed in [24] creates mediated schemas by analyzing the integrity constraints and tuple generating dependencies of the sources. The Merge operator proposed by Rizopoulos et al. produces an integrated schema by analyzing all possible semantic mappings of two sources [31]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Despite many innovative systems supporting the data integration process, designers advocate more abstract metaphors to master the inherent complexity of this activity. In fact, the visual notations provided in many modern data integration systems might run into scale up problems when facing the integration of big data sources. Thus, higher level visual notations and automatic schema mapping mechanisms might be the key factors to make the data integration process more tractable. In this paper we present the Conceptual Data Integration Language (CoDIL), a visual language providing conceptual level visual mechanisms to manipulate and integrate data sources, together with a formalization of the language icon operators by means of ALCN Description Logic. The formalization allowed us to define the logic-level semantics of CoDIL, providing reasoning rules for validating the correctness of a data integration process and for generating the logic-level reconciled schema.
  • Source
    • "Conflict management in Geo data. Data integration of (non spatial) data sources in presence of semantic heterogeneity has been largely studied in literature both for the traditional [3] [20] and Web [5] [6] data sources. Inspired by these approaches and by the works of Fonseca et al. [10], Kavouras et al. [15], Laurini [16] and by the wide literature in the domain of GIS and LBS interoperability (e.g., [12]), we proposed a general solution to integrate spatial data (more precisely POIs) in presence of representation conflicts [13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Currently, there is more and more interest in geo-spatial data sources providing rich information about a huge number of interconnected geo-entities and points of interest located in the real world. Moreover, this kind of sources is one of the first to be published as linked open data. Noteworthy examples are the Geonames and GeoLinkedData initiatives. On the one hand, making available more data sources as linked open data allows querying the sources in an integrated way. On the other hand, it is known that content of geo-spatial data sources suffers from various drawbacks, mainly concerning data quality and conflicts. In this context, relevant feedbacks from users with specific experience and knowledge about POIs in a certain spatial region are considered valuable contributions to improve data quality and solve description conflicts. In this context, we propose a conceptual framework called M-PREGeD (Multi-Providers cRowd-Enhanced Geo linked Data) devoted to collect, organize and rank user-generated corrections and completions to improve accuracy and completeness of Geo-spatial Linked Data from different data sources. Metrics have been defined for both contributors and contents. In the framework, validated and ranked corrections and completions are stored as linked open data in a separate repository but linked to the original data sources. The repository can be queried in a combined way with the original data sources.
    Proceedings of the Joint EDBT/ICDT 2013 Workshops; 03/2013
  • Source
    • "More recent work on schema integration builds on the research results on semiautomatic schema matching [9] and separate matching from merging. Hence, several algorithms have been proposed to merge schemas based on a pre-determined match mapping [3], [11], [6], [8]. Despite this simplification, several of these merge approaches are still not fully automatic but depend on manual intervention. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The proliferation of ontologies and taxonomies in many domains increasingly demands the integration of multiple such ontologies to provide a unified view on them. We demonstrate a new automatic approach to merge large taxonomies such as product catalogs or web directories. Our approach is based on an equivalence matching between a source and target taxonomy to merge them. It is target-driven, i.e. it preserves the structure of the target taxonomy as much as possible. Further, we show how the approach can utilize additional relationships between source and target concepts to semantically improve the merge result.
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on; 05/2011
Show more