Conference Paper

Schema merging and mapping creation for relational sources.

DOI: 10.1145/1353343.1353357 Conference: EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings
Source: DBLP

ABSTRACT We address the problem of generating a mediated schema from a set of relational data source schemas and conjunctive queries that specify where those schemas overlap. Unlike past approaches that generate only the mediated schema, our algorithm also generates view definitions , i.e., source-to-mediated schema mappings. Our main goal is to understand the requirements that a mediated schema and views should satisfy, such as completeness, preserva- tion of overlapping information, normalization, and minimality. We show how these requirements influence the detailed structure of schemas and view definitions that are produced. We introduce a normal form for mediated schemas and view definitions, show how to generate them, and prove that schemas and views in this normal form satisfy our requirements. The view definitions in our normal form use stylized GLAV mappings, for which query rewriting is easier than general GLAV mappings. We demonstrate the efficiency of query rewriting in a prototype implementation.

  • [Show abstract] [Hide abstract]
    ABSTRACT: Making sensible queries on databases collected from different organizations presents a challenging task for linking semantic equivalent data facts. Current techniques primarily focused on performing pair-wise attribute matching and paid little attention towards discovering probabilistic structural dependencies by exploiting the ontological domain knowledge of tables, attributes and tuples to construct hierarchical cluster mapping trees. In this paper, we present Ontology Guided Data Linkage (OGDL) framework for self-organizing heterogeneous data sources into homogeneous ontological clusters through multi-faceted classification. Through the evaluation on real-world data, we demonstrate the robustness and accuracy of our system.
    Advanced Data Mining and Applications - 7th International Conference, ADMA 2011, Beijing, China, December 17-19, 2011, Proceedings, Part II; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The proliferation of ontologies and taxonomies in many domains increasingly demands the integration of multiple such ontologies to provide a unified view on them. We demonstrate a new automatic approach to merge large taxonomies such as product catalogs or web directories. Our approach is based on an equivalence matching between a source and target taxonomy to merge them. It is target-driven, i.e. it preserves the structure of the target taxonomy as much as possible. Further, we show how the approach can utilize additional relationships between source and target concepts to semantically improve the merge result.
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on; 05/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Currently, there is more and more interest in geo-spatial data sources providing rich information about a huge number of interconnected geo-entities and points of interest located in the real world. Moreover, this kind of sources is one of the first to be published as linked open data. Noteworthy examples are the Geonames and GeoLinkedData initiatives. On the one hand, making available more data sources as linked open data allows querying the sources in an integrated way. On the other hand, it is known that content of geo-spatial data sources suffers from various drawbacks, mainly concerning data quality and conflicts. In this context, relevant feedbacks from users with specific experience and knowledge about POIs in a certain spatial region are considered valuable contributions to improve data quality and solve description conflicts. In this context, we propose a conceptual framework called M-PREGeD (Multi-Providers cRowd-Enhanced Geo linked Data) devoted to collect, organize and rank user-generated corrections and completions to improve accuracy and completeness of Geo-spatial Linked Data from different data sources. Metrics have been defined for both contributors and contents. In the framework, validated and ranked corrections and completions are stored as linked open data in a separate repository but linked to the original data sources. The repository can be queried in a combined way with the original data sources.
    Proceedings of the Joint EDBT/ICDT 2013 Workshops; 03/2013

Full-text (2 Sources)

Available from
May 20, 2014