Laconic schema mappings: computing core universal solutions by means of SQL queries

Source: arXiv

ABSTRACT We present a new method for computing core universal solutions in data exchange settings specified by source-to-target dependencies, by means of SQL queries. Unlike previously known algorithms, which are recursive in nature, our method can be implemented directly on top of any DBMS. Our method is based on the new notion of a laconic schema mapping. A laconic schema mapping is a schema mapping for which the canonical universal solution is the core universal solution. We give a procedure by which every schema mapping specified by FO s-t tgds can be turned into a laconic schema mapping specified by FO s-t tgds that may refer to a linear order on the domain of the source instance. We show that our results are optimal, in the sense that the linear order is necessary and the method cannot be extended to schema mapping involving target constraints.


Available from: Phokion Kolaitis, Aug 25, 2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: We call a data-transformation system any system that maps, translates and exchanges data across different representations. Nowadays, data architects are faced with a large variety of transformation tasks, and there is huge number of different approaches and systems that were conceived to solve them. As a consequence, it is very important to be able to evaluate such alternative solutions, in order to pick up the right ones for the problem at hand. To do this, we introduce IQ-Meter, the first comprehensive tool for the evaluation of data-transformation systems. IQ-Meter can be used to benchmark, test, and even learn the best usage of data-transformation tools. It builds on a number of novel algorithms to measure the quality of outputs and the human effort required by a given system, and ultimately measures “how much intelligence” the system brings to the solution of a data-translation task.
    2014 IEEE 30th International Conference on Data Engineering (ICDE); 03/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Research has investigated mappings among data sources under two perspectives. On the one side, there are studies of practical tools for schema mapping generation; these focus on algorithms to generate mappings based on visual specifications provided by users. On the other side, we have theoretical researches about data exchange. These study how to generate a solution – i.e., a target instance – given a set of mappings usually specified as tuple generating dependencies. Since the notion of a core solution has been formally identified as an optimal solution, it is very important to efficiently support core computations in mapping systems. In this paper, we introduce several new algorithms that contribute to bridge the gap between the practice of mapping generation and the theory of data exchange. We show how, given a mapping scenario, it is possible to generate an executable script that computes core solutions for the corresponding data exchange problem. The algorithms have been implemented and tested using common runtime engines to show that they guarantee very good performances, orders of magnitudes better than those of known algorithms that compute the core as a post-processing step.
    Information Systems 11/2012; 37(7):677–711. DOI:10.1016/ · 1.24 Impact Factor
  • Conference Paper: Mapping and cleaning
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the challenging and open problem of bringing together two crucial activities in data integration and data quality, i.e., transforming data using schema mappings, and fixing conflicts and inconsistencies using data repairing. This problem is made complex by several factors. First, schema mappings and data repairing have traditionally been considered as separate activities, and research has progressed in a largely independent way in the two fields. Second, the elegant formalizations and the algorithms that have been proposed for both tasks have had mixed fortune in scaling to large databases. In the paper, we introduce a very general notion of a mapping and cleaning scenario that incorporates a wide variety of features, like, for example, user interventions. We develop a new semantics for these scenarios that represents a conservative extension of previous semantics for schema mappings and data repairing. Based on the semantics, we introduce a chase-based algorithm to compute solutions. Appropriate care is devoted to developing a scalable implementation of the chase algorithm. To the best of our knowledge, this is the first general and scalable proposal in this direction.
    2014 IEEE 30th International Conference on Data Engineering (ICDE); 03/2014