Reasoning about Keys for XML

School of Informatics, University of Edinburgh, Edinburgh EH9 3JZ, Scotland, UK; Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104-6389, USA; Bell Laboratories, 600 Mountain Ave., Murray Hill, NJ 07974-0636, USA; Departamento de Informatica, Universidade Federal do Parana, Centro Politecnico, Curitiba, PR 81531-990, Brazil; Department of Computer Science, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
DOI: 10.1007/3-540-46093-4_8
Source: DBLP

ABSTRACT We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys
are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys
for XML, these keys can be reasoned about efficiently. We show that the (finite) satisfiability problem for these keys is
trivial, and their (finite) implication problem is finitely axiomatizable and decidable in PTIME in the size of keys.

  • [Show abstract] [Hide abstract]
    ABSTRACT: A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML Schemas from XML documents when no schema, or only a low-quality one is available. Unfortunately, and in strong contrast to, for instance, the relational model, the automatic discovery of even the simplest of XML constraints, namely XML keys, has been left largely unexplored in this context. A major obstacle here is the unavailability of a theory on reasoning about XML keys in the presence of XML schemas, which is needed to validate the quality of candidate keys. The present paper embarks on a fundamental study of such a theory and classifies the complexity of several crucial properties concerning XML keys in the presence of an XSD, like, for instance, testing for consistency, boundedness, satisfiability, universality, and equivalence. Of independent interest, novel results are obtained related to cardinality estimation of XPath result sets. A mining algorithm is then developed within the framework of levelwise search. The algorithm leverages known discovery algorithms for functional dependencies in the relational model, but incorporates the above mentioned properties to assess and refine the quality of derived keys. An experimental study on an extensive body of real world XML data evaluating the effectiveness of the proposed algorithm is provided.
    Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 06/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the growing use of XML as a format for the perma- nent storage of data, the study of functional dependencies in XML (XFDs) is central to understanding how to eectively design XML databases. We investigate the implication prob- lem for 'closest node' XFDs in complete XML documents. Our first contribution is to provide an axiom system for XFD implication that we prove to be sound and complete, and we then present a quadratic time closure algorithm for XFD implication. Our second contribution is to investigate the implication problem for XFDs in the presence of a Doc- ument Type Definition (DTD). We show that for a class of DTDs called 'simple' DTDs, the implication problem for a set of XFDs and a 'simple' DTD can be converted to the implication problem for a set of XFDs alone, and so is ax- iomatizable and eciently solvable by the first contribution. We do this by augmenting the original set of XFDs with ad- ditional XFDs generated from the structure of the DTD.
    Journal of Computer and System Sciences. 07/2012; 78(4).
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema, by following a mapping between the two schemas. There is a rich literature on problems related to data exchange, e.g., the design of a schema mapping language, the consistency of schema mappings, operations on mappings, and query answering over mappings. Data exchange is extensively studied on relational model, and is also recently discussed for XML data. This article investigates the construction of target instance for XML data exchange, which has received far less attention. We first present a rich language for the definition of schema mappings, which allow one to use various forms of document navigation and specify conditions on data values. Given a schema mapping, we then provide an algorithm to construct a canonical target instance. The schema mapping alone is not adequate for expressing target semantics, and hence, the canonical instance is in general not optimal. We recognize that target constraints play a crucial role in the generation of good solutions. In light of this, we employ a general XML constraint model to define target constraints. Structural constraints and keys are used to identify a certain entity, as rules for data merging. Moreover, we develop techniques to enforce non-key constraints on the canonical target instance, by providing a chase method to reason about data. Experimental results show that our algorithms scale well, and are effective in producing target instances of good quality.
    Information Processing & Management. 03/2013; 49(2):465–483.

Full-text (4 Sources)

Available from
May 28, 2014