Conference Paper

XML Schema, Tree Logic and Sheaves Automata.

DOI: 10.1007/3-540-44881-0_18 Conference: Rewriting Techniques and Applications, 14th International Conference, RTA 2003, Valencia, Spain, June 9-11, 2003, Proceedings
Source: DBLP

ABSTRACT XML documents may be roughly described as unranked, ordered trees and it is therefore natural to use tree automata to process or validate them. This idea has already been successfully applied in the context of Document Type Defi- nition (DTD), the simplest standard for defining document va lidity, but additional work is needed to take into account XML Schema, a more advanced standard, for which regular tree automata are not satisfactory. In thi s paper, we introduce Sheaves Logic (SL), a new tree logic that extends the syntax of the — recursion- free fragment of — W3C XML Schema Definition Language (WXS). Then we define a new class of automata for unranked trees that provide s decision proce- dures for the basic questions about SL: model-checking; satisfiability; entailment. The same class of automata is also used to answer basic questions about WXS, in- cluding recursive schemas: decidability of type-checking documents; testing the emptiness of schemas; testing that a schema subsumes another one.

Download full-text


Available from: Silvano Dal Zilio, Jul 10, 2015
  • Source
    • "We mention here only formalisms introduced in the context of XML. Presburger automata [36], sheaves automata [17], and the TQL logic [13] allow to express Presburger constraints on the numbers of occurrences of the different symbols among the children of some node. This is also equivalent to considering DTDs under commutative closure, similarly to [3] [30]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We investigate schema languages for unordered XML having no relative order among siblings. First, we propose \emph{unordered regular expressions} (UREs), essentially regular expressions with \emph{unordered concatenation} instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, \emph{disjunctive interval multiplicity expressions} (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: \emph{disjunctive interval multiplicity schema} (DIMS), and its restriction, \emph{disjunction-free interval multiplicity schema} (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.
    Theory of Computing Systems 11/2013; DOI:10.1007/s00224-014-9593-1
  • Source
    • "There have been several approaches to extend TA with arithmetic constraints on cardinalities |q| described above: the constraints can be added to transitions in order to count between siblings [17], [18] (in this case we could call them local by analogy with equality tests) or they can be global [19]. We compare in Section IV-A the latter approach (closer to our settings) with our extension of TAGC, wrt emptiness decision. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We define tree automata with global constraints (TAGC), generalizing the well-known class of tree automata with global equality and disequality constraints (14) (TAGED). TAGC can test for equality and disequality between sub- terms whose positions are defined by the states reached dur- ing a computation. In particular, TAGC can check that all the subterms reaching a given state are distinct. This con- straint is related to monadic key constraints for XML docu- ments, meaning that every two distinct positions of a given type have different values. We prove decidability of the emptiness problem for TAGC. This solves, in particular, the open question of de- cidability of emptiness for TAGED. We further extend our result by allowing global arithmetic constraints for count - ing the number of occurrences of some state or the number of different subterms reaching some state during a compu- tation. We also allow local equality and disequality tests between sibling positions and the extension to unranked ordered trees. As a consequence of our results for TAGC, we prove the decidability of a fragment of the monadic second order logic on trees extended with predicates for equality and disequality between subtrees, and cardinality.
    Proceedings of the 25th Annual IEEE Symposium on Logic in Computer Science, LICS 2010, 11-14 July 2010, Edinburgh, United Kingdom; 01/2010
  • Source
    • "However, this would grant a power to grammars for counting the number of occurrences of attributes. Although an approach directly dealing with this has been pursued by other researchers [30], we choose to disallow such expressions since we have not found presented elsewhere [16] based on automata with both element and attribute transitions. It is also worth remarking that RELAX NG has adopted this restriction . "
    [Show abstract] [Hide abstract]
    ABSTRACT: The history of schema languages for XML is (roughly) an increase of expressiveness. While early schema languages mainly focused on the element structure, Clark first paid an equal attention to attributes by allowing both element and attribute constraints in a single constraint expression (we call his mechanism “attribute–element constraints”). In this paper, we investigate intersection and difference operations and inclusion test for attribute–element constraints, in view of their importance in static typechecking for XML processing programs. The contributions here are (1) proofs of closure under intersection and difference as well as decidability of inclusion test and (2) algorithm formulations incorporating a “divide-and-conquer” strategy for avoiding an exponential blow-up for typical inputs.
    Theoretical Computer Science 08/2006; 360(1-3):327-351. DOI:10.1016/j.tcs.2006.05.004
Show more