Learning n-ary Node Selecting Tree Transducers from Completely Annotated Examples

11/2006; DOI: 10.1007/11872436_21
Source: OAI

ABSTRACT We present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (NSTTs), that are known to capture the class of MSO-definable n-ary queries. Despite of this highly expressive, we show that n-aryy queries, selecting a polynomially bounded number of tuples per tree, represented by deterministic NSTTs can be learned from polynomial time and data while allowing for efficient enumeration of query answers. An application to wrapper induction in Web information extraction yields encouraging results.


Available from: Rémi Gilleron, May 26, 2014
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: XML query induction is a key task in Web information extraction. Recent approaches based on grammatical inference represent node selection queries in XML trees by de- terministic tree automata. In this paper, we show how to guide RPNI-based learning algo- rithms by XML schemas which we can infer in a preprocessing step. We hope that schema guidance will help to improve heuristics that are essential for query learning algorithms.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Specifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schemaguided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction.
    Journal of Machine Learning Research 01/2013; 14(1):927-964. · 2.85 Impact Factor