Dirk Van Gucht

Dirk Van Gucht
Indiana University Bloomington | IUB · Department of Computer Science

PhD in Computer Science

About

210
Publications
25,882
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,787
Citations
Additional affiliations
September 1985 - present
Indiana University Bloomington
Position
  • Professor

Publications

Publications (210)
Chapter
Relational query languages rely heavily on costly join operations to combine tuples from multiple tables into a single resulting tuple. In many cases, the cost of query evaluation can be reduced by manually optimizing (parts of) queries to use cheaper semi-joins instead of joins. Unfortunately, existing database products can only apply such optimiz...
Article
Fragments of Tarski's relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. In this work, we perform such a systematic study....
Article
Many graph query languages rely on composition to navigate graphs and select nodes of interest, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting toward queries using semi-joins instead, resulting in a significant reduction of the query evaluation cost. We study techniques t...
Article
Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, proje...
Article
Full-text available
Many data sources can be represented easily by collections of sets of objects. For several practical queries on such collections of sets of objects, the answer does not depend on the precise composition of these sets, but only on the number of sets to which each object belongs. This is the case k= 1 for the more general situation where the query an...
Article
Full-text available
For any query language \(\mathcal {F}\), we consider three natural families of boolean queries. Nonemptiness queries are expressed as e ≠ ∅ with e an \(\mathcal {F}\) expression. Emptiness queries are expressed as e = ∅. Containment queries are expressed as e1 ⊆ e2. We refer to syntactic constructions of boolean queries as modalities. In first orde...
Article
Symmetric queries are introduced as queries on a sequence of sets of objects the result of which does not depend on the order of the sets. An appropriate data model is proposed, and two query languages are introduced, QuineCALC and SyCALC. They are correlated with the symmetric Boolean respectively relational functions. The former correlation yield...
Chapter
We identify three basic modalities for expressing boolean queries using the expressions of a query language: nonemptiness, emptiness, and containment. For the class of first-order queries, these three modalities have exactly the same expressive power. For other classes of queries, e.g., expressed in weaker query languages, the modalities may differ...
Chapter
For several practical queries on bags of sets of objects, the answer does not depend on the precise composition of these sets, but only on the number of sets to which each object belongs. This is the case k = 1 for the more general situation where the query answer only depends on the number of sets to which each group of at most k objects belongs....
Chapter
Fragments of Tarski’s relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. Our approach is to start from a basic fragment wh...
Article
Full-text available
Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. We study the effects on relative expressiveness when we add transitive closure, proje...
Conference Paper
Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can be costly. Often, this need for composition can be reduced by rewriting towards queries that use semi-joins instead. In this way, the cost of evaluating queries can be significantly reduc...
Conference Paper
Motivated by the continuing interest in the tree data model, we study the expressive power of downward fragments of navigational query languages on trees. The basic navigational query language we consider expresses queries by building binary relations from the edge relations and the identity relation, using composition and union. We study the effec...
Conference Paper
Given a set-comparison predicate P and given two lists of sets A = (A1,...,Am) and B = (B1,...,Bm), with all Ai, Bj ⊆ [n], the P-set join A bowtiePB is defined to be the set {(i, j) in [m] x [m] | P(Ai,Bj)}. When P(Ai,Bj) is the condition "Ai ∩ Bj ≠ is empty " we call this the set-intersection-notempty join (a.k.a. the composition of A and B); when...
Article
Given a document D in the form of an unordered node-labeled tree, we study the expressiveness on D of various basic fragments of XPath, the core navigational language on XML documents. Working from the perspective of these languages as fragments of Tarski's relation algebra, we give characterizations, in terms of the structure of D, for when a bina...
Article
Full-text available
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; copr...
Article
Conditional independence (CI) statements occur in several areas of computer science and artificial intelligence, e.g., as embedded multivalued dependencies in database theory, disjunctive association rules in data mining, and probabilistic CI statements in probability theory. Although, syntactically, such constraints can always be represented in th...
Article
Full-text available
Many data-intensive applications have to query a database that involves sequences of sets of objects. It is not uncommon that the order of the sets in such a sequence does not affect the result of the query. Such queries are called symmetric. In this paper, the authors wish to initiate research on symmetric queries. Thereto, a data model is propose...
Conference Paper
In this work we survey the research on foundations of data-aware (business) processes that has been carried out in the database theory community. We show that this community has indeed developed over the years a multi-faceted culture of merging data ...
Article
Full-text available
Motivated by applications in databases, this paper considers various fragments of the calculus of binary relations. The fragments are obtained by leaving out, or keeping in, some of the standard operators, along with some derived operators such as set difference, projection, coprojection, and residuation. For each considered fragment, a characteriz...
Conference Paper
In this short paper, I describe six data management research challenges relevant for Big Data and the Cloud. Although some of these problems are not new, their importance is amplified by Big Data and Cloud Computing.
Article
Several established and novel applications motivate us to study the expressive power of navigational query languages on graphs, which represent binary relations. Our basic language has only the operators union and composition, together with the identity relation. Richer languages can be obtained by adding other features such as other set operators,...
Article
Full-text available
We study the expressiveness of a positive fragment of path queries, denoted Path+, on documents that can be represented as node-labeled trees. The expressiveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to two particular subfragments, as well as to the class of tree queries, a subclass of...
Conference Paper
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; copr...
Article
Full-text available
The need to manage diverse information sources has triggered the rise of very loosely structured data models, known as ``dataspace models.'' Such information management systems must allow querying in simple ways, mostly by a form of searching. Motivated by these developments, we propose a theory of search queries in a general model of dataspaces. I...
Article
The relevance of Byzantine fault tolerance in the context of cloud computing has been questioned [3]. While arguments against Byzantine fault tolerance seemingly makes sense in the context of a single cloud, i.e., a large-scale cloud infrastructure that ...
Article
The logical and algorithmic properties of stable conditional independence (CI) as an alternative structural representation of conditional independence information are investigated. We utilize recent results concerning a complete axiomatization of stable conditional independence relative to discrete probability measures to derive perfect model prope...
Conference Paper
Full-text available
We study the expressiveness of a positive fragment of path queries, denoted Path+, on node-labeled trees documents. The expres- siveness of Path+ is studied from two angles. First, we establish that Path+ is equivalent in expressive power to a particular sub-fragment as well as to the class of tree queries, a sub-class of the flrst-order conjunc- t...
Article
Full-text available
We give a language-independent characterization of the expressive power of the relational algebra on finite sets of source-target relation instance pairs. The associated decision problem is shown to be co-graph-isomorphism hard and in co NP. The main result is also applied in providing a new characterization of the generic relational queries.
Article
Full-text available
XML query languages need to provide some mechanism to inspect and manipulate nodes at all levels of an input tree. We investigate the expressive power provided in this regard by structural recursion. In par- ticular, we show that the combination of vertical recursion down a tree combined with horizontal recursion across a list of trees gives rise t...
Conference Paper
Full-text available
The need to manage diverse information sources has trig- gered the rise of very loosely structured data models, known as "dataspace models." Such information management sys- tems must allow querying in simple ways, mostly by a form of searching. Motivated by these developments, we propose a theory of search queries in a general model of dataspaces....
Article
Full-text available
Recent studies have proposed structural summary techniques for path-query evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has considered structural characterizations of f...
Conference Paper
Full-text available
A lattice-theoretic framework is introduced that permits the study of the conditional in- dependence (CI) implication problem relative to the class of discrete probability measures. Semi-lattices are associated with CI state- ments and a finite, sound and complete in- ference system relative to semi-lattice inclu- sions is presented. This system is...
Article
We study the implication problem of measure-based constraints. These constraints are formulated in a framework for measures generalizing that for mathematical measures. Measures arise naturally in a wide variety of domains. We show that measure constraints, for particular measures, correspond to constraints that occur in relational databases, data...
Article
A lattice-theoretic framework is introduced that permits the study of the conditional independence (CI) implication problem relative to the class of discrete probability measures. Semi-lattices are associated with CI statements and a finite, sound and complete inference system relative to semi-lattice inclusions is presented. This system is shown t...
Conference Paper
Full-text available
As the number of applications that rely on XML data in- creases, so does the need for performing efficient XML query evaluation. A critical part of the solution involves provid- ing new techniques for designing XML indexes and lookup algorithms. In this paper, we leverage the results of our re- search on coupling the partitions induced by fragments...
Article
Full-text available
We utilize recent results concerning a complete axiomatization of stable conditional inde- pendence (CI) relative to discrete probability measures to derive perfect model properties of stable CI structures. We show that stable CI can be interpreted as a generalization of undirected graphical models and establish a connection between sets of stable...
Conference Paper
Full-text available
Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries Supporting efficient access to XML data using XPath [3] continues to be an important research problem [6, 12]. XPath queries are used to specify nodelabeled trees which match portions of the hierarchical XML data. In XPath qu...
Conference Paper
Full-text available
Complex database queries, like programs in general, can `crash', i.e., can raise runtime errors. We want to avoid crashes without losing expressive power, or we want to correctly predict the absence of crashes. We show how concepts and techniques from programming language theory, notably type systems and reflection, can be adapted to this end. Of c...
Article
The well-definedness problem for a programming language consists of checking, given an expression and an input type, whether the semantics of the expression is defined for all inputs adhering to the input type. A related problem is the semantic type-checking problem which consists of checking, given an expression, an input type, and an output type...
Conference Paper
Full-text available
XML query languages need to provide some mechanism to inspect and manipulate nodes at all levels of an input tree. In this paper we investigate the expressive power provided in this regard by structural recursion. We show that the combination of vertical recursion down a tree combined with horizontal recursion across a list of trees gives rise to a...
Conference Paper
Full-text available
Recent studies have proposed structural summary techniques for path- query evaluation on semi-structured data sources. One major line of this research has been the introduction of the DataGuide, 1-index, 2-index, and A(k) indices, and subsequent investigations and generalizations. Another recent study has con- sidered structural characterizations o...
Conference Paper
We introduce a new methodology for coupling language-induced partitions and index-induced partitions on XML documents that is aimed for the benefit of efficient evaluation of XPath queries. In particular, we identify XPath fragments which are ideally coupled with the newly introduced P(k)-partition which has its definition grounded in the well-know...
Article
Full-text available
Recent studies have proposed structural summary tech-niques for path query evaluation on semi-structured data sources. One major line of this research has been the intro-duction of the DataGuide, 1-index, 2-index, and A(k) in-dices, and subsequent investigations and generalizations. Another recent study has considered structural characteri-zations...
Conference Paper
Full-text available
We analyze algorithms that, under the right circumstances, permit ecien t mining for frequent itemsets in data with tall peaks (large frequent itemsets). We develop a family of level-by-level peak-jumping al- gorithms, and study them using a simple probability model. The analysis claries why the jumping idea sometimes works well, and which prop- er...
Conference Paper
Full-text available
Given a document D in the form of an unordered labeled tree, we study the expressibility on D of various fragments of XPath, the core navigational language on XML documents. We give charac- terizations, in terms of the structure of D, for when a binary relation on its nodes is definable by an XPath expression in these fragm ents. Since each pair of...
Conference Paper
Full-text available
This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is fre- quent), and failure (a candidate that is infrequent). For a selection of candidate-based frequent itemset mining a...
Article
Full-text available
Technologies for overcoming heterogeneities between autonomous data sources are key in the emerg- ing networked world. In this paper we discuss the initial results of a formal investigation into the underpinnings of technologies for alleviating structural heterogeneity. At the core of structural heterogeneity is the data mapping problem: discoverin...
Chapter
Languages for models of nested relations and complex objects have been attracting considerable attention recently. Some of these languages are algebraic, others are calculus based, some are logic programming oriented. This paper describes these languages and surveys recent results about the expressive power of these languages. The emphasis is on co...
Article
Full-text available
Consider a nonnegative measurable function f defined on Ω 1 ×Ω 2 , where Ω j is a probability space with probability measure μ j . We prove the inequality ∫∫ Ω 1 ×Ω 2 fdμ 1 dμ 2 p +∫∫ Ω 1 ×Ω 2 f p dμ 1 dμ 2 ≥∫ Ω 1 ∫ Ω 2 fdμ 2 p dμ 1 +∫ Ω 2 ∫ Ω 1 fdμ 1 p dμ 2 provided that 1≤p≤2. The inequality fails in general if p>2. It also fails if one of the me...
Conference Paper
Full-text available
We study the relative effectiveness and the efficiency of computing support-bounding rules that can be used to prune the search space in algorithms to solve the frequent item-sets mining problem (FIM). We develop a formalism wherein these rules can be stated and analyzed using the concept of differentials and density functions of the support functi...
Conference Paper
Full-text available
Differential constraints are a class of finite difference equations specified over functions from the powerset of a finite set into the reals. We characterize the implication problem for such constraints in terms of lattice decompositions, and give a sound and complete set of inference rules. We relate differential constraints to a subclass of prop...
Conference Paper
Full-text available
Two natural decision problems regarding the XML query language XQuery are well-definedness and semantic type-checking. We study these problems in the setting of a relational fragment of XQuery. We show that well-definedness and semantic type-checking are undecidable, even in the positive-existential case. Nevertheless, for a “pure” variant of XQuer...
Conference Paper
Full-text available
Two natural decision problems regarding the XML query language XQuery are well-definedness and semantic type-checking. We study these problems in the setting of a relational fragment of XQuery. We show that well-definedness and semantic type-checking are unde- cidable, even in the positive-existential case. Nevertheless, for a "pure" variant of XQu...
Article
Full-text available
Two natural decision problems regarding the XML query language XQuery are well-definedness and semantic type-checking. We study these problems in the setting of a relational fragment of XQuery. We show that well-definedness and semantic type-checking are unde-cidable, even in the positive-existential case. Nevertheless, for a "pure" variant of XQue...
Article
Full-text available
The failure rate of the Apriori Algorithm is studied both analytically and experimentally. The time needed by the Apriori Algorithm is determined by the number of item sets that are output (successes: item sets that occur in at least k baskets) and the number of item sets that are counted but not output (failures: item sets where all subsets of the...
Conference Paper
Full-text available
The need for interoperability among databases has increased dramatically with the proliferation of readily available DBMS and application software. Even within a single organization, data from disparate relational databases must be integrated. A framework for interoperability in a federated system of relational databases should be inherently relati...
Article
We extend Chandra and Harel's seminal work on computable queries for relational databases to a setting in which also spatial data may be present, using a constraint-based data model. Concretely, we introduce both coordinate-based and point-based query languages that are complete in the sense that they can express precisely all computable queries th...
Article
It is well known in descriptive computational complexity theory that fixpoint logic captures polynomial time on the class of ordered finite structures. The same is true on any class of structures on which a polynomial number of orders are definable in fixpoint logic. We call a class having this property polynomially orderable. We investigate this p...
Article
Full-text available
Object-oriented applications of database systems require database transformations involoving nonstandard functionalities such as set manipulation and object creation, that is, the introduction of new domain elements. To deal with thse functionalities, Abiteboul and Kanellakis [1989] introduced the “determinate” transformations as a generalization o...
Article
Full-text available
We introduce and study the concept of semi-determinism. A nondeterministic, generic query is called semi-deterministic if any two possible results of the query to a database are isomorphic. Semideterminism is a generalization of determinacy, proposed by Abiteboul and Kanellakis in the context of object-creating query languages. The framework of sem...
Article
This paper introduces a reflective extension of the relational algebra. Reflection is achieved by storing and manipulating relational algebra programs as relations and by adding a LISP-like evaluation operation to the algebra. We first show that this extension, which we call the reflective algebra, can serve as a unifying formalization of various f...
Conference Paper
Full-text available
. Recently, Abiteboul and Kanellakis introduced the notion of determinate query to describe database queries having the ability to create new domain elements. As there are no natural determinate-complete query languages known, more restrictive (the constructive queries) and more general (the semi-deterministic queries) notions of query were conside...
Article
In object-based data models, complex values such as tuples or sets have no special status and are represented just as any other object. However, different objects may represent the same value, i.e., duplicates may occur. It is known that typical object-based models supporting first-order queries, standard object creation, and while-loops, cannot in...
Article
Full-text available
Various non-deterministic aspects of object creation in database transformations are discussed, from a modeling as well as from an expressive power point of view. 1 Introduction In the past few years, a lot of attention has been paid to database transformations [AK89, AV88, AV90, AV91a, GPVG90, Hul86, HWWY91, HY91]. Database transformations are bin...
Article
This paper introduces and studies the relational meta algebra, a statically typed extension of the relational algebra to allow for meta programming in databases. In this meta algebra one can manipulate database relations involving not only stored data values (as in classical relational databases) but also stored relational algebra expressions. Topi...
Article
Full-text available
The linear database model, in which semi-linear sets are the only geometric objects, has been identified as suitable for spatial database applications from both modeling expressiveness as query efficiency considerations. For querying linear databases, the language has been proposed. In this paper, we examine the expressiveness of this language. Fir...
Article
Full-text available
In the constraint database community, FO+poly and FO+ linear have been proposed as foundations for spatial database query languages. One of the strengths of this approach is that these languages are a clean and natural generalization of Codd's relational model to a spatial setting. As a result, rigorous mathematical study of their expressiveness an...