Article

Rewriting the infinite chase

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Guarded tuple-generating dependencies (GTGDs) are a natural extension of description logics and referential constraints. It has long been known that queries over GTGDs can be answered by a variant of the chase ---a quintessential technique for reasoning with dependencies. However, there has been little work on concrete algorithms and even less on implementation. To address this gap, we revisit Datalog rewriting approaches to query answering, where GTGDs are transformed to a Datalog program that entails the same base facts on each base instance. We show that the rewriting can be seen as containing "shortcut" rules that circumvent certain chase steps, we present several algorithms that compute the rewriting by simulating specific types of chase steps, and we discuss important implementation issues. Finally, we show empirically that our techniques can process complex GTGDs derived from synthetic and real benchmarks and are thus suitable for practical use.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We now let A 2 be the result of chasing these instances chasing the view facts of these instances with the backward views. Since Σ are FGTGDs, we can perform the usual tree-like chase on it -see, for example, (Benedikt et al. 2022) or Section A.3 of this submission for a description. Note that when we chase an I 2 ∈ A 2 we are adding on facts over values in I 2 , but only facts that are already guarded in I 2 . ...
... The fact that the approximations are treelike is Proposition 9. The fact that the chase with FGTGDs gives a low treewidth structure is well-known: see, for example Proposition 10 in Section A.3, or (Benedikt et al. 2022). As mentioned earlier, the idea goes back to (Lukasiewicz, Calì, and Gottlob 2012). ...
... Proof. This can be proven using standard results about the treelike chase for FGTGDs, see for example (Benedikt et al. 2022): when we need a new speedy witness we create a child. In order to ensure that witness triggers for Σ always have frontier within a single node, we always propagate a new fact from child to parent if it is guarded in the parent. ...
Preprint
Full-text available
We study the interaction of views, queries, and background knowledge in the form of existential rules. The motivating questions concern monotonic determinacy of a query using views w.r.t. rules, which refers to the ability to recover the query answer from the views via a monotone function. We study the decidability of monotonic determinacy, and compare with variations that require the ``recovery function'' to be in a well-known monotone query language, such as conjunctive queries or Datalog. Surprisingly, we find that even in the presence of basic existential rules, the borderline between well-behaved and badly-behaved answerability differs radically from the unconstrained case. In order to understand this boundary, we require new results concerning entailment problems involving views and rules.
... To establish the correctness of the saturation procedure, we use the standard tool of the chase, but we need to restrict it to a specialized version called the one-pass chase. While the notion of one-pass chase is from [6], and some other techniques are adapted from the unpublished appendixes of [1], the only published work using these techniques is [2] -but there the analysis is restricted to a special case in which the side signature consists of a single unary atom. ...
... We then give the proof of Theorem 3.4. We start in Section 4 by introducing the necessary notion of a tree-like chase proof from [3] and one-pass tree-like chase proofs from [6], modified to our needs. Then we prove the main result in Section 5. We conclude in Section 6. ...
... To show our linearization result, we review in this section the tool of one-pass tree-like chase proofs and review the result from [6] that it can be used for OWQA with GTGDs. ...
Preprint
Full-text available
We consider the complexity of the open-world query answering problem, where we wish to determine certain answers to conjunctive queries over incomplete datasets specified by an initial set of facts and a set of Guarded TGDs. This problem has been well-studied in the literature and is decidable but with a high complexity, namely, it is 2EXPTIME complete. Further, the complexity shrinks by one exponential when the arity is fixed. We show in this paper how we can obtain better complexity bounds when considering separately the arity of the guard atom and that of the additional atoms, called the side signature. Our results make use of the technique of linearizing Guarded TGDs, introduced in a paper of Gottlog, Manna, and Pieris. Specifically, we present a variant of the linearization process, making use of a restricted version of the chase that we recently introduced. Our results imply that open-world query answering can be solved in EXPTIME with arbitrary-arity guard relations if we simply bound the arity of the side signature; and that the complexity drops to NP if we fix the side signature and bound the head arity and width of the dependencies.
... More recently, such reductions for Horn description logics have been implemented and evaluated Carral, González, and Koopmann 2019). Such datalog rewritings have also been studied for existential rules, for guarded (Benedikt et al. 2022), nearly guarded (Gottlob, Rudolph, and Simkus 2014), warded (Berger et al. 2022) and shy (Leone et al. 2019) rule sets. ...
Conference Paper
Ontology-based query answering is a problem that takes as input a set of facts F, an ontology R (typically expressed by existential rules), a Boolean query q , and asks whether R and F entails q. This problem is undecidable in general, and a widely investigated approach to tackle it is called query rewriting: from (R,q) (a ``rule query'') is computed q_R such that for any set of facts F, it holds that R and F entail q iff F entails q_R. The literature mostly focused on q_R expressed as a union of conjunctive queries (UCQs), and an algorithm that such a q_R whenever it exists has been proposed in the literature. However, UCQ-rewritability is applicable only in restricted settings. This raises the question whether such a generic algorithm can be designed for a more expressive language, such as datalog. We solve this question by the negative, by studying the difference between datalog-expressibility and datalog-rewritability. In particular, we show that query answering under datalog-expressible rule queries is undecidable.
... Among numerous applications, Datalog has been heavily used in the context of ontological query answering. In particular, for several important ontology languages based on description logics and existential rules, ontological query answering can be reduced to the problem of evaluating a Datalog query (see, e.g., (Eiter et al. 2012;Benedikt et al. 2022)), which in turn enables the exploitation of efficient Datalog engines such as Soufflé (Jordan, Scholz, and Subotic 2016), VLog (Urbani, Jacobs, and Krötzsch 2016), RDFox (Nenov et al. 2015), and DLV (Leone et al. 2006), to name a few. ...
Article
Explaining an answer to a Datalog query is an essential task towards Explainable AI, especially nowadays where Datalog plays a critical role in the development of ontology-based applications. A well-established approach for explaining a query answer is the so-called why-provenance, which essentially collects all the subsets of the input database that can be used to obtain that answer via some derivation process, typically represented as a proof tree. It is well known, however, that computing the why-provenance for Datalog queries is computationally expensive, and thus, very few attempts can be found in the literature. The goal of this work is to demonstrate how off-the-shelf SAT solvers can be exploited towards an efficient computation of the why-provenance for Datalog queries. Interestingly, our SAT-based approach allows us to build the why-provenance in an incremental fashion, that is, one explanation at a time, which is much more useful in a practical context than the one-shot computation of the whole set of explanations as done by existing approaches.
... Among numerous applications, Datalog has been heavily used in the context of ontological query answering. In particular, for several important ontology languages based on description logics and existential rules, ontological query answering can be reduced to the problem of evaluating a Datalog query (see, e.g., (Eiter et al. 2012;Benedikt et al. 2022)), which in turn enables the exploitation of efficient Datalog engines such as DLV (Leone et al. 2006) and Clingo (Gebser et al. 2016). ...
Preprint
Full-text available
Explaining why a database query result is obtained is an essential task towards the goal of Explainable AI, especially nowadays where expressive database query languages such as Datalog play a critical role in the development of ontology-based applications. A standard way of explaining a query result is the so-called why-provenance, which essentially provides information about the witnesses to a query result in the form of subsets of the input database that are sufficient to derive that result. To our surprise, despite the fact that the notion of why-provenance for Datalog queries has been around for decades and intensively studied, its computational complexity remains unexplored. The goal of this work is to fill this apparent gap in the why-provenance literature. Towards this end, we pinpoint the data complexity of why-provenance for Datalog queries and key subclasses thereof. The takeaway of our work is that why-provenance for recursive queries, even if the recursion is limited to be linear, is an intractable problem, whereas for non-recursive queries is highly tractable. Having said that, we experimentally confirm, by exploiting SAT solvers, that making why-provenance for (recursive) Datalog queries work in practice is not an unrealistic goal.
Article
Guarded tuple-generating dependencies (GTGDs) are a natural extension of description logics and referential constraints. It has long been known that queries over GTGDs can be answered by a variant of the chase —a quintessential technique for reasoning with dependencies. However, there has been little work on concrete algorithms and even less on implementation. To address this gap, we revisit Datalog rewriting approaches to query answering, where a set of GTGDs is transformed to a Datalog program that entails the same base facts on each base instance. We show that a rewriting consists of “shortcut” rules that circumvent certain chase steps, we present several algorithms that compute a rewriting by deriving such “shortcuts” efficiently, and we discuss important implementation issues. Finally, we show empirically that our techniques can process complex GTGDs derived from synthetic and real benchmarks and are thus suitable for practical use.
Article
Full-text available
Over the past years, there has been a resurgence of Datalog-based systems in the database community as well as in industry. In this context, it has been recognized that to handle the complex knowl- edge-based scenarios encountered today, such as reasoning over large knowledge graphs, Datalog has to be extended with features such as existential quantification. Yet, Datalog-based reasoning in the presence of existential quantification is in general undecid- able. Many efforts have been made to define decidable fragments. Warded Datalog+/- is a very promising one, as it captures PTIME complexity while allowing ontological reasoning. Yet so far, no im- plementation of Warded Datalog+/- was available. In this paper we present the Vadalog system, a Datalog-based system for performing complex logic reasoning tasks, such as those required in advanced knowledge graphs. The Vadalog system is Oxford’s contribution to the VADA research programme, a joint effort of the universi- ties of Oxford, Manchester and Edinburgh and around 20 indus- trial partners. As the main contribution of this paper, we illustrate the first implementation of Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive termination control strat- egy. We also provide a comprehensive experimental evaluation.
Article
Full-text available
We consider answering queries where the underlying data is available only over limited interfaces which provide lookup access to the tuples matching a given binding, but possibly restricting the number of output tuples returned. Interfaces imposing such "result bounds" are common in accessing data via the web. Given a query over a set of relations as well as some integrity constraints that relate the queried relations to the data sources, we examine the problem of deciding if the query is answerable over the interfaces; that is, whether there exists a plan that returns all answers to the query, assuming the source data satisfies the integrity constraints. The first component of our analysis of answerability is a reduction to a query containment problem with constraints. The second component is a set of "schema simplification" theorems capturing limitations on how interfaces with result bounds can be useful to obtain complete answers to queries. These results also help to show decidability for the containment problem that captures answerability, for many classes of constraints. The final component in our analysis of answerability is a "linearization" method, showing that query containment with certain guarded dependencies -- including those that emerge from answerability problems -- can be reduced to query containment for a well-behaved class of linear dependencies. Putting these components together, we get a detailed picture of how to check answerability over result-bounded services.
Conference Paper
Full-text available
The Resource Description Framework (RDF) is a standard for conceptually describing data on the Web, and SPARQL is the query language for RDF. As RDF data continue to be published across heterogeneous domains and integrated at Web-scale such as in the Linked Open Data (LOD) cloud, RDF data management systems are being exposed to queries that are far more diverse and workloads that are far more varied. The first contribution of our work is an in-depth experimental analysis that shows existing SPARQL benchmarks are not suitable for testing systems for diverse queries and varied workloads. To address these shortcomings, our second contribution is the Waterloo SPARQL Diversity Test Suite (WatDiv) that provides stress testing tools for RDF data management systems. Using WatDiv, we have been able to reveal issues with existing systems that went unnoticed in evaluations using earlier benchmarks. Specifically, our experiments with five popular RDF data management systems show that they cannot deliver good performance uniformly across workloads. For some queries, there can be as much as five orders of magnitude difference between the query execution time of the fastest and the slowest system while the fastest system on one query may unexpectedly time out on another query. By performing a detailed analysis, we pinpoint these problems to specific types of queries and workloads.
Article
Full-text available
Definition des fragments modaux de la logique des predicats a partir de formules du premier ordre qui sont des traductions des proprietes poly-modales elementaires. Distinguant les fragments variables et finis des fragments lies a un quantificateur, l'A. developpe une version semantique des fragments gardes en remplacant les liens syntaxiques par des restrictions sur les types d'attribution dans les modeles generalises. Se referant a l'algebre cylindrique, l'A. indique les nouvelles directions que peuvent prendre les theoremes de Tarski dans un environnement mathematique: celle des contraintes structurelles speciales, des extensions infinies, de la logique modale etendue et d'une semantique dynamique
Conference Paper
Full-text available
Because of the semantic conflicts, the exchange of information between heterogeneous applications remains a complex task. One way to address this problem is to use ontologies for the identification and association of semantically corresponding information concepts. In the electric power industry, the IEC/CIM represents the most complete and widely accepted ontology. We attempt to show through three concrete examples how the CIM can reap advantages from a formal representation of knowledge in order to support complex processes. We present a semantic approach for finding ringlets in the distribution network, for checking specific data inconsistencies and finally for identifying CIM topological nodes. We conclude by stating that the combination of CIM and RDF has the main advantage of offering valuable flexibility in processing complex tasks.
Article
Full-text available
This paper present several refinements of the Datalog +/- framework based on resolution and Datalog-rewriting. We first present a resolution algorithm which is complete for arbitrary sets of tgds and egds. We then show that a technique of saturation can be used to achieve completeness with respect to First-Order (FO) query rewriting. We then investigate the class of guarded tgds (with a loose definition of guardedness), and show that every set of tgds in this class can be rewritten into an equivalent set of standard Datalog rules. On the negative side, this implies that Datalog +/- has (only) the same expressive power as standard Datalog in terms of query answering. On the positive side however, this mean that known results and existing optimization techniques (such as Magic-Set) may be applied in the context of Datalog +/- despite its richer syntax.
Article
Full-text available
We describe feature vector indexing, a new, non-perfect indexing method for clause subsumption. It is suitable for both forward (i.e., finding a subsuming clause in a set) and backward (finding all subsumed clauses in a set) subsumption. Moreover, it is easy to implement, but still yields excellent performance in practice. As an added benefit, by restricting the selection of features used in the index, our technique immediately adapts to indexing modulo arbitrary AC theories with only minor loss of efficiency. Alternatively, the feature selection can be restricted to result in set subsumption. Feature vector indexing has been implemented in our equational theorem prover E, and has enabled us to integrate new simplification techniques making heavy use of subsumption. We experimentally compare the performance of the prover for a number of strategies using feature vector indexing and conventional sequential subsumption.
Article
Full-text available
We propose a new family of description logics (DLs), called DL-Lite, specifically tailored to capture basic ontology languages, while keeping low complexity of reasoning. Reasoning here means not only computing subsumption between concepts and checking satisfiability of the whole knowledge base, but also answering complex queries (in particular, unions of conjunctive queries) over the instance level (ABox) of the DL knowledge base. We show that, for the DLs of the DL-Litefamily, the usual DL reasoning tasks are polynomial in the size of the TBox, and query answering is LogSpace in the size of the ABox (i.e., in data complexity). To the best of our knowledge, this is the first result of polynomial-time data complexity for query answering over DL knowledge bases. Notably our logics allow for a separation between TBox and ABox reasoning during query evaluation: the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine, thus taking advantage of the query optimization strategies provided by current database management systems. Since even slight extensions to the logics of the DL-Litefamily make query answering at least NLogSpace in data complexity, thus ruling out the possibility of using on-the-shelf relational technology for query processing, we can conclude that the logics of the DL-Litefamily are the maximal DLs supporting efficient query answering over large amounts of instances.
Conference Paper
Full-text available
As applications of description logics proliferate, efcient reasoning with large ABoxes (sets of individuals with de- scriptions) becomes ever more important. Motivated by the prospects of reusing optimization techniques from deduc- tive databases, in this paper, we present a novel approach to checking consistency of ABoxes, instance checking and query answering, w.r.t. ontologies formulated using a slight restriction of the description logicSHIQ. Our approach pro- ceeds in three steps: (i) the ontology is translated into rst- order clauses, (ii) TBox and RBox clauses are saturated using a resolution-based decision procedure, and (iii) the saturated set of clauses is translated into a disjunctive datalog program. Thus, query answering can be performed using the resulting program, while applying all existing optimization techniques, such as join-order optimizations or magic sets. Equally im- portant, the resolution-based decision procedure we present is for unary coding of numbers worst-case optimal, i.e. it runs in EXPTIME.
Conference Paper
Full-text available
We establish complexities of the conjunctive query entailment problem for classes of existential rules (i.e. Tuple-Generating Dependencies or Datalog+/- rules). Our contribution is twofold. First, we introduce the class of greedy bounded treewidth sets (gbts), which covers guarded rules, and their known generalizations, namely (weakly) frontier-guarded rules. We provide a generic algorithm for query entailment with gbts, which is worst-case optimal for combined complexity with bounded predicate arity, as well as for data complexity. Second, we classify several gbts classes, whose complexity was unknown, namely frontier-one, frontier-guarded and weakly frontier-guarded rules, with respect to combined complexity (with bounded and unbounded predicate arity) and data complexity.
Conference Paper
Full-text available
In this paper we address the problem of query an- swering and rewriting in global-as-view data inte- gration systems, when key and inclusion dependen- cies are expressed on the global integration schema. In the case of sound views, we provide sound and complete rewriting techniques for a maximal class of constraints for which decidability holds. Then, we introduce a semantics which is able to cope with violations of constraints, and present a sound and complete rewriting technique for the same decid- able class of constraints. Finally, we consider the decision problem of query answering and give de- cidability and complexity results.
Article
Full-text available
As applications of description logics proliferate, efficient reasoning with knowledge bases containing many assertions becomes ever more important. For such cases, we developed a novel reasoning algorithm that reduces a SHIQ knowledge base to a disjunctive datalog program while preserving the set of ground conse- quences. Queries can then be answered in the resulting program while reusing existing and practically proven optimization techniques of deductive databases, such as join-order optimizations or magic sets. Moreover, we use our algorithm to derive precise data complexity bounds: we show that SHIQ is data complete for NP, and we identify an expressive fragment of SHIQ with polynomial data complexity.
Article
Full-text available
Let S1, S2 be two schemas, which may overlap, C be a set of constraints on the joint schema S1 UNION S2, and q1 be a S1-query. An (equivalent) reformulation of q1 in the presence of C is a S2-query, q2, such that q2 gives the same answers as q1 on any S1 UNION S2-database instance that satisfies C. In general, there may exist multiple such reformulations and choosing among them may require, for example, a cost model.
Conference Paper
Existential rules are an expressive ontology formalism for ontology-mediated query answering and thus query answering is of high complexity, while several tractable fragments have been identified. Existing systems based on first-order rewriting methods can lead to queries too large for DBMS to handle. It is shown that datalog rewriting can result in more compact queries, yet previously proposed datalog rewriting methods are mostly inefficient for implementation. In this paper, we fill the gap by proposing an efficient datalog rewriting approach for answering conjunctive queries over existential rules, and identify and combine existing fragments of existential rules for which our rewriting method terminates. We implemented a prototype system Drewer, and experiments show that it is able to handle a wide range of benchmarks in the literature. Moreover, Drewer shows superior or comparable performance over state-of-the-art systems on both the compactness of rewriting and the efficiency of query answering.
Article
We consider the following query answering problem: Given a Boolean conjunctive query and a theory in the Horn loosely guarded fragment, the aim is to determine whether the query is entailed by the theory. In this paper, we present a resolution decision procedure for the loosely guarded fragment, and use such a procedure to answer Boolean conjunctive queries against the Horn loosely guarded fragment. The Horn loosely guarded fragment subsumes classes of rules that are prevalent in ontology-based query answering, such as Horn ALCHOI and guarded existential rules. Additionally, we identify star queries and cloud queries, which using our procedure, can be answered against the loosely guarded fragment.
Conference Paper
We consider answering queries on data available through access methods, that provide lookup access to the tuples matching a given binding. Such interfaces are common on the Web; further, they often have bounds on how many results they can return, e.g., because of pagination or rate limits. We thus study result-bounded methods, which may return only a limited number of tuples. We study how to decide if a query is answerable using result-bounded methods, i.e., how to compute a plan that returns all answers to the query using the methods, assuming that the underlying data satisfies some integrity constraints. We first show how to reduce answerability to a query containment problem with constraints. Second, we show "schema simplification'' theorems describing when and how result bounded services can be used. Finally, we use these theorems to give decidability and complexity results about answerability for common constraint classes.
Conference Paper
The chase is a family of algorithms used in a number of data management tasks, such as data exchange, answering queries under dependencies, query reformulation with constraints, and data cleaning. It is well established as a theoretical tool for understanding these tasks, and in addition a number of prototype systems have been developed. While individual chase-based systems and particular optimizations of the chase have been experimentally evaluated in the past, we provide the first comprehensive and publicly available benchmark---test infrastructure and a set of test scenarios---for evaluating chase implementations across a wide range of assumptions about the dependencies and the data. We used our benchmark to compare chase-based systems on data exchange and query answering tasks with one another, as well as with systems that can solve similar tasks developed in closely related communities. Our evaluation provided us with a number of new insights concerning the factors that impact the performance of chase implementations.
Article
We present Ontop, an open-source Ontology-Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA, which avoids materializing triples and is implemented through the query rewriting technique, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL2QL and RDFS ontologies), and its support for all major relational databases.
Conference Paper
A new data structure set-trie for storing and retrieving sets is proposed. Efficient manipulation of sets is vital in a number of systems including datamining tools, object-relational database systems, and rule-based expert systems. Data structure set-trie provides efficient algorithms for set containment operations. It allows fast access to subsets and supersets of a given parameter set. The performance of operations is analyzed empirically in a series of experiments on real-world and artificial datasets. The analysis shows that sets can be accessed in O(cset)\mathcal{O}(c*|set|) time where |set| represents the size of parameter set and c is a constant.
Conference Paper
This paper presents Graal, a java toolkit dedicated to ontological query answering in the framework of existential rules. We consider knowledge bases composed of data and an ontology expressed by existential rules. The main features of Graal are the following: a basic layer that provides generic interfaces to store and query various kinds of data, forward chaining and query rewriting algorithms, structural analysis of decidability properties of a rule set, a textual format and its parser, and import of OWL 2 files. We describe in more detail the query rewriting algorithms, which rely on original techniques, and report some experiments.
Article
The so-called existential rules have recently gained attention, mainly due to their adequate expressiveness for ontological query answering. Several decidable fragments of such rules have been introduced, employing restriction such as various forms of guardedness to ensure decidability. Some of the more well-known languages in this arena are (weakly) guarded and (weakly) frontier-guarded fragments of existential rules. In this paper, we explore their relative and absolute expressiveness. In particular, we provide a new proof that queries expressed via frontier-guarded and guarded rules can be translated into plain Datalog queries. Since the converse translations are impossible, we develop generalizations of frontier-guarded and guarded rules to nearly frontier-guarded and nearly guarded rules, respectively, which have exactly the expressive power of Datalog. We further show that weakly frontier-guarded rules can be translated into weakly guarded rules, and thus, weakly frontier-guarded and weakly guarded rules have exactly the same expressive power. Such rules cannot be translated into Datalog since their query answering problem is ExpTime-complete in data complexity. We strengthen this result by showing that on ordered databases and with input negation available, weakly guarded rules capture all queries decidable in exponential time. We then show that weakly guarded rules extended with stratified negation are expressive enough to capture all database queries decidable in exponential time, without any assumptions about input databases. Finally, we note that the translations of this paper are, in general, exponential in size, but lead to worst-case optimal algorithms for query answering with considered languages.
Article
Semantic query optimization is the process of finding equivalent rewritings of an input query given constraints that hold in a database instance. In this paper, we report about a Chase & Backchase (C&B) algorithm strategy that generalizes and improves on well-known methods in the field. The implementation of our approach, the Pegasussystem, outperforms existing C&B systems an average by two orders of magnitude. This gain in performance is due to a combination of novel methods that lower the complexity in practical situations significantly.
Article
This article discusses the two incarnations of Otter entered in the CADE-13 Automated Theorem Proving System Competition. Also presented are some historical background, a summary of applications that have led to new results in mathematics and logic, and a general discussion of Otter.
Article
We give a new decision procedure for the guarded fragment with equality. The procedure is based on resolution with superposition. We argue that this method will be more useful in practice than methods based on the enumeration of certain finite structures. It is surprising to see that one does not need any sophisticated simplification and redundancy elimination method to make superposition terminate on the class of clauses that is obtained from the clausification of guarded formulas. Yet the decision procedure obtained is optimal with regard to time complexity. We also show that the method can be extended to the loosely guarded fragment with equality.
Article
A unification algorithm is described which tests a set of expressions for unifiability and which requires time and space which are only linear in the size of the input.
Article
Much of the work to date on the optimization of queries for relational databases has focussed on the case where the only dependencies allowed are functional dependencies. We extend this work to the case where inclusion dependencies are also allowed. We show that there are substantial special cases where the presence of inclusion dependencies does not make the basic problems of optimization any harder than they are when there are no dependencies at all. In particular, we show that the problems of query containment, equivalence, and nonminimality remain in NP when either (a) all dependencies are inclusion dependencies or (b) the set of dependencies is what we call “key-based.” These results assume that infinite databases are allowed. If only finite databases are allowed, new containments and equivalences may arise, as we illustrate by an example, and the problems may be substantialy more difficult. We can, however, prove a “finite controllability” theorem that shows that no such examples exist for case (b), or for (a) when the only inclusion dependencies allowed are those having “width” equal to one.
Conference Paper
Data integration is a pervasive challenge faced in appli-cations that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own a multitude of data sources, for progress in large-scale scientific projects, where data sets are being produced independently by multiple researchers, for better cooperation among government agencies, each with their own data sources, and in o ering good search quality across the millions of structured data sources on the World-Wide Web. Ten years ago we published "Querying Heterogeneous In-formation Sources using Source Descriptions" [73], a paper describing some aspects of the Information Manifold data integration project. The Information Manifold and many other projects conducted at the time [5, 6, 20, 25, 38, 43, 51, 66, 100] have led to tremendous progress on data in-tegration and to quite a few commercial data integration products. This paper o ers a perspective on the contribu-tions of the Information Manifold and its peers, describes some of the important bodies of work in the data integra-tion field in the last ten years, and outlines some challenges to data integration research today. We note in advance that this is not intended to be a comprehensive survey of data integration, and even though the reference list is long, it is by no means complete.
Conference Paper
We present a family of expressive extensions of Datalog, called Data- log , as a new paradigm for query answering over ontologies. The Datalog family admits existentially quantified variables in rule heads, and has suitable restrictions to ensure highly efficient ontology querying. In particular, we show that query answering,under so-called guarded,Datalog,is PTIME-complete in data complexity, and that query answering under so-called linear Datalog is in AC0 in data complexity. We also show,how,negative constraints and a general class of key constraints can be added to Datalog,while keeping ontology query- ing tractable. We then show that linear Datalog , enriched with a special class of key constraints, generalizes the well-known DL-Lite family of tractable de- scription logics. Furthermore, the Datalog family is of interest in its own right and can, moreover, be used in various contexts such as data integration and data exchange. This work is a short version of [8].
Conference Paper
The set-unification and set-matching problems, which are very restricted cases of the associative-commutative-idempotent unification and matching problems, respectively, are shown to be NP-complete. The NP-completeness of the subsumption check in first-order resolution follows from these results. It is also shown that commutative-idempotent matching and associative-idempotent matching are NP-hard, thus implying that the idempotency of a function does not help in reducing the complexity of matching and unification problems.
Conference Paper
We give a resolution based decision procedure for the guarded fragment of H. Andréka, I. Németi, and J. van Benthem [J. Philos. Log. 27, 217-274 (1998; Zbl 0919.03013)]. The relevance of the guarded fragment lies in the fact that many modal logics can be translated into it. In this way the guarded fragment acts as a framework explaining the nice properties of these modal logics. By constructing an effective decision procedure for the guarded fragment we define an effective procedure for deciding these modal logics.
Article
In vielen praktischen Anwendungen logik-basierter Methoden besteht die Notwendigkeit Ausdrucksstaerke und Berechenbarkeitskomplexitaet gegeneinander abzuwaegen. Sowohl das auffinden entscheidbarer Teilklassen der Praedikatenlogik erster Stufe, als auch Erweiterungen der Modallogik zu maechtigeren, aber nach wie vor effizient handhabbaren logischen Sprachen, sind ein wesentliches Forschungsziel. Das bewachte Fragment der Praedikatenlogik erster Stufe stellt einen erfolgreichen Versuch dar, die entscheidenden Eigenschaften von Modal-, Temporal- und Entscheidungslogiken auf ein groesseres Fragment der Praedikatenlogik zu verallgemeinern. Neben der Entscheidbarkeit gehoeren dazu die Endliche-Modell-Eigenschaft, die Invarianz unter einem geeigneten Bisimulationsbegriff und andere modelltheoretische Eigenschaften wie eine entscheidbare Erweiterung um Fixpunkte. Das Ziel dieser Arbeit besteht darin ein groesseres Verstaendnis in die Uebereinstimmung zwischen dem modalen und dem bewachten Ansatz zu finden. Dabei wurden einige elementare, und typischerweise sehr modale Eigenschaften betreffende, Luecken in den Ergebnissen ueber bewachte Logiken geschlossen. Eine bewachte Logik zweiter Stufe sowie eine bewachte relationale Algebra werden eingefuehrt. Eine dabei in verschiedenen Zusammenhaengen benutzte Technik besteht darin Strukturen und Formeln der bewachten Welt in ihre modalen Entsprechungen zu ueberfuehren. Dies ermoeglicht den Transfer einiger Resultate unter gleichzeitiger Ermoeglichung tieferer Einblicke in die Natur bewachter Logiken. Betrachtete Themengebiete umfassen Tableau basierte Entscheidungsverfahren, eine Landkarte aktions-bewachter Logiken, Kanonisierung von Strukturen und modelltheoretische Charakterisierungen. For many practical applications of logic-based methods there is a requirement to balance expressive power against computational tractability. Both identifying decidable sub-classes of first-order logic, and extending modal logic to larger, but nevertheless efficiently solvable languages has been a preeminent goal of research. The guarded fragment of first-order logic was a successful attempt to transfer key tractability properties of modal, temporal, and description logics to a larger fragment of predicate logic. Besides decidability, guarded logics inherit the finite model property, invariance under an appropriate variant of bisimulation, and other nice model theoretic properties including a decidable fixed-point extension. The goal of this this work is to gain greater insight into the correspondence between the modal world and the guarded world. In this process, several gaps concerning basic, typically modal, features of guarded logics are closed. Guarded second order logic and guarded relational algebra are developed. A recurring key technique consists of encoding structures and formulae used in the guarded world as their modal counterparts. This enables the transfer of various results, as well as giving greater insight into the nature of guarded logics. Touched subjects include tableau-based decision procedures, mapping action guarded logics, canonisation of structures and model-theoretic characterisation theorems.
Article
The problem of answering queries using views is to nd ecient methods of answering a query using a set of previously de ned materialized views over the database, rather than accessing the database relations. The problem has recently received signi cant attention because of its relevance to a wide variety of data management problems. In query optimization, nding a rewriting of a query using a set of materialized views can yield a more ecient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, nding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the dierent applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.
Article
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.
Article
The data integration problem is to provide uniform access to multiple heterogeneous information sources available online (e.g., databases on the WWW). This problem has recently received considerable attention from researchers in the fields of Artificial Intelligence and Database Systems. The data integration problem is complicated by the facts that (1) sources contain closely related and overlapping data, (2) data is stored in multiple data models and schemas, and (3) data sources have differing query processing capabilities. A key element in a data integration system is the language used to describe the contents and capabilities of the data sources. While such a language needs to be as expressive as possible, it should also enable to efficiently address the main inference problem that arises in this context: to translate a user query that is formulated over a mediated schema into a query on the local schemas. This paper describes several languages for describing contents of data sources, the tradeoffs between them, and the associated reformulation algorithms.
Article
this paper. References
Normal Form Transformations. In Handbook of Automated Reasoning. MIT Press
  • Baaz M.
Resolution Theorem Proving. In Handbook of Automated Reasoning. MIT Press
  • Leo Bachmair
  • Harald Ganzinger
  • Bachmair Leo
Giorgio Terracina, and Pierfrancesco Veltri. 2012. Magic-Sets for Datalog with Existential Quantifiers
  • Mario Alviano
  • Nicola Leone
  • Marco Manna
  • Giorgio Terracina
  • Pierfrancesco Veltri
  • Alviano Mario
Vince Bárány, Michael Benedikt, and Balder Ten Cate. 2013. Rewriting Guarded Negation Queries
  • Vince Bárány
  • Michael Benedikt
  • Balder Ten
  • Cate
  • Bárány Vince
Kevin Kappelmann. 2019. Decision Procedures for Guarded Logics
  • Kevin Kappelmann
  • Kappelmann Kevin
Rewriting Guarded Existential Rules into Small Datalog Programs
  • Shqiponja Ahmetaj
  • Magdalena Ortiz
  • Mantas Simkus
  • Ahmetaj Shqiponja
2022. The KAON2 System . Karslruhe Institute of Technology
  • Boris Motik
  • Motik Boris
Deepak Kapur and Paliath Narendran. 1986. NP-Completeness of the Set Unification and Matching Problems
  • Deepak Kapur
  • Paliath Narendran
  • Kapur Deepak
Andrea Calì Domenico Lembo and Riccardo Rosati. 2003. Query rewriting and answering under constraints in data integration systems
  • Andrea Calì
  • Domenico Lembo
  • Riccardo Rosati