Thesis

Counting queries in ontology-based data access

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Ontology-mediated query answering (OMQA) is a promising approach to data access and integration that has been actively studied in the knowledge representation and database communities for more than a decade. The vast majority of work on OMQA focuses on conjunctive queries, whereas more expressive queries that feature counting or other forms of aggregation remain largely unexplored. In this thesis, we introduce a general form of counting conjunctive query (CCQ), relate it to previous proposals, and study the complexity of answering such queries in the presence of ontologies expressed in the description logic ALCHI or its sublogics. As the general case of CCQ answering is intractable and often of high complexity over such ontologies, we consider two practically relevant restrictions, namely rooted CCQs and Boolean atomic CCQs, for which we establish improved complexity bounds.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The answer to a counting query in a model of the KB is then obtained as the number of different assignments for the counting variables when considering every possible homomorphism of the CQ into the model. Finding uniform bounds on those answers, i.e. model-independent bounds, has been viewed as a notion of certain answers and is now well-understood for a variety of DLs (Calvanese et al. 2020;Bienvenu, Manière, and Thomazo 2022;Manière 2022). The following example highlights that even the tightest uniform bounds give, in general, a poor over-approximation of the set of answers. ...
... In DL-Lite core , we can perform the satisfiability checks, the membership tests, and the minimality tests in polynomial time, see (Calvanese et al. 2006) and (Manière 2022) [Theorem 51] respectively, resulting in overall polynomial running time. ...
... Thus the shape of the spectrum is m, ∞ . By (Manière 2022), there is a polynomial time algorithm deciding Sp K (q C ) ⊆ i, ∞ and, thus, m can be found in polynomial time by inspecting all numbers in the interval 1, ℓ . ...
Preprint
Full-text available
Recent works have explored the use of counting queries coupled with Description Logic ontologies. The answer to such a query in a model of a knowledge base is either an integer or \infty, and its spectrum is the set of its answers over all models. While it is unclear how to compute and manipulate such a set in general, we identify a class of counting queries whose spectra can be effectively represented. Focusing on atomic counting queries, we pinpoint the possible shapes of a spectrum over ALCIF\mathcal{ALCIF} ontologies: they are essentially the subsets of N{}\mathbb{N} \cup \{ \infty \} closed under addition. For most sublogics of ALCIF\mathcal{ALCIF}, we show that possible spectra enjoy simpler shapes, being [m,][ m, \infty ] or variations thereof. To obtain our results, we refine constructions used for finite model reasoning and notably rely on a cycle-reversion technique for the Horn fragment of ALCIF\mathcal{ALCIF}. We also study the data complexity of computing the proposed effective representation and establish the FPNP[log]\mathsf{FP}^{\mathsf{NP}[\log]}-completeness of this task under several settings.
Conference Paper
Full-text available
Counting answers to a query is an operation supported by virtually all database management systems. In this paper we focus on counting answers over a Knowledge Base (KB), which may be viewed as a database enriched with background knowledge about the domain under consideration. In particular, we place our work in the context of Ontology-Mediated Query Answering/Ontology-based Data Access (OMQA/OBDA), where the language used for the ontology is a member of the DL-Lite family and the data is a (usually virtual) set of assertions. We study the data complexity of query answering, for different members of the DL-Lite family that include number restrictions, and for variants of conjunctive queries with counting that differ with respect to their shape (connected, branching, rooted). We improve upon existing results by providing PTIME and coNP lower bounds, and upper bounds in PTIME and LOGSPACE. For the LOGSPACE case, we have devised a novel query rewriting technique into first-order logic with counting.
Article
Full-text available
Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign to ontology predicates views over the data. The conventional semantics of OBDA is set-based—that is, the extension of the views defined by the mappings does not contain duplicate tuples. This treatment is, however, in disagreement with the standard semantics of database views and database management systems in general, which is based on bags and where duplicate tuples are retained by default. The distinction between set and bag semantics in databases is very significant in practice, and it influences the evaluation of aggregate queries. In this article, we propose and study a bag semantics for OBDA which provides a solid foundation for the future study of aggregate and analytic queries. Our semantics is compatible with both the bag semantics of database views and the set-based conventional semantics of OBDA. Furthermore, it is compatible with existing bag-based semantics for data exchange recently proposed in the literature. We show that adopting a bag semantics makes conjunctive query answering in OBDA CONP-hard in data complexity. To regain tractability of query answering, we consider suitable restrictions along three dimensions, namely, the query language, the ontology language, and the adoption of the unique name assumption. Our investigation shows a complete picture of the computational properties of query answering under bag semantics over ontologies in the DL-Lite family.
Conference Paper
Full-text available
We present RDFox—a main-memory, scalable, centralised RDF store that supports materialisation-based parallel datalog reasoning and SPARQL query answering. RDFox uses novel and highly-efficient parallel reasoning algorithms for the computation and incremental update of datalog materialisations with efficient handling of owl:sameAs. In this system description paper, we present an overview of the system architecture and highlight the main ideas behind our indexing data structures and our novel reasoning algorithms. In addition, we evaluate RDFox on a high-end SPARC T5-8 server with 128 physical cores and 4TB of RAM. Our results show that RDFox can effectively exploit such a machine, achieving speedups of up to 87 times, storage of up to 9.2 billion triples, memory usage as low as 36.9 bytes per triple, importation rates of up to 1 million triples per second, and reasoning rates of up to 6.1 million triples per second.
Article
Full-text available
Description Logics (DLs) are used in knowledge-based systems to represent and reason about terminological knowledge of the application domain in a semantically well-defined manner. In this thesis, we establish a number of novel complexity results and give practical algorithms for expressive DLs that provide different forms of counting quantifiers. We show that, in many cases, adding local counting in the form of qualifying number restrictions to DLs does not increase the complexity of the inference problems, even if binary coding of numbers in the input is assumed. On the other hand, we show that adding different forms of global counting restrictions to a logic may increase the complexity of the inference problems dramatically. We provide exact complexity results and a practical, tableau based algorithm for the DL SHIQ, which forms the basis of the highly optimized DL system iFaCT. Finally, we describe a tableau algorithm for the clique guarded fragment (CGF), which we hope will serve as the basis for an efficient implementation of a CGF reasoner.
Conference Paper
Full-text available
We study the data complexity of instance checking and conjunctive query answering in the EL family of description logics, with a particular emphasis on the boundary of tractability. We identify a large number of intractable extensions of EL, but also show that in ELIf, the extension of EL with inverse roles and global functionality, conjunctive query answering is tractable regarding data complexity. In contrast, already instance checking in EL extended with only inverse roles or global functionality is EXPTIME-complete regarding combined complexity.
Article
Full-text available
Ontology-based data access is concerned with querying incomplete data sources in the presence of domain-specific knowledge provided by an ontology. A central notion in this setting is that of an ontology-mediated query, which is a database query coupled with an ontology. In this paper, we study several classes of ontology-mediated queries, where the database queries are given as some form of conjunctive query and the ontologies are formulated in description logics or other relevant fragments of first-order logic, such as the guarded fragment and the unary-negation fragment. The contributions of the paper are three-fold. First, we characterize the expressive power of ontology-mediated queries in terms of fragments of disjunctive datalog. Second, we establish intimate connections between ontology-mediated queries and constraint satisfaction problems (CSPs) and their logical generalization, MMSNP formulas. Third, we exploit these connections to obtain new results regarding (i) first-order rewritability and datalog-rewritability of ontology-mediated queries, (ii) P/NP dichotomies for ontology-mediated queries, and (iii) the query containment problem for ontology-mediated queries.
Article
Full-text available
The recently introduced series of description logics under the common moniker 'DL-Lite' has attracted attention of the description logic and semantic web communities due to the low computational complexity of inference, on the one hand, and the ability to represent conceptual modeling formalisms, on the other. The main aim of this article is to carry out a thorough and systematic investigation of inference in extensions of the original DL-Lite logics along five axes: by (i) adding the Boolean connectives and (ii) number restrictions to concept constructs, (iii) allowing role hierarchies, (iv) allowing role disjointness, symmetry, asymmetry, reflexivity, irreflexivity and transitivity constraints, and (v) adopting or drop-ping the unique name assumption. We analyze the combined complexity of satisfiability for the resulting logics, as well as the data complexity of instance checking and answering positive existential queries. Our approach is based on embedding DL-Lite logics in suit-able fragments of the one-variable first-order logic, which provides useful insights into their properties and, in particular, computational behavior.
Article
Full-text available
We propose a new family of description logics (DLs), called DL-Lite, specifically tailored to capture basic ontology languages, while keeping low complexity of reasoning. Reasoning here means not only computing subsumption between concepts and checking satisfiability of the whole knowledge base, but also answering complex queries (in particular, unions of conjunctive queries) over the instance level (ABox) of the DL knowledge base. We show that, for the DLs of the DL-Litefamily, the usual DL reasoning tasks are polynomial in the size of the TBox, and query answering is LogSpace in the size of the ABox (i.e., in data complexity). To the best of our knowledge, this is the first result of polynomial-time data complexity for query answering over DL knowledge bases. Notably our logics allow for a separation between TBox and ABox reasoning during query evaluation: the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine, thus taking advantage of the query optimization strategies provided by current database management systems. Since even slight extensions to the logics of the DL-Litefamily make query answering at least NLogSpace in data complexity, thus ruling out the possibility of using on-the-shelf relational technology for query processing, we can conclude that the logics of the DL-Litefamily are the maximal DLs supporting efficient query answering over large amounts of instances.
Conference Paper
Full-text available
We provide an ExpTime algorithm for answering conjunctive queries (CQs) in Horn- SHIQ\mathcal{SHIQ} , a Horn fragment of the well-known Description Logic SHIQ\mathcal{SHIQ} underlying the OWL-Lite standard. The algorithm employs a domino system for model representation, which is constructed via a worst-case optimal tableau algorithm for Horn- SHIQ\mathcal{SHIQ} ; the queries are answered by reasoning over the domino system. Our algorithm not only shows that CQ answering in Horn- SHIQ\mathcal{SHIQ} is not harder than satisfiability testing, but also that it is polynomial in data complexity, making Horn- SHIQ\mathcal{SHIQ} an attractive expressive Description Logic.
Article
Full-text available
The OWL Web Ontology Language is a new formal language for representing ontologies in the Semantic Web. OWL has features from several families of representation languages, including primarily Description Logics and frames. OWL also shares many characteristics with RDF, the W3C base of the Semantic Web. In this paper, we discuss how the philosophy and features of OWL can be traced back to these older formalisms, with modifications driven by several other constraints on OWL. Several interesting problems have arisen where these influences on OWL have clashed.
Book
Description logics (DLs) have a long tradition in computer science and knowledge representation, being designed so that domain knowledge can be described and so that computers can reason about this knowledge. DLs have recently gained increased importance since they form the logical basis of widely used ontology languages, in particular the web ontology language OWL. Written by four renowned experts, this is the first textbook on description logics. It is suitable for self-study by graduates and as the basis for a university course. Starting from a basic DL, the book introduces the reader to their syntax, semantics, reasoning problems and model theory and discusses the computational complexity of these reasoning problems and algorithms to solve them. It then explores a variety of reasoning techniques, knowledge-based applications and tools and it describes the relationship between DLs and OWL.
Conference Paper
While ontology-mediated query answering most often adopts (unions of) conjunctive queries as the query language, some recent works have explored the use of counting queries coupled with DL-Lite ontologies. The aim of the present paper is to extend the study of counting queries to Horn description logics outside the DL-Lite family. Through a combination of novel techniques, adaptations of existing constructions, and new connections to closed predicates, we achieve a complete picture of the data and combined complexity of answering counting conjunctive queries (CCQs) and cardinality queries (a restricted class of CCQs) in ELHI⊥ and its various sublogics. Notably, we show that CCQ answering is 2EXP-complete in combined complexity for ELHI⊥ and every sublogic that extends EL or DL-Lite-pos-H. Our study not only provides the first results for counting queries beyond DL-Lite, but it also closes some open questions about the combined complexity of CCQ answering in DL-Lite.
Article
Motivated by applications in declarative data analysis, in this article, we study Datalog Z —an extension of Datalog with stratified negation and arithmetic functions over integers. This language is known to be undecidable, so we present the fragment of limit Datalog Z programs, which is powerful enough to naturally capture many important data analysis tasks. In limit Datalog Z , all intensional predicates with a numeric argument are limit predicates that keep maximal or minimal bounds on numeric values. We show that reasoning in limit Datalog Z is decidable if a linearity condition restricting the use of multiplication is satisfied. In particular, limit-linear Datalog Z is complete for Δ 2 EXP and captures Δ 2 P over ordered datasets in the sense of descriptive complexity. We also provide a comprehensive study of several fragments of limit-linear Datalog Z . We show that semi-positive limit-linear programs (i.e., programs where negation is allowed only in front of extensional atoms) capture coNP over ordered datasets; furthermore, reasoning becomes coNEXP-complete in combined and coNP-complete in data complexity, where the lower bounds hold already for negation-free programs. In order to satisfy the requirements of data-intensive applications, we also propose an additional stability requirement, which causes the complexity of reasoning to drop to EXP in combined and to P in data complexity, thus obtaining the same bounds as for usual Datalog. Finally, we compare our formalisms with the languages underpinning existing Datalog-based approaches for data analysis and show that core fragments of these languages can be encoded as limit programs; this allows us to transfer decidability and complexity upper bounds from limit programs to other formalisms. Therefore, our article provides a unified logical framework for declarative data analysis which can be used as a basis for understanding the impact on expressive power and computational complexity of the key constructs available in existing languages.
Conference Paper
Ontology-mediated query answering (OMQA) employs structured knowledge and automated reasoning in order to facilitate access to incomplete and possibly heterogeneous data. While most research on OMQA adopts (unions of) conjunctive queries as the query language, there has been recent interest in handling queries that involve counting. In this paper, we advance this line of research by investigating cardinality queries (which correspond to Boolean atomic counting queries) coupled with DL-Lite ontologies. Despite its apparent simplicity, we show that such an OMQA setting gives rise to rich and complex behaviour. While we prove that cardinality query answering is tractable (TC0) in data complexity when the ontology is formulated in DL-Lite-core, the problem becomes coNP-hard as soon as role inclusions are allowed. For DL-Lite-pos-H (which allows only positive axioms), we establish a P-coNP dichotomy and pinpoint the TC0 cases; for DL-Lite-core-H (allowing also negative axioms), we identify new sources of coNP complexity and also exhibit L-complete cases. Interestingly, and in contrast to related tractability results, we observe that the canonical model may not give the optimal count value in the tractable cases, which led us to develop an entirely new approach based upon exploring a space of strategies to determine the minimum possible number of query matches.
Conference Paper
Description Logics (DLs) support so-called anonymous objects, which significantly contribute to the expressiveness of these KR languages, but also cause substantial computational challenges. This paper investigates reasoning about upper bounds on predicate sizes for ontologies written in the expressive DL ALCHOIQ extended with closed predicates. We describe a procedure based on integer programming that allows us to decide the existence of upper bounds on the cardinality of some predicate in the models of a given ontology in a data-independent way. Our results yield a promising supporting tool for constructing higher quality ontologies, and provide a new way to push the decidability frontiers. To wit, we define a new safety condition for Datalog-based queries over DL ontologies, while retaining decidability of query entailment.
Article
Humans and intelligent computer programs must often jump to the conclusion that the objects they can determine to have certain properties or relations are the only objects that do. Circumscription formalizes such conjectural reasoning.
Conference Paper
Ontology-mediated query answering (OMQA) is a promising approach to data access and integration that has been actively studied in the knowledge representation and database communities for more than a decade. The vast majority of work on OMQA focuses on conjunctive queries, whereas more expressive queries that feature counting or other forms of aggregation remain largely unexplored. In this paper, we introduce a general form of counting query, relate it to previous proposals, and study the complexity of answering such queries in the presence of DL-Lite ontologies. As it follows from existing work that query answering is intractable and often of high complexity, we consider some practically relevant restrictions, for which we establish improved complexity bounds.
Article
Motivated by applications in declarative data analysis, we study DatalogZ-an extension of Datalog with stratified negation and arithmetics over integers. Reasoning in this language is undecidable, so we present a fragment, called limit DatalogZ, that is powerful enough to naturally capture many important data analysis tasks. In limit DatalogZ, all intensional predicates with a numeric argument are limit predicates that keep only the maximal or minimal bounds on numeric values. Reasoning in limit DatalogZ is decidable if multiplication is used in a way that satisfies our linearity condition. Moreover, fact entailment in limit-linear DatalogZ is ΔEXP 2 -complete in combined and ΔP2 -complete in data complexity, and it drops to coNEXP and coNP, respectively, if only (semi-)positive programs are considered. We also propose an additional stability requirement, for which the complexity drops to EXP and P, matching the bounds for usual Datalog. Limit DatalogZ thus provides us with a unified logical framework for declarative data analysis and can be used as a basis for understanding the expressive power of the key data analysis constructs.
Article
This paper is the tutorial we wish we had had available when starting our own research on constant delay enumeration for conjunctive queries. It provides precise statements and detailed, self-contained proofs of the fundamental results in this area.
Conference Paper
We present the framework of ontology-based data access, a semantic paradigm for providing a convenient and user-friendly access to data repositories, which has been actively developed and studied in the past decade. Focusing on relational data sources, we discuss the main ingredients of ontology-based data access, key theoretical results, techniques, applications and future challenges.
Conference Paper
We consider ontology-mediated queries (OMQs) based on an EL ontology and an atomic query (AQ), provide an ultimately fine-grained analysis of data complexity and study rewritability into linear Datalog-aiming to capture linear recursion in SQL. Our main results are that every such OMQ is in AC0, NL-complete or PTime-complete, and that containment in NL coincides with rewritability into linear Datalog (whereas containment in AC0 coincides with rewritability into first-order logic). We establish natural characterizations of the three cases, show that deciding linear Datalog rewritability (as well as the mentioned complexities) is ExpTime-complete, give a way to construct linear Datalog rewritings when they exist, and prove that there is no constant bound on the arity of IDB relations in linear Datalog rewritings.
Conference Paper
Answering aggregate queries is a key requirement of emerging applications of Semantic Technologies, such as data warehousing, business intelligence and sensor networks. In order to fulfill the requirements of such applications, the standardisation of SPARQL 1.1 led to the introduction of a wide range of constructs that enable value computation, aggregation, and query nesting. In this paper we provide an in-depth formal analysis of the semantics and expressive power of these new constructs as defined in the SPARQL 1.1 specification, and hence lay the necessary foundations for the development of robust, scalable and extensible query engines supporting complex numerical and analytics tasks.
Chapter
Recent years have seen an increasing interest in ontology-mediated query answering, in which the semantic knowledge provided by an ontology is exploited when querying data. Adding an ontology has several advantages (e.g. simplifying query formulation, integrating data from different sources, providing more complete answers to queries), but it also makes the query answering task more difficult. In this chapter, we give a brief introduction to ontology-mediated query answering using description logic (DL) ontologies. Our focus will be on DLs for which query answering scales polynomially in the size of the data, as these are best suited for applications requiring large amounts of data. We will describe the challenges that arise when evaluating different natural types of queries in the presence of such ontologies, and we will present algorithmic solutions based upon two key concepts, namely, query rewriting and saturation. We conclude the chapter with an overview of recent results and active areas of ongoing research.
Article
OWL 2 EL is a popular ontology language that supports role inclusions|axioms of the form S1 ⋯ Sn ⊆ S that capture compositional properties of roles. Role inclusions closely correspond to context-free grammars, which was used to show that answering conjunctive queries (CQs) over OWL 2 EL knowledge bases with unrestricted role inclusions is undecidable. However, OWL 2 EL inherits from OWL 2 DL the syntactic regularity restriction on role inclusions, which ensures that role chains implying a particular role can be described using a finite automaton (FA). This is sufficient to ensure decidability of CQ answering; however, the FAs can be worst-case exponential in size so the known approaches do not provide a tight upper complexity bound. In this paper, we solve this open problem and show that answering CQs over OWL 2 EL knowledge bases is PSpace-complete in combined complexity (i.e., the complexity measured in the total size of the input). To this end, we use a novel encoding of regular role inclusions using bounded-stack pushdown automata|that is, FAs extended with a stack of bounded size. Apart from theoretical interest, our encoding can be used in practical tableau algorithms to avoid the exponential blowup due to role inclusions. In addition, we sharpen the lower complexity bound and show that the problem is PSpace-hard even if we consider only role inclusions as part of the input (i.e., the query and all other parts of the knowledge base are fixed). Finally, we turn our attention to navigational queries over OWL 2 EL knowledge bases, and we show that answering positive, converse-free conjunctive graph XPath queries is PSpace-complete as well; this is interesting since allowing the converse operator in queries is known to make the problem ExpTime-hard. Thus, in this paper we present several important contributions to the landscape of the complexity of answering expressive queries over description logic knowledge bases.
Article
Conjunctive regular path queries are an expressive extension of the well-known class of conjunctive queries. Such queries have been extensively studied in the (graph) database community, since they support a controlled form of recursion and enable sophisticated path navigation. Somewhat surprisingly, there has been little work aimed at using such queries in the context of description logic (DL) knowledge bases, particularly for the lightweight DLs that are considered best suited for data-intensive applications. This paper aims to bridge this gap by providing algorithms and tight complexity bounds for answering two-way conjunctive regular path queries over DL knowledge bases formulated in lightweight DLs of the DL-Lite and script E葦 families. Our results demonstrate that in data complexity, the cost of moving to this richer query language is as low as one could wish for: the problem is NL-complete for DL-Lite and P-complete for script E葦. The combined complexity of query answering increases from NP- to PSpace-complete, but for two-way regular path queries (without conjunction), we show that query answering is tractable even with respect to combined complexity. Our results reveal two-way conjunctive regular path queries as a promising language for querying data enriched by ontologies formulated in DLs of the DL-Lite and script E葦 families or the corresponding OWL 2 QL and EL profiles.
Article
Ontology-based data access (OBDA) aims at enriching query answering by taking general background knowledge into account when evaluating queries. This background knowledge is represented by means of an ontology, that is expressed in this thesis by a very expressive class of first-order formulas, called existential rules (sometimes also tuple-generating dependencies and Datalog+/-). The high expressivity of the used formalism results in the undecidability of query answering, and numerous decidable classes (that is, restrictions on the sets of existential rules) have been proposed in the literature. The contribution of this thesis is two-fold: first, we propose a unified view of a large part of these classes, together with a complexity analysis and a worst-case optimal algorithm for the introduced generic class. Second, we consider the popular approach of query rewriting, and propose a generic algorithm that overcomes trivial causes of combinatorial explosion that make classical approaches inapplicable.
Article
While the problem of answering positive existential queries, in particular, conjunctive queries (CQs) and unions of CQs, over description logic ontologies has been studied extensively, there have been few attempts to analyse queries with negated atoms. Our aim is to sharpen the complexity landscape of the problem of answering CQs with negation and inequalities in lightweight description logics of the DL-Lite and families. We begin by considering queries with safe negation and show that there is a surprisingly significant increase in the complexity from AC to undecidability (even if the ontology and query are fixed and only the data is regarded as input). We also investigate the problem of answering queries with inequalities and show that answering a single CQ with one inequality over DL-Lite with role inclusions is undecidable. In the light of our undecidability results, we explore syntactic restrictions to attain efficient query answering with negated atoms. In particular, we identify a novel class of local CQs with inequalities, for which query answering over DL-Lite is decidable.
Article
The ontology based data access model assumes that users access data by means of an ontology, which is often described in terms of description logics. As a consequence, languages for managing ontologies now need algorithms not only to decide standard reasoning problems, but also to answer database-like queries. However, fundamental database aggregate queries, such as the ones using functions COUNT and COUNT DISTINCT, have received very little attention in this context, and even defining appropriate semantics for their answers over ontologies appears to be a non-trivial task. Our goal is to study the problem of answering database queries with aggregation in the context of ontologies. This paper presents an intuitive semantics for answering counting queries, followed by a comparison with similar approaches that have been taken in different database contexts. Afterwards, it exhibits a thorough study of the computational complexity of evaluating counting queries conforming to this semantics. Our results show that answering such queries over ontologies is decidable, but generally intractable. However, our semantics promotes awareness on the information that can be obtained by querying ontologies and raises the need to look for suitable approximations or heuristics in order to allow efficient evaluation of this widely used class of queries.
Article
Aggregate functions in relational query languages allow intricate reports to be written. In this paper aggregate functions are precisely defined. The definiation does not use the notion of ″duplicates.″ Relational algebra and relational calculus are extended in a general and natural fashion to include aggregate functions. It is shown that the languages so extended have equivalent expressive power.
Conference Paper
When answering queries in the presence of ontologies, adopting the closed world assumption for some predicates easily results in intractability. We analyze this situation on the level of individual ontologies formulated in the description logics DL-Lite and EL and show that in all cases where answering conjunctive queries (CQs) with (open and) closed predicates is tractable, it coincides with answering CQs with all predicates assumed open. In this sense, CQ answering with closed predicates is inherently intractable. Our analysis also yields a dichotomy between AC0 and CONP for CQ answering w.r.t. ontologies formulated in DL-Lite and a dichotomy between PTIME and CONP for EL. Interestingly, the situation is less dramatic in the more expressive description logic ELI, where we find ontologies for which CQ answering is in PTIME, but does not coincide with CQ answering where all predicates are open.
Conference Paper
One of the most prominent applications of description logic ontologies is their use for accessing data. In this setting, ontologies provide an abstract conceptual layer of the data schema, and queries over the ontology are then used to access the data. In this paper we focus on extensions of conjunctive queries (CQs) and unions of conjunctive queries (UCQs) with restricted forms of negations such as inequality and safe negation. In particular, we consider ontologies based on members of the DL-Lite family. We show that by extending UCQs with any form of negated atoms, the problem of query answering becomes undecidable even when considering ontologies expressed in the core fragment of DL-Lite. On the other hand, we show that answering CQs with inequalities is decidable for ontologies expressed in DL-LitecoreHLite^{\mathcal{H}}_{core}. To this end, we provide an algorithm matching the known coNP lower bound on data complexity. Furthermore, we identify a setting in which conjunctive query answering with inequalities is tractable. We regain tractability by means of syntactic restrictions on the queries, but keeping the expressiveness of the ontology.
Conference Paper
It is a classic result in database theory that conjunctive query (CQ) answering, which is NP-complete in general, is feasible in polynomial time when restricted to acyclic queries. Subsequent results identified more general structural properties of CQs (like bounded treewidth) which ensure tractable query evaluation. In this paper, we lift these tractability results to knowledge bases formulated in the lightweight description logics DL-Lite and ELH. The proof exploits known properties of query matches in these logics and involves a query-dependent modification of the data. To obtain a more practical approach, we propose a concrete polynomial-time algorithm for answering acyclic CQs based on rewriting queries into datalog programs. A preliminary evaluation suggests the interest of our approach for handling large acyclic CQs.
Article
Description logics (DLs) have become a prominent paradigm for representing knowledge in a variety of application areas, partly due to their ability to achieve a favourable balance between expressivity of the logic and performance of reasoning. Horn description logics are obtained, roughly speaking, by disallowing all forms of disjunctions. They have attracted attention since their (worst-case) data complexities are in general lower than those of their non-Horn counterparts, which makes them attractive for reasoning with large sets of instance data (ABoxes). It is therefore natural to ask whether Horn DLs also provide advantages for schema (TBox) reasoning, that is, whether they also feature lower combined complexities. This article settles this question for a variety of Horn DLs. An example of a tractable Horn logic is the DL underlying the ontology language OWL RL, which we characterize as the Horn fragment of the description logic SROIQ without existential quantifiers. If existential quantifiers are allowed, however, many Horn DLs become intractable. We find that Horn-ALC already has the same worst-case complexity as ALC, that is, ExpTime, but we also identify various DLs for which reasoning is PSpace-complete. As a side effect, we derive simplified syntactic definitions of Horn DLs for which we exploit suitable normal form transformations.
Book
With more substantial funding from research organizations and industry, numerous large-scale applications, and recently developed technologies, the Semantic Web is quickly emerging as a well-recognized and important area of computer science. While Semantic Web technologies are still rapidly evolving, Foundations of Semantic Web Technologies focuses on the established foundations in this area that have become relatively stable over time. It thoroughly covers basic introductions and intuitions, technical details, and formal foundations. The book concentrates on Semantic Web technologies standardized by the World Wide Web Consortium: RDF and SPARQL enable data exchange and querying, RDFS and OWL provide expressive ontology modeling, and RIF supports rule-based modeling. The text also describes methods for specifying, querying, and reasoning with ontological information. In addition, it explores topics that are clearly beyond foundations, such as tools, applications, and engineering aspects. Written by highly respected researchers with a deep understanding of the material, this text centers on the formal specifications of the subject and supplies many pointers that are useful for employing Semantic Web technologies in practice. The book has an accompanying website with supplemental information.
Conference Paper
Existing definitions of the relativizations of NC 1, L and NL do not preserve the inclusions NC1L,NLAC1{{\bf NC}^1 \subseteq {\bf L}, {\bf NL}\subseteq {\bf AC}^1}. We start by giving the first definitions that preserve them. Here for L and NL we define their relativizations using Wilson’s stack oracle model, but limit the height of the stack to a constant (instead of log(n)). We show that the collapse of any two classes in {AC0(m),TC0,NC1,L,NL}{\{{\bf AC}^0 (m), {\bf TC}^0, {\bf NC}^1, {\bf L}, {\bf NL}\}} implies the collapse of their relativizations. Next we exhibit an oracle α that makes AC k (α) a proper hierarchy. This strengthens and clarifies the separations of the relativized theories in Takeuti (1995). The idea is that a circuit whose nested depth of oracle gates is bounded by k cannot compute correctly the (k + 1) compositions of every oracle function. Finally, we develop theories that characterize the relativizations of subclasses of P by modifying theories previously defined by the second two authors. A function is provably total in a theory iff it is in the corresponding relativized class, and hence, the oracle separations imply separations for the relativized theories.
Article
The complexity class PP consists of all decision problems solvable by polynomial-time probabilistic Turing machines. It is well known that PP is a highly intractable complexity class and that PP-complete problems are in all likelihood harder than NP-complete problems. We investigate the existence of phase transitions for a family of PP-complete Boolean satisfiability problems under the fixed clauses-to-variables ratio model. A typical member of this family is the decision problem # 3SAT(⩾2n/2): given a 3CNF-formula, is it satisfied by at least the square-root of the total number of possible truth assignments? We provide evidence to the effect that there is a critical ratio r3,2 at which the asymptotic probability of # 3SAT(⩾2n/2) undergoes a phase transition from 1 to 0. We obtain upper and lower bounds for r3,2 by showing that 0.9227⩽r3,2⩽2.595. We also carry out a set of experiments on random instances of # 3SAT(⩾2n/2) using a natural modification of the Davis–Putnam–Logemann–Loveland (DPLL) procedure. Our experimental results suggest that r3,2≈2.5. Moreover, the average number of recursive calls of this modified DPLL procedure reaches a peak around 2.5 as well.
Article
The addition of aggregates has been one of the most relevant enhancements to the language of answer set programming (ASP). They strengthen the modelling power of ASP in terms of natural and concise problem representations. Previous semantic definitions typically agree in the case of non-recursive aggregates, but the picture is less clear for aggregates involved in recursion. Some proposals explicitly avoid recursive aggregates, most others differ, and many of them do not satisfy desirable criteria, such as minimality or coincidence with answer sets in the aggregate-free case.
Article
It is a folk result in database theory that SQL cannot express recursive queries such as reachability; in fact, a new construct was added to SQL3 to overcome this limitation. However, the evidence for this claim is usually given in the form of a reference to a proof that relational algebra cannot express such queries. SQL, on the other hand, in all its implementations has three features that fundamentally distinguish it from relational algebra: namely, grouping, arithmetic operations, and aggregation.In the past few years, most questions about the additional power provided by these features have been answered. This paper surveys those results, and presents new simple and self-contained proofs of the main results on the expressive power of SQL. Somewhat surprisingly, tiny differences in the language definition affect the results in a dramatic way: under some very natural assumptions, it can be proved that SQL cannot define recursive queries, no matter what aggregate functions and arithmetic operations are allowed. But relaxing these assumptions just a tiny bit makes the problem of proving expressivity bounds for SQL as hard as some long-standing open problems in complexity theory.
Article
A new randomized algorithm for the maximum matching problem is presented. Unlike conventional matching algorithms which are combinatorial, our algorithm is algebraic and works on the Tutte matrix of the given graph. Although slower than the best known matching algorithm, our algorithm has the advantage of being conceptually simple and easy to program.
Article
Answering conjunctive queries (CQs) has been recognized as an important task for the widening use of Description Logics (DLs) in a number of applications. The problem has been studied by many authors, who developed a number of different techniques for its solution. We present a novel approach to CQ answering that is based on knots, which are schematic trees of depth at most one that can be used to represent the terminological information represented in a TBox. They allow us to obtain an algorithm for the DL SH that has some advantages with respect to previous approaches, proceeding as follows. We build a compilation of an input knowledge base using knots, and then use this compilation to answer CQs in two stages. In the first stage we employ knots to rewrite the input query into a set of queries (a union of CQs, short UCQ) that incorporate the terminological constraints. In the next stage we answer the query over the full knowledge base, by answering the constructed UCQ over a set of relational structures that are obtained by enriching the assertional part of the knowledge base. Since in the first stage we process the query and the taxonomy, and the assertional part of the knowledge base is only processed in the second stage, parts of the computation can be reused; in particular, answering a query over changing assertional data amounts to re-executing the last step. Notably, the algorithm handles CQs with distinguished (i.e., output) variables in a direct manner and scales down nicely: while double exponential in general, it runs in single exponential time under various restrictions on transitive roles in queries, including the case of CQ answering in the DL ALCH. This is worst-case optimal, given that CQ answering is 2ExpTime-complete for SH and ExpTime-complete already for the core expressive DL ALC. Furthermore, the last step is amenable to a realization in disjunctive Datalog, which yields a worst-case optimal implementation under data complexity.
Article
We describe carin, a novel family of representation languages, that combine the expressive power of Horn rules and of description logics. We address the issue of providing sound and complete inference procedures for such languages. We identify existential entailment as a core problem in reasoning in carin, and describe an existential entailment algorithm for the description logic. As a result, we obtain a sound and complete algorithm for reasoning in non-recursive carin knowledge bases, and an algorithm for rule subsumption over . We show that in general, the reasoning problem for recursive carin- knowledge bases is undecidable, and identify the constructors of causing the undecidability. We show two ways in which carin- knowledge bases can be restricted while obtaining sound and complete reasoning.
Article
We characterize the polynomial time computable queries as those expressible in relational calculus plus a least fixed point operator and a total ordering on the universe. We also show that even without the ordering one application of fixed point suffices to express any query expressible with several alternations of fixed point and negation. This proves that the fixed point query hierarchy suggested by Chandra and Harel collapses at the first fixed point level. It is also a general result showing that in finite model theory one application of fixed point suffices.
Article
We exhibit several problems complete for deterministic logarithmic space under NC1 (i.e., log depth) reducibility. The list includes breadth-first search and depth-first search of an undirected tree, connectivity of undirected graphs known to be made up of one or more disjoint cycles, undirected graph acyclicity, and several problems related to representing and to operating with permutations of a finite set.
Article
KL-ONE is a system for representing knowledge in Artificial Intelligence programs. It has been developed and refined over a long period and has been used in both basic research and implemented knowledge-based systems in a number of places in the AI community. Here we present the kernel ideas of KL-ONE, emphasizing its ability to form complex structured descriptions. In addition to detailing all of KL-ONE's description-forming structures, we discuss a bit of the philosophy underlying the system, highlight notions of taxonomy and classification that are central to it, and include an extended example of the use of KL-ONE and its classifier in a recognition task.This research was supported in part by the Defense Advanced Research Projects Agency under Contract N00014-77-C-0378. Views and conclusions contained in this paper are the authors' and should not be interpreted as representing the official opinion or policy of DARPA, the U.S. Government, or any person or agency connected with them.