Conference PaperPDF Available

Minimizing View Sets without Losing Query-Answering Power

Authors:

Abstract

The problem of answering queries using views has been studied extensively due to its relevance in a wide variety of data-management applications. In these applications, we often need to select a subset of views to maintain due to limited resources. In this paper, we show that traditional query containment is not a good basis for deciding whether or not a view should be selected. Instead, we should minimize the view set without losing its query-answering power. To formalize this notion, we first introduce the concept of "p-containment." That is, a view set V is p-contained in another view set W, if W can answer all the queries that can be answered by V. We show that p-containment and the traditional query containment are not related. We then discuss how to minimize a view set while retaining its query-answering power. We develop the idea further by considering p-containment of two view sets with respect to a given set of queries, and consider their relationship in terms of maximally-contained rewritings of queries using the views.
... Finding a subschema of S 1 that can still have map S 1 ,S 2 applied to it is a simplified version of the problem of materialised view selection [ACN00, LBU01,CHS02]. The objective of materialised view selection is to find a set of views that allows us to answer a given query workload. ...
... In [LBU01] it is argued that selecting views that are minimal to a given set of views but not with respect to all views is a good solution, in other words a non-minimal Extract result is still useful. ...
... It is not always necessary for the implementations of the operators to produce results that are minimal in the information theoretic sense. Indeed, it may be the case that non-minimal solutions are of more practical benefit that a strictly minimal one, some minimal solutions do not always justify the added complexity of creating them [ LBU01,LV03]. With this in mind, the definitions of Diff, Extract and Merge do not require minimal solutions [ MBHR05,MRB03]. ...
Article
Shared databases made up of numerous heterogeneous components and used by large numbers of people are wide spread in both industry and academia. Writing programs to access and maintain these databases is a time consuming and difficult task that can take up a significant proportion of an enterprise IT manager's resources. The situation has worsened recently as new Data Definition Languages (DDLs) like XML and RDFS have come to be used. In general, solutions to these problems are specified at the data level and have to be rewritten if the schema is changed, cannot be applied to other application areas and are generally language and implementation specific. Model Management (MM) is an approach that provides a way of overcoming the problems with these data level solutions. The motivation behind MM is to raise the level of abstraction in these application areas from the data level to the schema level. The key idea is to develop a set of operators that can be applied to schemas, and the mappings between them, as a whole rather than to individual data elements. The operators should be applicable to a wide range of problems in database management and work on schemas and mappings specified in a wide range of DDLs. Solutions to database management problems can then be specified at a high level of abstraction by combining these operators into a concise and reusable script.
... In [72], the authors introduce the notion of "p-containment" (where "p" stands for power): a view set V is said to be pcontained in another view set W, i.e., W has at least the answering power of V, if W can answer all queries that can be answered using V. ...
Conference Paper
Full-text available
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
... • Problems concerning query containment in the presence of results of views or of view definitions [44,73,414], including the problem of answering questions using authorization views in security applications [337,414]; • The problem of answering queries using views in the presence of grouping and aggregation in the definitions of the queries or views [115,118,182,183,219]; • Problems concerning answering queries using views where query or view definitions involve various other query language constructs [11,13,70,80,136,150,394], or otherwise revisiting the problem of answering queries using views [121]; • Problems concerning answering queries using views in the presence of integrity constraints [80,136,156,219]; • Problems concerning optimizing various metrics for query rewritings in terms of views [110,304]; • Problems related to the role of the "information content" of views in query answering [72,74,75,76,303,310,354]; • Problems concerning desirable transformations of a set of views [165,264,265,275]. ...
... Various approaches studied the above mentioned issues and a few examples are [CMJ01], [RHD02], [HM99], [SKSK95] and [DASY97]. ...
... @BULLET Preliminary formalization can be found elsewhere [1]; @BULLET Lattice-theoretic data model for managing hypotheses; @BULLET Views partially-ordered according to their " queryanswering power " (a notion similar to Li et al.'s [3]). ...
Data
Full-text available
Article
This paper is to establish a theory of regular representations for vertex operator algebras. In the paper, for a vertex operator algebra V and a V-module W, we construct, out of the dual space W*, a family of canonical weak V ⊗ V-modules with a nonzero complex number z as the parameter. We prove that for V-modules W, W1 and W2, a P(z)-intertwining map of type in the sense of Huang and Lepowsky exactly amounts to a V ⊗ V-homomorphism from W1 ⊗ W2 into . Combining this with Huang and Lepowsky's one-to-one linear correspondence between the space of intertwining operators and the space of P(z)-intertwining maps of the same type we obtain a canonical linear isomorphism fromthe space of intertwining operators of the indicated type to . Denote by RP(z)(W) the sum of all (ordinary) V ⊗ V-submodules of . Assuming that V satisfies certain suitable conditions, we obtain a canonical decomposition of RP(z)(W) into irreducible V ⊗ V-modules. In particular, we obtain a decomposition of Peter–Weyl type for RP(z)(V). Denote by ℱP(z) the functor from the category of V-modules to the category of weak V ⊗ V-modules such that ℱP(z)(W)=RP(z)(W'). We prove that for V-modules W1, W2, a P(z)-tensor product of W1 and W2 in the sense of Huang and Lepowsky exactly amounts to a universal from W1 ⊗ W2 to the functor ℱP(z). This implies that the functor ℱP(z) is essentially a right adjoint of the Huang–Lepowsky's P(z)-tensor product functor. It is also proved that RP(z)(W) for are canonically isomorphic V ⊗ V-modules.
Article
Full-text available
A number of ideas concerning information-integration tools can be thought of as constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algorithms for conjunctive queries and/or Datalog programs. Then we compare the approaches taken by AT&T Labs’ “Information Manifold” and the Stanford “Tsimmis” project in these terms.
Conference Paper
Full-text available
We consider the problem of computing answers to queries by using materialized views. Aside from its potential in optimizing query evaluation, the problem also arises in applications such as Global Information Systems, Mobile Computing and maintaining physical data independence. We consider the problem of finding a rewriting of a query that uses the materialized views, the problem of finding minimal rewritings, and finding complete rewritings (i.e., rewritings that use only the views). We show that all the possible rewritings can be obtained by considering containment mappings from the views to the query, and that the problems we consider are NP-complete when both the query and the views are conjunctive and don't involve built-in comparison predicates. We show that the problem has two independent sources of complexity (the number of possible containment mappings, and the complexity of deciding which literals from the original query can be deleted). We describe a polynomial time algorithm for finding rewritings, and show that under certain conditions, it will find the minimal rewriting. Finally, we analyze the complexity of the problems when the queries and views may be disjunctive and involve built-in comparison predicates.
Conference Paper
Full-text available
View-based query processing requires to answer a query posed to a database only on the basis of the information on a set of views, which are again queries over the same database. This problem is relevant in many aspects of database management, and has been addressed by means of two basic approaches, namely, query rewriting and query answering. In the former approach, one tries to compute a rewriting of the query in terms of the views, whereas in the latter, one aims at directly answering the query based on the view extensions. Based on recent results, we first show that already for very simple query languages, a rewriting is in general a coNP function wrt to the size of view extensions. Hence, the problem arises of characterizing which instances of the problem admit a rewriting that is PTIME. However, a tight connection between view-based query answering and constraint-satisfaction problems, allows us to show that the above characterization is going to be difficult. 1 Intr...
Article
Given a conjunctive query, it is translated into a corresponding graph in which a small cycle cutting set of nodes is used to answer the query. The principal features of the described procedure (illustrated with flow charts) are that a conjunctive query can be represented as a graph, and that it can be solved by trying all possibilities for a set of nodes that cut all cycles as against all possibilities for all nodes.
Conference Paper
We consider the problem of answering datalog queries using materialized views. More specifically, queries are rewritten to refer to views instead of the base relations over which the queries were originally written. Much work has been done on program rewriting that produces an equivalent query. In the context of information integration, though, the importance of using views to infer as many answers as possible has been pointed out. Formally, the problem is: Given a datalog program P is there a datalog program P v which uses only views as EDB predicates and (i) produces a subset of the answers that P produces and (ii) any other program P′ v over the views with property (i) is contained in P v ? In this paper we investigate the problem in the case of disjunctive view definitions.
Conference Paper
Database reformulation is the process of rewriting the data and rules of a deductive database in a functionally equivalent manner. We focus on the problem of automatically reformulating a database in a way that reduces query processing time while satisfying strong storage space constraints. In previous work we have investigated database reformulation for the case of unary databases. In this paper we extend this work to arbitrary arity, while concentrating on databases with conjunctive rules. The main result of the paper is that the database reformulation problem is decidable for conjunctive databases.
Conference Paper
We consider the problem of rewriting queries using only materialized views. We first show that if the views subsume the query from the point of view of the information content, then the query can be rewritten using only the views, but the resulting query might be extremely inefficient. We then focus on aggregate views and queries over a single relation, which are fundamental in many applications such as data warehousing. We show that in this case, it is possible to guarantee that as soon as the views subsume the query, it can be rewritten in terms of the views in a simple query language. Our main contribution is the conception of rewriting algorithms which run in polynomial time, and the proof of their completeness which relies on combinatorial arguments. Finally, we consider the materialization of ratio views such as average and percentage, important for the design of materialized views.
Conference Paper
The foundational homomorphism techniques introduced by Chandra and Merlin for testing containment of conjunctive queries have recently attracted renewed interest due to their central role in information integration applications. We show that generalizations of the classical tableau representation of conjunctive queries are useful for computing query answers in information integration systems where information sources are modeled as views defined on a virtual global schema. We consider a general situation where sources may or may not be known to be correct and complete. We characterize the set of answers to a global query and give algorithms to compute a finite representation of this possibly infinite set, as well as its certain and possible approximations. We show how to rewrite a global query in terms of the sources in two special cases, and show that one of these is equivalent to the Information Manifold rewriting of Levy et al.