Content uploaded by Leopoldo Bertossi
Author content
All content in this area was uploaded by Leopoldo Bertossi on Aug 20, 2018
Content may be subject to copyright.
Consistent Query Answers in Inconsistent Databases
Marcelo Arenas
Pontificia Universidad Cat´olica de Chile
Escuela de Ingenier´ıa
Departamento de Ciencia de Computaci´on
Casilla 306, Santiago 22, Chile
marenas@ing.puc.cl
Leopoldo Bertossi
Pontificia Universidad Cat´olica de Chile
Escuela de Ingenier´ıa
Departamento de Ciencia de Computaci´on
Casilla 306, Santiago 22, Chile
bertossi@ing.puc.cl
Jan Chomicki
Monmouth University
Department of Computer Science
West Long Branch, NJ 07764
chomicki@monmouth.edu
Abstract
In this paper we consider the problem of the logical char-
acterization of the notion of consistent answer in a relational
database that may violate given integrity constraints. This
notion is captured in terms of the possible repairedversions
of the database. A method for computing consistent an-
swers is given and its soundness and completeness (for some
classes of constraints and queries) proved. The method is
based on an iterative procedure whose termination for sev-
eral classes of constraints is provedas well.
Integrity constraints capture an important normative aspect
of every database application. However, it is often the case
that their satisfaction cannot be guaranteed, allowing for the
existence of inconsistent database instances. In that case,
it is important to know which query answers are consistent
with the integrity constraints and which are not. In this pa-
per, we provide a logical characterization of consistent query
answers in relational databases that may be inconsistent with
the given integrity constraints. Intuitively, an answer to a
query posed to a database that violates the integrity con-
straints will be consistent in a precise sense: It should be the
same as the answer obtained from any minimally repaired
version of the original database. We also provide a method
for computing such answers and proveits properties. On the
basis of a query Q, the method computes, using an iterative
procedure, a new query TωQwhose evaluation in an arbi-
trary, consistent or inconsistent, database returns the set of
consistent answers to the original query Q. We envision the
application of our results in a number of areas:
Data warehousing. A data warehouse contains data com-
ing from many different sources. Some of it typically does
not satisfy the given integrity constraints. The usual ap-
proach is thus to clean the data by removing inconsistencies
before the data is stored in the warehouse [6]. Our results
make it possible to determine which data is already clean
and proceed to safely remove unclean data. Moreover,a dif-
ferent scenario becomes possible, in which the inconsisten-
cies are not removed but ratherquery answers are marked as
“consistent” or “inconsistent”. In this way, information loss
due to data cleaning may be prevented.
Database integration. Often many different databases
are integrated together to provide a single unified view for
the users. Database integration is difficult since it requires
the resolution of many different kinds of discrepancies of
the integrated databases. One possible discrepancy is due
to different sets of integrity constraints. Moreover, even if
every integrated database locally satisfies the same integrity
constraint, the constraint may be globally violated. For ex-
ample, different databases may assign different addresses to
the same student. Such conflicts may fail to be resolved at all
and inconsistent data cannot be “cleaned” because of the au-
tonomy of different databases. Therefore, it is important to
be able to find out, givena set of local integrity constraints,
which query answers returned from the integrated database
are consistent with the constraints and which are not.
Active and reactive databases. A violation of integrity
constraints may be acceptable under the provision that it will
be repaired in the near future. For example, the stock level in
a warehouse may be allowed to fall below the required min-
imum if the necessary replenishments have been ordered.
During this temporary inconsistency, however, query answers
should give an indication whether they are consistent with
the constraints or not. This problem is particularly acute in
active databases that allow such consistency lapses. The re-
sult of evaluating a trigger condition that is consistent with
the integrity constraints should be treated differently from
the one that isn’t.
The following example presents the basic intuitions be-
hind the notion of consistent query answer.
Example 1. Consider a database subject to the following IC:
xPx Qx
The instance
P a P b Q a Q c
violates this constraint. Now if the query asks forall xsuch
that Qx, only ais returned as an answer consistent with the
integrity constraint.
The plan of this paper is as follows. In section 2 we in-
troduce the basic notions of our approach, including those of
repair and consistent query answer. In section 3 we show a
method how to compute the query TωQfor a given first-
order query Q. In subsequent sections, the propertiesof this
method are analyzed: soundness in section 4, completeness
in section 5, and termination in section 6. In section 7 we
discuss related work. In section 8 we conclude and outline
some of the prospects for futurework in this area. The proofs
are given in the appendix.
In this paper we assume we have a fixed database schema
and a fixed infinite database domain D. We also have a first
order language based on this schema with names for the ele-
ments of D. We assume that elements of the domainwith dif-
ferent names are different. The instances of the schema are
finite structures for interpreting the first order language. As
such they all share the given domain D, nevertheless, since
relations are finite, every instance has a finite active domain
which is a subset of D. As usual, we allow built-in predi-
cates that have infinite extensions, identical for all database
instances. There is also a set of integrity constraints IC,ex-
pressed in that language, which the database instances are
expected to satisfy. We will assume that IC is consistent in
the sense that there is a database instance that makes it true.
Definition 1. (Consistency) A database instance ris consis-
tent if rsatisfies IC in the standard model-theoretic sense,
that is, r IC;ris inconsistent otherwise.
This paper addresses the issue of obtaining meaningful
and useful query answers in any, consistent or inconsistent,
database. It is well known how to obtain query answers in
consistent databases. Therefore, the challengingpart is how
to deal with the inconsistent ones.
Given a database instance r, we denote by Σrthe set of
formulas P¯a r P ¯a, where the Ps are relation names
and ¯ais ground tuple.
Definition 2. (Distance) The distance Δr r between data-
base instances rand ris the symmetric difference:
Δr r ΣrΣrΣrΣr
Definition 3. For the instances r r r ,rrrif Δr r
Δr r , i.e., if the distance between rand ris less than or
equal to the distance between rand r.
Notice that built-in predicates do not contribute to the
Δs because they have fixed extensions, identical in every
database instance.
Definition 4. (Repair) Given database instances rand r,we
say that ris a repair of rif r IC and ris r-minimal in
the class of database instances that satisfy the ICs.
Clearly, what constitutes a repair depends on the given
set of integrity constraints. In the following we assume that
this set is fixed.
Example 2. Let us consider a database schema with two
unary relations Pand Qand domain D a b c . Assume
that for an instance r,Σr P a P b Q a Q c , and
let IC x P x Q x . Clearly, rdoes not satisfy IC
because r Pb Qb.
In this case we have two possibles repairs for r. First,
we can falsify Pb, obtaining an instance rwith Σr
P a Q a Q c . As a second alternative, we can make
Q b true,obtaining an instancerwith Σr P a P b
Q a Q b Q c .
The definition of a repair satisfies certain desirable and
expected properties. Firstly, a consistent database does not
need to be repaired, because if rsatisfies IC, then, by the
minimality condition wrt the relation r,ris the only repair
of itself (since Δr r is empty). Secondly, any database r
can always be repaired because there is a database rthat
satisfies IC, and Δr r is finite.
Example 3. (motivated by [19]) Consider the IC sayingthat
Cis the only supplier of items of class T4:
x y z Supply x y z Class z T4x C (1)
The following database instance r1violates the IC:
Supply Class
CD
1I1I1T4
DD
2I2I2T4
The only repairs of this database are
Supply Class
CD
1I1I1T4
I2T4
and
Supply Class
CD
1I1I1T4
DD
2I2
Example 4. (motivated by [19]) Consider the IC:
x y Supply x y I1Supply x y I2(2)
saying that item I2is supplied whenever item I1is supplied;
and the following inconsistent instance, r2, of the database
Supply
CD
1I1
CD
1I3
This instance has two repairs:
Supply
CD
1I1
CD
1I2
CD
1I3
and
Supply
CD
1I3
Example 5. Consider a student database. Student x y z
means that xis the student number, yis the student’s name,
and zis the student’s address. The two following ICs state
that the first argument is a key of the relation
x y z u v Student x y z Student x u v y u
x y z u v Student x y z Student x u v z v
The inconsistent database instance r3
Student Course
S1N1D1S1C1G1
S1N2D1S1C2G2
has two repairs:
Student Course
S1N1D1S1C1G1
S1C2G2
and
Student Course
S1N2D1S1C1G1
S1C2G2
We assume all queries are in prefix disjunctive normal form.
Definition 5. A formula Qis a query if it has the following
syntactical form:
¯
Qs
i1
mi
j1Pi j ¯ui j
ni
j1Qi j ¯vi j ψi
where ¯
Qis a sequence of quantifiers and every ψicontains
only built-in predicates. If ¯
Qcontains only universal quanti-
fiers, then we say that Qis a universal query.If ¯
Qcontains
existential (and possibly universal) quantifiers, we say that
Qis non-universal query.
Definition 6. (Query answer) A (ground) tuple ¯
tis an an-
swer to a query Q¯xin a database instance rif r Q ¯
t.A
(ground)tuple ¯
tis an answer to a set of queries Q1Qn
if r Q1Qn.
Definition 7. (Consistent answer) Given a set of integrity
constraints, we say that a (ground) tuple ¯
tis a consistent
answer to a query Q¯xin a database instance r, and we
write rcQ¯
t(or rcQ¯x¯
t), if for every repair rof
r,r Q ¯
t.IfQis a sentence, then true (false)isaconsis-
tent answer to Qin r, and we write rcQ(rcQ), if for
every repair rof r,r Q (r Q).
Example 6. (example 3 continued) The only consistent an-
swer to the query Class z T4, posed to the database instance
r1,isI1because r1cClass z T4I1.
Example 7. (example 4 continued) The only consistent an-
swer to the query Supply C D1z, posed to the database in-
stance r2,isI3because r2cSupply C D1z I3.
Example 8. (example 5 continued) By considering all the re-
pairs of the database instance r3, we obtain C1and C2as the
consistent answers to the query zCourse S1y z , posed to
r3. For the query u v Student u N1v Course u x y ,
we obtain no (consistent) answers.
We present here a method to compute consistent answers to
queries. Given a query Q, the query TωQis defined based
on the notion of residue developedin the context of seman-
tic query optimization (SQO) [5]. In the context of deductive
databases, SQO is used to optimize the process of answering
queries using the semantic knowledge about the domain that
is contained in the ICs. In this case, the basic assumption is
that the ICs are satisfied by the database. In our case, since
we allow inconsistent databases, we do not assume the sat-
isfaction of the ICs while answering queries. A first attempt
to obtain consistent answers to a query Q¯xmay be to use
query modification, i.e., ask the query Q¯x IC. However,
this does not work, as we obtain false as the answer if the
DB is inconsistent. Instead, we iteratively modify the query
Qusing the residues. As a result, we obtain the query TωQ
with the property that the set of all answers to TωQis the
same as as the set of consistent answers to Q. (As shown
later, the property holds only for restricted classes of queries
and constraints.)
We consider only universal constraints. We begin by trans-
forming every integrity constraint to the standard format (ex-
pansion step).
Definition 8. An integrity constraint is in standard format if
it has the form
m
i1Pi¯xi
n
i1Qi¯yiψ
where represents the universal closure of the formula, ¯xi,
¯yiare tuples of variables and ψis a formula that mentions
only built–in predicates, in particular, equality.
Notice that in such an IC there are no constants in the
PiQi; if they are needed they can be pushed into ψ.
Many usual ICs that appear in DBs can be transformedto
the standard format, e.g. functional dependencies, set inclu-
sion dependencies of the form ¯x P ¯x Q ¯x, transitiv-
ity constraints of the form xyzPxy Pyz Pxz .
The usual ICs that appear in SQO in deductive databases
as rules [5] can be also accommodated in this format, in-
cluding rules with disjunction and logical negation in their
heads. An inclusion dependency of the form ¯x P ¯x
yQ ¯x y cannot be transformed to the standard format.
After the expansion of IC,rules associated with the database
schema are generated. This could be seen as considering
an instance of the database as an extensional database ex-
panded with new rules, and so obtaining an associated de-
ductive database where semantical query optimization can
be used.
For each predicate, its negative and positive occurrences
in the ICs (in standard format) will be treated separatelywith
the purpose of generating corresponding residues and rules.
First, a motivatingexample.
Example 9. Consider the IC x P x Q x .IfQ x is
false, then P x must be true. Then, when asking about
Qx, we make sure that P x becomes true. That is,
we generate the query Q x P x where P x is the
residue attached to the query.
For each IC in standard format
m
i1Pi¯xi
n
i1Qi¯yiψ(3)
and each positive occurrence of a predicate Pj¯xjin it, the
following residue for Pj¯xjis generated
¯
Qj1
i1Pi¯xi
m
i j 1Pi¯xi
n
i1Qi¯yiψ(4)
where ¯
Qis a sequence of universal quantifiers over all the
variables in the formula not appearing in ¯xj.
If R1Rrare all the residues for Pj, then the follow-
ing rule is generated:
Pj¯w Pj¯w R1¯w Rr¯w
where ¯ware new variables. If there are no residues for Pj,
then the rule Pj¯w Pj¯wis generated.
For each negativeoccurrence of a predicate Qj¯yjin (3),
the following residue for Qj¯yjis generated
¯
Qm
i1Pi¯xi
j1
i1Qi¯yi
n
i j 1Qi¯yiψ
where ¯
Qis a sequence of universal quantifiers over all the
variables in the formula not appearing in ¯yj.
If R1Rsare all the residues for Qj¯yj, the following
rule is generated:
Qj¯u Qj¯u R1¯u Rs¯u
If there are no residues for Qj¯yj, then the rule Qj¯u
Qj¯uis generated. Notice that there is exactly one new rule
for each positive predicate, and exactly one rule for each
negative predicate.
If there are more than one positive (negative)occurrences
of a predicate, say P, in an IC, then more then one residue
is computed for P. In some cases, e.g., for functional de-
pendencies, the subsequent residues will be redundant. In
other cases cases, e.g., for transitivity constraints, multiple
residues are not redundant.
Example 10. If we have the following ICs in standard for-
mat
IC x R x P x Q x x P x Q x
the following rules are generated:
P x P x R x Q x
Q x Q x R x P x P x
R x R x
P x P x Q x
Q x Q x
R x R x P x Q x
Notice that no rules are generated for built-in predicates,
but such predicates may appear in the residues. They have
fixed extensions and thus cannot contribute to the violation
of an IC or be modified to make an IC true. For example, if
we have the IC xyz Pxy Pxz y z, and the
database satisfies P1 2 P1 3 , the IC cannot be made
true by making 2 3.
Once the rules have been generated, it is possible to sim-
plify the associated residues. In every new rule of the form
P¯u P ¯u R1¯u Rr¯uthe auxiliary quantifica-
tions introduced in the expansion step are eliminated (both
the quantifier and the associated variable in the formula)
from the residues by the process inverse to the one applied
in the expansion. The same is done with rules of the form
P P .
TωQ
In order to determine consistent answers to queries in arbi-
trary databases, we will make use of a family of operators
consisting of Tn,n0, and Tω.
Definition 9. The application of an operator Tnto a query is
defined inductively by means of the following rules
1. Tn:,T
n: , for every n0( is the
empty clause).
2. T0ϕ:ϕ.
3. For each predicate P¯u, if there is a rule P¯u
P¯u R1¯u Rr¯u, then
Tn1P¯u:P¯ur
i1TnRi¯u
If P¯udoes not have residues, then Tn1P¯u:
P¯u.
4. For each negated predicate Q¯v, if there is a rule
Q¯v Q ¯v R1¯v Rs¯v, then
Tn1Q¯v:Q¯vs
i1TnRi¯v
If Q¯vdoes not haveany residues, then Tn1Q¯u:
Q¯u.
5. If ϕis a formula in prenex disjunctive normal form,
that is,
ϕ¯
Qs
i1
mi
j1Pi j ¯ui j
ni
j1Qi j ¯vi j ψi
where ¯
Qis a sequence ofquantifiers and ψiis a formula
that includes only built–in predicates, then for every
n0:
Tnϕ:¯
Qs
i1
mi
j1TnPi j ¯ui j
ni
j1TnQi j ¯vi j ψi
Definition 10. The application of operator Tωon a query is
defined as Tωϕ
nω
Tnϕ.
Example 11. (example 10 continued) For the query R x
we have T1R x R x P x Q x ,T
2R x
R x P x Q x Q x andfinally T3R x
T2R x . We have reached a fixed point and then
TωR x R x R x P x Q x
R x P x Q x Q x
We show first that the operator Tωconservativelyextends
standard query evaluation on consistent databases.
Proposition 1. Given a database instance rand a set of in-
tegrity constraints IC, such that rIC, then for every query
Q¯xand every natural number n:r¯x Q ¯xTnQ¯x.
Corollary 1. Given a database instance rand a set of in-
tegrity constraints IC, such that r IC, then for every query
Q¯xand every tuple ¯
t:r Q ¯
tif and only if rTωQ¯
t.
Now we will show the relationship between consistent an-
swers to a query Qin a database instance r(definition 7) and
answers to the query TωQ(definition 6). We show that
TωQreturns only consistent answers to Q.
Theorem 1. (Soundness) Let rbe a database instance, IC a
set of integrity constraints and Q¯xa query (see definition 5)
such that rTωQ¯x¯
t.IfQis universal or non-universal
and domain independent[20], then ¯
tis a consistent answer to
Qin r(in the sense of definition 7), that is, rcQ¯
t.
Thesecond condition inthe theoremexcludes non-universal,
but domain dependent queries like x Px.
Example 12. (example 6 continued) The IC (1) transformed
into the standard format becomes
x y z w Supply x y z
Class z w w T4x C
The following rule is generated:
Class z w Class z w
x y Supply x y z w T4x C
Given the database instance r1that violates the IC as before,
if we pose the query Class z T4, asking for the items of
class T4, directly to r1, we obtain I1and I2. Nevertheless, if
we pose the query TωClass z T4, that is
Class z T4
Class z T4x y Supply x y z x C
we obtain only I1, eliminating I2.I1is the only consistent
answer.
Example 13. (example 8 continued) In the standard format,
the ICs take the form
x y z u v Student x y z
Student x u v y u
x y z u v Student x y z
Student x u v z v
The following rule is generated
Student x y z Student x y z
u v Student x u v y u
u v Student x u v z v
Given the inconsistent database instance r3, if we pose the
query zCourse S1y z , asking for the names of the courses
of the student with number S1, we obtain C1and C2.Ifwe
pose the query
TωzCourse S1y z zCourse S1y z
we obviously obtain the same answers which, in this case,
are the consistent answers. Intuitively, in this case the Tω
operator helps us to establish that even whenthe name of the
student with number S1is undetermined, it is still possible
to obtain the list of courses in which he/she is registered. On
the other hand, if we pose the query
u v Student u N1v Course u x y
about the courses and grades for a student with name N1,to
r3, we obtain C1G1and C2G2. Nevertheless, if we ask
Tωu v Student u N1v Course u x y
we obtain, in conjunction with the original query, the for-
mula:
u v Student u N1v
y z Student u y z y N1
y z Student u y z z v Course u x y
from this we obtain the empty set of tuples. This answer
is intuitively consistent, because the number of the student
with name N1is uncertain, and in consequence it is not pos-
sible to find out in which courses he/she is registered. The
set of answers obtained with the Tωoperator coincides with
the set of consistent answers which is empty.
Definition 11. Abinary integrity constraint (BIC) is a sen-
tence of the form
l1¯x1l2¯x2ψ¯x
where l1and l2are literals, and ψis a formula that only
contains built-in predicates.
Examples of BICs include: functional dependencies,sym-
metry constraints, set inclusions dependencies of the form
¯x P ¯x Q ¯x.
Definition 12. Given a set of sentences Σin the language
of the database schema DB, and a sentence ϕ, we denote by
ΣDB ϕthe fact that, for every instance rof the database, if
rΣ, then rϕ.
Theorem 2. (Completeness for BICs) Given a set IC of bi-
nary integrity constraints, if for every literal l¯a,IC DB
l¯a, then the operator Tωis complete, that is, for every
ground literal l¯
t,ifrcl¯
tthen rTωl¯
t.
The theorem says that everyconsistent answer to a query
of the form l¯xis captured by the Tωoperator. Actually,
proposition 2 in the appendix and the completeness theorem
can be easily extended to the case of queries that are con-
junctions of literals. Notice that the finiteness Tωl¯xis
not a part of the hypothesis in this theorem. The hypoth-
esis of the theorem requires that the ICs are not enough to
answer a literal query by themselves; they do not contain
definite knowledge about the literals.
Example 14. We can see in the example 12 where BICs and
queries which are conjunctions of literals appear, that the
operator Tωgave us all the consistent answers, as implied
by the theorem.
Corollary 2. If IC is a set of functional dependencies (FDs)
IC P1¯x1y1P1¯x1z1y1z1(5)
Pn¯xnynPn¯xnznynzn
then the operator Tωis complete for consistent answers to
queries that are conjunctions of literals.
Example 15. In example 13 we had FDs that are also BICs.
Thus the operator Tωfound all the consistent answers, even
for some queries that are not conjunctions of literals, show-
ing that this is not a necessary condition.
Example 16. Here we will show that in general complete-
ness is not obtained for queries that are not conjunctions of
literals. Consider the IC: xyzPxy Pxz y z
andthe inconsistent instancerwith Σr P a b P a c .
This database has two repairs: rwith Σr P a b ; and
rwith Σr P a c . We have that rcxPax, be-
cause the query is true in the two repairs.
Now, it is easy to see that TωuP a u is logically equiv-
alent to u P a u z P a z z u . So, we have r
TωxP a x . Thus, the consistent answer true is not cap-
tured by the operator Tω.
The following theorem applies to arbitrary ICs and general-
izes Theorem 2.
Theorem 3. (Completeness) Let IC be a set of integrity con-
straints, l¯xa literal, and Tnl¯xof the form
l¯xm
i1¯xi¯yiCi¯x¯xiψi¯x¯yi
If for every n0, there is S1msuch that
1. for every j S and every tuple ¯a:IC DB Cj¯a, and
2. ¯xi¯yiCi¯x¯xiψi¯x¯yii S implies
¯xi¯yiCi¯x¯xiψi¯x¯yi1i m
then rcl¯
timplies rTωl¯
t.
This theorem can be extended to conjunctionsof literals.
Notice that the theorem requires a condition for every n.
Its application is obviously simplified if we know that the
iteration terminates. This is an issue to be analyzed in the
next section.
Termination means that the operator Tωreturns a finite set
of formulas. It is clearly important because then the set of
consistent answers can be computed by evaluating a single,
finite query. We distinguish between three different notions
of termination.
Definition 13. Given a set of ICs and a query Q¯x, we say
that TωQ¯xis
1. syntacticallyfinite if there is an an nsuch that TnQ¯x
and Tn1Q¯xare syntactically the same.
2. semantically finite if there is an nsuch that for all m
n, ¯xTnQ¯xTmQ¯xis valid.
3. semantically finite in an instance r, if there is an nsuch
that for all m n,r¯xTnQ¯xTmQ¯x.
The number nin cases 2 and 3 is called a point of finite-
ness. It is clear that 1 implies 2 and 2 implies 3. In the full
version we will show that all these implications are proper.
In all these cases, evaluating TωQ¯xgives the same result
as evaluating TnQ¯xfor some n(in the instance rin case
3). If TωQ¯xis semantically finite, sound and complete,
then the set of consistent answers to Qis first-order defin-
able.
The notion of syntactical finiteness is important because then
for some nand all m n,T
mQ¯xwill be exactly the same.
In consequence, TωQwill be a finite set of formulas. In
addition, a point of finiteness ncan be detected (if it exists)
by syntactically comparing every two consecutive steps in
the iteration. No simplification rules need to be considered,
because the iterative procedure is fully deterministic.
Here we introduce a necessary and sufficient condition
for syntactical finiteness.
Definition 14. A set of integrity constraints IC is acyclic if
there exists a function ffrom predicate names plus negations
of predicate names in the database to the natural numbers,
that is, f:p1pnp1pn, such that for
everyintegrity constraint k
i1li¯xiψ¯x IC as in (3),
and every iand j(1 i j k), if i j, then f lif lj.
(Here liis the literal complementary to li.)
Example 17. The set of ICs
IC x P x Q x S x
x y Q x S y T x y
is acyclic, because the function fdefined by
f P 2f Q 2f P 0f Q 0
f S 1f T 0f S 1f T 2, sat-
isfies the condition of definition 14.
Example 18. The set of ICs
IC x P x Q x S x
x y Q x S y T x y
is not acyclic, because for any function fthat we may at-
tempt to use to satisfy the condition in definition 14, from
the first integrity constraint we obtain f Q f S , andfrom
the second, we would obtain f S f Q ; a contradiction.
Theorem 4. A set of integrity constraints IC is acyclic iff
for every literal name lin the database schema, Tωl¯xis
syntactically finite.
The theorem can be extended to any class of queries sat-
isfying Definition 5.
Example 19. The set of integrity constraints in example 18
is not acyclic. In that case TωQ x is infinite.
Example 20. The ICs in example 17 are acyclic. There we
have
TωP u
P u
P u Q u S u
P u Q u S u v Q v T v u
TωQ u
Q u
Q u P u S u v S v T u v
Q u P u S u w Q w T w u
v S v P v Q v T u v
TωS u S u S u Q v T v u
TωT u v T u v
TωP u P u
TωQ u Q u
TωS u S u S u P u Q u
TωT u v
T u v
T u v Q u S v
T u v Q u S v P v Q v
Corollary 3. For functional dependencies and a query Q¯x,
TωQ¯xis always syntactically finite.
Definition 15. A constraint Cin clausal form is uniform if
for every literal l¯xin it, the set of variables in l¯xis the
same as the set of variables in C l ¯x. A set of constraints
is uniform if all the constraints in it are uniform.
Examples of uniform constraints include set inclusion
dependencies of the form ¯x P ¯x Q ¯x, e.g., Example
4.
Theorem 5. If a set of integrity constraints IC is uniform,
thenfor everyliteral name lin the database schema, Tωl¯x
is semantically finite. Furthermore, a point of finiteness n
can be bounded from above by a function of the number of
variables in the query, and the number of predicates (and
their arities) in the query and IC.
Theorem 6. Let lbe a literal name. If for some n,
¯xTnl¯x Tn1l¯x
is valid, then for all m n,
¯xTnl¯xTml¯x
is valid.
According to Theorem 6, we can detect a point of finite-
ness by comparing every two consecutive steps wrt logical
implication. Although this is undecidable in general, we
might try to apply semidecision procedures, for example,
automated theorem proving. We have successfully made use
of OTTER [17] in some cases that involve sets of constraints
that are neither acyclic nor uniform. Examples include mul-
tivalued dependencies, and functionaldependencies together
with set inclusion dependencies. For multivalued dependen-
cies, Theorem 6 together with Theorem 3 gives complete-
ness of Tωl¯xwhere l¯xis a negative literal. The cri-
terion from Theorem 6 is also applicable to uniform con-
straints by providing potentially faster termination detection
than the proof of Theorem 5.
Theorem 7. If Q¯xis a domain independent query, then
for every database instance rthere is an n, such that for all
m n,r¯x TnQ¯x TmQ¯x.
Notice that this theorem does not include the case of neg-
ative literals, as in the case of theorem 5.
Bry [4] was, to our knowledge, the first author to consider
the notion of consistent query answer in inconsistent data-
bases. He defined consistent query answers based on prov-
ability in minimal logic, without giving, however, a proof
procedure or any other computationalmechanism for obtain-
ing such answers. He didn’t address the issues of of seman-
tics, soundness or completeness.
It has been widely recognized that in database integra-
tion the integrated data may be inconsistentwith the integrity
constraints. A typical (theoretical) solution is to augment the
data model to represent disjunctive information. The follow-
ing example explains the need for a solution of this kind.
Example 21. Consider the functional dependency
x y z P x y P x z y z
If the integrated database contains both Pab and Pac,
then the functional dependency is violated. Each of Pab
and Pac may be coming from a different database that
satisfies the dependency. Thus, both facts are replaced by
their disjunction Pab Pac in the integrated database.
Now the functional dependencyis no longer violated.
To solve this kind of problems [1] introducedthe notion
of flexible relation, a non-1NF relation that contains tuples
with sets of non-key values (with such a set standing for one
of its elements). This approach is limited to primary key
functional dependencies and was subsequently generalized
to other key functional dependencies [9]. In the same con-
text, [3, 12] proposed to use disjunctive Datalog and [16]
tables with OR-objects. [1] introduced flexible relational al-
gebra to query flexible relations, and [9] - flexible relational
calculus (whose subset can be translated to flexible relational
algebra). The remaining papers did not discuss query lan-
guage issues, relying on the existing approaches to query
disjunctive Datalog or tables with OR-objects. There are
several important differences between the above approaches
and ours. First, they rely on the construction of a single (dis-
junctive) instance and the deletion of conflicting tuples. In
our approach, the underlying databases are incorporated into
the integrated one in toto, without any changes. There is no
need for introducing disjunctive information. It would be
interesting to compare the scope and the computational re-
quirements of both approaches. For instance, one should
note that the single-instance approach is not incremental:
Any changes in the underlying databases require the recom-
putation of the entire instance. Second, our approach seems
to be unique, in the context of database integration, in con-
sidering tuple insertions as possible repairs for integrity vi-
olations. Therefore, in some cases consistent query answers
may be different from query answers obtained from the cor-
responding single instance.
Example 22. Consider the integrity constraint p q and a
fact p. The instance consisting of palone does not satisfy
the integrity constraint. The common solution for remov-
ing this violation is to delete p. However, in our approach
inserting qis also a possible repair. This has consequences
for the inferences about pand q. Our approach returns
false in both cases, as p(resp. q) is true in a possible repair.
Other approaches return true (under CWA) or undefined (un-
der OWA).
Our work has connections with research done on belief
revision [10]. In our case, we have an implicit notion of re-
vision that is determined by the set of repairs of the database,
and corresponds to revising the database (or a suitable cat-
egorical theory describing it) by the set of integrity con-
straints. Thus, querying the inconsistent database expect-
ing only correct answers corresponds to querying the revised
theory without restrictions.
It is easy to see that our notion of repair of a relational
database is a particular case of the local semantics intro-
duced in [8], restricted to revision performed starting from
a single model (the database). From this we obtain that our
revision operator satisfies the postulates (R1) – (R5),(R7),
(R8) in [13]. For each given database r, the relation rin-
troduced in definition 3 provides the partial order between
models that determines the (models of the) revised database
as described in [13]. [8] concentrates on the computation
of the models of the revised theory, i.e. the repairs in our
case, whereas we do not compute the repairs,but keep query-
ing the original, non-revised database and pose a modified
query. Therefore, we can view our methodology as a way
of representing and querying simultaneously all the repairs
of the database by means of a new query. Nevertheless, our
motivation and starting point is quite different from belief
revision. We attempt to take direct advantage of the seman-
tic information contained in the integrity constraints in order
to answer queries, rather than revising the database. Revis-
ing the database means repairing all the inconsistencies in it,
instead we are interested in the information related to par-
ticular queries. For instance, a query referring only to the
consistent portion of the database can be answered without
repairing the database.
Reasoning in the presence of inconsistency has been an
important research problem in the area of knowledgerepre-
sentation. The goal is to design logical formalisms that limit
what can be inferred from an inconsistent set of formulas.
One does not want to infer all formulas (as required by the
classical two-valued logic). Also, one prefers not to infer a
formula together with its negation. The formalisms satisfy-
ing the above properties, e.g., [15], are usually propositional.
Moreover, they do not distinguish between integrity con-
straints and database facts. Thus, if the data in the database
violates an integrity constraint, the constraint itself can no
longer be inferred (which is not acceptable in the database
context).
Example 23. Assume the integrity constraint is p q
and the database contains the facts pand q. In the approach
of [15], p q can be inferred (minimal change is captured
correctly) but p,qand p q can no longer be inferred
(they are all involved in an inconsistency).
Because of the above-mentioned limitations, such methods
are not directly applicable to the problem of computing con-
sistent query answers.
Deontic logic [18, 14], a modal logic with operators cap-
turing permission and obligation, has been used for the spec-
ification of integrity constraints. [14] used the obligation op-
erator Oto distinguish integrity constraints that have to hold
always from database facts that just happen to hold. [18]
used deontic operators to describe policies whose violations
can then be caught and handled. The issues of possible re-
pairs of constraint violations, their minimality and consistent
query answers are not addressed.
Gertz [11] described techniques and algorithms for com-
puting repairs of constraint violations. The issue of query
answering in the presence of aninconsistency is not addressed
in his work.
This paper represents a first step in the development of a
new research area dealing with the theory and applications
of consistent query answers in arbitrary, consistent or incon-
sistent, databases.
The theoretical results presented here arepreliminary. We
have proveda general soundness result but the results about
completeness and termination are still partial. Also, one
needs to look beyond purely universal constraints to include
general inclusion dependencies. In a forthcoming paper we
will also describe our methodology for using automated the-
orem proving, in particular, OTTER, for proving termina-
tion.
It appears that in order to obtain completeness for dis-
junctive and existentially quantified queries oneneeds to move
beyond the Tωoperator on queries. Also, the upper bounds
on the size of Tωand the lower bounds on the complexity of
computing consistent answers for different classes of queries
and constraints need to be studied. In [2] it is shown that in
the propositional case, SAT is reducible in polynomialtime
to the problem of deciding if an arbitrary formula evaluated
in the propositional database does not give true as a correct
answer, that is it becomes false in some repair. From this it
follows that this problem is NP-complete.
There is an interesting connection to modal logic. Con-
sider the definition 7. We could write rQ¯
t, meaning
that Q¯
tis true in all repairs of r, the database instances
that are “accessible” from r. This is even more evident from
example 16, where, in essence, it is shown that xQ ¯xis
not logically equivalent to x Q ¯x, which is what usually
happens in modal logic.
This research has been partially supported by FONDECYT
Grants (1971304 & 1980945)and NSF Grant (IRI-9632870).
Part of this research was done when the second author was
on sabbatical at the Technical University of Berlin (CIS Group)
with the financial support from DAAD and DIPUC.
[1] S. Agarwal, A.M. Keller, G. Wiederhold, and
K. Saraswat. Flexible Relation: An Approach for
Integrating Data from Multiple, Possibly Inconsistent
Databases. In IEEE International Conference on Data
Engineering, 1995.
[2] M. Arenas, L. Bertossi, and M. Kifer. APC and Query-
ing Inconsistent Databases. In preparation.
[3] C. Baral, S. Kraus, J. Minker, and V.S. Subrahma-
nian. Combining Knowledge Bases Consisting of First-
Order Theories. Computational Intelligence, 8:45–71,
1992.
[4] F. Bry. Query Answering in Information Systems with
Integrity Constraints. In IFIP WG 11.5 Working Con-
ference on Integrity and Control in Information Sys-
tems. Chapman &Hall, 1997.
[5] U.S. Chakravarthy, J. Grant, and J. Minker. Logic-
Based Approach to Semantic Query Optimization.
ACM Transactions on Database Systems, 15(2):162–
207, 1990.
[6] S. Chaudhuri and U. Dayal. An Overview of
Data Warehousing and OLAP Technology. SIGMOD
Record, 26, March 1997.
[7] J. Chomicki and G. Saake, editors. Logics for
Databases and Information Systems. Kluwer Aca-
demic Publishers, Boston, 1998.
[8] T. Chou and M. Winslett. A Model-Based Belief Re-
vision System. J. Automated Reasoning, 12:157–208,
1994.
[9] Phan Minh Dung. Integrating Data from Possibly In-
consistent Databases. In International Conference on
Cooperative Information Systems, Brussels, Belgium,
1996.
[10] P. Gaerdenforsand H. Rott. Belief Revision. In D. M.
Gabbay, J. Hogger, C, and J. A. Robinson, editors,
Handbook of Logic in Artificial Intelligence and Logic
Programming, volume 4, pages 35–132. Oxford Uni-
versity Press, 1995.
[11] M. Gertz. Diagnosis and Repair of Constraint Vio-
lations in Database Systems. PhD thesis, Universit¨at
Hannover, 1996.
[12] P. Godfrey, J. Grant, J. Gryz, and J. Minker. Integrity
Constraints: Semantics and Applications. In Chomicki
and Saake [7], chapter 9.
[13] H. Katsuno and A. Mendelzon. Propositional Knowl-
edge Base Revision and Minimal Change. Artificial
Intelligence, 52:263–294, 1991.
[14] K.L. Kwast. A Deontic Approach to Database In-
tegrity. Annals of Mathematics and Artificial Intelli-
gence, 9:205–238, 1993.
[15] J. Lin. A Semantics for Reasoning Consistently in the
Presence of Inconsistency. Artificial Intelligence, 86(1-
2):75–95, 1996.
[16] J. Lin and A. O. Mendelzon. Merging Databases un-
der Constraints. International Journal of Cooperative
Information Systems, 7(1):55–76, 1996.
[17] W.W. McCune. OTTER 3.0 Reference Manual and
Guide. Argonne National Laboratory, Technical Re-
port ANL-94/6, 1994.
[18] J.-J. Meyer, R. Wieringa, and F. Dignum. The Role
of Deontic Logic in the Specification of Information
Systems. In Chomicki and Saake [7], chapter 4.
[19] Jean-Marie Nicolas. Logic for Improving Integrity
Checking in Relational Data Bases. Acta Informatica,
18:227–253, 1982.
[20] J. Ullman. Principles of Database and Knowledge-
Base Systems, Vol. I. Computer Science Press, 1988.
Some technical lemmas are stated without proof. Full proofs
can be found in the file in
.
Lemma 1. If rTωl¯a, where l¯ais a ground literal,
then for every repair rof r, it holds r l ¯a.
Lemma 2. If rTωn
i1li¯ai, where li¯aiis a ground
literal, then for every repair rof r, it holds rn
i1li¯ai.
Lemma 3. If rTωn
i1Ci¯ai, with Ci¯aia conjunction
of literals, then for every repair rof r,rn
i1Ci¯ai.
Lemma 4. Let Q¯xa universal query. If rTωQ¯
t, for
a ground tuple ¯
t, then for every repair rof r,r Q ¯
t.
Lemma 5. Let Q¯xa domain independent query. If r
TωQ¯
t, for a ground tuple ¯
t, then for every repair rof r,
r Q ¯
t.
Proof of Theorem 1: Lemmas 4 and 5.
Proposition 2. Given a set IC of integrity constraints, a
ground clause m
i1li¯
ti,ifIC DB m
i1li¯
tiand, for every
repair rof r,rm
i1li¯
ti, then rm
i1li¯
ti.
Proof of Proposition 2: Assume that rm
i1li¯
ti.By
hypothesis IC DB m
i1li¯
ti, thus there exists an instance
of the database rsuch that r IC m
i1li¯
ti. Let us
consider the set of database instances
R r r IC and Δr r Δr r
We know that Δr r is finite, therefore there exists r0R
such that Δr r0is minimal. Then, r0is a repair of r.
For every 1 i m,ifli¯
tiis p¯
tor p¯
t, then p¯
t
Δr r . Using this fact we conclude that p¯
tΔr r0,
Therefore, rm
i1li¯
tiif and only if r0m
i1li¯
ti. But
we assumed that rm
i1li¯
ti, then r0m
i1li¯
ti;a
contradiction.
Proof of Theorem 2: From theorem 3.
Proof of Corollary 2: In this case it holds:
1. For every tuple ¯a,IC DB Pi¯a, because the empty
database instance (which has only empty base rela-
tions) satisfies IC, but not P¯a.
2. For every tuple ¯a,IC DB Pi¯a, since the database
instance ri¯a, where the relation Picontains only the tu-
ple ¯aand the other relations are empty, satisfies IC,but
not Pi¯a.
Proof of Theorem 3: Suppose that rcl¯
t. Let ra repair
of r, we have that r l ¯
t. By proposition 1 we have that
r Tnl¯
t, that is
rl¯
tm
i1
mi
j1li j ¯
t¯xi j ψi¯
t¯xi(6)
We want to prove that for every iand for every sequence
of ground tuples ai,ai1, , ai mi:
rmi
j1li j ¯
t¯ai j ψi¯
t¯ai(7)
To do this, first we are going to prove that forevery i S
and for every sequence of ground tuples ai,ai1, , ai mi:
rmi
j1li j ¯
t¯ai j ψi¯
t¯ai(8)
This is immediately obtained when rψi¯
t¯ai. As-
sume that rψi¯
t¯ai. We know that ψionly mentions
built-in predicates, thus for every repair rof rwe have that
rψi¯
t¯ai. Therefore, by (6) we conclude that for every
repair rof r:
rmi
j1li j ¯
t¯ai j ψi¯
t¯ai
By proposition 2 we conclude(8). Thus we have that
r l ¯
ti S
mi
j1li j ¯
t¯xi j ψi¯
t¯xi
but by the second condition in the hypothesis of the theorem
we conclude that:
r l ¯
tm
i1
mi
j1li j ¯
t¯xi j ψi¯
t¯xi
Proof of Theorem 4: Suppose that IC is acyclic, then
there exists fas in the definition 14. We are going to prove
by induction on kthat for every literal name l,iff l k,
then Tk1l¯x Tk2l¯x
(I) If k0. We know that that for every literal name l,
f l 0. Therefore, everyintegrityconstraint containing l
is of the form l¯xψ¯y, where ψonly mentions built-
in predicates. This is because if there were any other literal
lin the integrity constraint, we would have f l f l 0.
Then T1l¯x T2l¯x.
(II) Suppose that the property is true for every m k.We
know that Tk2l¯xis of the form:
l¯xm
i1
¯
Qi
mi
j1Tk1li j ¯xi j ψi¯xi
where ¯
Qiis a sequence of quantifiers over all the variables
¯xi1,,¯xi mi,¯xinot appearing in ¯x, and Tk1l¯xis of the
form:
l¯xm
i1
¯
Qi
mi
j1Tkli j ¯xi j ψi¯xi
By definition of f, we know that for every literal name li j
in the previous formulas, f li j k. Then by induction
hypothesis Tkl¯xi j Tk1li j ¯xi j (since if Tml¯x
Tm1l¯x, then for every n m,Tnl¯x Tn1l¯x).
( ) Suppose that for every literal name l,T
ωl¯xis fi-
nite. The for every literal name lthere exists a first natu-
ral number ksuch that Tkl¯xTk1l¯x. Let us de-
fine a function f, from the literal names into the natural
number, by f l k (kas before). We can show that this
is a well defined function that behaves as in definition 14:
since if m
i1li¯xiψ¯y IC, then for every 1 s m,
Tf lsls¯xsis of the form
ls¯xs¯
Qs1
i1Tf ls1li¯xi
m
is1Tf ls1li¯xiψ¯yθ¯xs(9)
where ¯
Qis a sequence of quantifiers over all the variables
¯x1,,¯xm,¯y, not appearing in ¯xs, and Tf ls1ls¯xsis
of the form
ls¯xs¯
Qs1
i1Tf lsli¯xi
m
is1Tf lsli¯xiψ¯yθ¯xs(10)
By definition of f,Tf lsls¯xsTf ls1ls¯xs. Then,
by the form of (9) and (10), we conclude that for every i s,
Tf ls1li¯xiTf lsli¯xi, and then, again by defini-
tion of f,f lif ls.
Proof of Corollary 3: The following stratification function
from literals to can be defined: f Pi0 and f Pj1,
where PiPjare relation names.
Proof of Theorem 5: For uniform constraints the residues
do not contain quantifiers. Therefore Tnl¯xfor every n
0 is quantifier-free and contains only the variables that occur
in ¯x. There are only finitely many inequivalent formulas with
this property,and thus Tωl¯xis finite.
Lemma 6. If Tnl¯xis of the form:
l¯xm
i1¯xi¯yiCi¯x¯xiψi¯x¯yi
then Tn1l¯xis of the form:
l¯xm
i1¯xi¯yiT1Ci¯x¯xiψi¯x¯yi
Lemma 7. If for a ground tuple ¯a,Tnl¯ak
j1lj¯a¯zj,
then Tn1l¯ak
j1T1lj¯a¯zj.
Proof of Theorem 6: Suppose that for a natural number n,
¯xTnl¯xTn1l¯xis a valid sentence. We are going
to prove that for every m n, ¯xTml¯xTm1l¯xis
a valid sentence, by induction on m.
(I) If m n, by hypothesis.
(II) Suppose that ¯xTml¯xTm1l¯xis a valid sen-
tence. For every clause k
j1lj¯x¯zjψ¯x¯zin Tm1l¯x
and for every ground tuple ¯awe have that
Tml¯ak
j1lj¯a¯zjψ¯a¯z
By lemma 7 and considering that ψonly mentions built-in
predicates we have that Tm1l¯ak
j1T1lj¯a¯zj
ψ¯a¯z, and from this and lemma 6 we can conclude that
¯xTm1l¯xTm2l¯xis a validsentence.
Proof of Theorem 7: Let Q¯xbe a domain independent
queryand ra databaseinstance. DefineAn¯
t r TnQ¯
t.
We know that for everyn:An1An, therefore A Aii
ωis a family of subsets of A0. But A0is finite because Q¯x
is a domain independent query. Thus, there exists a minimal
element Amin A. For this element, it holds that for every
k m:AmAk, since AkAm.