# Laconic schema mappings: computing core universal solutions by means of SQL queries

**ABSTRACT** We present a new method for computing core universal solutions in data exchange settings specified by source-to-target dependencies, by means of SQL queries. Unlike previously known algorithms, which are recursive in nature, our method can be implemented directly on top of any DBMS. Our method is based on the new notion of a laconic schema mapping. A laconic schema mapping is a schema mapping for which the canonical universal solution is the core universal solution. We give a procedure by which every schema mapping specified by FO s-t tgds can be turned into a laconic schema mapping specified by FO s-t tgds that may refer to a linear order on the domain of the source instance. We show that our results are optimal, in the sense that the linear order is necessary and the method cannot be extended to schema mapping involving target constraints.

**0**Bookmarks

**·**

**101**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Schema mapping is becoming pervasive in all data transformation, exchange, and integration tasks. It brings to the surface the problem of differences and mismatches between heterogeneous formats and models, respectively, used in source and target databases to be mapped one to another. In this chapter, we start by describing the problem of schema mapping, its background, and technical implications. Then, we outline the early schema mapping systems, along with the new generation of schema mapping tools. Moving from the former to the latter entailed a dramatic change in the performance of mapping generation algorithms. Finally, we conclude the chapter by revisiting the query answering techniques allowed by the mappings, and by discussing useful applications and future and current developments of schema mapping tools.12/2010: pages 111-147; - SourceAvailable from: Giansalvatore Mecca
##### Conference Paper: Core schema mappings.

[Show abstract] [Hide abstract]

**ABSTRACT:**Research has investigated mappings among data sources under two perspectives. On one side, there are studies of practical tools for schema mapping generation; these focus on algorithms to generate mappings based on visual specifications provided by users. On the other side, we have theoretical researches about data exchange. These study how to generate a solution - i.e., a target instance - given a set of mappings usually specified as tuple generating dependencies. However, despite the fact that the notion of a core of a data exchange solution has been formally identified as an optimal solution, there are yet no mapping systems that support core computations. In this paper we introduce several new algorithms that contribute to bridge the gap between the practice of mapping generation and the theory of data exchange. We show how, given a mapping scenario, it is possible to generate an executable script that computes core solutions for the corresponding data exchange problem. The algorithms have been implemented and tested using common runtime engines to show that they guarantee very good performances, orders of magnitudes better than those of known algorithms that compute the core as a post-processing step.Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009; 01/2009 - SourceAvailable from: Yannis Velegrakis[Show abstract] [Hide abstract]

**ABSTRACT:**The increasing demand of matching and mapping tasks in modern integration scenarios has led to a plethora of tools for facilitating these tasks. While the plethora made these tools available to a broader audience, it led to some form of confusion regarding the exact nature, goals, core functionalities, expected features, and basic capabilities of these tools. Above all, it made performance measurements of these systems and their distinction a difficult task. The need for design and development of comparison standards that will allow the evaluation of these tools is becoming apparent. These standards are particularly important to mapping and matching system users, since they allow them to evaluate the relative merits of the systems and take the right business decisions. They are also important to mapping system developers, since they offer a way of comparing the system against competitors, and motivating improvements and further development. Finally, they are important to researchers as they serve as illustrations of the existing system limitations, triggering further research in the area. In this work, we provide a generic overview of the existing efforts on benchmarking schema matching and mapping tasks. We offer a comprehensive description of the problem, list the basic comparison criteria and techniques, and provide a description of the main functionalities and characteristics of existing systems.12/2010: pages 253-291;

Page 1

arXiv:0903.1953v1 [cs.DB] 11 Mar 2009

Laconic schema mappings: computing core universal solutions

by means of SQL queries⋆

Balder ten Cate1, Laura Chiticariu2, Phokion Kolaitis3, and Wang-Chiew Tan4

1University of Amsterdam

2IBM Almaden

3UC Santa Cruz and IBM Almaden

4UC Santa Cruz

Abstract. We present a new method for computing core universal solutions in data exchange

settings specified by source-to-target dependencies, by means of SQL queries. Unlike previously

known algorithms, which are recursive in nature, our method can be implemented directly on

top of any DBMS. Our method is based on the new notion of a laconic schema mapping. A

laconic schema mapping is a schema mapping for which the canonical universal solution is the

core universal solution. We give a procedure by which every schema mapping specified by FO s-t

tgds can be turned into a laconic schema mapping specified by FO s-t tgds that may refer to a

linear order on the domain of the source instance. We show that our results are optimal, in the

sense that the linear order is necessary and the method cannot be extended to schema mapping

involving target constraints.

1Introduction

We present a new method for computing core universal solutions in data exchange settings specified by

source-to-target dependencies, by means of SQL queries. Unlike previously known algorithms, which

are recursive in nature, our method can be implemented directly on top of any DBMS. Our method is

based on the new notion of a laconic schema mapping. A laconic schema mapping is a schema mapping

for which the canonical universal solution is the core universal solution. We give a procedure by which

every schema mapping specified by FO s-t tgds can be turned into a laconic schema mapping specified

by FO s-t tgds that may refer to a linear order on the domain of the source instance.

Outline of the paper: In Section 2, we recall basic notions and facts about schema mappings. Section 3

explains what it means to compute a target instance by means of SQL queries, and we state our main

result. Section 4 introduces the notion of laconicity, and contains some initial observations In Section 5,

we present our main result, namely a method for transforming any schema mapping specified by FO

s-t tgds into a laconic schema mapping specified by FO s-t tgds asssuming a linear order. In Section 6,

we show that our results cannot be extended to the case with target constraints.

2 Preliminaries

In this section, we recall basic notions from data exchange and fix our notation.

⋆This work was carried out during a visit of the first author to UC Santa Cruz and IBM Almaden. The work

of the first author funded by the Netherlands Organization of Scientific Research (NWO) grant 639.021.508

and NSF grant IIS-0430994. The work of the third author partly funded by NSF grant IIS-0430994. The

work of the fourth author partly funded by NSF CAREER Award IIS-0347065 and NSF grant IIS-0430994.

Page 2

2.1Instances and homomorphisms

Fix disjoint infinite sets of constant values Cons and null values Nulls, and let < be a linear order on

Cons. We consider instances whose values are from Cons ∪ Nulls. We use dom(I) to denote the set of

values that occur in facts in the instance I. A homomorphism h : I → J, with I,J instances of the

same schema, is a function h : Cons∪Nulls → Cons∪Nulls with h(a) = a for all a ∈ Cons, such that for

all relations R and all tuples of (constant or null) values (v1,...,vn) ∈ RI, (h(v1),...,h(vn)) ∈ RJ.

Instances I,J are homomorphically equivalent if there are homomorphisms h : I → J and h′: J → I.

An isomorphism h : I∼= J is a homomorphism that is a bijection between dom(I) and dom(J) and

that preserves truth of atomic formulas in both directions. Intuitively, nulls act as placeholders for

actual (constant) values, and a homomorphism from I to J captures the fact that J “contains more,

or at least as much information” as I.

The fact graph of an instance I is the graph whose nodes are the facts Rv (with R a k-ary relation

and v ∈ (Cons ∪ Nulls)k, k ≥ 0) true in I, and such that there is an edge between two facts if they

have a null value in common.

We will denote by CQ, UCQ, and FO the set of conjunctive queries, unions of conjunctive queries,

and first-order queries, respectively, and CQ<, UCQ<, and FO<are defined similarly, except that the

queries may refer to the linear order. Thus, unless indicated explicitly, it is assumed that queries do

not refer to the linear order. For any query q and instance I, we denote by q(I) the answers of q in I,

and we denote by q(I)↓the ground answers of q, i.e., q(I)↓= q(I) ∩ Conskfor k the arity of q.

2.2Schema mappings, universal solutions, and certain answers

Let S and T be disjoint schemas, called the source schema and the target schema. As usual in data

exchange, whenever we speak of a source instance, we will mean an instance of S whose values belong

to Cons, and when we speak of a target instance, we will mean a instance of T whose values may come

from Cons ∪ Nulls.

A schema mapping is a triple M = (S,T,Σst), where S and T are the source and target schemas

and Σst is a finite set of sentences of some logical language defining a class of pairs of instances

?I,J?. Here, ?I,J? denotes union of a source instance I and a target instance J, which is itself an

instance over the joint schema S∪T, and the logical languages we consider are presented below. Two

schema mappings, M = (S,T,Σst) and M′= (S,T,Σ′

and Σ′

stare logically equivalent, i.e., satisfied by the same pairs of instances. Given a schema mapping

M = (S,T,Σst) and a source instance I, a solution for I with respec to M is a target instance J such

that ?I,J? satisfies Σst. We denote the set of solutions for I with respect to M by SolM(I), or simply

Sol(I) when the schema mapping is clear from the context.

The concrete logical languages that we will consider for the specification of Σstare the following. A

source-to-target tuple generating dependency (s-t tgd) is a first-order sentence of the form ∀x(φ(x) →

∃y.ψ(x,y)), where φ is a conjunction of atomic formulas over S and ψ is a conjunction of atomic

formulas over T, such that each variable in x occurs in φ. A more general class of constraints called

FO s-t tgds is defined analogously, except that the antecedent is allowed to be any FO query over S.

Similarly, L s-t tgds can be defined for any query language L. A LAV s-t tgd, finally, is an s-t tgd in

which φ is a single atomic formula. To simplify notation, we will typically drop the universal quantifiers

when writing (L) s-t tgds.

Given a source instance I, a schema mapping M, and a target query q, we will denote by

certainM,q(I) the set of certain answers to q in I with respect to M, i.e., the intersection

?

answers of q, no matter which solution of I one picks. There are two methods to compute certain

answers to a conjunctive query. The first method uses universal solutions and the second uses query

rewriting.

A universal solution for a source instance I with respect to a schema mapping M is a solution

J ∈ SolM(I) such that, for every J′∈ SolM(I), there is a homomorphism from J to J′. It was shown

st), are said to be logically equivalent if Σst

J∈SolM(I)q(J). In other words, a tuple of values is a certain answer to q if belongs to the set of

Page 3

in [1] that the certain answers for a conjunctive target query can be obtained simply by evaluating

the query in a universal solution. Moreover, universal solutions are guaranteed to exist for schema

mappings specified by L s-t tgds, for any query language L.

Theorem 1 ([1]). For all schema mappings M, source instances I, conjunctive queries q, and uni-

versal solutions J ∈ Sol(I), certainM(q)(I) = q(J)↓.

Theorem 2 ([1]). For every schema mapping M specified by L s-t tgds, with L any query language,

and for every source instance I, there is a universal solution for I with respect to M.

Theorem 2 was shown in [1] for schema mappings specified by s-t tgds but the same argument

applies to schema mappigns specified by L s-t tgds, for any query language L. We will discuss concrete

methods for constructing universal solutions in Section 3.

The second method for computing certain answers to conjunctive queries is by rewriting the given

target query to a query over the source that directly computes the certain answers to the original

query.

Theorem 3. Let L be any of UCQ, UCQ<, FO, FO<. Then for every schema mapping M specified

by s-t tgds and for every L-query q over T, one can compute in exponential time an L-query over S

defining certainM,q.

There are various ways in which such certain answer queries can be obtained. One possibility is to

split up the schema mapping M into a composition M1◦ M2, with M1specified by full s-t tgds and

M2specified by LAV s-t tgds, and then to successively apply the known query rewriting techniques

of MiniCon [8] and full s-t tgd unfolding (cf. [7]). In [9], an alternative rewriting method was given for

the case of L = FO(<), which can be used to compute in polynomial time an FO(<)source query q′

defining certainM,qover source instances whose domain contains at least two elements.

2.3Core universal solutions

A source instance can have many universal solutions. Among these, the core universal solution plays

a special role. A target instance J is said to be a core if there is no proper subinstance J′⊆ J

and homomorphism h : J → J′. There is equivalent definition in terms of retractions. A subinstance

J′⊆ J is called a retract of J if there is a homomorphism h : J → J′such that for all a ∈ dom(J′),

h(a) = a. The corresponding homomorphism h is called a retraction. A retract is proper if it is a

proper subinstance of the original instance. A core of a target instance J is a retract of J that has

itself no proper retracts. Every (finite) target instance has a unique core, up to isomorphism. Moreover,

two instances are homomorphically equivalent iff they have isomorphic cores. It follows that, for every

schema mapping M, every source instance has at most one core universal solution up to isomorphism.

Indeed, if the schema mapping M is specified by FO s-t tgds then each source instance has exactly

one core universal solution up to isomorphism [3]. We will therefore freely speak of the core universal

solution.

It has been convincingly argued that, among all universal solutions for a source instance, the core

universal solution is the preferred solution. One important reason is that the core universal solution is

the smallest universal solution: if J is the core universal solution for a source instance I with respect to

a schema mapping M, and J′is any other solution universal solution for I, i.e., one that is not a core,

then |J| < |J′|. Consequently, the core universal solution is the universal solution that is least expensive

to materialize. We add to this a second virtue of the core universal solution, namely that, among all

universal solutions, it is the most conservative one in terms of the answers that it assigns to conjunctive

queries with inequalities. We propose another reason to be interested in the core universal solution,

namely that it is the solution that satisfies the most dependencies. In many practical data exchange

settings, one is interested in solutions satisfying certain target dependencies. One way to obtain such

solutions is to include the relevant target dependencies in the specification of the schema mapping.

Page 4

If the target dependencies satisfy certain syntactic requirements (in particular, if they form a weakly

acyclic set of target tgds and target egds), then a solution satisfying these target dependencies can be

obtained by means of the chase. On the other hand, sometimes it happens that the universal solution

one constructs without taking into account the target dependencies happens to satisfy the target

dependencies. Whether this happens depends very much on which universal solution is constructed.

For example if M is the schema mapping specified by the s-t tgd Rx → ∃y.Syx, I is any source instance

and J a universal solution, then the first attribute of S is not necessarily a key in J. However, if J is

the core universal solution, then it will be a key. In fact, it turns out that the core universal solution is

the universal solution that maximizes the set of valid target dependencies. To make this precise, let a

disjunctive target dependency be a first-order sentence of the from ∀xφ(x) →?

φ,ψiare conjunctions of atomic formulas over the target schema T and/or equalities. Then we have:

i∃yi.ψi(x,yi)), where

Theorem 4. Let M be any schema mapping, I be any source instance, J the core universal solution

of I, and J′any other universal solution of I, i.e., one that is not a core. Then

1. Every disjunctive dependency valid on J′is valid on J, and

2. Some disjunctive dependency valid on J is not valid on J′.

Proof. The first half of the result follows from the fact that J is a retract of J′and disjunctive

dependencies are preserved when going from an instance to one of its retract. This is shown in [5] for

non-disjunctive embedded dependencies, but the same argument applies to disjunctive dependencies.

To prove the second half, pick fresh variables x, one for each value (constant or null) in the domain

of J′, and let ψ(x) be the conjunction of all facts that are true in J′under the natural assignment.

Consider the disjunctive dependency ∀x(ψ(x) →

clearly not true in J′but it is trivially true in J, since J, being a proper retract of J′, contains strictly

fewer nulls than J′.

?

i?=j(xi = xj)). This disjunctive dependency is

⊓ ⊔

Concerning the complexity of computing core universal solutions, we have the following:

Theorem 5 ([3]). For fixed schema mappings specified by FO<s-t tgds, given a source instance, a

core universal solution can be computed in polynomial time.

Strictly speaking, in [3] this was only shown for schema mappings specified by s-t tgds. However the

same argument applies to FO<s-t tgds. In fact, this holds for richer data exchange settings, there the

schema mapping specification may contain also target constraints (specifically, target egds and weakly

acyclic target tgds). Moreover, several algorithms for obtaining core universal solutions in polynomial

time have been proposed.

3 Computing universal solutions with SQL queries

There is a discrepancy between the methods for computing universal solutions commonly presented

in the data exchange literature, and the methods actually employed by data exchange tools. In the

data exchange literature, methods for computing universal solutions are often presented in the form

of a chase procedures. In practical implementations such as Clio, on the other hand, it is common to

compute universal solutions using SQL queries, thus leveraging the capabilities of existing DBMSs. We

briefly review here both approaches, and explain how canonical universal solutions can be computed

using SQL queries.

The simplest and most well known method for computing universal solutions is the naive chase5

The algorithm is described in Figure 1. For a source instance I and schema mapping M specified by

FO(<)s-t tgds, the result of applying the naive chase is called the canonical universal solution of I with

respect to M. Note that the result of the naive chase is unique up to isomorphism, since it depends

only on the exact choice of fresh nulls. Also note that, even if two schema mappings are logically

equivalent, they may assign different canonical universal solutions to a source instance. We will now

5There are also other, more sophisticated versions of the chase, but they will not be relevant for most of what

we discuss, since we will be interested in computing solutions by means of SQL queries anyway. We will

briefly mention one variant of the chase later on.

Page 5

Input: A schema mapping M = (S,T,Σst) and a source instance I

Output: A target instance J that is a universal solution for I w.r.t. M

J := ∅;

for all ∀x(φ(x) → ∃y.ψ(x,y)) ∈ Σst do

for all tuples of constants a such that I |= φ(a) do

pick a fresh null value Ni for each yi and add the facts in ψ(a,N) to J

end for

end for;

return J

Fig.1. Naive chase method for computing universal solutions.

discuss how canonical universal solutions can be equivalently computed by means of SQL queries. The

idea is very simple, and before giving a rigorous presentation, we illustrate it by an example. Consider

the schema mapping specified by the s-t tgds

Rx1x2→ ∃y.(Sx1y ∧ Tx2y)

Rxx

→ Sxx

We first Skolemize the dependencies, and split them so that the right hand side consists of a single

conjunct. In this way, we get

Rx1x2→ Sx1f(x1,x2)

Rx1x2→ Tx2f(x1,x2)

Rxx

→ Sxx

Next, for each target relation R we collect the dependencies that contain R in the right hand side, and

we interpret these as constituting a definition of R. In this way, we get the following definitions of S

and T:

S := {(x1,f(x1,x2)) | Rx1x2} ∪ {(x,x) | Rxx}

T := {(x2,f(x1,x2)) | Rx1x2}

In general, the definition of a k-ary target relation R ∈ T will be of the shape

R := {(t1(x),...,tk(x)) | φ(x)} ∪ ··· ∪ {(t′

1(x′),...,t′

k(x)) | φ′(x′)}

(1)

where t1,...,tk,...,t′

Since FO queries correspond to SQL queries, one can easily use a relational DBMS in order to compute

the tuples in the relation R.

The general idea behind the construction of the FO queries should be clear from the example.

However, giving a precise definition of what it means to compute a target instance by means of

SQL queries require a bit of care. We need to assume some structure on the set of nulls Nulls. Fix

a countably infinite set of function symbols of arity n, for each n ≥ 0. For any set X, denote by

Terms[X] be the set of all terms built up from elements of X using these function symbols, and denote

by PTerms[X] ⊆ Terms[X] the set of all proper terms, i.e., those with at least one occurrence of a

function symbol. For instance, if g is a unary function and h is a binary function, then h(g(x),y),

g(x) and x belong to Terms[{x,y}], but only the first two belong to PTerms[{x,y}]. It is important

to distinguish between proper terms built up from constants on the one hand and constants on the

other hand, as the former will be treated as nulls and the latter not. More precisely, we assume that

PTerms[Cons] ⊆ Nulls. Recall that Cons ∩ Nulls = ∅.

1,...,t′

kare terms and φ,...,φ′are first-order queries over the source schema.

Definition 1 (L-term interpretation). Let L be any query language. An L-term interpretation Π

is a map assigning to each k-ary relation symbol R ∈ T a union of expressions of the form (1) where

t1,...,tk∈ Terms[x] and φ(x) is an L-query over S.

Page 6

Given a source instance I, an L-term interpretation Π generates an target instance Π(I), in the

obvious way. Note that Π(I) may contain constants as well as nulls. Although the program specifies

exactly which nulls are generated, we will consider Π(I) only up to isomorphism, and hence the

meaning of an L-term interpretation does not depend on exactly which function symbols it uses.

The previous example shows

Proposition 1. Let L be any query language. For every schema mapping specified by L s-t tgds there

is an L-term interpretation that yields for each source instance the canonical universal solution.

Incidentally, even for schema mappings specified by SO tgds, as defined in [4], FO-term interpre-

tations can be constructed that compute the canonical universal solution. However, the above suffices

for present purposes.

On the other hand,

Proposition 2. No FO-term interpretation yields for each source instance the core universal solution

with respect to the schema mapping specified by the FO (in fact LAV) s-t tgd Rxy → ∃z.(Sxz ∧ Syz).

Proof. The argument uses the fact that FO formulas are invariant for automorphisms. Let I be the

source instance whose domain consists of the constants a,b,c,d, and such that R is the total relation

over this domain. Note that every permutation of the domain is an automorphism of I. Suppose for

the sake of contradiction that there is an FO-term interpretation Π such that the Π(I) is the core

universal solution of I. Then the domain of Π(I) consists of the constants a,b,c,d and a distinct null

term, call it N{x,y}∈ PTerms[x], for each pair of distinct constants x,y ∈ {a,b,c,d}, and Π(I) contains

the facts RxN{x,y}and RyN{x,y}for each of these nulls N{x,y}. Now consider the term N{a,b}. We

can distinguish two cases. The first case is where the term N{a,b}does not contain any constants as

arguments. In this case, it follows from the invariance of FO formulas for automorphisms that Π(I)

contains RxN{a,b}for every x ∈ {a,b,c,d}, which is clearly not true. The second case is where N{a,b}

contains at least one constant as an argument. If N{a,b}contains the constant a or b then let t′be

obtained by switching all occurrences of a and b in N{a,b}, otherwise let t′be obtained by switching

all occurrences of c and d in N{a,b}. Either way, we obtain that there is a second null, namely t′, which

is distinct from N{a,b}, and which stands in exactly the same relations to a and b as N{a,b}does. This

again contradicts our assumption that J is the core universal solution of I.

Things change in the presence of a linear order. We will show that every schema mapping specified

by FO<s-t tgds is logically equivalent to a laconic schema mapping specified by FO<s-t tgds, i.e.,

one for which the canonical universal solution is always a core. In particular, given Proposition 1, this

shows:

Theorem 6. For every schema mapping specified by FO<s-t tgds there is a FO<-term interpretation

that yields for each source instance the core universal solution.

In the case of the example from Proposition 2, the FO<-term interpretation Π computing the core

universal solution is given by

Π(S) = {(x1,f(x1,x2)) | (Rx1x2∨ Rx2x1) ∧ x1≤ x2}

∪ {(x2,f(x1,x2)) | (Rx1x2∨ Rx2x1) ∧ x1≤ x2}

Furthermore, we will show that every schema mapping defined by FO s-t tgds whose right-hand-side

contains at most one atomic formula is equivalent to a laconic schema mapping specified by FO s-t tgds,

and therefore, its core universal solutions can be computed by means of an FO-term interpretation. In

other words, in this case the linear order is not needed. Note that in the example from Proposition 2,

the right-hand-size of the s-t tgd consists of two atomic formulas.

In the next section, we formally introduce the notion of laconicity. In Section 5, we show that every

schema mapping specified by FO<s-t tgds is logically equivalent to a laconic schema mapping specified

by FO<s-t tgds.

Page 7

Non-laconic schema mappingLogically equivalent laconic schema mapping

(a)

Px → ∃yz.Rxy ∧ Rxz

(a′)

Px → ∃y.Rxy

(b)

Px → ∃y.Rxy

Px → Rxx

(b′)

Px → Rxx

(c)

Rxy → Sxy

Px → ∃y.Sxy

(c′)

Rxy → Sxy

Px ∧ ¬∃y.Rxy → ∃y.Sxy

(d)

Rxy → ∃z.Sxyz

Rxx → Sxxx

(d′)

Rxy ∧ x ?= y → ∃z.Sxyz

Rxx → Sxxx

(e)

Rxy → ∃z.(Sxz ∧ Syz)(e′)(Rxy ∨ Ryx) ∧ x ≤ y → ∃z.(Sxz ∧ Syz)

Fig.2. Examples of non-laconic schema mappings and their laconic equivalents.

4Laconicity

A schema mapping is laconic if the canonical universal solution of a source instance coincides with the

core universal solution. In particular, for laconic schema mappings the core universal solution can be

computed using any method for computing canonical universal solutions, such as the ones described

in Section 3. In this section, we discuss some examples and general observations concerning laconicity,

in order to make the reader familiar with the notion. In the next section we will focus on constructing

laconic schema mappings. In particular, we will show there that every schema mapping specified by

FO<s-t tgds is logically equivalent to a laconic schema mapping specified by FO<s-t tgds.

Definition 2 (Laconicity). A schema mapping is laconic if for every source instance I, the canonical

universal solution of I with respect to M is a core.

Note that the definition only makes sense for schema mappings specified by FO(<)s-t tgds, because

we have defined the notion of a canonical universal solution only for such schema mappings.

Examples of laconic and non-laconic schema mappings are given in Figure 2. It is easy to see that

every schema mapping specified by full s-t tgds only (i.e., s-t tgds without existential quantifiers) is

laconic. Indeed, in this case, the canonical universal solution does not contain any nulls, and hence is

guaranteed to be a core. Thus, being specified by full s-t tgds is a sufficient condition for laconicity,

although a rather uninteresting one. The following provides us with a necessary condition, which

explains why the schema mapping in Figure 2(a) is not laconic. Given an s-t tgd ∀x(φ → ∃y.ψ), by

the canonical instance of ψ, we will mean the target instance whose facts are the conjuncts of ψ, where

the x variables are treated as constants and the y variables as nulls.

Proposition 3. If a schema mapping (S,T,Σst) specified by s-t tgds is laconic, then for each s-t tgd

∀x(φ → ∃y.ψ) ∈ Σst, the canonical instance of ψ is a core.

Proof. We argue by contraposition. Suppose the canonical instance J of ψ is not a core. Let J′⊆ J

be the core of J and h : J → J′the corresponding retraction.

Take any source instance I in which φ is satisfied under an assignment g, and let K be the canonical

universal solution of I. Since φ is true in I under the assignment g and by the construction of the

canonical universal solution, we have that g extends to a homomorphism ? g : J → K sending the y

values to disjoint nulls. In fact, we may assume without loss of generality that ? g(yi) = yi for each

yi∈ y. Moreover, by the construction of canonical universal solutions these null values will not play

any further role in subsequent steps of the chase. In particular, they do not participate in any facts of

K other than those in the ? g-image of J. By the ? g-image of J we mean the subinstance of K containing

those facts that are in the image of the homomorphism ? g : J → K.

Page 8

Finally, let K′be the subinstance of K in which the ? g-image of J is replaced by the ? g-image of J′.

Then h : J → J′naturally extends to a homomorphism h′: K → K′. Since K′is a proper subinstance

of K, we conclude that K is not a core, and therefore, M is not laconic.

⊓ ⊔

In the case of schema mapping (e) in Figure 2, the linear order is used in order to obtain a logically

equivalent laconic schema mapping (e′). Note that the schema mapping (e′) is order-invariant in the

sense that the set of solutions of a source instance I does not depend on the interpretation of the <

relation in I, as long as it is a linear order. Still, the use of the linear order cannot be avoided, as

follows from Proposition 2. What is really going on, in this example, is that the right hand side of (e)

has a non-trivial automorphism (viz. the map sending x to y and vice versa), and the conjunct x ≤ y

in the antecedent of (e′) plays, intuitively, the role of a tie-breaker, cf. Section 5.3.

Testing whether a given schema mapping is laconic is not a tractable problem:

Proposition 4. Testing laconicity of schema mappings specified by FO s-t tgds is undecidable. It is

NP-hard already for schema mappings specified by LAV s-t tgds.

Proof. The first claim is proved by a reduction from the satisfiability problem for first-order logic on

finite instances, which is undecidable by Trakhtenbrot’s theorem. For any first-order formula φ(x), let

Mφbe the schema mapping containing only one dependency, namely ∀x(φ(x) → ∃y1y2.(Py1∧Py2)).

It is easy to see that Mφis laconic iff φ is not satisfiable.

The NP-hardness in the case of LAV mappings is proved by a reduction from the core testing

problem (given a graph, is it a core), which is known to be NP-complete [6]. Consider any graph

G = (V,E) and let ∃y.φ(y) be the Boolean canonical conjunctive query of G. Let MGbe the schema

mapping whose only dependency is ∀x.(Px → ∃y.(φ(y) ∧?

core.

iRxyi). Then MG is laconic iff G is a

⊓ ⊔

5Making schema mappings laconic

In this section, we present a procedure for transforming any schema mapping M specified by FO<s-t

tgds into a logically equivalent laconic schema mapping M′specified by FO<s-t tgds. To simplify the

notation, throughout this section, we assume a fixed input schema mapping M = (S,T,Σst), with Σst

a finite set of FO<s-t tgds. Moreover, we will assume that the FO<s-t tgds ∀x(φ → ∃y.ψ) ∈ Σstare

non-decomposable [3], meaning that the fact graph of ∃y.φ(x,y) (where the facts are the conjuncts of

φ and two facts are connected if they have an existentially quantified variable in common) is connected.

This assumption is harmless: every FO<s-t tgd can be decomposed into a logically equivalent finite set

of non-decomposable FO<s-t tgds (with identical left-hand-sides, one for each connected component

of the fact graph) in polynomial time.

The outline of the procedure for making schema mappings laconic is as follows (the items correspond

to subsections of the present section):

1. Construct a finite list “fact block types”: descriptions of potential fact blocks in core universal

solutions.

2. Compute for each of the fact block types a precondition: a first-order formula over the source

schema that tells exactly when the core universal solution will contain a fact block of the given

type.

3. If any of the fact block types has non-trivial automorphisms, add an additional side condition,

consisting of a Boolean combination of formulas of the form xi < xj, in order to avoid that

multiple copies of the same fact block are created in the canonical universal solution.

4. Construct the new schema mapping M′= (S,T,Σ′

of the fact block types. The left-hand-side of the FO<s-t tgd is the conjunction of the precondition

and side condition of the respective fact block type, while the right-hand-side is the fact block type

itself.

st), where Σ′

stcontains an FO<s-t tgd for each

Page 9

We illustrate the approach by means of an example. The technical notions that we use in discussing

the example will be formally defined in the next subsections.

Example 1. Consider the schema mapping M = ({P,Q},{R1,R2},Σst), where Σst consists of the

dependencies

Px → ∃y.R1xy

Qx → ∃yzu.(R2xy ∧ R2zy ∧ R1zu)

In this case, there are exactly three relevant fact block types. They are listed below, together with

their preconditions.

Fact block type

t1(x;y)

t2(x;yzu) = {R2xy,R2zy,R1zu}

t3(x;y)= {R2xy}

Precondition

pret1(x) = Px

pret2(x) = Qx ∧ ¬Px

pret3(x) = Qx ∧ Px

= {R1xy}

We use the notation t(x;y) for a fact block type to indicate that the variables x stand for constants

and the variables y stand for distinct nulls.

As it happens, the above fact block types have no non-trivial automorphisms. Hence, no side

conditions need to be added, and Σ′

stwill consist of the following FO s-t tgds:

Px

Qx ∧ ¬Px → ∃yzu.(R2xy ∧ R2zy ∧ R1zu)

Qx ∧ Px

→ ∃y.(R2xy)

→ ∃y.R1xy

The reader may verify that in this case, the obtained schema mapping is indeed laconic. We will prove

in Section 5.4 that the output of our transformation is guaranteed to be a laconic schema mapping

that is logically equivalent to the input schema mapping.

⊣

We will now proceed to define all the notions appearing in this example.

5.1 Generating the fact block types

Recall that the fact graph of an instance is the graph whose nodes are the facts of the instance, and

such that there is an edge between two facts if they have a null value in common. A fact block, or f-block

for short, of an instance is a connected component of the fact graph of the instance. We know from

[2] that, for any schema mapping M specified by FO<s-t tgds, the size of f-blocks in core universal

solutions for M is bounded. Consequently, there is a finite number of f-block types, such that every

core universal solution consists of f-blocks of these types. This is a crucial observation that we will

exploit in our construction.

Formally, an f-block type t(x;y) will be a finite set of atomic formulas in x,y, where x and y are

disjoint sets of variables. We will refer to x as the constant variables of t and y as the null variables.

We say that an f-block type t(x;y) is a renaming of an f-block type t′(x′;y′) if there is a bijection

f between x and x′and between y and y′, such that t′= {R(f(v)) | R(v) ∈ t}. In this case, we

write f : t∼= t′and we call f also a renaming. We will not distinguish between f-block types that are

renamings of each other. We say that an f-block B has type t(x;y) if B can be obtained from t(x;y)

by replacing constant variables by constants and null variables to distinct nulls, i.e., if B = t(a,N)

for some sequence of constants a and sequence of distinct nulls N. Note that we require the relevant

substitution to be injective on the null variables but not necessarily on the constant variables. If a

target instance J contains a block B = t(a,N) of type t(x;y) then we say that t(x;y) is realized in

J at a. Note that, in general, an f-block type may be realized more than once at a tuple of constants

a, but this will not happen if the target instance J is a core universal solution.

We are interested in the f-block types that may be realized in core universal solutions. Eventually,

the schema mapping M′that we will construct from M will contain an FO<s-t tgd for each relevant

Page 10

f-block type. Not every f-block type as defined above can be realized. We may restrict attention to

a subclass. Below, by the canonical instance of an f-block type t(x;y) we will mean the instance

containing the facts in t(x;y), considering x as constants and y as nulls.

Definition 3. The set TypesM of f-block types generated by M consists of all f-block types t(x;y)

satisfying the following conditions:

(a) Σst contains an FO<s-t tgd ∀x′(φ(x′) → ∃y′.ψ(x′,y′)) with y ⊆ y′, and t(x,y) is the set of

conjuncts of ψ in which the variables y′− y do not occur;

(b) The canonical instance of t(x,y) is a core;

(c) The fact graph of the canonical instance of t(x;y) is connected.

If some f-block types generated by M are renamings of each other, we add only one of them to TypesM.

The main result of this subsection is:

Proposition 5. Let J be a core universal solution of a source instance I with respect to M. Then

each f-block of J has type t(x;y) for some t(x;y) ∈ TypesM.

Proof. Let B be any f-block of J. Since J is a core universal solution, it is, up to isomorphism, an

induced subinstance of the canonical universal solution J′of I. It follows that J′must have an f-block

B′such that B is the restriction of B′to domain of J. Since B′is a connect component of the fact

graph of J′, it must have been created in a single step during the naive chase. In other words, there is

an FO<s-t tgd

∀x(φ(x) → ∃y.ψ(x,y))

and an assignment g of constants to the variables x and distinct nulls to the variables y such that B′

is contained in the set of conjuncts of ψ(g(x),g(y)). Moreover, since we assume the FO<s-t tgds of

M to be non-decomposable and B′is a a connected component of the fact graph of J, B′must be

exactly the set of facts listed in ψ(g(x),g(y)). In other words, if we let t(x;y) be the set of all facts

listed in ψ, then B′has type t(x;y). Finally, let t′(x′;y′) ⊆ t(x;y) be the set of all facts from t(x;y)

containing only variables yifor which g(yi) occurs in B. Since B is the restriction of B′to the domain

of J, we have that B is of type t′(x′,y′). Moreover, the fact graph of the canonical instance of J is

connected because B is connected, and the canonical instance of t′(x′;y′) is a core, because, if it would

not be, then B would not be a core either, and hence J would not be a core either, which would lead

to a contradiction. It follows that t′(x′;y′) ∈ TypesM.

⊓ ⊔

Note that TypesMcontains only finitely many f-block types. Still, the number is in general expo-

nential in the size of the schema mapping, as the following example shows.

Example 2. Consider the schema mapping specified by the following s-t tgds:

Pix → P′

Qx → ∃y0y1...yk(Rxy0∧?

ix (for each 1 ≤ i ≤ k)

1≤i≤k(Ryiy0∧ P′

iyi))

For each S ⊆ {1,...,k}, the f-block type

tS(x;(yi)i∈S∪{0}) = {Rxy0} ∪ {Ryiy0,P′

iyi| i ∈ S}

belongs to TypesM. Indeed, each of these 2kf-block types is realized in the core universal solution

of some source instance. The example can be modified to use a fixed schemas: replace P′

Sx1x2∧ ...Sxi−1xi∧ Sxixi.

ix by Sxx1∧

⊣

The same example can be used to show that the smallest logically equivalent schema mapping that

is laconic can be exponentially longer.

Page 11

5.2Computing the precondition of an f-block type

Recall that, to simplify notation, we assume a fixed schema mapping M specified by FO<s-t tgds.

The main result of this subsection is the following, which shows that whether an f-block type is realized

in the core universal solution at a given sequence of constants a is something that can be tested by a

first-order query on the source.

Proposition 6. For each t(x;y) ∈ TypesM there is a FO<query precont(x) such that for every

source instance I with core universal solution J, and for every tuple of constants a, the following are

equivalent:

1. a ∈ precont(I)

2. t(x;y) is realized in J at a.

Proof. We first define an intermediate formula precon′

but not quite yet. For each f-block type t(x;y), let precon′

t(x) that almost satisfies the required properties,

t(x) be the following formula:

certainM(∃y.

?

t)(x)

∧

?

i?=j

¬certainM(∃y−i.

?

t[yi/yj])(x)

∧

?

i

¬∃x′.certainM(∃y−i.

?

t[yi/x′])(x,x′)

where y−i stands for the sequence y with yi removed, and t[u/v] is the result of replacing each

occurrence of u by v in t. By construction, if precont(a) holds in I, then every universal solution J

satisfies t(a,N) for some some sequence of distinct nulls N. Still, it may not be the case that t(x;y)

is realized at a, since it may be that that t(a,N) is part of a bigger f-block. To make things more

precise, we introduce the notion of an embedding. For any two f-block types, t(x;y) and t′(x′;y′), an

embedding of the first into the second is a function h mapping x into x′and mapping y injectively

into y′, such that whenever t contains an atomic formula R(z), then R(h(z)) belongs to of t′. The

embedding h is strict if t′contains an atomic formula that is not of the form R(h(z)) for any R(z) ∈ t.

Intuitively, the existence of a strict embedding means that t′describes an f-block that properly contains

the f-block described by t.

Let I be any source instance, J any core universal solution, t(x;y) ∈ TypesM, and a a sequence

of constants.

Claim 1: If t is realized in J at a, then a ∈ precon′

t(I).

Proof of claim: Clearly, since t is realized in J at a and J is a universal solution, the first conjunct of

precon′

tis satisfied. That the rest of the query is satisfied is also easily seen: otherwise J would not be

a core. End of proof of claim.

Claim 2: If a ∈ precon′

is realized at a tuple of constants a′, and there is a strict embedding h : t → t′such that ai = a′

whenever h(xi) = x′

j.

Proof of claim: It follows from the construction of precon′

the witnessing assignment for its truth must send all existential variables to distinct nulls, which belong

to the same block. By Proposition 5, the diagram of this block is a specialization of an f-block type

t′∈ TypesM. It follows that t is embedded in t′and a, together with possible some additional values

in Cons, realize t′.

t(I), then either t is realized in J at a or some f-block type t′(x′;y′) ∈ TypesM

j

t, and the definition of TypesMtypes, that

End of proof of claim.

We now define precont(x) to be the following formula:

precon′

t(x) ∧

?

t′(x′;y′) ∈ TypesM

h : t(x;y) → t′(x′;y′) a strict embedding

¬∃x′.

??

i

(xi= h(xi)) ∧ precon′

p′(x′)

?

Page 12

This formula satisfies the required conditions: a ∈ precont(I) iff t(x;y) is realized in J at a.

The left-to-right direction follows from Claim 2, while the right-to-left direction follows from Claim 1

together with the fact that J is a core.

⊓ ⊔

5.3Computing the side conditions of an f-block type

The issue we address in this subsection, namely that of non-rigid f-block types, is best explained by

an example.

Example 3. Consider again schema mapping (e) in Figure 2. This schema mapping is not laconic,

because, when a source instance contains Rab and Rba, for distinct values a,b, the canonical universal

solutions will contain two null values N, each satisfying SaN and SbN, corresponding to the two

assignments {x ?→ a,y ?→ b} and {x ?→ b,y ?→ a}. The essence of the problem is in the fact that the

right-hand-side of the dependency is, in some sense, symmetric: it is a non-trivial renaming of itself,

the renaming in question being {x ?→ y,y ?→ x}. According to the terminology that we will introduce

below, the right-hand-side of this dependency is non-rigid. Schema mapping (e′) from Figure 2 does

not suffer from this problem, because it contains x ≤ y in the antecedent, and we are assuming < to

be a linear order on the values in the source instance.

⊣

In order to formalize the intuition exhibited in the above example, we need to introduce some

terminology. We say that two f-blocks, B,B′, are copies of each other, if there is a bijection f from

Cons to Cons and from Nulls to Nulls such that f(a) = a for all a ∈ Cons and B′= {R(f(v1),...,f(vk)) |

R(v1,...,vk) ∈ B}. In other words, B′can be obtained from B by renaming null values.

Definition 4. An f-block type t(x;y) is rigid if for any two sequences of constants a,a′and for any

two sequences of distinct nulls N,N′, if t(a;N) and t(a′;N′) are copies of each other, then a = a′.

The s-t tgd from the above example is easily seen to be non-rigid. Moreover, a simple variation of

the argument in the above example shows:

Proposition 7. If an f-block type t(x;y) is non-rigid, then the schema mapping specified by the FO

(in fact LAV) s-t tgd ∀x(R(x) → ∃y.?t(x;y)) is not laconic.

In other words, if an f-block type is non-rigid, one cannot simply use it as the right-hand-side of

an s-t tgd without running the risk of non-laconicity. Fortunately, it turns out that f-block types can

be made rigid by the addition of suitable side conditions. By a side condition Φ(x) we will mean a

Boolean combination of formulas of the form xi< xjor xi= xj.

Definition 5. An f-block type t(x;y) is rigid relative to a side condition Φ(x) if for any two sequences

of constants a,a′satisfying Φ(a) and Φ(a′) and for any two sequences of distinct nulls N,N′, if

t(a;N) and t(a′;N′) are copies of each other, then a = a′.

Definition 6. A side-condition Φ(x) is safe for an f-block type t(x;y) if for every f-block t(a,N) of

type t there is a f-block t(a′,N′) of type t satisfying Φ(a′) such that the two are copies of each other.

Intuitively, safety means that the side condition is not too strong: whenever a f-block type should

be realized in a core universal solution, there will be at least one way of arranging the variables so that

the side condition is satisfied. The main result of this subsection, which will be put to use in the next

subsection, is the following:

Proposition 8. For every f-block type t(x;y) there is a side condition sidecont(x) such that t(x;y)

is rigid relative to sidecont(x), and sidecont(x) is safe for t(x;y).

Page 13

Proof. We will construct a sequence of side conditions Φ0(x),...,Φn(x) safe for t(x;y), such that

each Φi+1logically strictly implies Φi, and such that t(x;y) is rigid relative to Φn(x). Note that n is

necessarily bounded by a single exponential function in |x|. For Φ0(x) we pick the tautology ⊤, which

is trivially safe for t(x;y).

Suppose that t(x;y) is not rigid relative to Φi(x), for some i ≥ 0. By definition, this means that

there are two sequences of constants a,a′satisfying Φi(a) and Φi(a′) and two sequences of distinct

nulls N,N′, such that t(a;N) and t(a′;N′) are copies of each other, but a and a′are not the same

sequence, i.e., they differ in some coordinate. Let ψ(x) be the conjunction of all formulas of the form

xi< xjor xi= xjthat are true under the assignment sending x to a, and let Φi+1(x) = Φi(x)∧¬ψ(x).

It is clear that Φi+1 is strictly stronger than Φi. Moreover, we Φi+1 is still safe for t(x;y): consider

any f-block t(b,M) of type t(x;y). Since Φiis safe for t, we can find a f-block t(b′,M′) of type t such

that Φi(b′) the two blocks are copies of each other. If ¬ψ(b′) holds, then in fact Φi+1(b′) holds, and

we are done. Otherwise, we have that t(b′,M′) is isomorphic to t(a,N) and the preimage of t(a′,N′)

under this isomorphism will be again a copy of t(b′,M′) (and therefore also of t(b,M)) that satisfies

Φi(b′) ∧ ¬ψ(b′), i.e., Φi+1(b′).

⊓ ⊔

Incidentally, we believe the above construction of side-conditions is not the most efficient possible,

in terms of the size of the side-condition obtained. It can probably be improved.

5.4Putting things together: constructing the laconic schema mapping

Theorem 7. For each schema mapping M specified by FO<s-t tgds, there is laconic schema mapping

M′specified by FO<s-t tgds that is logically equivalent to M.

Proof. We define M′to consist of the following FO<s-t tgds. For each t(x;y) ∈ TypesM, we take

the FO<s-t tgd

∀x(precont(x) ∧ sidecont(x) → ∃y.

?

t(x;y))

In order to show that M′is laconic and logically equivalent to M (on structures where < denotes a

linear order), it is enough to show that, for every source instance I, the canonical universal solution

J of I with respect to M′is a core universal solution for I with respect to M. This follows from the

following three facts:

1. Every f-block of J is a copy of an f-block of the core universal solution of I. This follows from

Proposition 6.

2. Every f-block of the core universal solution of I is a copy of an f-block of J. This follows from

Proposition 5 and Proposition 6, together with the safety part of Proposition 8.

3. No two distinct f-blocks of J are copies of each other. This follows from the rigidity part of

Proposition 8 together with the fact that TypesMcontains no two distinct f-block type that are

renamings of each other.

⊓ ⊔

Incidentally, if the side conditions are left out, then the resulting schema mapping is still logically

equivalent to the original mapping M, but it may not be laconic. It will still satisfy a weak form of

laconicity: a variant of the chase defined in [1], which only fires dependencies whose right hand side is

not yet satisfied, will produce the core universal solution.

6 Target constraints

In this section we consider schema mappings with target constraints and we address the question

whether our main result can be extended to this setting. The answer will be negative. However, first

we need to revisit our basic notions, as some subtle issues arise in the case with target dependencies.

Page 14

It is clear that we cannot expect to compute core universal solutions for schema mappings with

target dependencies by means of FO<-term interpretations. Even for the simple schema mapping

defined by the s-t tgd Rxy → R′xy and the full target tgd R′xy ∧ R′yz → R′xz computing the core

universal solution means computing the transitive closure of R, which we know cannot be done in FO

logic even on finite ordered structures. Still, we can define a notion of laconicity for schema mappings

with target dependencies. Let M be any schema mapping specified by a finite set of FO<s-t tgds Σst

and a finite set of target tgds and target egds Σt, and let I be a source instance. We define the canonical

universal solution of I with respect to M as the target instance (if it exists) obtained by taking the

canonical universal solution of I with respect to Σstand chasing it with the target dependencies Σt.

We assume a standard chase but will not make any assumptions on the chase order. Laconicity is

now defined as before: a schema mapping is laconic if for each source instance, the canonical universal

solution coincides with the core universal solution.

Recall that, according our main result, we have (i) every schema mapping M specified by FO<s-t

tgds is logically equivalent to a laconic schema mapping M′specified by FO<s-t tgds. In particular,

this implies that, (ii) for each source instance I, the core universal solution for I with respect to M

is the canonical universal solution for I with respect to M′. For the implication from (i) to (ii) the

requirement of logical equivalence turns out to be stronger than needed: it is enough that M and

M′are CQ-equivalent, i.e., have the same core universal solution (possibly undefined) for each source

instance [2]. While CQ-equivalence and logical equivalence coincide for schema mappings specified by

FO<s-t tgds (as follows from the closure under target homomorphisms), the first is strictly weaker

than the second in the case with target dependencies [2].

Theorem 8. There is a schema mapping M specified by finitely many LAV s-t tgds and full target

tgds, for which there is no CQ-equivalent laconic schema mapping M′specified of FO<tgds, target

tgds and target egds.

Proof. (sketch) Let M be the schema mapping specified by the LAV s-t tgds

– Rx1x2→ R′x1x2

– Pix → ∃y.Qiy for i ∈ {1,2,3}.

and the full target tgds

– R′xy ∧ R′yz → R′xz

– R′xx ∧ P1y → P3y

– R′xx ∧ P2y → P3y

For source instances I in which the relations R,P1,P2,P3are non-empty, the core universal solution J

will have the following shape: J(R′) is the transitive closure of I(R), and J(Q1),J(Q2),J(Q3) are non-

empty. Moreover, if I(R) contains a cycle, then J(Q1) = {N1}, ,J(Q2) = {N2} and J(Q3) = {N1,N2}

for distinct null values N1,N2, while if I(R) is acyclic, J(Q1), J(Q2) and J(Q3) are disjoint singleton

sets of nulls.

Suppose for the sake of contradiction that there is a CQ-equivalent laconic schema mapping M′

specified by a finite set of FO<s-t tgds Σstand a finite set of target tgds and egds Σt. In particular,

for each source instance I, the canonical universal solution of I with respect to M′is the core universal

solution of I with respect to M. Let n be the maximum quantifier rank of the formulas in Σst.

Claim 1: There is a source instance I1containing a cycle, such that the canonical universal solution

J1of I1with respect to Σstcontains at least three nulls, one belonging only to Q1, one belonging only

to Q2, and one belonging only to Q3.

The proof of Claim 1 is based on the fact that acyclicity is not first-order definable on finite ordered

structures: take any two sources instances I1,I2 agreeing on all FO<-sentences of quantifier rank n

such that I1contains a cycle and I2does not. We may even assume that P1,P2,P3are non-empty in

both instances.

Page 15

Let J1and J2be the canonical universal solutions of I1and I2with respect to Σst. Then J2must

contain at least three nulls, one belonging only to Q1, one belonging only to Q2and one belonging only

to Q3. To see this, note that, first of all, J2must be a homomorphic pre-image of the core universal

solution of I2 with respect to M. Secondly, if one of the relations Qi a non-empty in J2, then the

crucial information that I2(Pi) is non-empty is lost, in the sense that J2 would be a homomorphic

pre-image of the source instance that is like I2except that the relation Piis empty, which impies that,

the result of chasing J2with Σtmust be homomorphically contained in the core universal solution of

this modified source instance with respect to M, which is different from the core universal solution of

I2).

This shows that J2must contain at least three nulls, one belonging only to Q1, one belonging only

to Q2and one belonging only to Q3. Each of these nulls must have been created by the application of

a dependency from Σst. Since I1and I2agree on all FO<-sentences of quantifier rank n, the left-hand-

side of this dependency is also satisfied in I1, and hence the same null is also created in the canonical

universal solution of I1.

Claim 2: Let J′

universal solution of I1with respect to M.

2be result of chasing J2 with Σt (assuming it exists). Then J′

2cannot be the core

The proof of Claim 2 is based on a monotonicity argument. More precisely, we use the fact that the

left-hand-side of each target depedency is a conjunctive query, and hence is preserved under homo-

morphisms. Let us assume for the sake of contradiction that J′

with respect to M, which contains exactly two null values, one in Q1∩ Q3and one in Q2∩ Q3. Let

N1,N2,N3be null values belonging only to J1(P1), only to J1(Q2) and only to J1(Q3), respectively. It

is easy to see that, during the chase with Σt, N3must have been identified with N1or N2by means

of a target egd φ. A monotonicity argument shows that the same target egd φ can be used to identify

the two null values in the core universal solution J′

1(note that the target dependencies cannot refer to

the linear order on the constants). This contradicts the fact that J′

Σt.

1is the core universal solution of I1

1is the end-result of the chase with

⊓ ⊔

We expect that similar arguments can be used to find a schema mapping M specified by a finite set

of LAV s-t tgds and target egds, such that there is no CQ-equivalent laconic schema mapping specified

by a finite set of FO<s-t tgds, target tgds and target egds.

References

1. Ronald Fagin, Phokion G. Kolaitis, Ren´ ee J. Miller, and Lucian Popa. Data exchange: semantics and query

answering. Theoretical Computer Science, 336(1):89–124, 2005.

2. Ronald Fagin, Phokion G. Kolaitis, Alan Nash, and Lucian Popa. Towards a theory of schema-mapping

optimization. In Maurizio Lenzerini and Domenico Lembo, editors, PODS, pages 33–42. ACM, 2008.

3. Ronald Fagin, Phokion G. Kolaitis, and Lucian Popa. Data exchange: getting to the core. ACM Transactions

on Database Systems, 30(1):174–210, 2005.

4. Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang-Chiew Tan. Composing schema mappings:

Second-order dependencies to the rescue. ACM Transactions on Database Systems, 30(4):994–1055, 2005.

5. Georg Gottlob and Alan Nash. Efficient core computation in data exchange. Journal of the ACM, 55(2):1–49,

2008.

6. Pavol Hell and Jaroslav Neˇ setˇ ril. The core of a graph. Discrete Mathematics, 109:117–126, 1992.

7. Maurizio Lenzerini. Data integration: A theoretical perspective. In Lucian Popa, editor, PODS, pages

233–246. ACM, 2002.

8. Rachel Pottinger and Alon Halevy. MiniCon: A scalable algorithm for answering queries using views. The

VLDB Journal, 10(2-3):182–198, 2001.

9. Balder ten Cate and Phokion G. Kolaitis. Structural characterizations of schema mapping languages. In

Proceedings of ICDT 2009, 2009.

#### View other sources

#### Hide other sources

- Available from Phokion Kolaitis · Aug 25, 2014
- Available from ArXiv