Relaxing Join and Selection Queries.
- SourceAvailable from: nus.edu.sg
Conference Paper: How to ConQueR why-not questions.[Show abstract] [Hide abstract]
ABSTRACT: One useful feature that is missing from today's database systems is an explain capability that enables users to seek clarifications on unexpected query results. There are two types of unexpected query results that are of interest: the presence of unexpected tuples, and the absence of expected tuples (i.e., missing tuples). Clearly, it would be very helpful to users if they could pose follow-up why and why-not questions to seek clarifications on, respectively, unexpected and expected (but missing) tuples in query results. While the why questions can be addressed by applying established data provenance techniques, the problem of explaining the why-not questions has received very little attention. There are currently two explanation models proposed for why-not questions. The first model explains a missing tuple t in terms of modifications to the database such that t appears in the query result wrt the modified database. The second model explains by identifying the data manipulation operator in the query evaluation plan that is responsible for excluding t from the result. In this paper, we propose a new paradigm for explaining a why-not question that is based on automatically generating a refined query whose result includes both the original query's result as well as the user-specified missing tuple(s). In contrast to the existing explanation models, our approach goes beyond merely identifying the "culprit" query operator responsible for the missing tuple(s) and is useful for applications where it is not appropriate to modify the database to obtain missing tuples.Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010; 01/2010
- [Show abstract] [Hide abstract]
ABSTRACT: While the growing number of learning resources increases the choice for learners on how, what and when to learn, it also makes it more and more difficult to find the learning resources that best match the learners' preferences and needs. The same applies to learning systems that aim to adapt or recommend suitable courses and learning resources according to a learner's wishes and requirements. Improved representations for a learner's preferences as well as improved search capabilities that take these preferences into account leverage these issues. In this paper, we propose an approach for selecting optimal learning resources based on preference-enabled queries. A preference-enabled query does not only allow for hard constraints (like 'return lectures about Mathematics') but also for soft constraints (such as 'I prefer a course on Monday, but Tuesday is also fine') and therefore allow for a more fine-grained representation of a learner's requirements, interests and wishes. We show how to exploit the representation of learner's wishes and interests with preferences and how to use preferences in order to find optimal learning resources. We present the personal preference search service~(PPSS), which offers significantly enhanced search capabilities for learning resources by taking the learner's detailed preferences into account.IEEE Transactions on Learning Technologies 04/2008; · 0.76 Impact Factor
Conference Paper: Fast best-effort pattern matching in large attributed graphs.[Show abstract] [Hide abstract]
ABSTRACT: We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring. Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n nq), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007; 01/2007
Relaxing Join and Selection Queries
Anthony K. H. Tung3
1University of Toronto, Canada
2University of California, Irvine, USA
3National University of Singapore, Singapore
Database users can be frustrated by having an empty an-
swer to a query. In this paper, we propose a framework to
systematically relax queries involving joins and selections.
When considering relaxing a query condition, intuitively one
seeks the ’minimal’ amount of relaxation that yields an an-
swer. We first characterize the types of answers that we
return to relaxed queries. We then propose a lattice based
framework in order to aid query relaxation. Nodes in the
lattice correspond to different ways to relax queries. We
characterize the properties of relaxation at each node and
present algorithms to compute the corresponding answer.
We then discuss how to traverse this lattice in a way that
a non-empty query answer is obtained with the minimum
amount of query condition relaxation. We implemented this
framework and we present our results of a thorough perfor-
mance evaluation using real and synthetic data. Our results
indicate the practical utility of our framework.
Issuing complex queries against large databases is a rela-
tively simple task provided one has knowledge of the suitable
query conditions and constants to use. Commonly however,
although one might have a clear idea about the parameters,
the resulting query may return an empty answer. In such
cases, users often find themselves in a position having to try
different parameters hoping to get an answer. Essentially
query formulation becomes a trial-and-error process. One
has to adjust the parameters until an answer is obtained
with which one is relatively comfortable. The process of pa-
rameter adjustment is not at all trivial. The more complex
a query is, in terms of predicates, the more choices one has
to conduct such an adjustment. A similar situation arises
when one is unclear about the right parameters to use, so
trying parameters in speculation seems a natural option.
§Chen Li and Rares Vernica are partially supported by NSF
CAREER Award No. IIS-0238586.
Permission tocopy without feeall orpart ofthis material isgranted provided
that the copies are not made or distributed for direct commercial advantage,
and notice is given that copying is by permission of the Very Large Data
Base Endowment. To copy otherwise, or to republish, to post on servers
or to redistribute to lists, requires a fee and/or special permission from the
VLDB ‘06, September 12-15, 2006, Seoul, Korea.
Copyright 2006 VLDB Endowment, ACM 1-59593-385-9/06/09.
As an example, consider a recruitment company that has
a database with two tables. The table Jobs has records of
job postings, with information such as job ID (JID), category
(Category), company name (Company), zip code (Zipcode),
and annual salary (Salary).
records of applicants, with information such as candidate ID
(CID), zip code (Zipcode), expected salary (ExpSalary), and
number of years of working experience (WorkYear). Some
sample data of the relations is shown in Tables 1 and 2.
A user issues the following query:
The table Candidates has
FROM Jobs J, Candidates C
WHERE J.Salary <= 95
AND J.Zipcode = C.Zipcode
AND C.WorkYear >= 5;
The query seeks for job-candidate pairs, such that the job
and the candidate are in the same area (same zip code), the
job’s annual salary is at most 95K, and the candidate has at
least 5 years of working experience. Suppose the answer to
the query turns out to be empty for the database instance.
As we can see from the tables, each record in one relation
can join with a record in the other relation. However, no
such pair of records satisfy both selection conditions. In
this situation, one way to get results is to be more flexible
about the jobs, in terms of job’s annual salary. Moreover, we
can also be more flexible about the candidates, in terms of
years of experience. By relaxing both selection conditions
on Salary and WorkYear, we can get a nonempty answer.
We can also compute an nonempty answer by relaxing the
join condition, i.e., by allowing a job and a candidate to
have similar but not necessarily identical zip codes. There
are other ways to relax the conditions as well.
From this example, two observations are in order. First,
there are different ways to relax the conditions. The number
of choices for adjusting the conditions is large (exponential
to the number of the conditions).
adjust each condition is not obvious. For instance, for a
condition Salary <= 95, we could relax it to Salary <= 100
or Salary <= 120. The former has a smaller adjustment
than the latter, but the new query may still return an empty
answer. Although the space of possible choices is very large,
it is natural to expect that a user would be interested in
the smallest amount of adjustment to the parameters in the
query in order to compute a nonempty answer. Clearly the
semantics of such adjustments have to be precisely defined.
In our running example, would a larger adjustment to the
join condition be favored over two smaller adjustments to
the two selection conditions?
Second, how much to
Table 1: Relation R: Jobs
Table 2: Relation S: Candidates
In this paper we put such questions into perspective and
formally reason about the process of adjusting the conditions
in a query that returns an empty answer, in order to obtain
nonempty query results. We refer to this process and query
relaxation. We make the following contributions:
• We formally define the semantics of the query relaxation
problem for queries involving numeric conditions in se-
lection and join predicates.
• We propose a lattice-based framework to aid query re-
laxation while respecting relaxation semantics that aims
to identify the relaxed version of the query that provides
a nonempty answer, while being “close” to the original
query formulated by the user.
• We propose algorithms for various versions of this prob-
lem that conduct query evaluation at each node of our
lattice framework aiming to minimize query response
time while obtaining an answer.
• We present the results of a thorough experimental evalu-
ation, depicting the practical utility of our methodology.
This paper is organized as follows. Section 2 presents our
overall framework. Section 3 presents our algorithms for re-
laxing selection conditions in equi-join queries. In Section 4
we study how to relax all conditions in a query. In Section 5
we show how to adapt our algorithms to variants of query re-
laxation. Section 6 contains the results of our experimental
evaluation. Section 7 concludes the paper.
1.1 Related Work
A study related to our work presented herein is the work
of Muslea et al. [23, 24]. In these papers they discuss how to
obtain alternate forms of conjunctive expressions in a way
that answers can be obtained. Their study however deals
primarily with expressibility issues without paying attention
to the data management issues involved. Another related
piece of work is , where a method for automated ranking
of query results is presented.
Efficient algorithms for computing skylines have been in
the center of research attention for years. The basic idea
of skyline queries came from some old research topics like
contour problem , maximum vectors  and convex
hull . Recently there are studies on efficient algorithms
for computing skylines, e.g., [4, 8, 31, 19, 25, 12, 26, 32, 27].
Unlike these works which aim to support answers of prefer-
ence queries, our focus is on relaxing queries with selection
conditions and join conditions. Consequently, unlike these
studies which assume the attributes and ordering of the val-
ues are already pre-determined in a single table, our work
require us to compute skyline dynamically for a set of ta-
bles which are to be joined and whose attribute values (i.e.,
the amount of relaxation) must also be determined on the
fly. We are not aware of any work utilizing such structures
for query relaxation, especially join-query relaxations.
Several papers have been devoted to the problem of an-
swering top-k queries efficiently [21, 5, 6]. These work focus
on finding k tuples in the database that are ranked the high-
est based on a scoring function. Users can assign weights
to various attributes in the database so as to express their
preference in the scoring function. Our study involves find-
ing the skyline of relaxations for select and join conditions
such that each set of relaxations is guaranteed to return at
least 1 tuple in the result set. Both the selection and join
conditions must be considered for relaxation in order for this
to take place, unlike top-k queries which focus on only the
Some of our algorithms are related to similarity search in
multiple-dimensional data, such as R-trees and multidimen-
sional indexing structures [13, 29, 3], and nearest neigh-
bor search and all pair nearest search [15, 30].
approaches have been proposed in the literature to relax
queries. For example, Gaasterland  studied how to con-
trol relaxation using a set of heuristics based on seman-
tic query-optimization techniques. Kadlag et al.  pre-
sented a query-relaxation algorithm that, given a user’s ini-
tial range query and a desired cardinality for the answer
set, produces a relaxed query that is expected to contain
the required number of answers based on multi-dimensional
histograms for query-size estimation. Finally, our work is
also related to the work on preference queries [18, 7, 9, 17]
In this section, we define our framework of relaxing queries
with joins and selections. For simplicity, we focus on the case
in which a query joins two relations; our results are easily
extendable to the case of multiple joins. Let R and S be two
relations. We consider join queries that are associated with
a set of selection conditions on R and S, and a set of join
conditions. Each selection condition is a range condition on
an attribute. A typical form of a range condition is “A θ v”,
where A is an attribute, v is a constant value, and θ is a
comparison operator such as =, <, >, ≤, or ≥. Examples are
Salary <= 95, WorkYear >= 5, and Age = 30.
condition is in the form of “R.A θ S.B”, where A is an
attribute of R, and B is an attribute of S.
We focus on relaxing conditions on numeric attributes,
whose relaxations can be quantified as value differences.
Consider a query Q and a pair of records ?r,s? in the two
relations. For each selection condition
C : R.A θ v
in R, the relaxation of r with respect to this condition is:
|r.A − v|;
Similarly, we can define the relaxation of record s with
respect to a selection condition on S. The relaxation of the
pair with respect to a join condition J : R.A θ S.B is:
0;if r satisfies C;
|r.A − s.B|;
if r,s satisfy J;
For instance, consider the query in our running exam-
ple. Let CR be the selection condition J.Salary <= 95. We
have RELAX(r1,CR) = 0, since r1 satisfies this selection
condition. In addition, RELAX(r3,R) = 25, since record r3
does not satisfy the condition. Let J be the join condition,
J.Zipcode = C.Zipcode. We have RELAX(r2,s1,J) = 0,
since the records r2 and s1 satisfy the join condition, while
RELAX(r2,s2,J) = 1040, since the records r2 and s2 do not
satisfy this join condition.
Let the set of selection conditions for R in query Q be
CQ,R, for S be CQ,S, and the set of join conditions be
CQ,J. Intuitively, every tuple r ∈ R and every tuple s ∈ S
can produce an answer with respect to the query Q for
some sufficiently large relaxation on the set of conditions
CQ,R ∪ CQ,S ∪ CQ,J. We denote this set of relaxations on
different conditions as RELAX(r,s,Q). To separate out the
relaxations for CQ,R, CQ,S, and CQ,J, we will denote the
relaxations for them as RELAX(r,CQ,R), RELAX(s,CQ,S),
and RELAX(r,s,CQ,J), respectively.
Obviously, RELAX(r,s,Q) is different for different pairs
of ?r,s?. Given two tuple pairs ?r1,s1? and ?r2,s2?, it is
possible that the first pair is “better” than the second in
terms of their relaxations. To formulate such a relationship,
we make use of the concept of “skyline”  to define a partial
order among the relaxations for different tuple pairs.
Definition 1. (Dominate) We say RELAX(r1,s1,Q)
RELAX(r1,s1,Q) are equal or smaller than the correspond-
ing relaxations in RELAX(r2,s2,Q) for all the conditions and
smaller in at least one condition.
Definition 2. (Relaxation Skyline) The relaxation
skyline of a query Q on two relations R and S, denoted
by SKYLINE(R,S,Q), is the set of all the tuple pairs, ?r,s?,
r ∈ R, s ∈ S, each of which has its relaxations with respect
to Q not dominated by any other tuple pair ?r′s′?, r′∈ R,
Computing the relaxation skyline for all the conditions
of a query can ensure at least one relaxed answer being re-
turned, it can sometimes return too many incomparable an-
swers with large processing overhead due to the need to relax
all the conditions.1Returning many results is not useful for
users who just want a small set of answers. In addition,
depending on the semantics of the query, often a user does
not want to relax some conditions. For instance, in the job
example, the user might not want to relax the join condition.
Therefore, we also consider the case where we do relax-
ations on a subset of the conditions, and compute the corre-
sponding relaxation skyline with respect to these conditions.
The tuple pairs on this relaxation skyline have to satisfy the
conditions that cannot be relaxed. Interestingly, the various
combinations of the options to relax these conditions form a
lattice structure. For instance, consider the three conditions
in our running example. Fig. 1 shows the lattice structure
of these different combinations. In the lattice, for each node
1Technically a query has a final projection to return the
values for some attributes. We assume that the main com-
putational cost is to compute those pairs.
n, the conditions being relaxed at its descendants, are a su-
perset of those being relaxed at this node. In such a case, it
is natural that the set of tuple pairs in the relaxation sky-
line corresponding to this node n is a subset of those in the
corresponding relaxation skyline for each descendant of n.
By analyzing various factors that will result in an empty an-
swer, we can try to identify the highest nodes in the lattice
that can bring a user specified number of answers.
RJ RS SJ
Figure 1: Lattice structure of various combinations
of relaxations in the query in the jobs example. “R”,
“S”, and “J” stand for relaxing the selection condi-
tion in jobs, the selection condition in candidates,
and the join condition, respectively.
In our framework, a user can also assign a weight to each
of the conditions in a query, and a value k. Then the system
computes the k best answers using these weights. That is,
for all the pairs in the relaxation skyline of the query, we
return the k pairs that have the k smallest weighted sum-
mation of the relaxations on the conditions. In this way,
the user can specify preference towards different ways to
relax the conditions. In addition, computing the final an-
swers could be more efficient. In Section 5.1 we show how
to extend our algorithms for this variant.
We study how to compute the relaxation skyline of a query
with equi-join conditions, when we do not want to relax its
join conditions, i.e., we only relax (possibly a subset of)
its selection conditions. The motivation is that, many join
conditions are specified on identifier attributes, such as em-
ployee ID, project ID, and movie ID. This case happens es-
pecially when we have a join between a foreign-key attribute
and its referenced key attribute. Semantically it might not
be meaningful to relax such a join attribute.
For instance, in our running example, we are allowed to
relax the selection conditions CR (Salary <= 95) and CS
(WorkYear >= 5), but we do not relax the join condition CJ
(Jobs.Zipcode = Candidates.Zipcode). That is, each pair
of records ?r,s? of R and S in the relaxation skyline with
respect to these two selection conditions should satisfy:
• RELAX(r,s,CJ) = 0, i.e., r.Zipcode=s.Zipcode.
• This pair cannot be dominated by any other joinable
pair, i.e., there does not exist another pair ?r′,s′? of
records such that:
– RELAX(r′,CR) ≤ RELAX(r,CR);
– RELAX(s′,CS) ≤ RELAX(s,CS).
One of the two inequalities should be strict.
The job-candidate pair ?r1,s1? is not in the relaxation
skyline since its join relaxation is not 0. The pair ?r4,s4? is
not in the answer since it is dominated by the pair ?r2,s1?.
The relaxation skyline with respect to the two selection con-
ditions should include two pairs ?r2,s1? and ?r3,s3?. Both
pairs respect the join condition, and neither of them is dom-
inated by the other pairs. The first pair has the smallest
relaxation on condition CR, while the second has the small-
est relaxation on condition CS. In this section we develop
algorithms for computing a relaxation skyline efficiently. In
Section 4 we will study the general case where we want to
relax join conditions as well.
Let Q be a query with selection conditions and join condi-
tions on relations R and S, and the query returns an empty
answer set. To compute the relaxation skyline with respect
to the selection conditions, one might be tempted to develop
the following simple (but incorrect) algorithm. Compute the
set KR (resp. KS) of the relaxation skyline points with re-
spect to the selection conditions for relation R (resp. S).
Then join the two sets KR and KS. For instance, in our
running example, this algorithm computes the relaxation
skyline of the relation Jobs with respect to the selection con-
dition CR: J.Salary <= 95. The result includes the jobs r1
and r2, whose salary values satisfy the selection condition
CR. Similarly, it also computes the relaxation skyline of rela-
tion Candidates with respect to the selection condition CS:
C.WorkYear >= 5, and the result has two records, s2 and s3,
which satisfy the selection condition CS. It then joins the
points on the two relaxation skylines, and returns an empty
answer.2The example shows the reason why this naive ap-
proach fails. Intuitively, the algorithm relaxes the selection
conditions of each relation locally. However, our goal is to
compute the pairs of tuples that are not dominated by any
other pair of tuples with respect to both of the selection con-
ditions, not just one selection condition of a relation. Trying
to compute the dominating points in each relation and then
joining them will lead to missing some points that might
form tuples that would not be dominated.
3.2 Algorithm: JoinFirst (JF)
This algorithm, called JoinFirst, starts by computing a join
of the two relations without using the selection conditions.
It then computes a skyline of these resulting tuple pairs
with respect to the relaxations on the selection conditions.
Algorithm 1 describes the pseudo code of this algorithm.
Figure 1 JoinFirst
1: Compute tuple pairs respecting the join conditions, without
considering the selection conditions;
2: Compute the skyline of these tuple pairs with respect to re-
laxations on the selection conditions;
3: Return the pairs in the skyline (with necessary projection).
In our running example, the first step of the algorithm will
compute the join of two relations with respect to the join
condition J.zip=C.zip. In the second step, it computes the
job-candidate pairs in this result that cannot be dominated
by other pairs with respect to the relaxation on the CR
2There are examples showing that, even if this approach
returns a nonempty answer set, the result is still not the
corresponding relaxation skyline.
and CS conditions. There are different ways to implement
each step in the algorithm. In the join step, we can do a
nested-loop join, a hash-based join, a sort-based join, or an
index-based join.In the second step, we can use one of
the skyline-computing algorithms in the literature, such as
the block-nested-loops algorithm in . One advantage of
this algorithm is that it can use those existing algorithms
(e.g., a hash-join operator inside a DBMS) as a black box
without any modification. However, the algorithm may not
be efficient if the join step returns a large number of pairs.
3.3 Algorithm: PruningJoin (PJ)
This algorithm tries to reduce the size of the results after
the join step in the JoinFirst algorithm by computing the re-
laxation skyline during the join step. Algorithm 2 describes
the pseudo code of this algorithm, assuming we are doing
an index-based join using an index structure on the join at-
tributes of S. The algorithm goes through all the records in
relation R. For each one of them (say r), it uses an index
structure on the join attribute of S to find those S records
that can join with this record (say s). For each such record
s, the algorithm calls a procedure “Update” by passing the
pair ?r,s? and the current skyline. This procedure checks
if this pair is already dominated by a pair in the current
relaxation skyline K. This dominance checking is based on
Definition 1, assuming we can compute the relaxation of this
record pair for each condition in the query.3We discard this
pair if it is already dominated. Otherwise, we discard those
pairs in K that are dominated by this new pair, before in-
serting this pair to K. The algorithm terminates when we
have processed all the records in R.
Figure 2 PruningJoin (Index based)
1: Relaxation skyline K = empty;
2: for each tuple r in R do
I = index-scan(S, r); // joinable records in S
Call Update(?r, s?, K) for each tuple s in I;
5: end for
6: return K;
7: procedure Update(element e, skyline K)
if e is dominated by an element in K then
discard K’s elements dominated by e;
add e to K;
14: end procedure
The description can be easily modified for other possi-
ble physical implementations of the join. For instance, if
we want to do a hash-based join, we first bucketize both
relations. For each pair of buckets from the two relations,
we consider each pair of records from these two buckets,
and check if this tuple pair can be inserted into the current
relaxation skyline, and potentially eliminate some existing
record pairs. The algorithm terminates when all the pairs
of buckets are processed. Extensions to other types of join
methods (e.g., nested-loop or sort-based) are similar.
3Technically the dominance checking in the “Update” pro-
cedure relies on a set of query conditions. For simplicity,
we assume the skyline K already includes these query con-
ditions and the corresponding method to do the dominance
checking, so that this procedure can be called by other al-
One advantage of this algorithm (compared to the Join-
First algorithm) is that it can reduce the number of pair
records after the join (which might be stored in memory),
since this algorithm conducts dominance checking on the fly.
One disadvantage is that it needs to modify the implemen-
tations of different join methods.
3.4 Algorithm: PruningJoin+(PJ+)
The algorithm modifies the PruningJoin algorithm by com-
puting a “local relaxation skyline” for a set of records in one
relation that join with a specific record in the other relation,
and doing dominance checking within this local skyline. Al-
gorithm 3 describes the algorithm, and it is based on an
index-scan-based join implementation. For each record r in
R, after computing the records in S that can join with r
(stored in I in the description), the algorithm goes through
these records to compute a local relaxation skyline L with
respect to the selection conditions on S. Those locally dom-
inated S records do not need to be considered in the com-
putation of the global relaxation skyline. If both records
s1 and s2 of S can join with record r, and s1 dominates
s2 with respect to the selection conditions on S, then pair
?r,s1? also dominates ?r,s2? with respect to all the selection
conditions in the query. Therefore the second pair cannot be
in the global relaxation skyline. Extensions of the algorithm
to other join implementation methods are straightforward.
Figure 3 PruningJoin+(Index based)
1: Relaxation skyline K = empty;
2: for each tuple r in R do
I = index-scan(S, r); // joinable records in S
Local relaxation skyline L = empty;
Call Update(s, L) for each tuple s in I;
Call Update(?r, s?, K) for each tuple s in L;
7: end for
8: return K;
Example 3.1. Consider the following query on two rela-
tions R(A, B, C) and S(C, D, E).
FROM R, S
WHERE R.A = 10 AND R.B = 30
AND R.C = S.C
AND S.D = 70 AND S.E = 90;
Fig. 2 shows an example. Currently there are two record
pairs (p1 and p2) in the global relaxation skyline. For the
given record r of relation R, ?13,34,55?, there are four S
records that join with record r. Among these four, record
s2 is locally dominated by record s1, since the relaxations of
s2 on the two local selection conditions are both larger than
those of record s1. The local relaxation skyline of this record
r will contain three records, s1, s3, and s4. Among the three
corresponding tuple pairs, ?r,s1? is dominated by the existing
pair p2. The two remaining pairs, ?r,s3? and ?r,s4?, will be
inserted into the global relaxation skyline.
Notice that this algorithm does the local pruning using
those local relaxation skylines, hoping that it can eliminate
some S records locally. This local pruning is not always
beneficial to performance, especially when the local pruning
does not eliminate many S records. As our experiments have
?? ? ??
? ??? ???
?? ????? ?
? ? ? ?
? ? ? ? ?
???!? ? ? ?
? ?? ? ??? ? ?
? ?? ? ??? ? ?
? ??? ??? # ?
Figure 2: Example of algorithm PruningJoin+.
verified, whether the overhead of this local pruning is worth
the performance gains depends on several factors, such as
the number of conditions.
3.5Algorithm: SortedAccessJoin (SAJ)
This algorithm adopts the main idea in Fagin’s algorithm,
originally proposed to compute answers to top-k queries .
As shown in Algorithm 4, the algorithm first constructs a
sorted list of tuple IDs for each selection condition in the
given query Q, based on the relaxation of each record on that
selection condition. Such a list can be obtained efficiently,
e.g., when the corresponding table has an indexing structure
such as B-tree. The algorithm goes through the lists in a
round-robin fashion. For each of them Li, it retrieves the
next tuple ID (in an ascending order) and the corresponding
tuple p. It then uses an available index structure on the
other table to find records that can join with this record p,
and stores them in I. For each such joinable tuple q, we
form a tuple pair ?p,q?. We insert this pair into the set of
candidate pairs P, if it is not in the set. The algorithm calls
a function “CheckStopCondition()” to check if we can stop
searching for tuple pairs. If so, we process all the candidate
pairs in P to compute a relaxation skyline.
Figure 4 SortedAccessJoin
1: Let C1,...,...,Cn be the selection conditions on R, and
Cn+1,...,Cn+m be the selection conditions on S;
2: Let Li(i = 1,...,n+m) be a sorted list of record IDs based
on their relaxation on the selection condition Ci (ascending
3: set of candidate pairs P = empty;
4: StopSearching = false;
5: // produce a set of candidate pairs
6: while not StopSearching do
Attribute j = round-robin(1,...,n + m);
Retrieve the next tuple ID k from list Lj;
Retrieve the corresponding tuple p using k;
I = index-scan(the other relation, p);
for each-tuple q in I do
if (?p,q? not in P)
insert ?p,q? in P;
StopSearching = CheckStopCondition();
16: end while
17: // compute Skyline
18: Relaxation skyline K = empty;
19: for each-tuple-pair ?r,s? in P do
Update(?r, s?), K);
21: end for
22: return K;
In the “CheckStopCondition()” function, we check if the
current tuple pair ?p,q? has a smaller relaxation than each
of the current records on the lists, except the current list
Li. That is, the function returns true only if for each list Lj
(j ?= i), the relaxation of this tuple pair on the condition Cj