Relaxing Join and Selection Queries.

Conference Proceeding: Fast besteffort pattern matching in large attributed graphs.
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 1215, 2007; 01/2007  SourceAvailable from: Eelco Herder[show abstract] [hide abstract]
ABSTRACT: While the growing number of learning resources increases the choice for learners on how, what and when to learn, it also makes it more and more difficult to find the learning resources that best match the learners' preferences and needs. The same applies to learning systems that aim to adapt or recommend suitable courses and learning resources according to a learner's wishes and requirements. Improved representations for a learner's preferences as well as improved search capabilities that take these preferences into account leverage these issues. In this paper, we propose an approach for selecting optimal learning resources based on preferenceenabled queries. A preferenceenabled query does not only allow for hard constraints (like 'return lectures about Mathematics') but also for soft constraints (such as 'I prefer a course on Monday, but Tuesday is also fine') and therefore allow for a more finegrained representation of a learner's requirements, interests and wishes. We show how to exploit the representation of learner's wishes and interests with preferences and how to use preferences in order to find optimal learning resources. We present the personal preference search service~(PPSS), which offers significantly enhanced search capabilities for learning resources by taking the learner's detailed preferences into account.IEEE Transactions on Learning Technologies 04/2008; · 0.76 Impact Factor
Page 1
Relaxing Join and Selection Queries
Nick Koudas1
Chen Li§2
Anthony K. H. Tung3
Rares Vernica§2
1University of Toronto, Canada
2University of California, Irvine, USA
3National University of Singapore, Singapore
ABSTRACT
Database users can be frustrated by having an empty an
swer to a query. In this paper, we propose a framework to
systematically relax queries involving joins and selections.
When considering relaxing a query condition, intuitively one
seeks the ’minimal’ amount of relaxation that yields an an
swer. We first characterize the types of answers that we
return to relaxed queries. We then propose a lattice based
framework in order to aid query relaxation. Nodes in the
lattice correspond to different ways to relax queries. We
characterize the properties of relaxation at each node and
present algorithms to compute the corresponding answer.
We then discuss how to traverse this lattice in a way that
a nonempty query answer is obtained with the minimum
amount of query condition relaxation. We implemented this
framework and we present our results of a thorough perfor
mance evaluation using real and synthetic data. Our results
indicate the practical utility of our framework.
1.INTRODUCTION
Issuing complex queries against large databases is a rela
tively simple task provided one has knowledge of the suitable
query conditions and constants to use. Commonly however,
although one might have a clear idea about the parameters,
the resulting query may return an empty answer. In such
cases, users often find themselves in a position having to try
different parameters hoping to get an answer. Essentially
query formulation becomes a trialanderror process. One
has to adjust the parameters until an answer is obtained
with which one is relatively comfortable. The process of pa
rameter adjustment is not at all trivial. The more complex
a query is, in terms of predicates, the more choices one has
to conduct such an adjustment. A similar situation arises
when one is unclear about the right parameters to use, so
trying parameters in speculation seems a natural option.
§Chen Li and Rares Vernica are partially supported by NSF
CAREER Award No. IIS0238586.
Permission tocopy without feeall orpart ofthis material isgranted provided
that the copies are not made or distributed for direct commercial advantage,
theVLDBcopyrightnoticeandthetitleofthepublicationanditsdateappear,
and notice is given that copying is by permission of the Very Large Data
Base Endowment. To copy otherwise, or to republish, to post on servers
or to redistribute to lists, requires a fee and/or special permission from the
publisher, ACM.
VLDB ‘06, September 1215, 2006, Seoul, Korea.
Copyright 2006 VLDB Endowment, ACM 1595933859/06/09.
As an example, consider a recruitment company that has
a database with two tables. The table Jobs has records of
job postings, with information such as job ID (JID), category
(Category), company name (Company), zip code (Zipcode),
and annual salary (Salary).
records of applicants, with information such as candidate ID
(CID), zip code (Zipcode), expected salary (ExpSalary), and
number of years of working experience (WorkYear). Some
sample data of the relations is shown in Tables 1 and 2.
A user issues the following query:
The table Candidates has
SELECT *
FROM Jobs J, Candidates C
WHERE J.Salary <= 95
AND J.Zipcode = C.Zipcode
AND C.WorkYear >= 5;
The query seeks for jobcandidate pairs, such that the job
and the candidate are in the same area (same zip code), the
job’s annual salary is at most 95K, and the candidate has at
least 5 years of working experience. Suppose the answer to
the query turns out to be empty for the database instance.
As we can see from the tables, each record in one relation
can join with a record in the other relation. However, no
such pair of records satisfy both selection conditions. In
this situation, one way to get results is to be more flexible
about the jobs, in terms of job’s annual salary. Moreover, we
can also be more flexible about the candidates, in terms of
years of experience. By relaxing both selection conditions
on Salary and WorkYear, we can get a nonempty answer.
We can also compute an nonempty answer by relaxing the
join condition, i.e., by allowing a job and a candidate to
have similar but not necessarily identical zip codes. There
are other ways to relax the conditions as well.
From this example, two observations are in order. First,
there are different ways to relax the conditions. The number
of choices for adjusting the conditions is large (exponential
to the number of the conditions).
adjust each condition is not obvious. For instance, for a
condition Salary <= 95, we could relax it to Salary <= 100
or Salary <= 120. The former has a smaller adjustment
than the latter, but the new query may still return an empty
answer. Although the space of possible choices is very large,
it is natural to expect that a user would be interested in
the smallest amount of adjustment to the parameters in the
query in order to compute a nonempty answer. Clearly the
semantics of such adjustments have to be precisely defined.
In our running example, would a larger adjustment to the
join condition be favored over two smaller adjustments to
the two selection conditions?
Second, how much to
199
Page 2
JID
r1
r2
r3
r4
...
Category
Sales
Hardware Engineer
Software Engineer
Project Manager
...
Company
Broadcom
Intel
Microsoft
IBM
...
Zipcode
92047
93652
82632
90391
...
Salary
80
95
120
130
...
Table 1: Relation R: Jobs
CID
s1
s2
s3
s4
...
Zipcode
93652
92612
82632
90931
...
ExpSalary
120
130
100
150
...
WorkYear
3
6
5
1
...
Table 2: Relation S: Candidates
In this paper we put such questions into perspective and
formally reason about the process of adjusting the conditions
in a query that returns an empty answer, in order to obtain
nonempty query results. We refer to this process and query
relaxation. We make the following contributions:
• We formally define the semantics of the query relaxation
problem for queries involving numeric conditions in se
lection and join predicates.
• We propose a latticebased framework to aid query re
laxation while respecting relaxation semantics that aims
to identify the relaxed version of the query that provides
a nonempty answer, while being “close” to the original
query formulated by the user.
• We propose algorithms for various versions of this prob
lem that conduct query evaluation at each node of our
lattice framework aiming to minimize query response
time while obtaining an answer.
• We present the results of a thorough experimental evalu
ation, depicting the practical utility of our methodology.
This paper is organized as follows. Section 2 presents our
overall framework. Section 3 presents our algorithms for re
laxing selection conditions in equijoin queries. In Section 4
we study how to relax all conditions in a query. In Section 5
we show how to adapt our algorithms to variants of query re
laxation. Section 6 contains the results of our experimental
evaluation. Section 7 concludes the paper.
1.1 Related Work
A study related to our work presented herein is the work
of Muslea et al. [23, 24]. In these papers they discuss how to
obtain alternate forms of conjunctive expressions in a way
that answers can be obtained. Their study however deals
primarily with expressibility issues without paying attention
to the data management issues involved. Another related
piece of work is [1], where a method for automated ranking
of query results is presented.
Efficient algorithms for computing skylines have been in
the center of research attention for years. The basic idea
of skyline queries came from some old research topics like
contour problem [22], maximum vectors [20] and convex
hull [28]. Recently there are studies on efficient algorithms
for computing skylines, e.g., [4, 8, 31, 19, 25, 12, 26, 32, 27].
Unlike these works which aim to support answers of prefer
ence queries, our focus is on relaxing queries with selection
conditions and join conditions. Consequently, unlike these
studies which assume the attributes and ordering of the val
ues are already predetermined in a single table, our work
require us to compute skyline dynamically for a set of ta
bles which are to be joined and whose attribute values (i.e.,
the amount of relaxation) must also be determined on the
fly. We are not aware of any work utilizing such structures
for query relaxation, especially joinquery relaxations.
Several papers have been devoted to the problem of an
swering topk queries efficiently [21, 5, 6]. These work focus
on finding k tuples in the database that are ranked the high
est based on a scoring function. Users can assign weights
to various attributes in the database so as to express their
preference in the scoring function. Our study involves find
ing the skyline of relaxations for select and join conditions
such that each set of relaxations is guaranteed to return at
least 1 tuple in the result set. Both the selection and join
conditions must be considered for relaxation in order for this
to take place, unlike topk queries which focus on only the
selection conditions.
Some of our algorithms are related to similarity search in
multipledimensional data, such as Rtrees and multidimen
sional indexing structures [13, 29, 3], and nearest neigh
bor search and all pair nearest search [15, 30].
approaches have been proposed in the literature to relax
queries. For example, Gaasterland [11] studied how to con
trol relaxation using a set of heuristics based on seman
tic queryoptimization techniques. Kadlag et al. [16] pre
sented a queryrelaxation algorithm that, given a user’s ini
tial range query and a desired cardinality for the answer
set, produces a relaxed query that is expected to contain
the required number of answers based on multidimensional
histograms for querysize estimation. Finally, our work is
also related to the work on preference queries [18, 7, 9, 17]
Several
2.QUERYRELAXATION FRAMEWORK
In this section, we define our framework of relaxing queries
with joins and selections. For simplicity, we focus on the case
in which a query joins two relations; our results are easily
extendable to the case of multiple joins. Let R and S be two
relations. We consider join queries that are associated with
a set of selection conditions on R and S, and a set of join
conditions. Each selection condition is a range condition on
an attribute. A typical form of a range condition is “A θ v”,
where A is an attribute, v is a constant value, and θ is a
comparison operator such as =, <, >, ≤, or ≥. Examples are
Salary <= 95, WorkYear >= 5, and Age = 30.
condition is in the form of “R.A θ S.B”, where A is an
attribute of R, and B is an attribute of S.
Each join
2.1Relaxing Conditions
We focus on relaxing conditions on numeric attributes,
whose relaxations can be quantified as value differences.
Consider a query Q and a pair of records ?r,s? in the two
relations. For each selection condition
C : R.A θ v
in R, the relaxation of r with respect to this condition is:
r.A − v;
Similarly, we can define the relaxation of record s with
respect to a selection condition on S. The relaxation of the
pair with respect to a join condition J : R.A θ S.B is:
RELAX(r,C) =
0;if r satisfies C;
otherwise.
200
Page 3
RELAX(r,s,J) =
0;
r.A − s.B;
if r,s satisfy J;
otherwise.
For instance, consider the query in our running exam
ple. Let CR be the selection condition J.Salary <= 95. We
have RELAX(r1,CR) = 0, since r1 satisfies this selection
condition. In addition, RELAX(r3,R) = 25, since record r3
does not satisfy the condition. Let J be the join condition,
J.Zipcode = C.Zipcode. We have RELAX(r2,s1,J) = 0,
since the records r2 and s1 satisfy the join condition, while
RELAX(r2,s2,J) = 1040, since the records r2 and s2 do not
satisfy this join condition.
Let the set of selection conditions for R in query Q be
CQ,R, for S be CQ,S, and the set of join conditions be
CQ,J. Intuitively, every tuple r ∈ R and every tuple s ∈ S
can produce an answer with respect to the query Q for
some sufficiently large relaxation on the set of conditions
CQ,R ∪ CQ,S ∪ CQ,J. We denote this set of relaxations on
different conditions as RELAX(r,s,Q). To separate out the
relaxations for CQ,R, CQ,S, and CQ,J, we will denote the
relaxations for them as RELAX(r,CQ,R), RELAX(s,CQ,S),
and RELAX(r,s,CQ,J), respectively.
2.2Relaxation Skyline
Obviously, RELAX(r,s,Q) is different for different pairs
of ?r,s?. Given two tuple pairs ?r1,s1? and ?r2,s2?, it is
possible that the first pair is “better” than the second in
terms of their relaxations. To formulate such a relationship,
we make use of the concept of “skyline” [4] to define a partial
order among the relaxations for different tuple pairs.
Definition 1. (Dominate) We say RELAX(r1,s1,Q)
dominatesRELAX(r2,s2,Q)
RELAX(r1,s1,Q) are equal or smaller than the correspond
ing relaxations in RELAX(r2,s2,Q) for all the conditions and
smaller in at least one condition.
iftherelaxationsin
Definition 2. (Relaxation Skyline) The relaxation
skyline of a query Q on two relations R and S, denoted
by SKYLINE(R,S,Q), is the set of all the tuple pairs, ?r,s?,
r ∈ R, s ∈ S, each of which has its relaxations with respect
to Q not dominated by any other tuple pair ?r′s′?, r′∈ R,
s′∈ S.
Computing the relaxation skyline for all the conditions
of a query can ensure at least one relaxed answer being re
turned, it can sometimes return too many incomparable an
swers with large processing overhead due to the need to relax
all the conditions.1Returning many results is not useful for
users who just want a small set of answers. In addition,
depending on the semantics of the query, often a user does
not want to relax some conditions. For instance, in the job
example, the user might not want to relax the join condition.
Therefore, we also consider the case where we do relax
ations on a subset of the conditions, and compute the corre
sponding relaxation skyline with respect to these conditions.
The tuple pairs on this relaxation skyline have to satisfy the
conditions that cannot be relaxed. Interestingly, the various
combinations of the options to relax these conditions form a
lattice structure. For instance, consider the three conditions
in our running example. Fig. 1 shows the lattice structure
of these different combinations. In the lattice, for each node
1Technically a query has a final projection to return the
values for some attributes. We assume that the main com
putational cost is to compute those pairs.
n, the conditions being relaxed at its descendants, are a su
perset of those being relaxed at this node. In such a case, it
is natural that the set of tuple pairs in the relaxation sky
line corresponding to this node n is a subset of those in the
corresponding relaxation skyline for each descendant of n.
By analyzing various factors that will result in an empty an
swer, we can try to identify the highest nodes in the lattice
that can bring a user specified number of answers.
{}
RJS
RJ RS SJ
RSJ
Figure 1: Lattice structure of various combinations
of relaxations in the query in the jobs example. “R”,
“S”, and “J” stand for relaxing the selection condi
tion in jobs, the selection condition in candidates,
and the join condition, respectively.
In our framework, a user can also assign a weight to each
of the conditions in a query, and a value k. Then the system
computes the k best answers using these weights. That is,
for all the pairs in the relaxation skyline of the query, we
return the k pairs that have the k smallest weighted sum
mation of the relaxations on the conditions. In this way,
the user can specify preference towards different ways to
relax the conditions. In addition, computing the final an
swers could be more efficient. In Section 5.1 we show how
to extend our algorithms for this variant.
3.ALGORITHMSFORRELAXINGSELEC
TION CONDITIONS
We study how to compute the relaxation skyline of a query
with equijoin conditions, when we do not want to relax its
join conditions, i.e., we only relax (possibly a subset of)
its selection conditions. The motivation is that, many join
conditions are specified on identifier attributes, such as em
ployee ID, project ID, and movie ID. This case happens es
pecially when we have a join between a foreignkey attribute
and its referenced key attribute. Semantically it might not
be meaningful to relax such a join attribute.
For instance, in our running example, we are allowed to
relax the selection conditions CR (Salary <= 95) and CS
(WorkYear >= 5), but we do not relax the join condition CJ
(Jobs.Zipcode = Candidates.Zipcode). That is, each pair
of records ?r,s? of R and S in the relaxation skyline with
respect to these two selection conditions should satisfy:
• RELAX(r,s,CJ) = 0, i.e., r.Zipcode=s.Zipcode.
• This pair cannot be dominated by any other joinable
pair, i.e., there does not exist another pair ?r′,s′? of
records such that:
– r’.Zipcode=s’.Zipcode;
– RELAX(r′,CR) ≤ RELAX(r,CR);
– RELAX(s′,CS) ≤ RELAX(s,CS).
One of the two inequalities should be strict.
201
Page 4
The jobcandidate pair ?r1,s1? is not in the relaxation
skyline since its join relaxation is not 0. The pair ?r4,s4? is
not in the answer since it is dominated by the pair ?r2,s1?.
The relaxation skyline with respect to the two selection con
ditions should include two pairs ?r2,s1? and ?r3,s3?. Both
pairs respect the join condition, and neither of them is dom
inated by the other pairs. The first pair has the smallest
relaxation on condition CR, while the second has the small
est relaxation on condition CS. In this section we develop
algorithms for computing a relaxation skyline efficiently. In
Section 4 we will study the general case where we want to
relax join conditions as well.
3.1Pitfalls
Let Q be a query with selection conditions and join condi
tions on relations R and S, and the query returns an empty
answer set. To compute the relaxation skyline with respect
to the selection conditions, one might be tempted to develop
the following simple (but incorrect) algorithm. Compute the
set KR (resp. KS) of the relaxation skyline points with re
spect to the selection conditions for relation R (resp. S).
Then join the two sets KR and KS. For instance, in our
running example, this algorithm computes the relaxation
skyline of the relation Jobs with respect to the selection con
dition CR: J.Salary <= 95. The result includes the jobs r1
and r2, whose salary values satisfy the selection condition
CR. Similarly, it also computes the relaxation skyline of rela
tion Candidates with respect to the selection condition CS:
C.WorkYear >= 5, and the result has two records, s2 and s3,
which satisfy the selection condition CS. It then joins the
points on the two relaxation skylines, and returns an empty
answer.2The example shows the reason why this naive ap
proach fails. Intuitively, the algorithm relaxes the selection
conditions of each relation locally. However, our goal is to
compute the pairs of tuples that are not dominated by any
other pair of tuples with respect to both of the selection con
ditions, not just one selection condition of a relation. Trying
to compute the dominating points in each relation and then
joining them will lead to missing some points that might
form tuples that would not be dominated.
3.2 Algorithm: JoinFirst (JF)
This algorithm, called JoinFirst, starts by computing a join
of the two relations without using the selection conditions.
It then computes a skyline of these resulting tuple pairs
with respect to the relaxations on the selection conditions.
Algorithm 1 describes the pseudo code of this algorithm.
Figure 1 JoinFirst
1: Compute tuple pairs respecting the join conditions, without
considering the selection conditions;
2: Compute the skyline of these tuple pairs with respect to re
laxations on the selection conditions;
3: Return the pairs in the skyline (with necessary projection).
In our running example, the first step of the algorithm will
compute the join of two relations with respect to the join
condition J.zip=C.zip. In the second step, it computes the
jobcandidate pairs in this result that cannot be dominated
by other pairs with respect to the relaxation on the CR
2There are examples showing that, even if this approach
returns a nonempty answer set, the result is still not the
corresponding relaxation skyline.
and CS conditions. There are different ways to implement
each step in the algorithm. In the join step, we can do a
nestedloop join, a hashbased join, a sortbased join, or an
indexbased join.In the second step, we can use one of
the skylinecomputing algorithms in the literature, such as
the blocknestedloops algorithm in [4]. One advantage of
this algorithm is that it can use those existing algorithms
(e.g., a hashjoin operator inside a DBMS) as a black box
without any modification. However, the algorithm may not
be efficient if the join step returns a large number of pairs.
3.3 Algorithm: PruningJoin (PJ)
This algorithm tries to reduce the size of the results after
the join step in the JoinFirst algorithm by computing the re
laxation skyline during the join step. Algorithm 2 describes
the pseudo code of this algorithm, assuming we are doing
an indexbased join using an index structure on the join at
tributes of S. The algorithm goes through all the records in
relation R. For each one of them (say r), it uses an index
structure on the join attribute of S to find those S records
that can join with this record (say s). For each such record
s, the algorithm calls a procedure “Update” by passing the
pair ?r,s? and the current skyline. This procedure checks
if this pair is already dominated by a pair in the current
relaxation skyline K. This dominance checking is based on
Definition 1, assuming we can compute the relaxation of this
record pair for each condition in the query.3We discard this
pair if it is already dominated. Otherwise, we discard those
pairs in K that are dominated by this new pair, before in
serting this pair to K. The algorithm terminates when we
have processed all the records in R.
Figure 2 PruningJoin (Index based)
1: Relaxation skyline K = empty;
2: for each tuple r in R do
3:
I = indexscan(S, r); // joinable records in S
4:
Call Update(?r, s?, K) for each tuple s in I;
5: end for
6: return K;
7: procedure Update(element e, skyline K)
8:
if e is dominated by an element in K then
9:
discard e;
10:
else
11:
discard K’s elements dominated by e;
12:
add e to K;
13:
end if
14: end procedure
The description can be easily modified for other possi
ble physical implementations of the join. For instance, if
we want to do a hashbased join, we first bucketize both
relations. For each pair of buckets from the two relations,
we consider each pair of records from these two buckets,
and check if this tuple pair can be inserted into the current
relaxation skyline, and potentially eliminate some existing
record pairs. The algorithm terminates when all the pairs
of buckets are processed. Extensions to other types of join
methods (e.g., nestedloop or sortbased) are similar.
3Technically the dominance checking in the “Update” pro
cedure relies on a set of query conditions. For simplicity,
we assume the skyline K already includes these query con
ditions and the corresponding method to do the dominance
checking, so that this procedure can be called by other al
gorithms.
202
Page 5
One advantage of this algorithm (compared to the Join
First algorithm) is that it can reduce the number of pair
records after the join (which might be stored in memory),
since this algorithm conducts dominance checking on the fly.
One disadvantage is that it needs to modify the implemen
tations of different join methods.
3.4 Algorithm: PruningJoin+(PJ+)
The algorithm modifies the PruningJoin algorithm by com
puting a “local relaxation skyline” for a set of records in one
relation that join with a specific record in the other relation,
and doing dominance checking within this local skyline. Al
gorithm 3 describes the algorithm, and it is based on an
indexscanbased join implementation. For each record r in
R, after computing the records in S that can join with r
(stored in I in the description), the algorithm goes through
these records to compute a local relaxation skyline L with
respect to the selection conditions on S. Those locally dom
inated S records do not need to be considered in the com
putation of the global relaxation skyline. If both records
s1 and s2 of S can join with record r, and s1 dominates
s2 with respect to the selection conditions on S, then pair
?r,s1? also dominates ?r,s2? with respect to all the selection
conditions in the query. Therefore the second pair cannot be
in the global relaxation skyline. Extensions of the algorithm
to other join implementation methods are straightforward.
Figure 3 PruningJoin+(Index based)
1: Relaxation skyline K = empty;
2: for each tuple r in R do
3:
I = indexscan(S, r); // joinable records in S
4:
Local relaxation skyline L = empty;
5:
Call Update(s, L) for each tuple s in I;
6:
Call Update(?r, s?, K) for each tuple s in L;
7: end for
8: return K;
Example 3.1. Consider the following query on two rela
tions R(A, B, C) and S(C, D, E).
SELECT *
FROM R, S
WHERE R.A = 10 AND R.B = 30
AND R.C = S.C
AND S.D = 70 AND S.E = 90;
Fig. 2 shows an example. Currently there are two record
pairs (p1 and p2) in the global relaxation skyline. For the
given record r of relation R, ?13,34,55?, there are four S
records that join with record r. Among these four, record
s2 is locally dominated by record s1, since the relaxations of
s2 on the two local selection conditions are both larger than
those of record s1. The local relaxation skyline of this record
r will contain three records, s1, s3, and s4. Among the three
corresponding tuple pairs, ?r,s1? is dominated by the existing
pair p2. The two remaining pairs, ?r,s3? and ?r,s4?, will be
inserted into the global relaxation skyline.
Notice that this algorithm does the local pruning using
those local relaxation skylines, hoping that it can eliminate
some S records locally. This local pruning is not always
beneficial to performance, especially when the local pruning
does not eliminate many S records. As our experiments have
?
???
?
?
??
? ?
? ?
?
?
?
??
????
?
?
???
?
?? ? ??
?
? ??? ???
?
? ?
?
?
??
?
?
??????
?
??
?
?
?
?? ????? ?
?
? ? ? ?
?
?
?
?
??
? ? ? ? ?
?
?
??
?
??
?
?
?
???
?
?
?
???!? ? ? ?
???
?
? ?? ? ??? ? ?
?
?
?
?
? ?? ? ??? ? ?
?
?
?
?
? ???
?
??"?
?
!
?
?
? ??? ??? # ?
Figure 2: Example of algorithm PruningJoin+.
verified, whether the overhead of this local pruning is worth
the performance gains depends on several factors, such as
the number of conditions.
3.5Algorithm: SortedAccessJoin (SAJ)
This algorithm adopts the main idea in Fagin’s algorithm,
originally proposed to compute answers to topk queries [10].
As shown in Algorithm 4, the algorithm first constructs a
sorted list of tuple IDs for each selection condition in the
given query Q, based on the relaxation of each record on that
selection condition. Such a list can be obtained efficiently,
e.g., when the corresponding table has an indexing structure
such as Btree. The algorithm goes through the lists in a
roundrobin fashion. For each of them Li, it retrieves the
next tuple ID (in an ascending order) and the corresponding
tuple p. It then uses an available index structure on the
other table to find records that can join with this record p,
and stores them in I. For each such joinable tuple q, we
form a tuple pair ?p,q?. We insert this pair into the set of
candidate pairs P, if it is not in the set. The algorithm calls
a function “CheckStopCondition()” to check if we can stop
searching for tuple pairs. If so, we process all the candidate
pairs in P to compute a relaxation skyline.
Figure 4 SortedAccessJoin
1: Let C1,...,...,Cn be the selection conditions on R, and
Cn+1,...,Cn+m be the selection conditions on S;
2: Let Li(i = 1,...,n+m) be a sorted list of record IDs based
on their relaxation on the selection condition Ci (ascending
order);
3: set of candidate pairs P = empty;
4: StopSearching = false;
5: // produce a set of candidate pairs
6: while not StopSearching do
7:
Attribute j = roundrobin(1,...,n + m);
8:
Retrieve the next tuple ID k from list Lj;
9:
Retrieve the corresponding tuple p using k;
10:
I = indexscan(the other relation, p);
11:
for eachtuple q in I do
12:
if (?p,q? not in P)
13:
insert ?p,q? in P;
14:
StopSearching = CheckStopCondition();
15:
end for
16: end while
17: // compute Skyline
18: Relaxation skyline K = empty;
19: for eachtuplepair ?r,s? in P do
20:
Update(?r, s?), K);
21: end for
22: return K;
In the “CheckStopCondition()” function, we check if the
current tuple pair ?p,q? has a smaller relaxation than each
of the current records on the lists, except the current list
Li. That is, the function returns true only if for each list Lj
(j ?= i), the relaxation of this tuple pair on the condition Cj
203