Content uploaded by Guido Moerkotte

Author content

All content in this area was uploaded by Guido Moerkotte on Apr 10, 2017

Content may be subject to copyright.

Dynamic Programming: The Next Step

Marius Eich 1, Guido Moerkotte 2

University of Mannheim

Mannheim, Germany

1marius.eich@uni-mannheim.de

2moerkotte@uni-mannheim.de

Abstract—We ﬁll two gaps in the literature. First, we give a

comprehensive set of equivalences allowing reordering of group-

ing with non-inner joins. Second, we show how to incorporate

the optimal placement of grouping into a state-of-the-art dynamic

programming (DP)-based plan generator.

I. INTRODUCTION

In a recent paper, Shanbhag and Sudarshan pointed out that

the biggest disadvantage of DP-based plan generators (PG)

(like [1]) is that they are not capable of reordering grouping

and joins [2]. Here, we show how to overcome this deﬁcit.

Further, we ﬁll another gap in the literature: the lack of

equivalences useful to push grouping down non-inner joins.

Consider the following query against the TPC-H schema:

select ns.n_name, nc.n_name, count(*)

from (nation ns inner join supplier s on

(ns.n_nationkey = s.s_nationkey))

full outer join

(nation nc inner join customer c on

(nc.n_nationkey = c.c_nationkey))

on (ns.n_nationkey = nc.n_nationkey)

group by ns.n_name, nc.n_name

The main point here is that since only reorderings between

grouping and inner joins are known [3], [4], [5], [6], [7],

[8], the outer join constitutes a barrier to any reordering with

grouping. This has severe consequences. On HyPer [9], the

execution time is 2140 ms. The plan produced by the DP-

based PG presented in this paper reduces it to 1.51 ms. (Similar

results were obtained for two major commercial systems: for

system 1 we got 8480 ms vs. 47 ms and for system 2 64900 ms

vs. 210 ms.) Since in general, reordering grouping and outer

joins is not a correct rewrite, we eliminate the barrier by

generalizing the deﬁnition of outer joins (Sec. III).

After this preliminary step, we show how to integrate

these equivalences into a modern DP-based PG. Since free

placement of grouping extends the search space substantially,

we also present one optimality-preserving pruning technique

and two heuristics.

The rest of the paper is organized as follows. The next

section presents some preliminaries. Then, we discuss the new

equivalences. Sec. IV starts by introducing the basic DP-based

PG. Then, we show how to extend it such that the search space

enlarged by the new equivalences is systematically explored.

Since the enlarged search space may become quite large, we

discuss two heuristics and an optimality-preserving pruning

technique to make the approach feasible. Sec. V contains

experimental results. We ﬁrst evaluate the gain of our approach,

then its costs. Sec. VI summarizes the achievements and

proposes directions for future research.

II. PRELIMINARIES

A. Aggregate Functions and Their Properties

Aggregate functions are applied to a group of tuples to

aggregate their values in one common attribute to a single

value. Some standard aggregate functions supported by SQL

are sum,count,min,max and avg. Additionally, it is possible

to specify how duplicates are treated by these functions using

the distinct keyword as in sum(distinct), count(distinct) and so

on. Since several aggregate functions are allowed in the select

clause of a SQL query, we deal with vectors Fof aggregate

functions. If F1and F2are two vectors of aggregate functions,

we denote by F1◦F2their concatenation.

As usual, the set of attributes provided by some expression

e(e.g. a base relation) is denoted by A(e)and the set of

attributes referenced by some expression e(e.g. a predicate)

is denoted by F(e).

The following deﬁnitions of properties of aggregate func-

tions will be illustrated by some examples in the next section.

There, it also becomes clear why they are needed.

1) Splittability: The following deﬁnition captures the intu-

ition that we can split a vector of aggregate functions into two

parts if each aggregate function accesses only attributes from

one of two given alternative expressions.

Deﬁnition 1: An aggregation vector Fis splittable into F1

and F2with respect to arbitrary expressions e1and e2if F=

F1◦F2,F(F1)∩ A(e2) = ∅and F(F2)∩ A(e1) = ∅.

In this case, we can evaluate F1on e1and F2on e2. A special

case S1 occurs for count(*), which accesses no attributes

and can thus be added to both, F1and F2.

2) Decomposability: One property of aggregate functions

that is of particular interest for the considerations in this paper

is decomposability [10]:

Deﬁnition 2: An aggregate function agg is decomposable

if there exist aggregate functions agg1and agg2such that

agg(Z) = agg2(agg1(X), agg1(Y)), for bags of values X,Y

and Zwhere Z=X∪Y.

In other words, if agg is decomposable, agg(Z)can be

computed independently on arbitrary subbags of Zand the

978-1-4799-7964-6/15/$31.00 © 2015 IEEE ICDE Conference 2015903

partial results can be aggregated to yield the correct total result.

For some aggregate functions, decomposability can be seen

easily:

min(X∪Y) = min(min(X), min(Y))

max(X∪Y) = max(max(X), max(Y))

count(X∪Y) = sum(count(X), count(Y))

sum(X∪Y) = sum(sum(X), sum(Y))

In contrast to their duplicate preserving counterparts,

sum(distinct) and count(distinct) are not decomposable.

The treatment of avg is only slightly more complicated.

If there are no null values present, SQL’s avg is equivalent to

avg(X) = sum(X)/count(Y). Since both sum and count are

decomposable, we can decompose avg as follows:

avg(X∪Y) = sum(sum(X),sum(Y))/(count(X)+count(Y)).

If there exist null values, we need a slightly modiﬁed

version of count that only counts tuples where the aggregated

attribute is not null. We denote this by countNN . We can then

use this to decompose avg as follows:

avg(X∪Y) = sum(sum(X),sum(Y))

countNN (X) + countN N (Y).

For special case S1, the two counts have to be multiplied.

3) Duplicate Sensitive and Agnostic: We have already seen

that duplicates play a central role in correct aggregate process-

ing. Thus, we deﬁne the following. An aggregate function f

is called duplicate agnostic if its result does not depend on

whether there are duplicates in its argument or not. Otherwise,

it is called duplicate sensitive. Yan and Larson use the terms

Class C for duplicate sensitive functions and Class D for

duplicate agnostic functions [4].

For SQL aggregate functions, we have that

•min, max, sum(distinct), count(distinct), avg(distinct)

are duplicate agnostic and

•sum, count, avg are duplicate sensitive.

If we want to decompose an aggregate function that is du-

plicate sensitive, some care has to be taken. We encapsulate

this by an operator prime (0) as follows. Let F= (b1:

agg1(a1), . . . , bm:aggm(am)) be an aggregation vector.

Further, let cbe some other attribute. In the context of this

work, cwill be an attribute holding the result of some count(∗).

Then, we deﬁne F⊗cas

F⊗c:= (b1:agg0

1(e1), . . . , bm:agg0

m(em))

with

agg0

i(ei) = (aggi(ei)if aggiis duplicate agnostic,

aggi(ei∗c)if aggiis sum,

sum(c)if aggi(ei) = count(∗),

and if aggi(ei)is count(ei), then agg0

i(ei) := sum(ei=

NULL ? 0 : c).

B. Algebraic Operators

One operator that plays an important role in the following

sections is the grouping operator, which we denote by Γ. The

grouping operator can be deﬁned as

ΓθG;g:f(e) := {y◦[g:x]|y∈ΠD

G(e),

x=f({z|z∈e, z.G θ y.G})}

for some set of grouping attributes G, a single attribute g,

an aggregate function f, and a comparison operator θ∈ {=

,6=,≤,≥, <, >}. We denote by ΠD

A(e)the duplicate-removing

projection onto the set of attributes A, applied to the expres-

sion e. The resulting relation only contains values for those

attributes that are contained in Aand no duplicate values.

The function fis then applied to groups of tuples taken from

this relation. The groups are determined by the comparison

operator θ. Afterwards, a new tuple consisting of the grouping

attribute’s values and an attribute gholding the corresponding

value calculated by the aggregate function fis constructed.

The grouping operator can also introduce more than one

new attribute by applying several aggregate functions. We

deﬁne

ΓθG;b1:f1,...,bk:fk(e) := {y◦[b1:x1, . . . , bk:xk]|y∈ΠG(e),

xi=fi({z|z∈e, z.G θ y.G})},

where the attribute values b1. . . bkare created by applying the

aggregation vector F= (f1, . . . , fk), consisting of kaggregate

functions, to the tuples grouped according to θ. The grouping

criterion may also be deﬁned on several attributes. If all θ

equal ’=’, we abbreviate Γ=G;g:fby ΓG;g:f.

The map operator (χ) extends every input tuple by new

attributes:

χa1:e1,...,an:en(e) := {t◦[a1:e1(t), . . . , an:en(t)]|t∈e}

As usual, selection is deﬁned as

σp(e) := {x|x∈e, p(x)}.

The join operators we consider are the (inner) join (B),

left semijoin N, left antijoin (T), left outerjoin (E), full

outerjoin (K), and groupjoin (Z). The deﬁnitions of these

join operators are given in Figure 1. There, ◦denotes tuple

concatenation. Most of these operators are rather standard.

However, both the left and the full outerjoin are generalized

such that for tuples not ﬁnding a join partner, default values

can be provided instead of null padding. More speciﬁcally, let

Di=di

1:ci

1, . . . , di

k:ci

k(i= 1,2) be two vectors assigning

constants cjto attributes di

j. The deﬁnitions of the left and

full outerjoin with defaults are given in 7 and 8, respectively.

Fig. 2 provides examples.

The last row deﬁnes the left groupjoin Z, introduced by

von B¨

ultzingsloewen [11]. First, for a given tuple t1∈e1, it

determines the sets of all join partners for t1in e2using the

join predicate p. Then, it applies the aggregate function fto

these tuples and extends t1by a new attribute gcontaining the

result of this aggregation. Figure 2 gives an example.

904

e1Ae2:= {r◦s|r∈e1, s ∈e2}(1)

e1Bpe2:= {r◦s|r∈e1, s ∈e2, p(r, s)}(2)

e1Npe2:= {r|r∈e1,∃s∈e2, p(r, s)}(3)

e1Tpe2:= {r|r∈e1,@s∈e2, p(r, s)}(4)

e1Epe2:= (e1Bpe2)∪((e1Tpe2)A{⊥A(e2)}(5)

e1Kpe2:= (e1Bpe2)

∪((e1Tpe2)A{⊥A(e2)}

∪({⊥A(e1)}A(e2Tpe1)) (6)

e1ED2

p:= (e1Bpe2)∪

((e1Tpe2)A{⊥A(e2)\A(D2)◦[D2]}(7)

e1KD1;D2

pe2:= (e1Bpe2)∪

((e1Tpe2)A{⊥A(e2)\A(D2)◦[D2]} ∪

({⊥A(e1)\A(D1)◦[D1]}A(e2Tpe1)) (8)

e1Zp;g:fe2:= {r◦[g:G]|r∈e1,

G=f({s|s∈e2, p(r, s)})}(9)

Fig. 1. Join operators

e1

a b c

001

101

213

323

e2

d e f

0 0 1

1 1 1

2 2 1

3 4 2

e1Be1.b=e2.d e2

a b c d e f

0 0 1 0 0 1

1 0 1 0 0 1

2 1 3 1 1 1

3 2 3 2 2 1

e1Te1.a=e2.e e2

a b c

3 2 3

e1Ne1.b=e2.d e2

a b c

0 0 1

1 0 1

2 1 3

3 2 3

e1Ee:7

e1.a=e2.e e2

a b c d e f

001001

101111

213221

3 2 3 - 7 -

e1Kb:7;e:7

e1.a=e2.e e2

a b c d e f

0 0 1 0 0 1

1 0 1 1 1 1

2 1 3 2 2 1

3 2 3 - 7 -

- 7 - 3 4 2

e1Ze1.a=e2.f;g:sum(e2.f)e2

a b c g

1 0 1 3

2 1 3 2

Fig. 2. Examples of different join operators

C. Keys

We use information about keys to avoid unnecessary group-

ing operators. Thus, if the key information is incomplete,

unnecessary grouping operators are introduced. However, it

is important to note that this does not affect the correctness of

plans, but only their efﬁciency.

The keys for base relations are speciﬁed in the database

schema and therefore given. By using them and the join

operators and predicates contained in a given plan it is possible

to compute the keys for all intermediate results.

In the following paragraphs, we denote by κ(e)the set of

keys for a relation resulting from an expression e. Note that

a single key is a set of attributes. Therefore, κis a set of

sets. Each of the following paragraphs covers one of the join

operators speciﬁed in Sec. II-B.

1) Inner Join: We have to distinguish three cases:

•In case A1is a key of e1and A2is a key of e2, we

have

κ(e1BA1=A2e2) = κ(e1)∪κ(e2).

That is, each key from one of the input expressions is

again a key for the join result.

•In case A1is a key, but A2is not, we have

κ(e1BA1=A2e2) = κ(e2).

The reverse case is handled analogously.

•Without any assumption on the Aior the join predi-

cate, we have

κ(e1Bqe2) = [

k1∈κ(e1),k2∈κ(e2)

k1∪k2.

In other words, every pair of keys from e1and e2

forms a key for the join result.

2) Left Outerjoin: Here, we have only two possible cases.

If A2is a key of e2, then

κ(e1EA1=A2e2) = κ(e1).

Otherwise, we have to combine two arbitrary keys from e1and

e2to form a key:

κ(e1Eqe2) = [

k1∈κ(e1),k2∈κ(e2)

k1∪k2,

where qis an arbitrary predicate.

3) Full Outerjoin: Regardless of the join predicate, we

have to combine two arbitrary keys from e1and e2to form a

key for the join expression:

κ(e1Kqe2) = [

k1∈κ(e1),k2∈κ(e2)

k1∪k2,

where qis an arbitrary join predicate.

4) Left Semijoin/Left Antijoin/Left Groupjoin: Since the

attributes from the right input are no longer present in the

join result and the result is duplicate-free by deﬁnition, we

always have

κ(e1◦e2) = κ(e1)

for ◦∈{N,T,Z}.

Using these basic rules, the keys for every subtree of an

operator tree can be computed bottom-up. Note that the keys

resulting from the full and left outerjoin contain null values.

We therefore assume that null values are treated as suggested

in [12], i.e., two attributes are equal if they agree in value or

they are both null.

905

III. EQUIVALENCES

This section is organized into two parts. The ﬁrst part

shows how to push down/pull up a grouping operator, the sec-

ond part shows how to eliminate an unnecessary top grouping

operator.

A. Pushing Group-By

Since the work by Yan and Larson [5], [6], [8], [7], [4]

is the most general one, we take it as the basis for our work.

Figure 3 shows all known and new equivalences. The nine

equivalences already known from Yan and Larson’s work can

be recognized by the inner join on their left-hand side. The

different section headings within the ﬁgures are those proposed

by Yan and Larson (except for Others). A special case of

Eqv. 20 occured in [13]. The proofs of all equivalences are

provided in our technical report [14].

Within the equivalences, a couple of simple abbreviations

as well as some conventions occur. We give them in this

short paragraph and illustrate them by means of two examples

afterwards. By Gwe denote the set of grouping attributes, by

Fa vector of aggregation functions, and by qa join predicate.

The grouping attributes coming from expression eiare denoted

by Gi, i.e., Gi=A(ei)∩G. The join attributes from expression

eiare denoted by Ji, i.e., Ji=A(ei)∩ F(q). The union

of the grouping and join attributes from eiare denoted by

G+

i=Gi∪Ji. If F1and/or F2occur in some equivalence, then

the equivalence assumes that Fis splittable into F1and F2. If

F1or F2do not occur in some equivalence, they are assumed

to be empty. If for some i∈ {1,2},F1

iand F2

ioccur in some

equivalence, the equivalence requires that Fiis decomposable

into F1

iand F2

i. Last but not least, ⊥abbreviates a special

tuple that returns the NULL value for every attribute.

1) Example 1: Join: Fig. 4 shows two relations e1and e2,

which will be used to illustrate Eqv. 10 as well as Eqv. 12.

Let us start with Eqv. 10. In order to do so, we only look at

the top equivalences above each relation and ignore the tuples

below the separating horizontal line. Relations e1and e2at the

top of Fig. 4 serve as input. The calculation of the result of

the left-hand side of Eqv. 10 is rather straightforward. Relation

e3gives the result of the join e1Bj1=j2e2. The result is then

grouped by Γg1,g2;F(e3)for the aggregation vector F=k:

count(∗), b1:sum(a1), b2:sum(a2). The result is given as

e4. For our join example, it consists simply of a single tuple.

We have intentionally chosen an example with a single group,

since multiple groups make the example longer but do not give

more insights.

Before we start the calculation of the right-hand side of

Eqv. 10, we take apart the grouping attributes and the aggre-

gation vector F. Among the grouping attributes G={g1, g2}

only g1occurs in e1. The only join attribute in the join

predicate j1=j2from e1is j1. Thus, G+

1={g1, j1}. The

aggregation vector Fcan be split into F1, which references

only attributes in e1, and F2, which references only parts

in e2. This gives us F1=k:count(∗), b1:sum(a1),

where it does not matter whether we add kto F1or F2,

since it does not reference any attributes. Next, we need to

decompose F1into F1

1and F2

1by applying the insights of

Sec. II-A. This gives us F1

1=k0:count(∗), b0

1:sum(a1)and

e1

g1j1a1

1 1 2

1 2 4

1 2 8

1 3 7

e2

g2j2a2

1 1 2

1 1 4

1 2 8

1 4 9

e3:= e11j1=j2e2

e0

3:= e1Kj1=j2e2

g1j1a1g2j2a2

1 1 2 1 1 2

1 1 2 1 1 4

1 2 4 1 2 8

1 2 8 1 2 8

1 3 7 - - -

- - - 1 4 9

e4:= Γg1,g2;F(e3)

e0

4:= Γg1,g2;F(e0

3)

g1g2k b1b2

1 1 4 16 22

1 - 1 7 -

- 1 1 - 9

e5:= Γg1,j1;FX(e1)

g1j1k0/c1b0

1

1 1 1 2

1 2 2 12

1 3 1 7

e7:= Γg1,g2;FY(e6)

e0

7:= Γg1,g2;FY(e0

6)

g1g2k b1b2

1 1 4 16 22

1 - 1 7 -

- 1 1 - 9

e6:= e51j1=j2e2

e0

6:= e5KF1

1({⊥}),c1:1;−

j1=j2e2

g1j1k0/c1b0

1g2j2a2

1 1 1 2 1 1 2

1 1 1 2 1 1 4

1 2 2 12 1 2 8

1 3 1 7 - - -

- - 1∗- 1 4 9

where

F=k:count(∗), b1:sum(a1), b2:sum(a2)

F1=k:count(∗), b1:sum(a1)

F2=b2:sum(a2)

F1

1=k0:count(∗), b0

1:sum(a1)

F2

1=k:sum(k0), b1:sum(b0

1)

FX=F1

1◦(c1:count(∗))

FY= (F2⊗c1)◦F2

1

=b2:sum(c1∗a2), k :sum(k0), b1:sum(b0

1)

and G={g1, g2},G+

1={g1, j1}.

Fig. 4. Example for Eqvs. 10 and 12

F2

1=k:sum(k0), b1:sum(b0

1). The inner grouping operator

of Eqv. 10 requires us to add an attribute c1:count(∗)to

F1

1, which we abbreviate by FX. Since there already exists

one count(∗)the result of which is stored in k0, we keep

only one of them in Fig. 4 and call it k0/c1. This ﬁnishes

our preprocessing on the aggregation functions of the inner

grouping operator. Its result is given as relation e5in Fig. 4.

It consists of two tuples. The next step consists of calculating

the join e5Bj1=j2e2. As this is rather straightforward, we just

give the result (relation e6). The ﬁnal step is again a little more

complex. Eqv. 10 requires us to calculate F2⊗c1. Looking back

at the end of Sec. II-A, we see that sum is duplicate sensitive

and that F2⊗c1=b2:sum(c1∗a2). Concatenating this

aggregation vector with F2

1, as demanded by Eqv. 10, gives

us FYas speciﬁed in Fig. 4. The ﬁnal result of the left-hand

side of Eqv. 10, calculated as e7= Γg1,g2;FY(e6), is given in

Fig. 4. Note that this is the same as the result of the right-hand

side (e4).

2) Example 2: Full Outerjoin: The second example reuses

the relations e1and e2given in Fig. 4. But this time, we

calculate the full outerjoin instead of the inner join, and we

906

Eager/Lazy Groupby-Count

ΓG;F(e1Bqe2)≡ΓG;(F2⊗c1)◦F2

1(ΓG+

1;F1

1◦(c1:count(∗))(e1)Bqe2)(10)

ΓG;F(e1Eqe2)≡ΓG;(F2⊗c1)◦F2

1(ΓG+

1;F1

1◦(c1:count(∗))(e1)Eqe2)(11)

ΓG;F(e1Kqe2)≡ΓG;(F2⊗c1)◦F2

1(ΓG+

1;F1

1◦(c1:count(∗))(e1)KF1

1({⊥}),c1:1;−

qe2)(12)

ΓG;F(e1Bqe2)≡ΓG;(F1⊗c2)◦F2

2(e1BqΓG+

2;F1

2◦(c2:count(∗))(e2)) (13)

ΓG;F(e1Eqe2)≡ΓG;(F1⊗c2)◦F2

2(e1EF1

2({⊥}),c2:1

qΓG+

2;F1

2◦(c2:count(∗))(e2)) (14)

ΓG;F(e1Kqe2)≡ΓG;(F1⊗c2)◦F2

2(e1K−;F1

2({⊥}),c2:1

qΓG+

2;F1

2◦(c2:count(∗))(e2)) (15)

Eager/Lazy Group-by

ΓG;F(e1Bqe2)≡ΓG;F2

1(ΓG+

1;F1

1(e1)Bqe2)(16)

ΓG;F(e1Eqe2)≡ΓG;F2

1(ΓG+

1;F1

1(e1)Eqe2)(17)

ΓG;F(e1Kqe2)≡ΓG;F2

1(ΓG+

1;F1

1(e1)KF1

1({⊥});−

qe2)(18)

ΓG;F(e1Bqe2)≡ΓG;F2

2(e1BqΓG+

2;F1

2(e2)) (19)

ΓG;F(e1Eqe2)≡ΓG;F2

2(e1EF1

2({⊥})

qΓG+

2;F1

2(e2)) (20)

ΓG;F(e1Kqe2)≡ΓG;F2

2(e1K−;F1

2({⊥})

qΓG+

2;F1

2(e2)) (21)

Eager/Lazy Count

ΓG;F(e1Bqe2)≡ΓG;(F2⊗c1)(ΓG+

1;c1:count(∗)(e1)Bqe2)(22)

ΓG;F(e1Eqe2)≡ΓG;(F2⊗c1)(ΓG+

1;(c1:count(∗))(e1)Eqe2)(23)

ΓG;F(e1Kqe2)≡ΓG;(F2⊗c1)(ΓG+

1;(c1:count(∗))(e1)Kc1:1;−

qe2)(24)

ΓG;F(e1Bqe2)≡ΓG;(F1⊗c2)(e1BqΓG+

2;c2:count(∗)(e2)) (25)

ΓG;F(e1Eqe2)≡ΓG;(F1⊗c2)(e1Ec2:1

qΓG+

2;c2:count(∗)(e2)) (26)

ΓG;F(e1Kqe2)≡ΓG;(F1⊗c2)(e1K−;c2:1

qΓG+

2;(c2:count(∗))(e2)) (27)

Double Eager/Lazy

ΓG;F(e1Bqe2)≡ΓG;(F2

1⊗c2)(ΓG+

1;F1

1(e1)BqΓG+

2;c2:count(∗)(e2)) (28)

ΓG;F(e1Eqe2)≡ΓG;(F2

1⊗c2)(ΓG+

1;F1

1(e1)Ec2:1

qΓG+

2;c2:count(∗)(e2)) (29)

ΓG;F(e1Kqe2)≡ΓG;(F2

1⊗c2)(ΓG+

1;F1

1(e1)KF1

1({⊥});c2:1

qΓG+

2;(c2:count(∗))(e2)) (30)

ΓG;F(e1Bqe2)≡ΓG;(F2

2⊗c1)(ΓG+

1;c1:count(∗)(e1)BqΓG+

2;F1

2(e2)) (31)

ΓG;F(e1Eqe2)≡ΓG;(F2

2⊗c1)(ΓG+

1;c1:count(∗)(e1)EF1

2({⊥})

qΓG+

2;F1

2(e2)) (32)

ΓG;F(e1Kqe2)≡ΓG;(F2

2⊗c1)(ΓG+

1;(c1:count(∗))(e1)Kc1:1;F1

2({⊥})

qΓG+

2;F1

2(e2)) (33)

Eager/Lazy Split (with Γ2:= ΓG;(F2

1⊗c2)◦(F2

2⊗c1)):

ΓG;F(e1Bqe2)≡ΓG;(F2

1⊗c2)◦(F2

2⊗c1)(ΓG+

1;F1

1◦(c1:count(∗))(e1)BqΓG+

2;F1

2◦(c2:count(∗))(e2)) (34)

ΓG;F(e1Eqe2)≡ΓG;(F2

1⊗c2)◦(F2

2⊗c1)(ΓG+

1;F1

1◦(c1:count(∗))(e1)EF1

2({⊥}),c2:1

qΓG+

2;F1

2◦(c2:count(∗))(e2)) (35)

ΓG;F(e1Kqe2)≡Γ2(ΓG+

1;F1,1

1◦(c1:count(∗))(e1)KF1,1

1({⊥}),c1:1;F1,1

2({⊥}),c2:1

qΓG+

2;F1,1

2◦(c2:count(∗))(e2)) (36)

Others

ΓG;F(e1Nqe2)≡ΓG;F(e1)Nqe2(F(q)∩ A(e1)) ⊆G(37)

ΓG;F(e1Tqe2)≡ΓG;F(e1)Tqe2(F(q)∩ A(e1)) ⊆G(38)

ΓG;F(e1ZJ1θJ2;Fe2)≡ΓG;(F2⊗c1)◦F2

1(ΓG+

1;F1

1◦(c1:count(∗))(e1)ZJ1θJ2;Fe2)).(39)

ΓG;F(e1ZJ1θJ2;Fe2)≡ΓG;F2

1(ΓG+

1;F1

1(e1)ZJ1θJ2;Fe2)) (40)

ΓG;F(e1ZJ1θJ2;Fe2)≡ΓG;(F2⊗c1)(ΓG+

1;(c1:count(∗))(e1)ZJ1θJ2;Fe2)) (41)

Fig. 3. Equivalences

907

apply Eqv. 12. The according expressions are now given in the

lower header line of each relation. Now all tuples in each eiare

relevant, including those below the separating horizontal line.

The result of e1Kj1=j2e2is given in e0

3, where we denote

NULL by ’-’. We can reuse all the different aggregation vectors

derived in the previous example. The only new calculation that

needs to be done is the one for the default values for the full

outerjoin on the right-hand side of Eqv. 12. Eqv. 12 deﬁnes

default values in case a tuple tfrom e2does not ﬁnd a join

partner from the other side. All c1values of orphaned e2tuples

become 1. Further, F1

1({⊥})evaluates to 1 for k(count(*) on

a relation with a single element), and NULL for a2, since

SQL’s sum returns NULL for sets which contain only NULL

values. Thus prepared, we can calculate the left-hand side of

Eqv. 12 via e5, which is the same as in the previous example

e0

6, which now uses a full outerjoin with default, and, ﬁnally,

e0

7, which shows the same result as e0

4.

3) Remarks: The main equivalences are those under the

heading Eager/Lazy Group-by Count. They fall into two class-

es depending on whether the grouping is pushed into the left

or the right argument of the join. For commutative operators

like inner join and full outerjoin, deriving one from the other is

simple. For non-commutative operators like the left outerjoin,

an additional proof is necessary. Now, instead of pushing the

grouping operator only into one argument, we can combine

both equivalences to push it into both arguments. The resulting

equivalences are given under the heading Eager/Lazy Split.

The equivalences between these two blocks are specializations

in case an aggregation vector Fonly accesses attributes

from one input. In this case, either F1or F2is empty, and

the equivalences can be simpliﬁed. These simpliﬁcations are

shown in the blocks Eager/Lazy Group-By, Eager/Lazy Count

and Double Eager/Lazy. The last block of equivalences, termed

Others, shows how to push the grouping operator into the left

semijoin, left antijoin, and the groupjoin. The latter requires

another arbitrary aggregation vector F. All have in common

that after they are applied, only the attributes from their left

input are accessible. Thus, the grouping operator can only be

pushed into their left argument.

B. Eliminating the Top Grouping

We wish to eliminate a top grouping from some expression

of the form ΓG,F (e)for some aggregation vector F= (b1:

agg1(a1), . . . , bk:aggk(ak)). Clearly, this is only possible if

Gis a key for eand eis duplicate-free, since in this case,

there exists exactly one tuple in efor each group. The only

detail left is to calculate the aggregation vector F. This can

be done via a map operator as in

ΓG;F(e)≡ΠC(χˆ

F(e)) (42)

if we deﬁne ˆ

Fto calculate the result of some aggregate

function for single values, i.e., ˆ

F:= (b1:agg1({a1}), . . . , bk:

aggk({ak})), and C=G∪ {b1, . . . , bk}.

Remark. In SQL, a declaration of a primary key or a

uniqueness constraint implies not only a key but also that the

relation is duplicate-free.

IV. DYNAMIC PROGRAMMING

A. Plan Generation Basics

We brieﬂy repeat the basics of a bottom-up plan gener-

ator. Fig. 5 shows the basic structure of a typical dynamic

programming-based plan generator. Its input consists of three

major pieces: the set of relations to be joined, the set of

operators to be used for this, and a hypergraph representing

the query graph. Clearly, the relations and the operators are

derived from the initial SQL query in a straightforward manner.

The hypergraph is constructed by a conﬂict detector [1]. It

encodes possible reordering conﬂicts as far as possible into

the hypergraph. This is necessary since inner joins and outer

joins are not freely reorderable.

The major data structure used is the DPTable, which stores

(an) optimal plan(s) for a given set of relations. The basic

algorithm in Fig. 5 uses a single plan per DPTable entry.

Later on, multiple plans exists per DPTable entry.

The plan generator consists of four major components.

The ﬁrst component initializes the DPTable with plans for

access paths for single relations, such as table scans and index

accesses (Line 1,2). The second component enumerates csg-

cmp-pairs of the hypergraph H(Line 3), where a csg-cmp-pair

(ccp for short) is deﬁned as follows:

Deﬁnition 3: Let H= (V , E)be a hypergraph and S1, S2

two subsets of V.(S1, S2)is a csg-cmp-pair (ccp for short) if

the following three conditions hold:

1) S1∩S2=∅,

2) S1and S2induce connected subgraphs of H, and

3) ∃(u, v)∈E u ⊆S1∧v⊆S2, that is S1and S2are

connected by some edge.

An efﬁcient enumerator for csg-cmp-pairs has been proposed

in [15].

The third component (Line 5) is an applicability test for

operators. It builds upon the conﬂict representation and checks

whether some operator ◦pcan be safely applied. This is

necessary since it is not possible to exactly cover all conﬂicts

within a hypergraph representation of the query [1].

The fourth component (BU IL DPL AN S) is a procedure that

builds plans using some operator ◦pas the top operator and

the optimal plans for the subsets of relations S1and S2, which

can be looked up in the DPTable. Finally, the optimal plan is

returned (Line 9).

Subsequently, we will see that all components except

for the last one can remain unmodiﬁed if we introduce the

capability of pushing grouping operators down. Thus, our

approach is minimally invasive.

B. Applying Eager Aggregation

Before we continue with the ﬁrst plan generator, we

introduce the routine OPTREE S (Fig. 6) that is utilized by all

plan generators in this paper. Its arguments are two join trees

T1and T2, and a join operator ◦p. The result consists of a

set of at most four trees which join T1and T2, including all

possible variants of eager aggregation.

908

DP-PLA NGE N

Input: a set of relations R={R0, . . . , Rn−1}

a set of operators Owith associated predicates

a query hypergraph H

Output: an optimal bushy operator tree

1for all Ri∈R

2DPTable[Ri] = Riinitial access paths

3for all csg-cmp-pairs (S1, S2)of H

4for all ◦p∈O

5if APPLICABLE(S1, S2,◦p)

6 BUILDPLANS(S1, S2,◦p)

7if ◦pis commutative

8 BUILDPLANS(S2, S1,◦p)

9return DPTable [R]

BUI LD PLA NS(S1, S2,◦p)

1OptimalCost =∞

2S=S1∪S2

3T1=DPTable[S1]

4T2=DPTable[S2]

5if DPTable[S]6=NULL

6OptimalCost =COS T(DPTable[S])

7if COS T(T1◦pT2)<OptimalCost

8OptimalCost =COS T(T1◦pT2)

9DPTable[S]=(T1◦pT2)

Fig. 5. Basic DP Algorithm

The relation sets S1and S2are obtained from T1and T2,

respectively, by extracting their leaf nodes. This is denoted by

T(T)for a tree T. The ﬁrst tree is the one which joins T1and

T2using ◦pwithout any grouping.

One situation that requires some care is when we create

a join tree containing all the relations in our query, which is

equivalent to S=R, where Ris the set of all relations. Then

we have to add another grouping on top of ◦pif and only if

the grouping attributes do not comprise a key (cf. Sec. III-B).

This is checked by calling NE ED SGROUPING, which is listed

in Figure 7.

The next tree is the one that groups the left argument before

the join. In order to do so, we have to make sure that the

corresponding transformation is valid. This check is accom-

plished by calling the subroutine VALI D, which implements

the equivalences presented in Sec. III. Additionally, we have

to avoid the case in which the grouping attributes G+

iform a

key for the set Si, with i∈ {1,2}, because then the grouping

would be a waste. And again, if necessary, we have to add a

grouping on top.

Once the routine terminates, the returned set Trees contains

up to four different join trees which are depicted in Fig. 8.

Note that the introduction of OPTR EE S only serves the purpose

of increasing the readability of the following algorithms and

should not be included in a real implementation since it

produces plans that not all of the subsequent algorithms will

need.

OPTRE ES(T1,T2,◦p)

1S1=T(T1)

2S2=T(T2)

3S=S1∪S2

4Trees =∅

5NewTree = (T1◦pT2)

6if S

= =

R∧NEE DS GROU PI NG(G,NewTree )

7NewTree = (ΓG(NewTree ))

8Trees.insert(N ewT ree)

9NewTree = ΓG+

1(T1)◦pT2

10 if VALID(NewTree)∧NE ED SGROUPING(G+

1,NewTree)

11 if S

= =

R∧NEE DS GROU PI NG(G,NewTree )

12 NewTree = (ΓG(NewTree ))

13 Trees.insert(N ewT ree)

14 NewTree =T1◦pΓG+

2(T2)

15 if VALID(NewTree)∧NE ED SGROUPING(G+

2,NewTree)

16 if S

= =

R∧NEE DS GROU PI NG(G,NewTree )

17 NewTree = (ΓG(N ewT ree))

18 Trees.insert(N ewT ree)

19 NewTree = ΓG+

1(T1)◦pΓG+

2(T2)

20 if VALID(NewTree)

∧NEE DS GROU PI NG(G+

1,NewTree)

∧NEE DS GROUPING(G+

2,NewTree))

21 if S

= =

R∧NEE DS GROU PI NG(G,NewTree )

22 NewTree = (ΓG(NewTree ))

23 Trees.insert(N ewT ree)

24 return Trees

Fig. 6. OPTR EE S

NEE DS GROU PI NG(G, T )

1if ∃k∈κ(T),k⊆G∧the result of Tis duplicate-free

2return FALS E

3else

4return TRUE

Fig. 7. NE ED SGROUPING

C. Enumerating the Complete Search Space

Our goal is to make use of eager aggregation and the equiv-

alences presented in the previous section in a plan generator

like the basic DP-algorithm described above.

To ﬁnd the best possible join tree taking eager aggregation

into account, we have to keep all subtrees found by our plan

generator, combine them to produce all possible trees for our

query and pick the best one.

◦S1,S2

S1S2

(a)

◦S1,S2

ΓG+

1

S1

S2

(b)

◦S1,S2

S1ΓG+

2

S2

(c)

◦S1,S2

ΓG+

1

S1

ΓG+

2

S2

(d)

Fig. 8. Possible trees for grouping and join

909

BUI LD PLA NS ALL(S1,S2,◦p)

1S=S1∪S2

2for each T1∈DPTable [S1]

3for each T2∈DPTable [S2]

4for each T∈OPTREE S(T1,T2,◦p)

5if S

= =

R

6 IN SE RTTO PLE VE LPL AN(S,T)

7else

8DPTable[S1∪S2].APPEND(T)

INS ERT TOP LEV EL PLA N(S, T)

1if DPTable[S]

= =

∅ ∨ COS T(T)<CO ST(DPTable[S])

2DPTable[S] = ∅

3DPTable.APPEND(T)

Fig. 9. BU ILD PLA NSAL L

To do this, we change the dynamic programming table in

such a way that it can not only contain one optimal join tree

for every set S⊆R, but also a list of possible trees. Figure

9 shows the routine BUILDPLANSALL, which is derived from

the routine BU IL DPL AN S depicted in Figure 5 and illustrates

the necessary modiﬁcations.

Like before, we enumerate all pairs of subsets S1, S2with

S=S1∪S2to ﬁnd possible join trees for S. We then combine

every tree for S1with every tree for S2using two loops. We

call OPTR EE S for each pair of join trees, which results in up to

four different trees for every combination. The newly created

trees are added to the list for S.

Eventually, we face the situation where S=Rholds

and we need to build a join tree for the complete query.

At this point, we call another subroutine named IN SE RTTO-

PLEV EL PLA N. Inside this routine, we compare the join trees

for Sto ﬁnd the one with minimal costs because there are no

subsequent join operators that need to be taken into account.

Before we can do this, we have to decide whether we need

a top-level grouping by calling NEEDSGROUPING (cf. Figure

7). In contrast to the other relation sets, we do not have to

keep a list of trees for R, but only the best tree found so far

and replace it if a better one is found.

Obviously, the runtime complexity of this algorithm is

O(22n−1#ccp)for nrelations if #ccp denotes the number of

csg-cmp-pairs for the query.

D. A First Heuristic

In this subsection, we present a plan generator that is

capable of applying eager aggregation without the exponential

overhead induced by the algorithm depicted in Figure 9. The

downside of this is the fact that the new less complex plan

generator does not guarantee an optimal solution any longer.

The major difﬁculty we face in incorporating eager aggre-

gation into a DP-based plan generator is that Bellman’s Prin-

ciple of Optimality is no longer valid. If we push a grouping

operator into one or both arguments of a join operator, this can

inﬂuence the costs of subsequent join operations. This means

that it might be necessary to use a suboptimal join tree for a

set S1to construct an optimal solution for some set S⊃S1.

BUI LD PLA NS H1(S1,S2,◦p)

1for each T∈OPTREE S(DPTable[S1],DPTable[S2],◦p)

2if COS T(T)<COS T(DPTable[S1∪S2])

3DPTable[S1∪S2] = T

Fig. 10. BU ILD PLA NSH 1

Γe1.d;d0:count(∗)

Be0.a,e2.f

e0Be1.d,e2.e

e1e2

Γe1.d;d00:sum(d0)

Be0.a,e2.f

e0Be1.d,e2.e

Γe1.d;d0:count(∗)

e1

e2

e0

a b

0 0

1 0

2 1

3 1

e1

c d

0 1

1 0

2 1

3 1

4 4

e2

e f

0 0

1 1

2 3

3 4

e1,2:= e1Be1.a,e2.e e2

c d e f

0 1 1 1

1 0 0 0

2 1 1 1

3 1 1 1

e0

1:= Γd;d0:count(∗)(e1)

d d0

0 1

1 3

4 1

e0,1,2:= e0Be0.a,e2.f e1,2

a b c d e f

0 0 1 0 0 0

1 0 0 1 1 1

1 0 2 1 1 1

1 0 3 1 1 1

e0

1,2:= e0

11e1.d,e2.e e2

d d0e f

0 1 0 0

1 3 1 1

Γ1.d;d0:count(e1.d)(e0,1,2)

d d0

1 3

0 1

e0

0,1,2:= e0Be0.a,e2.f e0

1,2

a b d d0e f

1 0 1 3 1 1

0 0 0 1 0 0

Γ1.d;d00:sum(d0)(e0

0,1,2)

d d00

1 3

0 1

Fig. 11. Exemplary query with alternative join trees

That is because the higher costs of the non-optimal subplan

under certain circumstances are compensated by cost savings

for the subsequent joins.

Figure 10 again shows a modiﬁed version of BUI LD PLA NS.

We refer to the resulting plan generator as our ﬁrst heuristic or

H1. The modiﬁed routine is called BU IL DPL AN SH1. It serves

to demonstrate the problems that arise from the violation of

Bellman’s Principle of Optimality.

The only difference to the basic version of BUI LD PLA NS

we presented in Section IV-A is that the new algorithm makes

use of OPTR EE S to ﬁnd all possible trees for the current

csg-cmp-pair. For each of them the cost function is called to

compute the combined costs for the join and the groupings

contained in the tree, if any. If the costs are lower than those

of an existing plan or if this is the ﬁrst plan for the current set

of relations, the plan is added to the DP-table. In summary,

H1 records only the single cheapest plan for every plan class.

910

To clarify why this approach can lead to problems, Fig. 11

provides a sample query. At the top of the ﬁgure there are

two equivalent operator trees. Both of them involve a grouping

operation. The left one does not make use of eager aggregation,

so the grouping remains at the top of the tree and is evaluated

after all join operations. In the tree on the right side, a grouping

operator has been pushed down into the left argument of

Be1.d,e2.e. Note how the aggregation vector of the original

grouping operator at the top of the tree is adjusted according

to our observations from Section III. That is, we now have to

sum up the values created by the other grouping operator to

get the originally intended count(∗). Below the two operator

trees, there are instances of the three relations e0,e1and e2,

and all the intermediate results for both operator trees.

Cout(e0) = Cout (e1) = Cout(e2)=0

Cout(e1,2)=4 Cout (e0

1)=3

Cout(e0,1,2)=8 Cout (e0

1,2)=5

Cout(Γ(e0,1,2)) = 10 Cout (e0

0,1,2)=7

Cout(Γ(e0

0,1,2)=9

TABLE I. COS TS F OR IN TE RME DIAT E RES ULTS

Table I contains the costs of all subexpressions of both

operator trees, where, for simplicity, we used the cost function

Cout, which simply sums up the intermediate result sizes:

Cout(T) =

0if T is a single table

|T|+Cout(T1) + Cout (T2)if T=T1◦T2

|T|+Cout(T1)if T= Γ(T1)

According to the deﬁnition of Cout, scanning the base

relations does not cause any costs at all, which is reﬂected

in the ﬁrst line of Table I. (Note that the scan costs would

be the same constant in both plans anyway.) Beginning on the

second line, the left (right) column contains the cost of the

intermediate results of the left (right) plan of Fig. 11.

Let us now go through our heuristic. It will decide against

early aggregation of relation e1because the combined costs for

the grouping and the following join operation are higher than

the costs for joining without prior grouping. Taking a closer

look at the following lines in our table, we see that the costs

for joining e1,2with e0amount to 8, whereas the right column

states a value of 7 for the join between e0

1,2and e0. For the

total costs of the query, we notice the same cost difference: the

left tree causes costs of 10, the right one only 9. This means

the tree that is eliminated by our naive plan generator is in

fact less expensive than the other one.

The reason for this behaviour can be seen in Figure 11. The

early grouping of relation e1causes additional costs of 3, but

it also reduces the cardinality of the following expressions e0

1,2

and e0

0,1,2compared to e1,2and e0,1,2. The additional costs for

the ﬁrst grouping operation are therefore compensated by the

reduced cardinalities and costs of the following expressions.

Considering only the costs of expression e0

1,2, this beneﬁt is

not obvious because it becomes visible only further up in the

tree.

In the example above, the inﬂuence of an early grouping on

the cardinalities of subsequent expressions is already enough

to make eager aggregation beneﬁcial. But there is also a second

aspect to it that allows for even bigger cost savings. The

introduction of new grouping operators also inﬂuences the

functional dependencies that hold for the intermediate results.

Looking back at the values for e0

0,1,2in Figure 11, we

can see that the ﬁnal grouping is not necessary to produce

the same result as the left join tree. Instead, a projection

on the attribute set {e1.d, d0}sufﬁces because the functional

dependency e1.d → A(e0

0,1,2)holds, i.e., e1.d is a key for

e0

0,1,2and the attribute d0already contains the correct value

for the original aggregate function count(∗). We can therefore

leave out the ﬁnal grouping and replace it by a much cheaper

duplicate-preserving projection Πe1.d,d0. As our cost function

does not take projection costs into account, we end up with

a cost value of 7 for the tree applying eager aggregation, in

contrast to a value of 10 for the other tree.

These ﬁndings lead to the conclusion that it is not sufﬁcient

to “locally” assess the proﬁtability of eager aggregation for

one join operation, as described above, if we want to consider

the whole search space. Still, this approach can be used as a

simple heuristic producing only a moderate overhead on top

of dynamic programming and at the same time exploiting at

least some of the potential beneﬁts of eager aggregation.

E. Improving the Heuristic

As we have seen in the previous subsection, the routine

BUI LD PLA NS H1 tends to discard trees applying eager aggre-

gation even in cases where it might be beneﬁcial because the

accumulated costs of the aggregation and the join are higher

than those of the join alone.

It is therefore possible to improve the heuristic by making

this cost comparison less strict and thereby enabling the plan

generator to prefer plans that are “more eager” even though

they might cause slightly higher costs locally. For this purpose,

we introduce the simple notion of the eagerness of a plan,

which is deﬁned as follows:

Eagerness(T) =

0if T=T1BT2

1if T= Γ(T1)BT2or T=T1BΓ(T2)

2if T= Γ(T1)BΓ(T2)

The eagerness of a join tree Tis simply deﬁned as the number

of grouping operators that are a direct child of the topmost

join operator. Figure 12 shows the pseudocode for the routine

BUI LD PLA NS H2, which exploits eagerness.

The main difference to BU ILDPLANSH1 is the new sub-

routine COMPAREADJUSTEDCOST S, which is called from line

2. It takes two join trees and compares the costs of the two,

whereby it adjusts the costs of the less eager tree using a

constant factor F. The value of Fdetermines the degree to

which more eager plans are preferred when compared to less

eager plans. If the eagerness of the two join trees passed to

COMPAREADJUSTEDCOS TS is equal or if the trees form a

plan for the whole query, no cost adjustment is applied. In the

evaluation, we experiment with different values for F.

F. Optimality Preserving Pruning

In Subsection IV-D we showed that it is not possible to

decide whether or not a particular subtree is part of the ﬁnal

solution solely based on its costs. Instead, there are some

911

BUI LD PLA NS H2(S1,S2,◦p)

1for each T∈OPTREE S(DPTable[S1],DPTable[S2],◦p)

2if COMPAREADJUSTEDCOS TS(T, DPTable[S1∪S2])

3DPTable[S1∪S2] = T

COMPAREADJUSTEDCOS TS(T1, T2)

1if Tis top-level plan ∨

EAGE RN ES S(T1)

= =

EAGE RN ES S(T2)

2return COST(T1)<COST(T2)

3if (EAGE RN ES S(T1)<EAG ER NE SS(T2))

4return (F×COST(T1)) <COST(T2)

5elseif (EAGE RN ES S(T1)>EAG ER NE SS(T2))

6return COST(T1)<(F×COST(T2))

Fig. 12. BU ILD PLA NSH 2 and COMPAREADJUSTEDCOST S

more properties we have to check before we can safely discard

suboptimal trees.

As we have seen in Subsection IV-C, keeping all possible

trees in the solution table guarantees an optimal solution but,

on the other hand, causes such a big overhead that it is

impractical for most queries. This leads us to the question

if we can ﬁnd a way to reduce the number of DP-table entries

and still preserve the optimality of the resulting solution.

The ﬁrst observation we made for the example query shown

in Figure 11 was that we have to take the cardinalities of

intermediate results into account. That is because subtrees

with suboptimal costs caused by the introduction of additional

grouping operators can in turn produce smaller results and

thereby lower the costs of subsequent operations.

In addition to that, we discovered that the functional

dependencies that hold for the result of an intermediate join

expression can inﬂuence the costs of the ﬁnal join tree. These

functional dependencies are in turn inﬂuenced by the grouping

operators present in the expression.

As a result of these ﬁndings, we can deﬁne dominance of

one tree over another tree by means of three criteria:

Deﬁnition 4: A join tree T1dominates another join tree T2

for the same set of relations if all of the following conditions

hold:

•Cost(T1)≤C ost(T2)

• |T1|≤|T2|

•F D+(T1)⊇F D+(T2).

We can safely discard any join tree T2that is dominated

by another join tree T1. Note that the third item makes

use of the closure of the functional dependencies, denoted

by F D+, that hold in T1and T2, respectively. Since the

computation and comparison of these two sets is expensive,

this condition can be weakened in an actual implementation

by comparing the sets of candidate keys instead. Figure 13

shows the routine PRU NE DOM INATED PLA NS, which checks

these three conditions.

The routine expects as arguments a set of relations Sand

a join tree Tfor this set. The loop beginning in line 1 iterates

through the existing join trees for Staken from the DP-table

PRUN EDO MI NATE DPL AN S(S, T )

1for Told ∈DPTable[S]

2if COS T(Told)≤COST (T)∧ |Told| ≤ |T|

∧F D+(Told )⊇F D+(T)

3return

4if COS T(Told)≥COST (T)∧ |Told| ≥ |T|

∧F D+(Told )⊆F D+(T)

5 discard Told

6DPTable[S].APPEND(T)

Fig. 13. PRU NE DOMI NATE DPLA NS

BUI LD PLA NS PRUN E(S1,S2,◦p)

1S=S1∪S2

2for each T1∈DPTable [S1]

3for each T2∈DPTable [S2]

4for each T∈OPTREE S(T1,T2,◦p)

5if S

= =

R

6 IN SE RTTO PLE VE LPL AN(S,T)

7else

8 PRU NE DOM INATED PLA NS(S,T)

Fig. 14. BU ILD PLA NSPR UNE

and compares each of them with the new tree T. If there is an

existing tree Told with lower or equal costs and lower or equal

cardinality than Tand the functional dependencies holding for

the new tree are a subset of the ones for Told,Tcannot result

in a better solution than Told. Therefore, the routine returns

without adding Tto the tree list for S.

If Tdominates an existing tree, we can safely delete Told

from the DP-table. In this case, we continue to loop through

the existing trees because there may exist more dominated trees

to discard. Eventually, the loop ends and Tis added to the list

for S.

This pruning routine is called by BU IL DPL AN SAL L for

every new join tree found for a set S6=R. Note that there

is no need to prune in case of S=Rbecause then the list

contains only one tree anyway. Summarizing, this gives us the

routine BUILDPLANSPRU NE depicted in Figure 14.

V. EVAL UATIO N

We evaluate the algorithms experimentally with respect to

runtime and plan optimality. For our experiments, we extended

the DP-based plan generator DPhyp [16] and generated 10,000

operator trees each for a certain number of relations from three

to twenty. Therefore, we ﬁrst generated random binary trees

using the unranking procedure proposed by Liebehenschel

[17]. Next, we randomly attached join operators to the internal

node and relations to the leaves. Then, the attributes for

equality join predicates and grouping are randomly selected.

Finally, random cardinalities and selectivities are generated.

A. The Gain

First of all, we demonstrate the potential beneﬁt that arises

from the application of eager aggregation in terms of plan qual-

ity. Fig. 15 shows the average total plan cost achieved without

912

0

2

4

6

8

10

12

14

16

18

2 4 6 8 10 12 14

Relative Plan Cost

Relations

DPhyp

EA-All/EA-Prune

Fig. 15. Plan Cost DPhyp and EA-Prune

eager aggregation in relation to the values produced by EA-

Prune/EA-All, i.e., the complete plan generators with/without

pruning. As we have stated in the previous section, our pruning

criterion does not affect plan optimality. The values of the two

algorithms are therefore identical.

As can be seen in the graphic, the plan quality for queries

with three relations is nearly equal for the two plan generators,

but EA-Prune is already slightly ahead. As the number of

relations increases, the cost difference also increases. The

curves stop at 13 relations, where the plans produced by DPhyp

are on average 18 times as expensive as the ones produced

by EA-Prune. However, there are some extreme outliers. The

biggest cost difference was observed for a query with 10

tables where the plan produced by DPhyp was 17,500 times

as expensive as the one achieved with eager aggregation.

B. The Price

These gains come at the price of increased runtime and

memory usage. Fig. 16 shows the runtime for DPhyp, the

two complete enumeration algorithms EA-Prune and EA-All

and our ﬁrst heuristic H1. Note that the y-axis is scaled

logarithmically. The curves for EA-Prune and EA-All stop at 8

and 13 relations, respectively, since running them with 10,000

different input queries for up to 20 relations was not feasible

because of their extremely long runtimes. As can be seen in

the ﬁgure, EA-Prune takes more than one second for a query

with 11 relations. If pruning is not applied, this threshold is

reached with only 7 relations. DPhyp, on the other hand, stays

below one second even for 20 relations. H1 differs from DPhyp

by an almost constant factor of 2.6on average. This leads us

to the conclusion that the complete enumeration of the search

space including eager aggregation is only practical for small

to medium queries, even if pruning is applied.

C. The Details

Now we take a closer look at the heuristic algorithms. First

of all, we are interested in how close to the optimal solution

they actually get. Fig. 17 compares the total cost achieved by

H1, H2 and the complete algorithm with pruning. Again, all

values are relative to the ones produced by EA-Prune. For

the second (improved) heuristic H2, the ﬁgure contains four

curves, each with a different value for the tolerance factor

F. These values are drawn from a wide range of alternative

1e-06

1e-05

0.0001

0.001

0.01

0.1

1

10

100

2 4 6 8 10 12 14 16 18 20

Runtime[s]

Relations

DPhyp

EA-Prune

EA-All

H1

Fig. 16. Runtime EA-Prune, EA-All and H1

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

2 4 6 8 10 12 14

Cost

Relations

EA-All/EA-Prune

H1

H2, F=1.03

H2, F=1.05

H2, F=1.1

H2, F=1.01

Fig. 17. Plan Cost Heuristics and EA-Prune

tolerance factors that we used in the course of our experiments.

They serve to show the inﬂuence of different factors on the

resulting plan quality.

None of the heuristic plan generators produces optimal

costs for every query, but all of them are signiﬁcantly closer to

optimality than DPhyp. Out of the plan generators that were

run for this experiment, H2 with a tolerance factor of 1.03

is the best as its plan quality is closest to that produced by

EA-Prune. For 13 relations, the plans produced by H2 are

on average only 7 percent more expensive than the optimal

solution. The largest factor we observed for H1 is 10.3, and

for H2 it is 9.7 (F= 1.03), both resulting from queries with

13 relations.

The runtimes of H1 and H2 are given in Fig. 18. In many

cases H2 is slightly faster. The reason for this is that H2 has

a tendency to apply eager aggregation more often than H1,

which has an inﬂuence on the key constraints that hold in the

produced subplans. While H2 has to do more work for every

plan it considers because it has to determine the eagerness of

the plan and calculate the adjusted cost, it considers fewer

plans because pushing a grouping often makes a group-by

further up in the operator tree obsolete. The latter is due to

the fact that the grouping attributes become a key.

913

0.92

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

2 4 6 8 10 12 14 16 18 20

Relative Runtime

Relations

H1

H2, F=1.03

Fig. 18. Runtime H1 and H2

D. TPC-H Queries

Table II shows a comparison of DPhyp and our new

algorithms with respect to optimization times and optimized

plan costs for the example query from Section I (Ex) and three

selected TPC-H queries (Q3, Q5, Q10). Query statistics were

taken from a scale factor 1 instance of TPC-H. Since only Ex

contains an outer join, it is important to stress that the presence

of outer joins does not increase the complexity of EA-Prune

or any of the algorithms presented in this paper.

Among the listed queries, Ex beneﬁts most from eager

aggregation, which is also reﬂected by the execution times we

observed on different existing systems (see Section I). TPCH-

Q5, on the other hand, provides the smallest possible gain.

Ex Q3 Q5 Q10

Time EA [ms] 0.184 0.163 2.4 0.31

Time H1 [ms] 0.15 0.13 0.333 0.183

Time H2 [ms] 0.122 0.151 0.413 0.323

Time DPhyp [ms] 0.097 0.115 0.327 0.158

Rel. Time EA/DPhyp 1.9 1.42 7.34 1.96

Rel. Time H1/DPhyp 1.55 1.13 1.02 1.16

Rel. Time H2/DPhyp 1.26 1.31 1.26 2.04

Rel. Cost EA/DPhyp 6.1×10−40.65 0.9 0.58

Rel. Cost H1/DPhyp 6.1×10−40.92 0.9 0.58

Rel. Cost H2/DPhyp 6.1×10−40.65 0.9 0.58

TABLE II. OPTIMIZATION TIME AND PLAN COS T FO R TPC-H

QUE RIE S

VI. CONCLUSION

We presented a complete set of equivalences that allows

us to push grouping into inner joins, left outerjoins, full

outerjoins, semijoins, antijoins, and groupjoins. Further, we

introduced four novel algorithms to integrate the exploita-

tion of these equivalences within a state-of-the-art dynamic

programming-based plan generator. Both, a simple complexity

analysis and the experiments indicate that the complete enu-

meration of the extended search space is possible for only up to

7 relations. A newly introduced optimality preserving pruning

technique allows to extend this bound to 10. Beyond that, only

heuristic approaches are possible. One of them, H2, produces

competitive plans which are on average only 7% worse than

the optimal plan. However, some extreme outliers exist where

the plan produced by H2 is a factor of 9.7 worse than the

optimal plan. Thus, two directions for future research are to

discover better heuristic algorithms and to develop even more

effective optimality preserving pruning techniques.

REFERENCES

[1] G. Moerkotte, P. Fender, and M. Eich, “On the correct and complete

enumeration of the core search space,” in ACM SIGMOD, 2013, pp.

493–504.

[2] A. Shanbhag and S. Sudarshan, “Optimizing join enumeration in

transformation-based query optimizers,” Proc. of the VLDB Endowment

(PVLDB), vol. 7, no. 12, pp. 1243–1254, 2014.

[3] S. Chaudhuri and K. Shim, “Including group-by in query optimization,”

in VLDB, 1994, pp. 354–366.

[4] W. Yan, “Rewriting optimization of sql queries containing group-by,”

Ph.D. dissertation, University of Waterloo, 1995.

[5] W. Yan and P.-A. Larson, “Performing group-by before join,” Dept. of

Computer Science, University of Waterloo, Canada, Technical Report

CS 93-46, 1993.

[6] ——, “Performing group-by before join,” in IEEE ICDE, 1994, pp.

89–100.

[7] ——, “Eager aggregation and lazy aggregation,” in Proc. Int. Conf. on

Very Large Data Bases (VLDB), 1995, pp. 345–357.

[8] ——, “Interchanging the order of grouping and join,” Dept. of Com-

puter Science, University of Waterloo, Canada, Technical Report CS

95-09, 1995.

[9] A. K. et al., “Processing in the hybrid OLTP & OLAP main-memory

database system HyPer,” IEEE Data Engineering Bulletin, vol. 36, no. 2,

pp. 41–47, 2013.

[10] S. Cluet and G. Moerkotte, “Efﬁcient evaluation of aggregates on bulk

types,” in Int. Workshop on Database Programming Languages, 1995.

[11] G. von Bultzingsloewen, “Optimizing sql queries for parallel execution,”

SIGMOD Rec., vol. 18, December 1989.

[12] G. Paulley, “Exploiting functional dependence in query optimization,”

Ph.D. dissertation, University of Waterloo, 2000.

[13] C. Galindo-Legaria and M. Joshi, “Orthogonal optimization of sub-

queries and aggregation,” in Proc. of the ACM SIGMOD Conf. on

Management of Data, 2001, pp. 571–581.

[14] M. Eich and G. Moerkotte, “Dynamic programming: The next step,”

University of Mannheim, Tech. Rep., 2014.

[15] G. Moerkotte and T. Neumann, “Analysis of two existing and one new

dynamic programming algorithm for the generation of optimal bushy

join trees without cross products,” in VLDB, 2006, pp. 930–941.

[16] ——, “Dynamic programming strikes back,” in ACM SIGMOD, 2008,

pp. 539–552.

[17] J. Liebehenschel, “Lexicographical generation of a generalized dyck

language,” University of Frankfurt, Tech. Rep. 5/98, 1998.

ACKNOWLEDGMENT

We thank Simone Seeger for her help preparing the

manuscript and the anonymous referees for their helpful

comments.

914