Content uploaded by Guido Moerkotte
Author content
All content in this area was uploaded by Guido Moerkotte on Jan 16, 2013
Content may be subject to copyright.
A Blackboard Architecture for Query Optimization in Ob ject Bases
Alfons Kemper
3
Guido Moerkotte
+
KlausPeithner
3
3
Fakultat fur Mathematik und Informatik
+
Fakultat f ur Informatik
Universitat Passau Universitat Karlsruhe
W-8390 Passau, F.R.G. W-7500 Karlsruhe, F.R.G.
kemper
peithner
@db.fmi.uni-passau .de moer@ira.uka.de
Abstract
Adopting the blackboard architecture from the area of
Articial Intelligence, a novel kind of optimizer enabling
two desirable ideas will be prop osed. Firstly, using such
a well-structured approach backpropagation of the op-
timized queries allows an evolutionary improvement of
(crucial) parts of the optimizer. Secondly, the
A
3
search
strategy can be applied to harmonize two contrary prop-
erties: Alternatives are generated whenever necessary,
and straight-forward optimizing is performed whenever
possible, however.
The generic framework for realizing a blackboard op-
timizer is proposed rst. Then, in order to demonstrate
the viability of the new approach, a simple example op-
timizer is presented. It can be viewed as an incarnation
of the generic framework.
1 Introduction
Query optimizers|no matter whether relational or
object-oriented|are among the most complex software
systems that have been built. Therefore, it is not sur-
prising that the design of query optimizers is still a
\hot" research issue|especially in ob ject-oriented da-
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed
for direct commercial advantage, the VLDB copyright notice
and the title of the publication and its date appear, and no-
tice is given that copying is by permission of the Very Large
Data Base Endowment. To copy otherwise, or to republish,
requires a fee and/or special permission from the Endow-
ment.
Proceedings of the 19th VLDB Conference
Dublin, Ireland, 1993.
tabase systems. The following is a list of desiderata that
one may expect of a \goo d" query optimizer:
1.
extensibility and adaptability
: As new, advanced
query evaluation techniques and/or index struc-
tures become available the optimizer architec-
ture should facilitate extension or an adaptation|
without undue eort.
2.
evolutionary improvability
: It should be possible to
tune
the query optimizer after gathering exp erience
over a longer sequence of queries being optimized.
Ultimately, a self-tuning optimizer could be envi-
sioned.
3.
predictability of quality
: Especially when optimiz-
ing interactive queries, a tradeo between the time
used for optimization and the quality of the op-
timized result has to be taken into account. It
is, therefore, most useful if we could estimate the
quality of the optimization outcome relative to the
allocated time for optimization.
4.
graceful degradation under time constraints
: This
desideratum is strongly correlated to the preceding
one. Allo cating less time for optimization should
only gracefully degrade the quality of the opti-
mized queries. This, of course, precludes any opti-
mizer that rst generates all possible alternatives|
without any qualitative ordering|and then evalu-
ates each alternative in turn.
5.
early assessment of alternatives
: The performance
of an optimizer strongly depends on the number
of alternatives generated. Typically, a heuristics
is used to restrict the search space. However, a
better, since more exible, approach is to abandon
the less promising alternatives as soon as possible.
For that, a cost model which enables an estimate
of the potential quality of an alternative already in
an early stage of optimization is required.
6.
specialization
: As in areas of (human) expertise
the optimizer architecture should support the in-
tegration of highly specialized knowledge to deal
with particular (restricted) parts of the optimiza-
tion process and/or with particular subclasses of
queries, e.g., conjunctive or non-recursive queries.
In order to achieve|some of|these desiderata, dier-
ent query optimizer architectures have been proposed.
Unfortunately, all of the proposals fall short of meeting
all
criteria. It even appears that in the attempt of ful-
lling some of the desiderata others had to be neglected,
e.g., rule-based systems emphasize the extensibility, on
the other hand the predictability of the quality in rela-
tion to allocated optimization time becomes extremely
dicult.
To support extensibility, rule-based systems were pro-
posed [5, 22, 13, 3]. Adaptability is the main concern of
the EXODUS query optimizer generator [6], the VOL-
CANO optimizer generator [7], and the GENESIS tool
box system [2]. Structuring the query optimizer for
maintenance and specialization is a ma jor concern of
proposal [19].
A well-structured architecture will be gained, if the
optimization process is subdivided into single, small
steps [24]. The \wholistic" approaches, e.g., [26, 4],
consider an optimization graph|logical or physical|
representing the entire query. That is, at each stage
a complete query evaluation plan exists. Then, rules
are applied to transform this representation. However,
in our opinion it is better to segment the query into
building blocks and operations, in order to compose a
query evaluation plan step by step. The
building block
approach has already been proposed by Lohman [18].
The cost model is an essential part of a query opti-
mizer in order to assure high-quality output. Since it
is not generally obvious which transformation has to be
applied for approaching the optimal plan, alternatives
are generated [6, 22]. The alternatives are graded by a
cost function which has to be continually improved [18].
In [6] an \expected-cost-factor", which is controlled by
monitored results of the optimization, is added to each
rule. We extend that idea by introducing a mechanism
of backpropagation into our architecture.
The right choice of the search strategy is essential
for the performance and the extensibility of an opti-
mizer. Randomized optimization algorithms as pro-
posed in, e.g., [10], are very eective, if the shape of the
cost function forms a
well
, as pointed out in [9]. Fur-
ther, the search strategy should b e independent from
the search space [17]. The search strategy|also pro-
posed for multi query optimization [25]|that will be
applied in our sample optimizer is a slight modication
of A
3
, a search technique which, in its pure form, guar-
antees to nd the optimal solution [20].
In this paper, we present a new architecture for
query optimization, based on a
blackboard
approach,
which facilitates|in combination with a building block,
bottom-up assembling approach and early assessment
by utilizing future cost estimates|to address all the
desiderata. Our approach is a general one as far as we
rst devise the generic blackboard-based architecture
which can be utilized for any kind of optimizer con-
struction. The viability of the proposed generic opti-
mizer architecture is demonstrated by an example query
optimizer which, tough quite simple, demonstrates the
main|that is, we describe one sample instantiation of
the generic framework which, though still incomplete,
adheres to the main principles of the blackboard archi-
tecture.
The rest of the paper is organized as follows. In Sec-
tion 2, the basic framework of the optimizer blackboard
is introduced. We conceptually show how the optimiza-
tion process works and how evolutionary improvability
is integrated into the blackboard architecture. In Sec-
tion 3, the running example|i.e., an object base and
an associated query|is given. In order to establish the
general ideas in our sp ecic GOM optimizer, the basics
as, e.g., the algebra, the organization of our optimizer,
and the search strategy are explained in Section 4. Since
the cost model is essential for every optimizer gener-
ating alternatives, it is outlined in Section 5. Having
sketched our Blackboard Optimizer. Section 6 demon-
strates a sample optimization process. Section 7 con-
cludes the paper.
2 Generic Framework
2.1 The Pure Blackboard
The optimizer
blackboard
is organized into
r
successive
regions
R
0
;:::;R
r
0
1
. Each region contains a set of
items
representing the advances of the optimizer to de-
rive an optimal evaluation plan for a given query. The
original query is translated into some initial internal
format which is identied by
"
and placed into region
R
0
|as its only item.
A
knowledge source KS
i
is associated with each pair
(
R
i
; R
i
+1
) of successive regions. Each knowledge source
KS
i
retrieves items to process from region
R
i
. For each
such item, the knowledge source
KS
i
may generate sev-
eral alternative items which are emitted|in an order
determined by
KS
i
|into the region
R
i
+1
.
Note that there is no restriction concerning the ad-
ditional data read by a knowledge source. They are
R
6
2
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
0 00000
3
;:::;
2
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
2 74318
3
R
5
2
KS
4
KS
3
KS
2
KS
1
KS
0
0 0000
3
;:::;
2
KS
4
KS
3
KS
2
KS
1
KS
0
7 4318
3
R
4
2
KS
3
KS
2
KS
1
KS
0
0 0 0 0
3
;:::;
2
KS
3
KS
2
KS
1
KS
0
4 3 1 8
3
R
3
2
KS
2
KS
1
KS
0
0 0 0
3
;
2
KS
2
KS
1
KS
0
0 1 0
3
;:::;
2
KS
2
KS
1
KS
0
3 1 8
3
R
2
2
KS
1
KS
0
0 0
3
;
2
KS
1
KS
0
1 0
3
;:::;
2
KS
1
KS
0
1 8
3
R
1
2
KS
0
0
3
;
2
KS
0
1
3
;:::;
2
KS
0
8
3
R
0
"
-
-
-
-
-
-
KS
0
KS
1
KS
2
KS
3
KS
4
KS
5
Figure 1: Blackboard Architecture
allowed to read any information at any region, all sta-
tistical data, schema data, indexing information, and so
forth.
The knowledge sources generate sequences of alter-
natives. Therefore, the order in which the alterna-
tive items are generated can be used for identication.
For our abstract blackboard architecture shown in Fig-
ure 1, the items at region
R
6
are identied by six pairs
each consisting of the knowledge source identier|i.e.,
KS
0
;:::;
KS
5
|and the sequence number indicating the
position at which the particular item was generated.
For example, the identier
#
I
=
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
1 0 2 3 4 1
of an item
I
in region
R
6
indicates that this particu-
lar item
I
|whose identier is denoted
#
I
|is the fth
alternative generated by
KS
1
from the second item gen-
erated by
KS
0
, etc.
In Section 2.3 we will see that this particular identi-
cation mechanism is essential for evaluating the quality
and for adapting/calibrating the optimizer blackboard.
2.2 Search Strategy
The blackboard optimizer utilizes a building block ap-
proach for generating the (alternative)
query evaluation
plans
(
QEPs
). Thus, for a given query
Q
the succes-
sive regions of the optimizer blackboard contain more
and more complete query evaluation plans|nally, the
top-most region
R
r
0
1
contains complete (alternative)
evaluation plans that are equivalent to the user-query
Q
.
It is essential to control the search space of the op-
timizer in order to avoid the exhaustive search over all
possible query evaluation plans. Therefore, items at all
regions have associated costs. There exist two cost func-
tions,
cost
h
and
cost
f
, which estimate the
history
and
future
costs for evaluating a certain item. With each
item two sets of operations are associated: the set of
operations which are already integrated into the item
(representing a still incomplete
QEP
) and the set of
operations which still have to b e integrated. The for-
mer set determines
cost
h
and the latter
cost
f
. Based
on these cost functions, the optimizer blackboard is ide-
ally controlled by
A
3
search [20]. That is, at any given
time the knowledge source being applicable to the item
with lowest total cost (
cost
h
+
cost
f
) is allowed to emit
further alternatives.
If
cost
h
corresponds to the actual costs for evaluat-
ing the operations of the rst set and
cost
f
is a
close
lower bound of the future costs,
A
3
search guarantees
to nd an optimal
QEP
eciently. However, for query
optimization a lower b ound estimate of the future costs
is always based on the best case for each operation, i.e.,
the least cost for evaluation is assumed. Hence, the total
estimate of the future costs can be (far) lower than the
actual costs. Then, the
A
3
search could p ossibly degen-
erate to an (almost) exhaustive search which leads to
unacceptable optimization times. In order to straighten
the optimization, the proposed
A
3
search strategy is en-
hanced by the subsequently described
ballooning com-
ponent
.
As explained before, knowledge sources retrieve an
item
I
from their associated region and generate an
ordered sequence of items
I
1
,
:::
,
I
j
which are emit-
ted into the successor region. It is one of the major
objectives in the design and subsequent calibration|
cf. Section 2.3, below|of a knowledge source to en-
sure that the most promising alternatives are generated
rst. Such-like sophisticated knowledge sources entail
the incorporation of the ballo oning control component
to expedite the optimization process. The basic idea of
the ballo oning control is to
periodically
and
temporarily
\switch o " the
A
3
control and to process the rst few
alternatives generated by the knowledge sources with-
out any cost control. Thereby, some \balloons" will
\rise" through success ive regions|possibly all the way
up to the top-most region where items constitute com-
plete
QEPs
.
When switching back to
A
3
search only the balloons
at the top of the derivation chains are further consid-
ered; intermediate steps generated during ballooning
are discarded|thereby reducing the resulting search
space and \straightening" the optimization. Since the
blackboard approach allows to assess the sequence of
the items generated by a knowledge source with respect
to its quality for the global optimization, it is expected
that the integration of the ballo oning component into
the
A
3
search does not substantially degrade the qual-
ity of the optimization. Ballooning will only pro cess
Q
z}| {
111k
Q
2
k
Q
1
KS
0
KS
1
KS
2
KS
3
KS
4
KS
5
111
I
2
m
j 1 1 1 j
I
2
1
k
I
1
n
j 1 1 1 j
I
1
2
j
I
1
1
#
I
1
i
1
j
#
I
1
i
2
j 1 1 1 j
#
I
1
i
n
k
#
I
2
i
1
111
Backpropagation
6 6 6 6 6 6
6
111
6
1 1 1
6
111
6
6
111
6
111
- -
quantitative
analysis
h
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
7 3
2
0 1 7
i
P
P
-
6
Proles of KS quality
\Top-Rank"
Figure 2: Evaluation and Calibration by Backpropagation
highly-promising items very eciently|without back-
tracking. Further, a reconciliation of the time allo cated
for optimization and the quality of the solution|recall
Desideratum 4. of the Introduction|can be achieved
by increasing or decreasing the share of ballooning.
A simplied version of the search algorithm used in
the GOM Blackboard Optimizer is given in Section 4.4.
2.3 Backpropagation
The structuring of our optimizer blackboard imposed
by the knowledge sources operating on successive re-
gions enables the thorough quantitative evaluation and
subsequent calibration of the quality of the knowledge
sources. This is achieved by backpropagating the out-
come of an extensive set of b enchmark queries. The
principle of backpropagating is depicted in Figure 2.
Let
Q
=
f
Q
1
; Q
2
; : : :
g
be a large set of representa-
tive queries|which are either extracted from user sup-
plied queries or are generated by a query generator. For
these queries let the optimizer generate
all
possible al-
ternative query evaluation plans, i.e., for this purpose
all items are expanded at regions
R
0
;:::;R
r
0
2
. It is,
however, essential that the optimizer obeys the control
imposed by the pure
A
3
search|except that the search
continues even after the optimum has b een generated.
For a query
Q
j
a sequence
I
j
n
;
111
; I
j
2
; I
j
1
of alternative
items specifying a complete
QEP
at region
R
r
0
1
|the
right-most item being generated rst and the left-most
last|is obtained. Note that the alternatives are already
sorted by their cost. More sp ecically,
#
I
j
i
1
is the cheap-
est
QEP
identier and
I
j
i
n
is the most expensive one for
a query
Q
j
.
This ordered sequence of plan identiers is propa-
gated back to the blackboard optimizer in order to eval-
uate the individual knowledge sources' quality. The
quality of a knowledge source is measured in terms of
the relative position at which an alternative was gen-
erated in comparison to the position of this alternative
in the
QEP
sequence ordered by their running times.
By evaluating a representative number of queries, a so-
called \Top-Rank" prole can be derived. In Figure 2,
e.g., the backpropagation of
Q
1
increases the third col-
umn of the Top-Rank prole of
KS
3
since the identier
#
I
1
i
1
of the top rank
QEP
states that the appropriate
QEP
was generated as the third alternative by
KS
3
.
In Figure 2, the Top-Rank prole of knowledge
source
KS
3
indicates that almost all top rank
QEPs
emerge from the rst three alternatives of this know-
ledge source. Actually, in practice we are usually more
interested in the so-called \Top-
" proles in which
all those query evaluation plans with running time
within
% of the actual optimum are considered semi-
optimal|where
may be some application domain-
specic threshold value.
Quantitative analysis of the proles facilitates pre-
dicting the average quality of the optimization|as envi-
sioned in Desideratum 3. stated in the Introduction. Let
BAP
(
KS
i
; n
i
) denote the probability that the rst
n
i
alternatives emitted by knowledge source
KS
i
include
the optimal one|under the condition that
KS
i
starts
with the alternative from knowledge source
KS
i
0
1
which ultimately leads to the optimum. This func-
tion can easily be computed from the \Top-Rank" pro-
le. Furthermore, let
b
KS
i
denote a (limiting) branch-
ing factor of knowledge source
KS
i
, i.e., the maximal
number of alternatives that knowledge source
KS
i
is
allowed to generate. Then, the following calculation
Q
i
2f
0
;:::;r
0
2
g
BAP
(
KS
i
; b
KS
i
) derives the probability
that the optimal
QEP
is among the
Q
i
2f
0
;:::;r
0
2
g
b
KS
i
alternatives that emerge at the top-most region
R
r
0
1
.
Further, a more qualitative analysis of the proles
facilitates tuning the individual knowledge sources|as
demanded in Desideratum 2. To give an idea of how
the optimizer can be improved, the three following \hy-
pothetical" proles are depicted:
-
6
(a)
-
6
(c)
-
6
(b)
An ideal prole is Prole (a)|no improvement can be
made. The worst one can think of is Prole (b). It looks
like the prole of a \no-knowledge knowledge source".
Usually, a prole like (c) is worth striving for. It dis-
plays that the knowledge source has only to generate
few alternatives in order to carry the creation of the
optimal (Top-Rank) or a semi-optimal
QEP
(Top-
).
Ultimately, we envision that the proles can b e used
by the optimizer for self-tuning|Desideratum 2|since
the analysis of the proles as well as the generation of
the hints may be carried out automatically.
2.4 Generalized Optimizer Blackboard
In the discussion of the hypothetical knowledge source
proles we already observed that it might be useful to
classify queries within the regions. This allows to pro-
cess them more sp ecically by particular highly cus-
tomized knowledge sources. The classication of queries
depends on the region. As an example, consider clas-
sication of recursive vs. non-recursive queries which is
important to know for applying the right algorithm to
compute join orderings.
In the pure architecture a knowledge source reads
items from region
R
i
and emits the outcome into the
next higher region
R
i
+1
. We extend this concept such
that an item leaving a special region
R
i
o
is allowed to
re-enter the blackboard at a lower level
R
i
e
(
i
e
i
o
).
Thus, items can iterate over the regions
R
i
e
to
R
i
o
. An
item will leave that iteration if it comes back to
R
i
o
without b eing modied.
3 Running Example
In this section, an example object base|called
Com-
pany
|is presented. In Figure 3, ten objects belonging
Emp
Dept
Manager
id
1
name: \Sander"
worksIn:
id
5
salary: 90000
sex: `F'
id
2
name: \Versace"
worksIn:
id
5
salary: 100000
sex: `M'
id
3
name: \Hinault"
worksIn:
id
6
salary: 260000
sex: `M'
id
4
name: \LeMond"
worksIn:
id
6
salary: 100000
sex: `M'
id
5
name: \Clothes"
mgr:
id
8
id
6
name: \Bicycles"
mgr:
id
9
id
7
name: \Shoes"
mgr:
id
10
id
8
name: \Boss"
worksIn:
id
5
salary: 150000
sex: `M'
backUp:
id
2
id
9
name: \Chief"
worksIn:
id
6
salary: 280000
sex: `M'
backUp:
id
3
id
10
name: \Master"
worksIn:
id
7
salary: 900000
sex: `M'
backUp:
NU LL
Figure 3: Example Extension of
Company
to types
Emp
,
Dept
, and
Manager
are shown. The
type denitions are omitted|for the further discus-
sion it is only of importance that each object of type
Emp
has the attributes
name
:
String
,
worksIn
:
Dept
,
salary
:
Float
, and
sex
:
Char
, and each object of type
Dept
the attributes
name
:
String
and
mgr
:
Manager
.
Since
Manager
is a subtype of
Emp
it contains all the
attributes of
Emp
and, furthermore, it has one attribute
backUp
:
Emp
additionally. Further, a type-associated
function
skill
computing a ranking number for individ-
ual
Emp
loyees is assumed.
The lab els
id
i
for
i
2 f
1
;
2
;
3
; : : :
g
denote the system-
wide unique object identiers (
OIDs
). References via
attributes are maintained uni-directionally in GOM|
as in almost all other ob ject models. For example,
in the extension of
Company
there is a reference from
Emp
loyee
id
1
to
Dept id
5
via the
worksIn
attribute.
The Example Query
For the ob ject model GOM, a
QUEL-like query language called GOMql [13] was de-
veloped. As an example query, we want to know when-
ever there is a
Manager
|usually called \MCP"|who
pays a female less than a male
Emp
loyee (in one of his
Dept
s) even though the female is better qualied. We
want to retrieve the manager and as evidence the fe-
male, the male, and the dierence of their salaries. In
GOMql the query can be formulated as follows:
range
u : Emp, o : Emp
retrieve
[mcp : u.worksIn.mgr, underPaid : u,
overPaid : o, dierence : o.salary - u.salary]
where
u.worksIn.mgr = o.worksIn.mgr
and
u.skill
>
o.skill
and
u.salary
<
o.salary
and
u.sex = `F'
and
o.sex = `M'
There are three clauses. The
range
-clause introduces
the needed variables and binds them to nite ranges|
here, the extensions of the types. The
retrieve
-clause
species the nal pro jection of the query, and the
where
-clause contains the selection predicate. Un-
der the assumption that \
Sander
" has higher
skill
than \
Versace
", the relation
f
[
mcp
:
id
8
;
underPaid
:
id
1
;
overPaid
:
id
2
;
dierence
: 10000]
g
is the outcome
of the query with respect to the ob ject base
Company
.
At this point, we would like to stress that even though
we have chosen GOM and GOMql as the example data
model and query language, respectively, the results ob-
viously apply to other object-oriented data models and
query languages as well.
The Index Structures
The GOM query evaluation
is supported by two very general index structures tai-
lored for object-oriented data models:
Access Support Relations
(
ASRs
) [12] are used to
materialize (frequently) traversed reference chains,
and
Generalized Materialization Relations
(
GMRs
) [11]
maintain pre-computed function results.
Since these two index structures have to be taken into
account in the optimization process, two index relations
based on the schema
Company
are exemplied:
[[
Emp:work sI n:mgr
]]
#0 :
OI D
Emp
#1 :
OI D
Dept
#2 :
OI D
Manag er
id
1
id
5
id
8
id
2
id
5
id
8
: : : : : : : : :
id
10
id
7
id
10
hh
Emp:skil l
ii
#0 :
OI D
Emp
#1 :
int
id
1
10
id
2
4
::: :::
id
10
10
The extension of the
ASR
[[
Emp
:
worksIn
:
mgr
]] which
contains all paths corresponding to the indicated path
expression, and of the
GMR
hh
Emp
:
skill
ii
which main-
tains the pre-computed
skill
function for each
Emp
loyee
are depicted. Note that the columns of these index re-
lations are sequentially numbered, i.e., #0, #1,
:::
4 GOM Blackboard Optimizer
4.1 The Algebra
The
query evaluation plans
(
QEPs
) are
directed acyc-
lic graphs
(
DAGs
) consisting of algebraic operator ap-
plications.
Building blocks
standing for sets of
OIDs
of a type
T
(denoted by
oid
(
T
)),
ASRs
(denoted by
[[
:::
]]), and
GMRs
(denoted by
hh
:::
ii
) are the leaves
of the
DAGs
. The treatment of indexes|like
ASRs
and
GMRs
|as additional sources of information is al-
ready present in the notion of shadow tables as intro-
duced in [23]. In accordance with the building block
approach [18], the
DAGs
are succes sively composed
bottom-up|operations are added to the
DAG
and com-
mon subexpressions are factorized. In order to compute
a (near-)optimal
DAG
the optimizer has to determine
an optimal set of building blo cks and an optimal order
of the algebraic operations.
Our algebra mainly cop es with relations. In order to
refer to single columns of relations, we use so-called
in-
formation units
(
IUs
). We do not call them attributes,
since we want to avoid any conict with the attributes
at the GOM object type level. Each
IU
is unique
throughout the entire optimization process, i.e., over
all alternatives which would be generated, and so an
unambiguous dereferencing mechanism is obtained for
the algebraic operations and the cost functions.
Besides the usual set operations (
[
,
n
), the algebra
consists of the common relational
selection
,
projection
,
join
1
, and
renaming
%
. Further, a mapping operator
(
)|called
expansion
|belongs to the algebra. Let
T
be a type,
v
,
v
1
,
v
0
1
,
:::
,
v
n
,
v
0
n
be
IUs
,
a
1
,
:::
,
a
n
be
attributes,
2 f
=
; <; >; : : :
g
be a comparison op erator,
and
c
be a constant. Then, the building blocks and the
algebraic operators are informally dened as follows:
building blocks
: The extension of
T
oid
(
T
), an
ASR
[[
:::
]], and a
GMR
hh
:::
ii
are building blocks. The
columns of the relations retrieved by them are de-
noted by
self
and #0
; : : : ;
#
n
, resp ec tively. We
assume indices on the rst and last column of an
ASR
and on each column of a
GMR
.
expansions
: An expansion
v
1
:
v:a
1
;:::;v
n
:
v:a
n
deref-
erences sets of
OIDs
denoted by
IU
v
such that the
attribute values can be obtained and be assigned
to new
IUs
v
1
,
:::
,
v
n
, respectively. The input
relation is expanded by new columns denoted
v
1
,
:::
,
v
n
. Further, the
operator may also expand
the tuples by function invocations|instead of at-
tribute accesses. The parameters of functions are
enclosed in parentheses following its name.
usual relational operations
:
1
v
1
v
2
denotes a join,
v
1
c
and
v
1
v
2
selections,
v
1
;:::;v
n
a pro jection
on the
IUs
in the subscript, and
%
v
0
1
=
v
1
;:::;v
0
n
=
v
n
a
renaming operation where the column named
v
i
is
renamed to
v
0
i
(
i
= 1
;:::;n
).
Relying heavily on ordinary relational operators allows
us to exploit relational optimization techniques [16, 14].
4.2 The Normal Forms
In object-oriented query processing it is common to
translate the query into an internal representation as
close to the original query as possible|witness, e.g.,
[1, 4, 13, 14]. This is also valid for relational query pro-
cessing where, e.g., an SQL query is translated into a
1
-expression. However, this representation exhibits
another property which the initial internal representa-
tion of object-oriented queries very often lacks: It is an
(expensive) well-structured term facilitating a straight-
forward splitting into building blocks and operations.
Our prop osed starting point|called
Most Costly
Normal Form
(
MCNF
) [14]|has one additional
-
expansion directly following the
1
resulting in a
1
sequence. All the extensions whose instances are needed
for the query evaluation are joined with
true
as join
predicate.
-expansions follow enhancing each tu-
ple of the resulting relation by further information
needed to evaluate the selection predicate solely on
the basis of this result. Thus, two vital concepts of
object-orientation|access via
OIDs
(implicit derefer-
enciation) and function invocation|are integrated into
the
MCNF
, and are prepared for their optimization.
Then, the selections accompanied by the nal projec-
tion onto the required
IUs
are appended.
The
MCNF
representation of the example query
\MCP" is shown below:
mcp
:
um
;
underPaid
:
u;
overPaid
:
o;
dierence
:
osa
0
usa
(
osx
=`M'
(
usx
=`F'
(
usa
<
osa
(
usk
>
osk
(
um
=
om
(
um
:
ud
:
mgr
(
om
:
od
:
mgr
(
ud
:
u:
worksIn
;
usa
:
u:
salary
;
usx
:
u:
sex
(
od
:
o:
worksIn
;
osa
:
o:
salary
;
osx
:
o:
sex
(
usk
:
u:
skill
(
osk
:
o:
skill
(
%
u
=
self
(
oid
(
Emp
))
1
true
%
o
=
self
(
oid
(
Emp
))
:::
)
The
MCNF
is further enhanced [15] in order to ob-
tain a convenient basis for composing the query evalu-
ation plans. A table combining the building blocks and
the op erations with catalog information is derived such
that it contains
all
information relevant for optimizing
the query. Thus, we can, e.g., eciently retrieve the
building blocks and the operations in which a given
IU
is involved. This elab orated normal form is obtained by
decomposing the
MCNF
term into its building blocks
and operations. Each piece is then enriched by statisti-
cal data being relevant to the query. For example, the
cardinalities of the building blocks and the selectivities
of the operations are attached. The fact which columns
of a building block are supported by an index is impor-
tant for an exact cost estimate. Hence, this information
is also maintained.
4.3 Regions and Knowledge Sources
The blackboard of our GOM Blackboard Optimizer is
subdivided into seven
regions
|each one completing the
QEP
in a particular way:
R
0
(
MCNF
),
R
1
(Decompo-
sition),
R
2
(Anchor Sets),
R
3
(Introduce
),
R
4
(In-
troduce
),
R
5
(Introduce
1
), and
R
6
(Introduce
).
Each region supplies
items
, each of which p ossess es an
entry
currentDAGs
and an entry
futureWork
where the
DAGs
composed so far and the remaining operations,
respectively, are stored.
The
knowledge sources
of type
KS
i
read items at re-
gion
R
i
and write items at region
R
i
+1
. What follows
is an informal description of the knowledge sources at
each region. We assume that the query is represented
in
MCNF
format at region
R
0
.
KS
0
(to \Decomposition"): The
MCNF
term is de-
composed into building blocks and operations. The
additional information is obtained from the schema
manager which also manages the statistical data.
Additionally, the
ASRs
and
GMRs
which can be
integrated into the query are determined. There
exists only one knowledge source of this type and
it does not produce any alternatives.
KS
1
(to \Anchor Sets"): A knowledge source of this
type determines which building blocks are chosen
for evaluating the query. We call such a minimal
(i.e., non-redundant) set of building blocks contain-
ing enough information for answering the query an
anchor set
.
KS
1
generates several anchor sets and
sorts them according to special heuristics, e.g., con-
sidering the number of joins or the number of op-
erations left in the
futureWork
entry.
KS
2
(to \Introduce
"): Expansions are added to the
currentDAGs
entry. In the current implemen-
tation, the following heuristics is applied: An
expansion|or a pair of expansions|is integrated
into the
DAGs
if (and only if ) a selection or a join
directly depends on it, or the
futureWork
entry of
the item only contains expansions and pro jections.
KS
3
(to \Introduce
"): According to the heuristics
\introduce selections as early as possible", selec-
tions are integrated into the query whenever it is
possible.
KS
4
(to \Introduce
1
"): At each iteration the know-
ledge source of type
KS
4
introduces at most one
join. As a consequence, for each item a join or-
der is obtained by repeated iterations. Alternatives
might have dierent join orderings.
KS
5
(to \Introduce
"): Finally, projections are added
to the
DAG
. We rule out the following two se-
quences:
1
and
, since a
1
and a
sequence can b e replaced by only one single physi-
cal op eration.
The blackboard is re-entered from region
R
5
to
R
2
until
all expansions, selections, and joins are processed, that
is, the
futureWork
entry is empty except for a single
projection.
In order to avoid evaluating equal expressions twice,
items leaving regions
R
1
,
R
2
,
R
3
,
R
4
, and
R
5
are
fac-
torized
. For example, if
KS
1
selects
%
u
=
self
(
oid
(
Emp
))
and
%
o
=
self
(
oid
(
Emp
)) as elements of an anchor set,
they will be factorized as follows:
oid
(
Emp
)
%
u
=
self
%
o
=
self
@I0
The full set of factorization rules applied can be found
in [15]. As a result, the optimizer generates a
DAG
which is a \logical" query evaluation plan.
4.4 Search Algorithm
The search strategy in the GOM Blackboard Optimizer
consists of two parts. On the one hand,
A
3
search
ad-
vances the alternative with the minimal sum of history
(
cost
h
) and future costs (
cost
f
), and on the other hand,
ballooning
proceeds the alternative(s) emitted rst by
a knowledge source. The actual search strategy com-
bines these two techniques by allowing a certain ratio
of optimization steps to be done under
A
3
search and
under the ballooning control, respectively. The search
strategy is outlined as follows:
1. Insert the starting state (item)
"
into the list
OPEN of unexpanded states.
2. Sort the elements
I
of
OP E N
by increasing
f
(
I
) :=
cost
h
(
I
) +
cost
f
(
I
) values.
3. If the ballooning ag is raised, do
(a) remove the rst
b
initial
elements from OPEN
and insert them into the set
B
(b) perform the following steps
b
iterations
times
i. expand each
I
2 B
by its appropriate
knowledge source to
I
1
,
:::
,
I
j
for
j
b
branch
ii. remove
I
from
B
and insert the item into
CLOSED
iii. insert
I
1
,
:::
,
I
j
into
B
(c) transfer the items in
B
to OPEN, and go to
Step 2.
4. Remove the left-most item
I
from OPEN|i.e.,
the item for which
f
(
I
) :=
cost
h
(
I
) +
cost
f
(
I
) is
minimum (ties broken arbitrarily)| and place it
on CLOSED.
5. If
I
is a goal state, i.e.,
I:F W
=
;
, exit successfully
with the solution
I
.
6. Let the appropriate knowledge source expand state
I
, generating all its succes sors.
7. For every successor
I
0
of
I
:
(a) insert
I
0
into OPEN unless
(b) there exists
I
00
2
OP E N
[
CLO S ED
with
I
0
:F W
=
I
00
:F W
then
i. if
cost
h
(
I
0
)
< cost
h
(
I
00
), then insert
I
0
into OPEN and transfer
I
00
to PRUNED
ii. else, if
cost
h
(
I
0
)
cost
h
(
I
00
), then insert
I
0
into PRUNED
8. Go to Step 2.
The
A
3
search algorithm is a
best rst algorithm
[20].
It starts with inserting
"
, the initial state, into
OPEN
.
OPEN
contains all states which have been reached but
have not been fully expanded, i.e., it contains all items
waiting for their further processing. In each iteration,
A
3
search continues with the item of
OPEN
which has
the least
f
-value
, i.e., the minimal sum of
cost
h
and
cost
f
. That item is expanded, i.e., its successors are
put into
OPEN
, and then it is promoted to
CLOSED
,
the set of all fully expanded states. The algorithm
will successfully terminate as soon as an item is gen-
erated whose future work|denoted by
FW
|is empty
and whose costs are minimal.
In Step 3. the control is temporarily switched from
A
3
search to ballooning. Ballooning might, for exam-
ple, be triggered after a certain number of iterations
in the
A
3
search have b een p erformed. Then, the rst
b
initial
items of
OPEN
are expanded
b
iterations
times,
i.e., the items are expanded to lists, the rst, at most
b
branch
|which should be one in most cases|elements
of each list are then expanded, and so on. The numbers
b
initial
,
b
iterations
, and
b
branch
can be set depending on
the analysis of the entire query and the current state
of the optimizing process. For example, the optimiz-
ing process of a query containing many
-expansions
and selections may be expedited by low
b
initial
, high
b
iterations
, and low
b
branch
parameters, since generating
many alternatives is unnecessary for integrating these
operations. Thus, by ballooning fast optimizing can be
switched on whenever it seems acceptable.
For the
pruning conditions
in Step (7b), a special
case of the optimality criterion [20] is presupposed:
If there are two items
I
1
and
I
2
with equal future
work entries both containing an operation
op
and, fur-
ther,
cost
h
(
I
1
)
<
cost
h
(
I
2
) holds, then integrating
op
into the history work entry of
I
1
and
I
2
will keep the
cost order between the two items invariant. There-
fore, all items (states) which produce higher costs than
an item with the same future work are pruned by
the pruning condition (7b) and transferred to a set
PRUNED
since, due to the optimality criterion, they
cannot possibly yield a better item. Thus, the suc-
cessor item
I
0
will cause the pruning of some items
I
00
2
OPEN
[
CLOSED
, if it is less \expensive", and it
will b e pruned itself by an item
I
00
2
OPEN
[
CLOSED
,
if it is more \expensive".
The pruning conditions can be strengthened, if some
further properties are ensure d by the cost functions [15].
5 Cost Model
From specic data extracted from the object base, the
costs for scanning the building blocks and evaluating
the operations are estimated.
For the calculation of the
history costs
as well as the
future costs
, two parameters are assigned to each
DAG
node: the cardinality #
o
of the output relation, and the
numbers #
e
= (
e
v
1
;:::;e
v
n
) of distinct values belong-
ing to the
IUs
v
1
,
:::
,
v
n
of the output relation|called
e
-values
. Their calculation from so-called basic num-
bers is explained below. The number of page faults #
p
and the
CPU
costs #
c
|additionally to #
o
and #
e
as-
signed to each
DAG
node|are derived from #
o
, #
e
,
and the basic numbers. For estimating #
p
, the well-
known formula of Yao [27] is used.
The estimate for #
c
is based on system-dep ende nt
functions which estimate the
CPU
costs for the building
blocks and the appropriate operations with #
o
and #
e
as input.
Thus, the calculation of the history costs is fairly
straight-forward. The future cost estimate of an op-
eration is demanded to be a lower bound of the actual
costs. For that, we derive a lower bound of the size and
the
e
-values of the input relations (see below). Then,
we can calculate the future costs in basically the same
way as the history costs.
Assigning a quadruple
= (#
p;
#
c;
#
o;
#
e
) to each
DAG
node, the costs of a
DAG
are computed by sum-
ming up the costs of its nodes. Then, we compute the
history cost of an item by adding up the costs of the
DAGs
in the
currentDAGs
entry of the item and the
future costs by adding up the costs of the operations in
the
futureWork
entry.
The data used for the cost calculations is stored as
basic numbers
in three levels: \
Values from the Object
Base
", \
Single Selectivities
", and \
Combined Selectiv-
ities
".
For every object type
T
, the cardinality
c
T
of its ex-
tension and the values
p
oid
T
and
p
object
T
|which denote
the number of pages occupied by the extension, i.e.,
the set of
OIDs
, and by the objects, respectively|are
available as values from the object base. Let
a
be an
attribute of an object type
T
. If
a
refers to an object
type,
def
T;a
denotes the probability that the attribute
is dened (
6
=
NULL
). For each attribute
a
of type
T
,
the parameter
c
T;a
denotes the size of its range. For
each method
m
, the size of its range
c
T;m
and its aver-
age execution time
1
exec
T;m
(
n
)|for executing
n
times
the method
m
on
OIDs
of type
T
|is maintained. The
cardinality of an
ASR
[[
:::
]] and a
GMR
hh
:::
ii
|which
is denoted
c
[[
:::
]]
and
c
hh
:::
ii
, respectively|and the num-
ber of pages they o ccupy|denoted
p
[[
:::
]]
and
p
hh
:::
ii
|are
also available as values from the object base.
The selectivity
s
for a unary operation
op
1
(
R
) is de-
ned as
s
(
op
1
(
R
)) =
j
op
1
(
R
)
j
=
j
R
j
, and for a binary op-
eration
op
2
as
s
(
op
2
(
R
1
; R
2
)) =
j
op
2
(
R
1
; R
2
)
j
=
(
j
R
1
j 3
j
R
2
j
). These single selectivities can b e estimated in
three dierent ways with increasing accuracy:
1. As in [24], the selectivities might be derived
from simple estimates. Thus, if the basic num-
bers
c
Emp
;
skill
= 10,
c
Emp
;
salary
= 10
:
000, and
c
Manager
= 150 are given, the selectivity for
usk
>
osk
,
usa
<
osa
, and
um
=
om
will be (1
0
(1
=c
Emp
;
skill
))
=
2 = 0
:
45, (1
0
(1
=c
Emp
;
salary
))
=
2
0
:
5, and 1
=c
Manager
0
:
007, respe ctively.
2. The selectivities can also be determined by his-
tograms [21]. For that, histograms are generated
by sampling the object base. The selectivities for
1
We know that this is only a rough estimate. Future versions
of the cost mo del will rene this.
osx
=
`F'
and
usx
=
`M'
can be determined in this
way.
3. During the evaluation of a query, one can gain
more accurate selectivity estimates for use in fu-
ture query optimization by monitoring.
Since, in the current implementation, the indepen-
dence of attribute values is presupposed, combined sel-
ectivities are the product of their single selectivities. In
the future, this will be rened.
Knowing the selectivity
s
of an operation, we are able
to derive the output size #
o
of that operation by mul-
tiplying
s
with the cardinality of the input relation(s).
The output size of a building block, i.e., type exten-
sions,
ASRs
, and
GMRs
, is given by the basic numbers.
Thus, the cardinalities of the (intermediate) relations of
a
DAG
are calculated bottom-up.
Since not the total number, but the number of
dis-
tinct OIDs
is essential for cost estimates considering
-expansions and retrieving building blocks with an in-
dex, an
e
-value
e
v
dened by
j
v
(
R
)
j
is assigned to each
IU
v
in a relation
R
. The bottom-up calculation of the
e
-values is p erformed as follows: The initialization is
done by the basic numbers of the building blocks. The
further calculation is mainly based on a formula also
used for generating join orderings [8]. For example, let
an expansion
v
0
:
v:a
be applied on a relation
R
where
the
e
-values are known. Let
c
T
v
;a
be the cardinality
of the range of the attribute/type-associated function
a
and
e
v
be equal to
j
v
(
R
)
j
. Then, the following formula
determines the number
e
0
v
of values b eing referenced:
e
v
0
=
c
T
v
;a
3
(1
0
(1
0
1
=c
T
v
;a
)
e
v
)
Since the
e
-values decrease with each operation ap-
plication, we can determine a non-trivial lower bound
on all
e
-values. Let
R
be the relation obtained by evalu-
ating the
DAG
of the
MCNF
where the last projection
is cut o. Then,
j
v
(
R
)
j
gives a lower bound on all
e
-values of the
IU
v
in all (possibly unnished)
DAGs
representing the query. Using the formulas for history
costs and applying these to the operations in the
future-
Work
entry of an item, we arrive at a lower bound on
the future costs.
6 Sample Optimization
Performing the optimization process for the running ex-
ample, some decisions individually made at each region,
factorization, and pruning will be demonstrated.
The normal forms were already explained in Sec-
tion 4.2. Thus, the sample optimization starts at gener-
ating anchor sets. Each non-redundant set which binds
the
IUs
u
and
o
is a potential anchor set for our exam-
ple. The values for the other
IUs
can be retrieved by
-expansions. Because of symmetry of
u
and
o
, we only
give the sets resulting in bindings for
u
:
A
1
=
f
%
u
=
self
(
oid
(
Emp
))
g
A
2
=
f
%
u
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
g
A
3
=
f
%
u
=#0
;ud
=#1
;um
=#2
([[
Emp:wor k sI n:mgr
]])
g
A
4
=
f
%
u
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
;
%
u
0
=#0
;ud
=#1
;um
=#2
([[
Emp:work sI n:mgr
]])
g
A
5
=
f
%
u
0
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
;
%
u
=#0
;ud
=#1
;um
=#2
([[
Emp:wor k sI n:mgr
]])
g
Due to the corresponding sets for
o
, the appropriate
knowledge source generates at most 5
3
5 = 25 alter-
native anchor sets. Because of the cost functions, the
GOM Blackboard Optimizer favors the following anchor
set
A
2
;
2
originated from
A
2
:
A
2
;
2
=
f
%
u
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
;
%
o
=#0
;osk
=#1
(
hh
Emp:skil l
ii
)
g
Though
A
3
search might backtrack to one of the alter-
native anchor sets the example optimization is limited
to
A
2
;
2
. Factorizing this anchor set results in the fol-
lowing
currentDAGs
entry:
hh
Emp:skil l
ii
%
u
=#0
;usk
=#1
%
o
=#0
;osk
=#1
:
X
Xy
Now, we want to sketch the search space originating
in the item
I
0
containing the
DAG
above. In order
to simplify the following consideration, the future work
for that item is reduced to the operations
usx
:
u:
sex
,
osx
:
o:
sex
,
usx
=`F'
, and
osx
=`M'
. The GOM Black-
board Optimizer doesn't usually open the whole search
space as it is depicted in Figure 4. There, the possi-
ble paths leading from
I
0
to an item
I
1
containing the
future work of
I
0
in its
currentDAGs
entry are illus-
trated. If pure
A
3
search is applied and the evaluation
costs of the operations dier hardly, all six paths from
I
0
to
I
1
are examined. Although some of the six alter-
natives are pruned every time edges come together, a
further reduction of the expense can be achieved. Since
for integrating expansions and selections, the knowledge
sources deliver a good sequence of the items, the trig-
ger condition of the ballooning component can be set
to true and the branching factor
b
branch
to one. Then,
only one alternative is produced.
The other expansions by
worksIn
and
salary
are also
integrated. Since we assume that an attribute access of
an object already resident in the buer is free of cost,
the expansions dereferencing
u
and
o
, respectively, are
r
r
r
r
r
r
r
r
r
1
P
P
P
P
Pq
1
P
P
P
P
Pq
1
P
P
P
P
Pq
P
P
P
P
Pq
1
P
P
P
P
Pq
1
P
P
P
P
Pq
1
I
0
I
1
usx
:
u:sex
osx
:
o:sex
usx
=
`F'
osx
=
`M'
osx
:
o:sex
usx
:
u:sex
osx
:
o:sex
osx
=
`M'
usx
=
`F'
usx
:
u:sex
osx
=
`M'
usx
=
`F'
Figure 4: Example Search Space from
I
0
to
I
1
put together. Further, the two expansions are factorized
as the lower part of the DAG in Figure 5 shows.
Two expansions, three joins, and one projection are
left in the
futureWork
entry. The joins
usa
<
osa
,
usk
>
osk
, or
um
=
om
can be added to the actual
cur-
rentDAGs
entry
2
. Thus, the state expansion|Step 6
of the search strategy (cf. Section 4.4)|leads to three
items
I
0
1
,
I
00
1
, and
I
000
1
.
The history costs of the three items
I
0
1
,
I
00
1
, and
I
000
1
dif-
fer hardly. In contrast to that, the future cost estimates
dier substantially, since the selectivities and, therefore,
the estimates of the cardinalities are very dierent. As
pointed out in Section 5, the selectivity estimate of the
operation
um
=
om
is far less than the other two selec-
tivities. Thus, the future costs and consequently the
f
-value of the item where that operation is integrated
into its
CurrentDAGs
entry is lowest. Hence, this item
is further processed and the two remaining joins are
added to its
CurrentDAGs
entry as selections.
The nal projection completes the
DAG
. Further-
more, projections which reduce the size of the inter-
mediate relations are integrated into the
DAG
.
The resulting
DAG
is given in Figure 5. Further op-
timizations will map the operations to physical oper-
ations. Since every
and every
1
sequence
entails only one physical operation, the resulting
DAG
is divided by dashed horizontal lines.
7 Conclusion
A novel architecture for query optimization based on a
blackboard which is organized in successive regions has
been devised. At every region knowledge sources are ac-
tivated consecutively completing alternative query eval-
uation plans. Starting from basic building blocks a -
nite set of algebraic operations is added such that a
DAG
nally results in a (logical) query evaluation plan.
2
Actually, in order to introduce
um
=
om
the expansions
ud
and
od
have to be added before. This detail is omitted, since
the comparison of the items obtained after incorporating the joins
gives an idea about the importance of the future cost estimates.
hh
Emp:skill
ii
%
u
=#0
;usk
=#1
;ud
=
id;usa
=
isa;usx
=
isx
%
o
=#0
;osk
=#1
;od
=
id;osa
=
isa;osx
=
isx
id
:#0
:worksI n;isa
:#0
:salary;isx
:#0
:sex
usx
=
`F'
osx
=
`M'
1
um
=
om
usk;usa;um;u;osk ;osa;om;o
mcp
:
um;underP aid
:
u;overP aid
:
o;diff er ence
:
osa
0
usa
usk>osk
usa<osa
ud;usk;usa;u
um
:
ud:mgr
od;osk;osa;o
om
:
od:mgr
1
1
X
X
Figure 5: Resulting DAG of the Sample Optimization
Due to this well-structured approach, the optimizer
can continually b e improved. By backpropagating the
optimized queries, each knowledge source can be cali-
brated and assessed. Thus, the weak points of the op-
timizer can be determined and eliminated. An evolu-
tionary improvement takes place.
As a search strategy,
A
3
search enriched by balloon-
ing has been proposed. By subdividing the costs for
each alternative into history and future costs,
A
3
search
is able to compare the possibly unnished plans with
each other. However, even in states where the way
of building ecient plans is obvious, pure
A
3
search
might generate a large number of alternatives. To al-
leviate this, ballooning was designed to accelerate the
optimizer without degrading its quality.
The viability of our approach was shown by the GOM
Blackboard Optimizer. Based on an object-oriented al-
gebra, a blackboard optimizer was sp ecied. It was
shown how a blackboard, its regions, and its knowledge
sources could be designed. The search algorithm was
explained and the basics of a cost model were described.
For illustration purpose a sample optimization was
demonstrated.
Acknowledgement
This work was supported by
the German Research Council DFG under contracts
Ke 401/6-1 and SFB 346/A1.
We thank the participants of the Dagstuhl seminar on
query processing organized by J. C. Freytag, D. Maier,
and G. Vossen, and the attendees of a talk one of the
authors gave on invitation by U. Dayal for fruitful dis-
cussions. We also gratefully acknowledge our students
K. Hauten, A. Ro emer, S. Voss, and R. Waurig who
have implemented the rst prototype.
References
[1] J. Banerjee, W. Kim, and K. C. Kim. Queries in ob ject-
oriented databases. In
Proc. IEEE Conf. on Data En-
gineering
, pages 31{38, L.A., USA, Feb 1988.
[2] D. S. Batory. Extensible cost models and query op-
timization in GENESIS.
IEEE Database Engineering
,
10(4), Nov 1987.
[3] L. Becker and R. H. Guting. Rule-based optimiza-
tion and query processing in an extensible geometric
database system.
ACM Trans. on Database Systems
,
17(2):247{303, Jun 1992.
[4] S. Cluet and C. Delobel. A general framework for the
optimization of object-oriented queries. In
Proc. of the
ACM SIGMOD Conf. on Management of Data
, pages
383{392, San Diego, USA, Jun 1992.
[5] J. C. Freytag. A rule-based view of query optimization.
In
Proc. of the ACM SIGMOD Conf. on Management
of Data
, pages 173{180, San Francisco, USA, 1987.
[6] G. Graefe and D. J. DeWitt. The EXODUS optimizer
generator. In
Proc. of the ACM SIGMOD Conf. on
Management of Data
, pages 160{172, San Francisco,
USA, 1987.
[7] G. Graefe and W. J. McKenna. The Volcano opti-
mizer generator: Extensibility and ecient search. In
Proc. IEEE Conf. on Data Engineering
, pages 209{218,
Wien, Austria, Apr 1993.
[8] T. Ibaraki and T. Kameda. Optimal nesting for com-
puting
N
-relational joins.
ACM Trans. on Database
Systems
, 9(3):482{502, 1984.
[9] Y. E. Ioannidis and Y. C. Kang. Cost wells in random
graphs. Submitted for publication, Jun 1992.
[10] Y. E. Ioannidis and E. Wong. Query optimization by
simulated annealing. In
Proc. of the ACM SIGMOD
Conf. on Management of Data
, pages 9{22, San Fran-
cisco, USA, 1987.
[11] A. Kemper, C. Kilger, and G. Moerkotte. Function
materialization in object bases. In
Proc. of the ACM
SIGMOD Conf. on Management of Data
, pages 258{
268, Denver, USA, May 1991.
[12] A. Kemper and G. Moerkotte. Access supp ort in object
bases. In
Proc. of the ACM SIGMOD Conf. on Man-
agement of Data
, pages 364{374, Atlantic City, USA,
May 1990.
[13] A. Kemper and G. Moerkotte. Advanced query process-
ing in object bases using access support relations. In
Proc. of the Conf. on Very Large Data Bases (VLDB)
,
pages 290{301, Brisbane, Australia, Aug 1990.
[14] A. Kemper and G. Moerkotte. Query optimization in
object bases: Exploiting relational techniques. In J.-
C. Freytag, D. Maier, and G. Vossen, editors,
Query
Optimization in Next-Generation Database Systems
.
Morgan-Kaufmann, 1993. (forthcoming).
[15] A. Kemper, G. Moerkotte, and K. Peithner. A black-
board architekture for query optimization in ob ject
bases. Technical Report #92-31, RWTH Aachen, 1992.
[16] R. Lanzelotte and J.-P. Cheiney. Adapting relational
optimisation technology for deductive and object-
oriented declarative database languages. In
Database
Programming Languges Workshop
, pages 322{336, Naf-
plion, Greece, August 1991.
[17] R. S. G. Lanzelotte and P. Valduriez. Extending the
search strategy in a query optimizer. In
Proc. of the
Conf. on Very Large Data Bases (VLDB)
, pages 363{
373, Barcelona, Spain, Sep 1991.
[18] G. M. Lohman. Grammar-like functional rules for rep-
resenting query optimization alternatives. In
Proc. of
the ACM SIGMOD Conf. on Management of Data
,
pages 18{27, Chicago, USA, 1988.
[19] G. Mitchell, S. B. Zdonik, and U. Dayal. An archi-
tecture for query processing in persistent object stores.
In
Proc. Hawaii Intl. Conference on System Sciences
,
1992.
[20] J. Pearl.
Heuristics
. Addison-Wesley, Reading, Mas-
sachusetts, 1984.
[21] G. Piatetsky-Shapiro and C. Connell. Accurate estima-
tion of the number of tuples satisfying a condition. In
Proc. of the ACM SIGMOD Conf. on Management of
Data
, pages 256{276, Boston, USA, Jun 84.
[22] A. Rosenthal and U. S. Chakravarthy. Anatomy of
a modular multiple query optimizer. In
Proc. of the
Conf. on Very Large Data Bases (VLDB)
, pages 230{
239, L.A., USA, Sep 1988.
[23] A. Rosenthal and D. Reiner. An architecture for query
optimization. In
Proc. of the ACM SIGMOD Conf. on
Management of Data
, pages 246{255, Jun 1982.
[24] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin,
R. A. Lorie, and T. G. Price. Access path selection
in a relational database management system. In
Proc.
of the ACM SIGMOD Conf. on Management of Data
,
pages 23{34, Boston, USA, Jun 1979.
[25] T. K. Sellis. Multiple-query optimization.
ACM Trans.
on Database Systems
, 13(1):23{52, Mar 1988.
[26] D. D. Straube and M. T.
Ozsu. Execution plan gen-
eration for an object-oriented data model. In
Proc. of
the Intl. Conf. on Database Theory (ICDT)
, pages 43{
67, Munich, F.R.G., Dec 1991, LCNS # 470, Springer-
Verlag.
[27] S. B. Yao. Approximating block accesses in database or-
ganizations.
Communications of the ACM
, 20(4):260{
261, Apr 1977.