ArticlePDF Available

A Blackboard Architecture for Query Optimization in Object Bases

Authors:

Abstract

Adopting the blackboard architecture from the area of Artificial Intelligence, a novel kind of optimizer enabling two desirable ideas will be proposed. Firstly, using such a well-structured approach backpropagation of the optimized queries allows an evolutionary improvement of (crucial) parts of the optimizer. Secondly, the A strategy can be applied to harmonize two contrary properties: Alternatives are generated whenever necessary, and straight-forward optimizing is performed whenever possible, however.
A Blackboard Architecture for Query Optimization in Ob ject Bases
Alfons Kemper
3
Guido Moerkotte
+
KlausPeithner
3
3
Fakultat fur Mathematik und Informatik
+
Fakultat f ur Informatik
Universitat Passau Universitat Karlsruhe
W-8390 Passau, F.R.G. W-7500 Karlsruhe, F.R.G.
kemper
peithner
@db.fmi.uni-passau .de moer@ira.uka.de
Abstract
Adopting the blackboard architecture from the area of
Articial Intelligence, a novel kind of optimizer enabling
two desirable ideas will be prop osed. Firstly, using such
a well-structured approach backpropagation of the op-
timized queries allows an evolutionary improvement of
(crucial) parts of the optimizer. Secondly, the
A
3
search
strategy can be applied to harmonize two contrary prop-
erties: Alternatives are generated whenever necessary,
and straight-forward optimizing is performed whenever
possible, however.
The generic framework for realizing a blackboard op-
timizer is proposed rst. Then, in order to demonstrate
the viability of the new approach, a simple example op-
timizer is presented. It can be viewed as an incarnation
of the generic framework.
1 Introduction
Query optimizers|no matter whether relational or
object-oriented|are among the most complex software
systems that have been built. Therefore, it is not sur-
prising that the design of query optimizers is still a
\hot" research issue|especially in ob ject-oriented da-
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed
for direct commercial advantage, the VLDB copyright notice
and the title of the publication and its date appear, and no-
tice is given that copying is by permission of the Very Large
Data Base Endowment. To copy otherwise, or to republish,
requires a fee and/or special permission from the Endow-
ment.
Proceedings of the 19th VLDB Conference
Dublin, Ireland, 1993.
tabase systems. The following is a list of desiderata that
one may expect of a \goo d" query optimizer:
1.
extensibility and adaptability
: As new, advanced
query evaluation techniques and/or index struc-
tures become available the optimizer architec-
ture should facilitate extension or an adaptation|
without undue eort.
2.
evolutionary improvability
: It should be possible to
tune
the query optimizer after gathering exp erience
over a longer sequence of queries being optimized.
Ultimately, a self-tuning optimizer could be envi-
sioned.
3.
predictability of quality
: Especially when optimiz-
ing interactive queries, a tradeo between the time
used for optimization and the quality of the op-
timized result has to be taken into account. It
is, therefore, most useful if we could estimate the
quality of the optimization outcome relative to the
allocated time for optimization.
4.
graceful degradation under time constraints
: This
desideratum is strongly correlated to the preceding
one. Allo cating less time for optimization should
only gracefully degrade the quality of the opti-
mized queries. This, of course, precludes any opti-
mizer that rst generates all possible alternatives|
without any qualitative ordering|and then evalu-
ates each alternative in turn.
5.
early assessment of alternatives
: The performance
of an optimizer strongly depends on the number
of alternatives generated. Typically, a heuristics
is used to restrict the search space. However, a
better, since more exible, approach is to abandon
the less promising alternatives as soon as possible.
For that, a cost model which enables an estimate
of the potential quality of an alternative already in
an early stage of optimization is required.
6.
specialization
: As in areas of (human) expertise
the optimizer architecture should support the in-
tegration of highly specialized knowledge to deal
with particular (restricted) parts of the optimiza-
tion process and/or with particular subclasses of
queries, e.g., conjunctive or non-recursive queries.
In order to achieve|some of|these desiderata, dier-
ent query optimizer architectures have been proposed.
Unfortunately, all of the proposals fall short of meeting
all
criteria. It even appears that in the attempt of ful-
lling some of the desiderata others had to be neglected,
e.g., rule-based systems emphasize the extensibility, on
the other hand the predictability of the quality in rela-
tion to allocated optimization time becomes extremely
dicult.
To support extensibility, rule-based systems were pro-
posed [5, 22, 13, 3]. Adaptability is the main concern of
the EXODUS query optimizer generator [6], the VOL-
CANO optimizer generator [7], and the GENESIS tool
box system [2]. Structuring the query optimizer for
maintenance and specialization is a ma jor concern of
proposal [19].
A well-structured architecture will be gained, if the
optimization process is subdivided into single, small
steps [24]. The \wholistic" approaches, e.g., [26, 4],
consider an optimization graph|logical or physical|
representing the entire query. That is, at each stage
a complete query evaluation plan exists. Then, rules
are applied to transform this representation. However,
in our opinion it is better to segment the query into
building blocks and operations, in order to compose a
query evaluation plan step by step. The
building block
approach has already been proposed by Lohman [18].
The cost model is an essential part of a query opti-
mizer in order to assure high-quality output. Since it
is not generally obvious which transformation has to be
applied for approaching the optimal plan, alternatives
are generated [6, 22]. The alternatives are graded by a
cost function which has to be continually improved [18].
In [6] an \expected-cost-factor", which is controlled by
monitored results of the optimization, is added to each
rule. We extend that idea by introducing a mechanism
of backpropagation into our architecture.
The right choice of the search strategy is essential
for the performance and the extensibility of an opti-
mizer. Randomized optimization algorithms as pro-
posed in, e.g., [10], are very eective, if the shape of the
cost function forms a
well
, as pointed out in [9]. Fur-
ther, the search strategy should b e independent from
the search space [17]. The search strategy|also pro-
posed for multi query optimization [25]|that will be
applied in our sample optimizer is a slight modication
of A
3
, a search technique which, in its pure form, guar-
antees to nd the optimal solution [20].
In this paper, we present a new architecture for
query optimization, based on a
blackboard
approach,
which facilitates|in combination with a building block,
bottom-up assembling approach and early assessment
by utilizing future cost estimates|to address all the
desiderata. Our approach is a general one as far as we
rst devise the generic blackboard-based architecture
which can be utilized for any kind of optimizer con-
struction. The viability of the proposed generic opti-
mizer architecture is demonstrated by an example query
optimizer which, tough quite simple, demonstrates the
main|that is, we describe one sample instantiation of
the generic framework which, though still incomplete,
adheres to the main principles of the blackboard archi-
tecture.
The rest of the paper is organized as follows. In Sec-
tion 2, the basic framework of the optimizer blackboard
is introduced. We conceptually show how the optimiza-
tion process works and how evolutionary improvability
is integrated into the blackboard architecture. In Sec-
tion 3, the running example|i.e., an object base and
an associated query|is given. In order to establish the
general ideas in our sp ecic GOM optimizer, the basics
as, e.g., the algebra, the organization of our optimizer,
and the search strategy are explained in Section 4. Since
the cost model is essential for every optimizer gener-
ating alternatives, it is outlined in Section 5. Having
sketched our Blackboard Optimizer. Section 6 demon-
strates a sample optimization process. Section 7 con-
cludes the paper.
2 Generic Framework
2.1 The Pure Blackboard
The optimizer
blackboard
is organized into
r
successive
regions
R
0
;:::;R
r
0
1
. Each region contains a set of
items
representing the advances of the optimizer to de-
rive an optimal evaluation plan for a given query. The
original query is translated into some initial internal
format which is identied by
"
and placed into region
R
0
|as its only item.
A
knowledge source KS
i
is associated with each pair
(
R
i
; R
i
+1
) of successive regions. Each knowledge source
KS
i
retrieves items to process from region
R
i
. For each
such item, the knowledge source
KS
i
may generate sev-
eral alternative items which are emitted|in an order
determined by
KS
i
|into the region
R
i
+1
.
Note that there is no restriction concerning the ad-
ditional data read by a knowledge source. They are
R
6
2
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
0 00000
3
;:::;
2
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
2 74318
3
R
5
2
KS
4
KS
3
KS
2
KS
1
KS
0
0 0000
3
;:::;
2
KS
4
KS
3
KS
2
KS
1
KS
0
7 4318
3
R
4
2
KS
3
KS
2
KS
1
KS
0
0 0 0 0
3
;:::;
2
KS
3
KS
2
KS
1
KS
0
4 3 1 8
3
R
3
2
KS
2
KS
1
KS
0
0 0 0
3
;
2
KS
2
KS
1
KS
0
0 1 0
3
;:::;
2
KS
2
KS
1
KS
0
3 1 8
3
R
2
2
KS
1
KS
0
0 0
3
;
2
KS
1
KS
0
1 0
3
;:::;
2
KS
1
KS
0
1 8
3
R
1
2
KS
0
0
3
;
2
KS
0
1
3
;:::;
2
KS
0
8
3
R
0
"
-
-
-
-
-
-
KS
0
KS
1
KS
2
KS
3
KS
4
KS
5
Figure 1: Blackboard Architecture
allowed to read any information at any region, all sta-
tistical data, schema data, indexing information, and so
forth.
The knowledge sources generate sequences of alter-
natives. Therefore, the order in which the alterna-
tive items are generated can be used for identication.
For our abstract blackboard architecture shown in Fig-
ure 1, the items at region
R
6
are identied by six pairs
each consisting of the knowledge source identier|i.e.,
KS
0
;:::;
KS
5
|and the sequence number indicating the
position at which the particular item was generated.
For example, the identier
#
I
=
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
1 0 2 3 4 1
of an item
I
in region
R
6
indicates that this particu-
lar item
I
|whose identier is denoted
#
I
|is the fth
alternative generated by
KS
1
from the second item gen-
erated by
KS
0
, etc.
In Section 2.3 we will see that this particular identi-
cation mechanism is essential for evaluating the quality
and for adapting/calibrating the optimizer blackboard.
2.2 Search Strategy
The blackboard optimizer utilizes a building block ap-
proach for generating the (alternative)
query evaluation
plans
(
QEPs
). Thus, for a given query
Q
the succes-
sive regions of the optimizer blackboard contain more
and more complete query evaluation plans|nally, the
top-most region
R
r
0
1
contains complete (alternative)
evaluation plans that are equivalent to the user-query
Q
.
It is essential to control the search space of the op-
timizer in order to avoid the exhaustive search over all
possible query evaluation plans. Therefore, items at all
regions have associated costs. There exist two cost func-
tions,
cost
h
and
cost
f
, which estimate the
history
and
future
costs for evaluating a certain item. With each
item two sets of operations are associated: the set of
operations which are already integrated into the item
(representing a still incomplete
QEP
) and the set of
operations which still have to b e integrated. The for-
mer set determines
cost
h
and the latter
cost
f
. Based
on these cost functions, the optimizer blackboard is ide-
ally controlled by
A
3
search [20]. That is, at any given
time the knowledge source being applicable to the item
with lowest total cost (
cost
h
+
cost
f
) is allowed to emit
further alternatives.
If
cost
h
corresponds to the actual costs for evaluat-
ing the operations of the rst set and
cost
f
is a
close
lower bound of the future costs,
A
3
search guarantees
to nd an optimal
QEP
eciently. However, for query
optimization a lower b ound estimate of the future costs
is always based on the best case for each operation, i.e.,
the least cost for evaluation is assumed. Hence, the total
estimate of the future costs can be (far) lower than the
actual costs. Then, the
A
3
search could p ossibly degen-
erate to an (almost) exhaustive search which leads to
unacceptable optimization times. In order to straighten
the optimization, the proposed
A
3
search strategy is en-
hanced by the subsequently described
ballooning com-
ponent
.
As explained before, knowledge sources retrieve an
item
I
from their associated region and generate an
ordered sequence of items
I
1
,
:::
,
I
j
which are emit-
ted into the successor region. It is one of the major
objectives in the design and subsequent calibration|
cf. Section 2.3, below|of a knowledge source to en-
sure that the most promising alternatives are generated
rst. Such-like sophisticated knowledge sources entail
the incorporation of the ballo oning control component
to expedite the optimization process. The basic idea of
the ballo oning control is to
periodically
and
temporarily
\switch o " the
A
3
control and to process the rst few
alternatives generated by the knowledge sources with-
out any cost control. Thereby, some \balloons" will
\rise" through success ive regions|possibly all the way
up to the top-most region where items constitute com-
plete
QEPs
.
When switching back to
A
3
search only the balloons
at the top of the derivation chains are further consid-
ered; intermediate steps generated during ballooning
are discarded|thereby reducing the resulting search
space and \straightening" the optimization. Since the
blackboard approach allows to assess the sequence of
the items generated by a knowledge source with respect
to its quality for the global optimization, it is expected
that the integration of the ballo oning component into
the
A
3
search does not substantially degrade the qual-
ity of the optimization. Ballooning will only pro cess
Q
z}| {
111k
Q
2
k
Q
1
KS
0
KS
1
KS
2
KS
3
KS
4
KS
5
111
I
2
m
j 1 1 1 j
I
2
1
k
I
1
n
j 1 1 1 j
I
1
2
j
I
1
1
#
I
1
i
1
j
#
I
1
i
2
j 1 1 1 j
#
I
1
i
n
k
#
I
2
i
1
111
Backpropagation
6 6 6 6 6 6
6
111
6
1 1 1
6
111
6
6
111
6
111
- -
quantitative
analysis
h
KS
5
KS
4
KS
3
KS
2
KS
1
KS
0
7 3
2
0 1 7
i
P
P
-
6
Proles of KS quality
\Top-Rank"
Figure 2: Evaluation and Calibration by Backpropagation
highly-promising items very eciently|without back-
tracking. Further, a reconciliation of the time allo cated
for optimization and the quality of the solution|recall
Desideratum 4. of the Introduction|can be achieved
by increasing or decreasing the share of ballooning.
A simplied version of the search algorithm used in
the GOM Blackboard Optimizer is given in Section 4.4.
2.3 Backpropagation
The structuring of our optimizer blackboard imposed
by the knowledge sources operating on successive re-
gions enables the thorough quantitative evaluation and
subsequent calibration of the quality of the knowledge
sources. This is achieved by backpropagating the out-
come of an extensive set of b enchmark queries. The
principle of backpropagating is depicted in Figure 2.
Let
Q
=
f
Q
1
; Q
2
; : : :
g
be a large set of representa-
tive queries|which are either extracted from user sup-
plied queries or are generated by a query generator. For
these queries let the optimizer generate
all
possible al-
ternative query evaluation plans, i.e., for this purpose
all items are expanded at regions
R
0
;:::;R
r
0
2
. It is,
however, essential that the optimizer obeys the control
imposed by the pure
A
3
search|except that the search
continues even after the optimum has b een generated.
For a query
Q
j
a sequence
I
j
n
;
111
; I
j
2
; I
j
1
of alternative
items specifying a complete
QEP
at region
R
r
0
1
|the
right-most item being generated rst and the left-most
last|is obtained. Note that the alternatives are already
sorted by their cost. More sp ecically,
#
I
j
i
1
is the cheap-
est
QEP
identier and
I
j
i
n
is the most expensive one for
a query
Q
j
.
This ordered sequence of plan identiers is propa-
gated back to the blackboard optimizer in order to eval-
uate the individual knowledge sources' quality. The
quality of a knowledge source is measured in terms of
the relative position at which an alternative was gen-
erated in comparison to the position of this alternative
in the
QEP
sequence ordered by their running times.
By evaluating a representative number of queries, a so-
called \Top-Rank" prole can be derived. In Figure 2,
e.g., the backpropagation of
Q
1
increases the third col-
umn of the Top-Rank prole of
KS
3
since the identier
#
I
1
i
1
of the top rank
QEP
states that the appropriate
QEP
was generated as the third alternative by
KS
3
.
In Figure 2, the Top-Rank prole of knowledge
source
KS
3
indicates that almost all top rank
QEPs
emerge from the rst three alternatives of this know-
ledge source. Actually, in practice we are usually more
interested in the so-called \Top-
" proles in which
all those query evaluation plans with running time
within
% of the actual optimum are considered semi-
optimal|where
may be some application domain-
specic threshold value.
Quantitative analysis of the proles facilitates pre-
dicting the average quality of the optimization|as envi-
sioned in Desideratum 3. stated in the Introduction. Let
BAP
(
KS
i
; n
i
) denote the probability that the rst
n
i
alternatives emitted by knowledge source
KS
i
include
the optimal one|under the condition that
KS
i
starts
with the alternative from knowledge source
KS
i
0
1
which ultimately leads to the optimum. This func-
tion can easily be computed from the \Top-Rank" pro-
le. Furthermore, let
b
KS
i
denote a (limiting) branch-
ing factor of knowledge source
KS
i
, i.e., the maximal
number of alternatives that knowledge source
KS
i
is
allowed to generate. Then, the following calculation
Q
i
2f
0
;:::;r
0
2
g
BAP
(
KS
i
; b
KS
i
) derives the probability
that the optimal
QEP
is among the
Q
i
2f
0
;:::;r
0
2
g
b
KS
i
alternatives that emerge at the top-most region
R
r
0
1
.
Further, a more qualitative analysis of the proles
facilitates tuning the individual knowledge sources|as
demanded in Desideratum 2. To give an idea of how
the optimizer can be improved, the three following \hy-
pothetical" proles are depicted:
-
6
(a)
-
6
(c)
-
6
(b)
An ideal prole is Prole (a)|no improvement can be
made. The worst one can think of is Prole (b). It looks
like the prole of a \no-knowledge knowledge source".
Usually, a prole like (c) is worth striving for. It dis-
plays that the knowledge source has only to generate
few alternatives in order to carry the creation of the
optimal (Top-Rank) or a semi-optimal
QEP
(Top-
).
Ultimately, we envision that the proles can b e used
by the optimizer for self-tuning|Desideratum 2|since
the analysis of the proles as well as the generation of
the hints may be carried out automatically.
2.4 Generalized Optimizer Blackboard
In the discussion of the hypothetical knowledge source
proles we already observed that it might be useful to
classify queries within the regions. This allows to pro-
cess them more sp ecically by particular highly cus-
tomized knowledge sources. The classication of queries
depends on the region. As an example, consider clas-
sication of recursive vs. non-recursive queries which is
important to know for applying the right algorithm to
compute join orderings.
In the pure architecture a knowledge source reads
items from region
R
i
and emits the outcome into the
next higher region
R
i
+1
. We extend this concept such
that an item leaving a special region
R
i
o
is allowed to
re-enter the blackboard at a lower level
R
i
e
(
i
e
i
o
).
Thus, items can iterate over the regions
R
i
e
to
R
i
o
. An
item will leave that iteration if it comes back to
R
i
o
without b eing modied.
3 Running Example
In this section, an example object base|called
Com-
pany
|is presented. In Figure 3, ten objects belonging
Emp
Dept
Manager
id
1
name: \Sander"
worksIn:
id
5
salary: 90000
sex: `F'
id
2
name: \Versace"
worksIn:
id
5
salary: 100000
sex: `M'
id
3
name: \Hinault"
worksIn:
id
6
salary: 260000
sex: `M'
id
4
name: \LeMond"
worksIn:
id
6
salary: 100000
sex: `M'
id
5
name: \Clothes"
mgr:
id
8
id
6
name: \Bicycles"
mgr:
id
9
id
7
name: \Shoes"
mgr:
id
10
id
8
name: \Boss"
worksIn:
id
5
salary: 150000
sex: `M'
backUp:
id
2
id
9
name: \Chief"
worksIn:
id
6
salary: 280000
sex: `M'
backUp:
id
3
id
10
name: \Master"
worksIn:
id
7
salary: 900000
sex: `M'
backUp:
NU LL
Figure 3: Example Extension of
Company
to types
Emp
,
Dept
, and
Manager
are shown. The
type denitions are omitted|for the further discus-
sion it is only of importance that each object of type
Emp
has the attributes
name
:
String
,
worksIn
:
Dept
,
salary
:
Float
, and
sex
:
Char
, and each object of type
Dept
the attributes
name
:
String
and
mgr
:
Manager
.
Since
Manager
is a subtype of
Emp
it contains all the
attributes of
Emp
and, furthermore, it has one attribute
backUp
:
Emp
additionally. Further, a type-associated
function
skill
computing a ranking number for individ-
ual
Emp
loyees is assumed.
The lab els
id
i
for
i
2 f
1
;
2
;
3
; : : :
g
denote the system-
wide unique object identiers (
OIDs
). References via
attributes are maintained uni-directionally in GOM|
as in almost all other ob ject models. For example,
in the extension of
Company
there is a reference from
Emp
loyee
id
1
to
Dept id
5
via the
worksIn
attribute.
The Example Query
For the ob ject model GOM, a
QUEL-like query language called GOMql [13] was de-
veloped. As an example query, we want to know when-
ever there is a
Manager
|usually called \MCP"|who
pays a female less than a male
Emp
loyee (in one of his
Dept
s) even though the female is better qualied. We
want to retrieve the manager and as evidence the fe-
male, the male, and the dierence of their salaries. In
GOMql the query can be formulated as follows:
range
u : Emp, o : Emp
retrieve
[mcp : u.worksIn.mgr, underPaid : u,
overPaid : o, dierence : o.salary - u.salary]
where
u.worksIn.mgr = o.worksIn.mgr
and
u.skill
>
o.skill
and
u.salary
<
o.salary
and
u.sex = `F'
and
o.sex = `M'
There are three clauses. The
range
-clause introduces
the needed variables and binds them to nite ranges|
here, the extensions of the types. The
retrieve
-clause
species the nal pro jection of the query, and the
where
-clause contains the selection predicate. Un-
der the assumption that \
Sander
" has higher
skill
than \
Versace
", the relation
f
[
mcp
:
id
8
;
underPaid
:
id
1
;
overPaid
:
id
2
;
dierence
: 10000]
g
is the outcome
of the query with respect to the ob ject base
Company
.
At this point, we would like to stress that even though
we have chosen GOM and GOMql as the example data
model and query language, respectively, the results ob-
viously apply to other object-oriented data models and
query languages as well.
The Index Structures
The GOM query evaluation
is supported by two very general index structures tai-
lored for object-oriented data models:
Access Support Relations
(
ASRs
) [12] are used to
materialize (frequently) traversed reference chains,
and
Generalized Materialization Relations
(
GMRs
) [11]
maintain pre-computed function results.
Since these two index structures have to be taken into
account in the optimization process, two index relations
based on the schema
Company
are exemplied:
[[
Emp:work sI n:mgr
]]
#0 :
OI D
Emp
#1 :
OI D
Dept
#2 :
OI D
Manag er
id
1
id
5
id
8
id
2
id
5
id
8
: : : : : : : : :
id
10
id
7
id
10
hh
Emp:skil l
ii
#0 :
OI D
Emp
#1 :
int
id
1
10
id
2
4
::: :::
id
10
10
The extension of the
ASR
[[
Emp
:
worksIn
:
mgr
]] which
contains all paths corresponding to the indicated path
expression, and of the
GMR
hh
Emp
:
skill
ii
which main-
tains the pre-computed
skill
function for each
Emp
loyee
are depicted. Note that the columns of these index re-
lations are sequentially numbered, i.e., #0, #1,
:::
4 GOM Blackboard Optimizer
4.1 The Algebra
The
query evaluation plans
(
QEPs
) are
directed acyc-
lic graphs
(
DAGs
) consisting of algebraic operator ap-
plications.
Building blocks
standing for sets of
OIDs
of a type
T
(denoted by
oid
(
T
)),
ASRs
(denoted by
[[
:::
]]), and
GMRs
(denoted by
hh
:::
ii
) are the leaves
of the
DAGs
. The treatment of indexes|like
ASRs
and
GMRs
|as additional sources of information is al-
ready present in the notion of shadow tables as intro-
duced in [23]. In accordance with the building block
approach [18], the
DAGs
are succes sively composed
bottom-up|operations are added to the
DAG
and com-
mon subexpressions are factorized. In order to compute
a (near-)optimal
DAG
the optimizer has to determine
an optimal set of building blo cks and an optimal order
of the algebraic operations.
Our algebra mainly cop es with relations. In order to
refer to single columns of relations, we use so-called
in-
formation units
(
IUs
). We do not call them attributes,
since we want to avoid any conict with the attributes
at the GOM object type level. Each
IU
is unique
throughout the entire optimization process, i.e., over
all alternatives which would be generated, and so an
unambiguous dereferencing mechanism is obtained for
the algebraic operations and the cost functions.
Besides the usual set operations (
[
,
n
), the algebra
consists of the common relational
selection
,
projection
,
join
1
, and
renaming
%
. Further, a mapping operator
(
)|called
expansion
|belongs to the algebra. Let
T
be a type,
v
,
v
1
,
v
0
1
,
:::
,
v
n
,
v
0
n
be
IUs
,
a
1
,
:::
,
a
n
be
attributes,
2 f
=
; <; >; : : :
g
be a comparison op erator,
and
c
be a constant. Then, the building blocks and the
algebraic operators are informally dened as follows:
building blocks
: The extension of
T
oid
(
T
), an
ASR
[[
:::
]], and a
GMR
hh
:::
ii
are building blocks. The
columns of the relations retrieved by them are de-
noted by
self
and #0
; : : : ;
#
n
, resp ec tively. We
assume indices on the rst and last column of an
ASR
and on each column of a
GMR
.
expansions
: An expansion
v
1
:
v:a
1
;:::;v
n
:
v:a
n
deref-
erences sets of
OIDs
denoted by
IU
v
such that the
attribute values can be obtained and be assigned
to new
IUs
v
1
,
:::
,
v
n
, respectively. The input
relation is expanded by new columns denoted
v
1
,
:::
,
v
n
. Further, the
operator may also expand
the tuples by function invocations|instead of at-
tribute accesses. The parameters of functions are
enclosed in parentheses following its name.
usual relational operations
:
1
v
1
v
2
denotes a join,
v
1
c
and
v
1
v
2
selections,
v
1
;:::;v
n
a pro jection
on the
IUs
in the subscript, and
%
v
0
1
=
v
1
;:::;v
0
n
=
v
n
a
renaming operation where the column named
v
i
is
renamed to
v
0
i
(
i
= 1
;:::;n
).
Relying heavily on ordinary relational operators allows
us to exploit relational optimization techniques [16, 14].
4.2 The Normal Forms
In object-oriented query processing it is common to
translate the query into an internal representation as
close to the original query as possible|witness, e.g.,
[1, 4, 13, 14]. This is also valid for relational query pro-
cessing where, e.g., an SQL query is translated into a
1
-expression. However, this representation exhibits
another property which the initial internal representa-
tion of object-oriented queries very often lacks: It is an
(expensive) well-structured term facilitating a straight-
forward splitting into building blocks and operations.
Our prop osed starting point|called
Most Costly
Normal Form
(
MCNF
) [14]|has one additional
-
expansion directly following the
1
resulting in a

1
sequence. All the extensions whose instances are needed
for the query evaluation are joined with
true
as join
predicate.
-expansions follow enhancing each tu-
ple of the resulting relation by further information
needed to evaluate the selection predicate solely on
the basis of this result. Thus, two vital concepts of
object-orientation|access via
OIDs
(implicit derefer-
enciation) and function invocation|are integrated into
the
MCNF
, and are prepared for their optimization.
Then, the selections accompanied by the nal projec-
tion onto the required
IUs
are appended.
The
MCNF
representation of the example query
\MCP" is shown below:
mcp
:
um
;
underPaid
:
u;
overPaid
:
o;
dierence
:
osa
0
usa
(
osx
=`M'
(
usx
=`F'
(
usa
<
osa
(
usk
>
osk
(
um
=
om
(
um
:
ud
:
mgr
(
om
:
od
:
mgr
(
ud
:
u:
worksIn
;
usa
:
u:
salary
;
usx
:
u:
sex
(
od
:
o:
worksIn
;
osa
:
o:
salary
;
osx
:
o:
sex
(
usk
:
u:
skill
(
osk
:
o:
skill
(
%
u
=
self
(
oid
(
Emp
))
1
true
%
o
=
self
(
oid
(
Emp
))
:::
)
The
MCNF
is further enhanced [15] in order to ob-
tain a convenient basis for composing the query evalu-
ation plans. A table combining the building blocks and
the op erations with catalog information is derived such
that it contains
all
information relevant for optimizing
the query. Thus, we can, e.g., eciently retrieve the
building blocks and the operations in which a given
IU
is involved. This elab orated normal form is obtained by
decomposing the
MCNF
term into its building blocks
and operations. Each piece is then enriched by statisti-
cal data being relevant to the query. For example, the
cardinalities of the building blocks and the selectivities
of the operations are attached. The fact which columns
of a building block are supported by an index is impor-
tant for an exact cost estimate. Hence, this information
is also maintained.
4.3 Regions and Knowledge Sources
The blackboard of our GOM Blackboard Optimizer is
subdivided into seven
regions
|each one completing the
QEP
in a particular way:
R
0
(
MCNF
),
R
1
(Decompo-
sition),
R
2
(Anchor Sets),
R
3
(Introduce
),
R
4
(In-
troduce
),
R
5
(Introduce
1
), and
R
6
(Introduce
).
Each region supplies
items
, each of which p ossess es an
entry
currentDAGs
and an entry
futureWork
where the
DAGs
composed so far and the remaining operations,
respectively, are stored.
The
knowledge sources
of type
KS
i
read items at re-
gion
R
i
and write items at region
R
i
+1
. What follows
is an informal description of the knowledge sources at
each region. We assume that the query is represented
in
MCNF
format at region
R
0
.
KS
0
(to \Decomposition"): The
MCNF
term is de-
composed into building blocks and operations. The
additional information is obtained from the schema
manager which also manages the statistical data.
Additionally, the
ASRs
and
GMRs
which can be
integrated into the query are determined. There
exists only one knowledge source of this type and
it does not produce any alternatives.
KS
1
(to \Anchor Sets"): A knowledge source of this
type determines which building blocks are chosen
for evaluating the query. We call such a minimal
(i.e., non-redundant) set of building blocks contain-
ing enough information for answering the query an
anchor set
.
KS
1
generates several anchor sets and
sorts them according to special heuristics, e.g., con-
sidering the number of joins or the number of op-
erations left in the
futureWork
entry.
KS
2
(to \Introduce
"): Expansions are added to the
currentDAGs
entry. In the current implemen-
tation, the following heuristics is applied: An
expansion|or a pair of expansions|is integrated
into the
DAGs
if (and only if ) a selection or a join
directly depends on it, or the
futureWork
entry of
the item only contains expansions and pro jections.
KS
3
(to \Introduce
"): According to the heuristics
\introduce selections as early as possible", selec-
tions are integrated into the query whenever it is
possible.
KS
4
(to \Introduce
1
"): At each iteration the know-
ledge source of type
KS
4
introduces at most one
join. As a consequence, for each item a join or-
der is obtained by repeated iterations. Alternatives
might have dierent join orderings.
KS
5
(to \Introduce
"): Finally, projections are added
to the
DAG
. We rule out the following two se-
quences:
1
and

, since a
1
and a

sequence can b e replaced by only one single physi-
cal op eration.
The blackboard is re-entered from region
R
5
to
R
2
until
all expansions, selections, and joins are processed, that
is, the
futureWork
entry is empty except for a single
projection.
In order to avoid evaluating equal expressions twice,
items leaving regions
R
1
,
R
2
,
R
3
,
R
4
, and
R
5
are
fac-
torized
. For example, if
KS
1
selects
%
u
=
self
(
oid
(
Emp
))
and
%
o
=
self
(
oid
(
Emp
)) as elements of an anchor set,
they will be factorized as follows:
oid
(
Emp
)
%
u
=
self
%
o
=
self
@I0
The full set of factorization rules applied can be found
in [15]. As a result, the optimizer generates a
DAG
which is a \logical" query evaluation plan.
4.4 Search Algorithm
The search strategy in the GOM Blackboard Optimizer
consists of two parts. On the one hand,
A
3
search
ad-
vances the alternative with the minimal sum of history
(
cost
h
) and future costs (
cost
f
), and on the other hand,
ballooning
proceeds the alternative(s) emitted rst by
a knowledge source. The actual search strategy com-
bines these two techniques by allowing a certain ratio
of optimization steps to be done under
A
3
search and
under the ballooning control, respectively. The search
strategy is outlined as follows:
1. Insert the starting state (item)
"
into the list
OPEN of unexpanded states.
2. Sort the elements
I
of
OP E N
by increasing
f
(
I
) :=
cost
h
(
I
) +
cost
f
(
I
) values.
3. If the ballooning ag is raised, do
(a) remove the rst
b
initial
elements from OPEN
and insert them into the set
B
(b) perform the following steps
b
iterations
times
i. expand each
I
2 B
by its appropriate
knowledge source to
I
1
,
:::
,
I
j
for
j
b
branch
ii. remove
I
from
B
and insert the item into
CLOSED
iii. insert
I
1
,
:::
,
I
j
into
B
(c) transfer the items in
B
to OPEN, and go to
Step 2.
4. Remove the left-most item
I
from OPEN|i.e.,
the item for which
f
(
I
) :=
cost
h
(
I
) +
cost
f
(
I
) is
minimum (ties broken arbitrarily)| and place it
on CLOSED.
5. If
I
is a goal state, i.e.,
I:F W
=
;
, exit successfully
with the solution
I
.
6. Let the appropriate knowledge source expand state
I
, generating all its succes sors.
7. For every successor
I
0
of
I
:
(a) insert
I
0
into OPEN unless
(b) there exists
I
00
2
OP E N
[
CLO S ED
with
I
0
:F W
=
I
00
:F W
then
i. if
cost
h
(
I
0
)
< cost
h
(
I
00
), then insert
I
0
into OPEN and transfer
I
00
to PRUNED
ii. else, if
cost
h
(
I
0
)
cost
h
(
I
00
), then insert
I
0
into PRUNED
8. Go to Step 2.
The
A
3
search algorithm is a
best rst algorithm
[20].
It starts with inserting
"
, the initial state, into
OPEN
.
OPEN
contains all states which have been reached but
have not been fully expanded, i.e., it contains all items
waiting for their further processing. In each iteration,
A
3
search continues with the item of
OPEN
which has
the least
f
-value
, i.e., the minimal sum of
cost
h
and
cost
f
. That item is expanded, i.e., its successors are
put into
OPEN
, and then it is promoted to
CLOSED
,
the set of all fully expanded states. The algorithm
will successfully terminate as soon as an item is gen-
erated whose future work|denoted by
FW
|is empty
and whose costs are minimal.
In Step 3. the control is temporarily switched from
A
3
search to ballooning. Ballooning might, for exam-
ple, be triggered after a certain number of iterations
in the
A
3
search have b een p erformed. Then, the rst
b
initial
items of
OPEN
are expanded
b
iterations
times,
i.e., the items are expanded to lists, the rst, at most
b
branch
|which should be one in most cases|elements
of each list are then expanded, and so on. The numbers
b
initial
,
b
iterations
, and
b
branch
can be set depending on
the analysis of the entire query and the current state
of the optimizing process. For example, the optimiz-
ing process of a query containing many
-expansions
and selections may be expedited by low
b
initial
, high
b
iterations
, and low
b
branch
parameters, since generating
many alternatives is unnecessary for integrating these
operations. Thus, by ballooning fast optimizing can be
switched on whenever it seems acceptable.
For the
pruning conditions
in Step (7b), a special
case of the optimality criterion [20] is presupposed:
If there are two items
I
1
and
I
2
with equal future
work entries both containing an operation
op
and, fur-
ther,
cost
h
(
I
1
)
<
cost
h
(
I
2
) holds, then integrating
op
into the history work entry of
I
1
and
I
2
will keep the
cost order between the two items invariant. There-
fore, all items (states) which produce higher costs than
an item with the same future work are pruned by
the pruning condition (7b) and transferred to a set
PRUNED
since, due to the optimality criterion, they
cannot possibly yield a better item. Thus, the suc-
cessor item
I
0
will cause the pruning of some items
I
00
2
OPEN
[
CLOSED
, if it is less \expensive", and it
will b e pruned itself by an item
I
00
2
OPEN
[
CLOSED
,
if it is more \expensive".
The pruning conditions can be strengthened, if some
further properties are ensure d by the cost functions [15].
5 Cost Model
From specic data extracted from the object base, the
costs for scanning the building blocks and evaluating
the operations are estimated.
For the calculation of the
history costs
as well as the
future costs
, two parameters are assigned to each
DAG
node: the cardinality #
o
of the output relation, and the
numbers #
e
= (
e
v
1
;:::;e
v
n
) of distinct values belong-
ing to the
IUs
v
1
,
:::
,
v
n
of the output relation|called
e
-values
. Their calculation from so-called basic num-
bers is explained below. The number of page faults #
p
and the
CPU
costs #
c
|additionally to #
o
and #
e
as-
signed to each
DAG
node|are derived from #
o
, #
e
,
and the basic numbers. For estimating #
p
, the well-
known formula of Yao [27] is used.
The estimate for #
c
is based on system-dep ende nt
functions which estimate the
CPU
costs for the building
blocks and the appropriate operations with #
o
and #
e
as input.
Thus, the calculation of the history costs is fairly
straight-forward. The future cost estimate of an op-
eration is demanded to be a lower bound of the actual
costs. For that, we derive a lower bound of the size and
the
e
-values of the input relations (see below). Then,
we can calculate the future costs in basically the same
way as the history costs.
Assigning a quadruple
= (#
p;
#
c;
#
o;
#
e
) to each
DAG
node, the costs of a
DAG
are computed by sum-
ming up the costs of its nodes. Then, we compute the
history cost of an item by adding up the costs of the
DAGs
in the
currentDAGs
entry of the item and the
future costs by adding up the costs of the operations in
the
futureWork
entry.
The data used for the cost calculations is stored as
basic numbers
in three levels: \
Values from the Object
Base
", \
Single Selectivities
", and \
Combined Selectiv-
ities
".
For every object type
T
, the cardinality
c
T
of its ex-
tension and the values
p
oid
T
and
p
object
T
|which denote
the number of pages occupied by the extension, i.e.,
the set of
OIDs
, and by the objects, respectively|are
available as values from the object base. Let
a
be an
attribute of an object type
T
. If
a
refers to an object
type,
def
T;a
denotes the probability that the attribute
is dened (
6
=
NULL
). For each attribute
a
of type
T
,
the parameter
c
T;a
denotes the size of its range. For
each method
m
, the size of its range
c
T;m
and its aver-
age execution time
1
exec
T;m
(
n
)|for executing
n
times
the method
m
on
OIDs
of type
T
|is maintained. The
cardinality of an
ASR
[[
:::
]] and a
GMR
hh
:::
ii
|which
is denoted
c
[[
:::
]]
and
c
hh
:::
ii
, respectively|and the num-
ber of pages they o ccupy|denoted
p
[[
:::
]]
and
p
hh
:::
ii
|are
also available as values from the object base.
The selectivity
s
for a unary operation
op
1
(
R
) is de-
ned as
s
(
op
1
(
R
)) =
j
op
1
(
R
)
j
=
j
R
j
, and for a binary op-
eration
op
2
as
s
(
op
2
(
R
1
; R
2
)) =
j
op
2
(
R
1
; R
2
)
j
=
(
j
R
1
j 3
j
R
2
j
). These single selectivities can b e estimated in
three dierent ways with increasing accuracy:
1. As in [24], the selectivities might be derived
from simple estimates. Thus, if the basic num-
bers
c
Emp
;
skill
= 10,
c
Emp
;
salary
= 10
:
000, and
c
Manager
= 150 are given, the selectivity for
usk
>
osk
,
usa
<
osa
, and
um
=
om
will be (1
0
(1
=c
Emp
;
skill
))
=
2 = 0
:
45, (1
0
(1
=c
Emp
;
salary
))
=
2
0
:
5, and 1
=c
Manager
0
:
007, respe ctively.
2. The selectivities can also be determined by his-
tograms [21]. For that, histograms are generated
by sampling the object base. The selectivities for
1
We know that this is only a rough estimate. Future versions
of the cost mo del will rene this.
osx
=
`F'
and
usx
=
`M'
can be determined in this
way.
3. During the evaluation of a query, one can gain
more accurate selectivity estimates for use in fu-
ture query optimization by monitoring.
Since, in the current implementation, the indepen-
dence of attribute values is presupposed, combined sel-
ectivities are the product of their single selectivities. In
the future, this will be rened.
Knowing the selectivity
s
of an operation, we are able
to derive the output size #
o
of that operation by mul-
tiplying
s
with the cardinality of the input relation(s).
The output size of a building block, i.e., type exten-
sions,
ASRs
, and
GMRs
, is given by the basic numbers.
Thus, the cardinalities of the (intermediate) relations of
a
DAG
are calculated bottom-up.
Since not the total number, but the number of
dis-
tinct OIDs
is essential for cost estimates considering
-expansions and retrieving building blocks with an in-
dex, an
e
-value
e
v
dened by
j
v
(
R
)
j
is assigned to each
IU
v
in a relation
R
. The bottom-up calculation of the
e
-values is p erformed as follows: The initialization is
done by the basic numbers of the building blocks. The
further calculation is mainly based on a formula also
used for generating join orderings [8]. For example, let
an expansion
v
0
:
v:a
be applied on a relation
R
where
the
e
-values are known. Let
c
T
v
;a
be the cardinality
of the range of the attribute/type-associated function
a
and
e
v
be equal to
j
v
(
R
)
j
. Then, the following formula
determines the number
e
0
v
of values b eing referenced:
e
v
0
=
c
T
v
;a
3
(1
0
(1
0
1
=c
T
v
;a
)
e
v
)
Since the
e
-values decrease with each operation ap-
plication, we can determine a non-trivial lower bound
on all
e
-values. Let
R
be the relation obtained by evalu-
ating the
DAG
of the
MCNF
where the last projection
is cut o. Then,
j
v
(
R
)
j
gives a lower bound on all
e
-values of the
IU
v
in all (possibly unnished)
DAGs
representing the query. Using the formulas for history
costs and applying these to the operations in the
future-
Work
entry of an item, we arrive at a lower bound on
the future costs.
6 Sample Optimization
Performing the optimization process for the running ex-
ample, some decisions individually made at each region,
factorization, and pruning will be demonstrated.
The normal forms were already explained in Sec-
tion 4.2. Thus, the sample optimization starts at gener-
ating anchor sets. Each non-redundant set which binds
the
IUs
u
and
o
is a potential anchor set for our exam-
ple. The values for the other
IUs
can be retrieved by
-expansions. Because of symmetry of
u
and
o
, we only
give the sets resulting in bindings for
u
:
A
1
=
f
%
u
=
self
(
oid
(
Emp
))
g
A
2
=
f
%
u
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
g
A
3
=
f
%
u
=#0
;ud
=#1
;um
=#2
([[
Emp:wor k sI n:mgr
]])
g
A
4
=
f
%
u
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
;
%
u
0
=#0
;ud
=#1
;um
=#2
([[
Emp:work sI n:mgr
]])
g
A
5
=
f
%
u
0
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
;
%
u
=#0
;ud
=#1
;um
=#2
([[
Emp:wor k sI n:mgr
]])
g
Due to the corresponding sets for
o
, the appropriate
knowledge source generates at most 5
3
5 = 25 alter-
native anchor sets. Because of the cost functions, the
GOM Blackboard Optimizer favors the following anchor
set
A
2
;
2
originated from
A
2
:
A
2
;
2
=
f
%
u
=#0
;usk
=#1
(
hh
Emp:skil l
ii
)
;
%
o
=#0
;osk
=#1
(
hh
Emp:skil l
ii
)
g
Though
A
3
search might backtrack to one of the alter-
native anchor sets the example optimization is limited
to
A
2
;
2
. Factorizing this anchor set results in the fol-
lowing
currentDAGs
entry:
hh
Emp:skil l
ii
%
u
=#0
;usk
=#1
%
o
=#0
;osk
=#1
:
X
Xy
Now, we want to sketch the search space originating
in the item
I
0
containing the
DAG
above. In order
to simplify the following consideration, the future work
for that item is reduced to the operations
usx
:
u:
sex
,
osx
:
o:
sex
,
usx
=`F'
, and
osx
=`M'
. The GOM Black-
board Optimizer doesn't usually open the whole search
space as it is depicted in Figure 4. There, the possi-
ble paths leading from
I
0
to an item
I
1
containing the
future work of
I
0
in its
currentDAGs
entry are illus-
trated. If pure
A
3
search is applied and the evaluation
costs of the operations dier hardly, all six paths from
I
0
to
I
1
are examined. Although some of the six alter-
natives are pruned every time edges come together, a
further reduction of the expense can be achieved. Since
for integrating expansions and selections, the knowledge
sources deliver a good sequence of the items, the trig-
ger condition of the ballooning component can be set
to true and the branching factor
b
branch
to one. Then,
only one alternative is produced.
The other expansions by
worksIn
and
salary
are also
integrated. Since we assume that an attribute access of
an object already resident in the buer is free of cost,
the expansions dereferencing
u
and
o
, respectively, are
r
r
r
r
r
r
r
r
r
1
P
P
P
P
Pq
1
P
P
P
P
Pq
1
P
P
P
P
Pq
P
P
P
P
Pq
1
P
P
P
P
Pq
1
P
P
P
P
Pq
1
I
0
I
1
usx
:
u:sex
osx
:
o:sex
usx
=
`F'
osx
=
`M'
osx
:
o:sex
usx
:
u:sex
osx
:
o:sex
osx
=
`M'
usx
=
`F'
usx
:
u:sex
osx
=
`M'
usx
=
`F'
Figure 4: Example Search Space from
I
0
to
I
1
put together. Further, the two expansions are factorized
as the lower part of the DAG in Figure 5 shows.
Two expansions, three joins, and one projection are
left in the
futureWork
entry. The joins
usa
<
osa
,
usk
>
osk
, or
um
=
om
can be added to the actual
cur-
rentDAGs
entry
2
. Thus, the state expansion|Step 6
of the search strategy (cf. Section 4.4)|leads to three
items
I
0
1
,
I
00
1
, and
I
000
1
.
The history costs of the three items
I
0
1
,
I
00
1
, and
I
000
1
dif-
fer hardly. In contrast to that, the future cost estimates
dier substantially, since the selectivities and, therefore,
the estimates of the cardinalities are very dierent. As
pointed out in Section 5, the selectivity estimate of the
operation
um
=
om
is far less than the other two selec-
tivities. Thus, the future costs and consequently the
f
-value of the item where that operation is integrated
into its
CurrentDAGs
entry is lowest. Hence, this item
is further processed and the two remaining joins are
added to its
CurrentDAGs
entry as selections.
The nal projection completes the
DAG
. Further-
more, projections which reduce the size of the inter-
mediate relations are integrated into the
DAG
.
The resulting
DAG
is given in Figure 5. Further op-
timizations will map the operations to physical oper-
ations. Since every

and every
1
sequence
entails only one physical operation, the resulting
DAG
is divided by dashed horizontal lines.
7 Conclusion
A novel architecture for query optimization based on a
blackboard which is organized in successive regions has
been devised. At every region knowledge sources are ac-
tivated consecutively completing alternative query eval-
uation plans. Starting from basic building blocks a -
nite set of algebraic operations is added such that a
DAG
nally results in a (logical) query evaluation plan.
2
Actually, in order to introduce
um
=
om
the expansions
ud
and
od
have to be added before. This detail is omitted, since
the comparison of the items obtained after incorporating the joins
gives an idea about the importance of the future cost estimates.
hh
Emp:skill
ii
%
u
=#0
;usk
=#1
;ud
=
id;usa
=
isa;usx
=
isx
%
o
=#0
;osk
=#1
;od
=
id;osa
=
isa;osx
=
isx
id
:#0
:worksI n;isa
:#0
:salary;isx
:#0
:sex
usx
=
`F'
osx
=
`M'
1
um
=
om
usk;usa;um;u;osk ;osa;om;o
mcp
:
um;underP aid
:
u;overP aid
:
o;diff er ence
:
osa
0
usa
usk>osk
usa<osa
ud;usk;usa;u
um
:
ud:mgr
od;osk;osa;o
om
:
od:mgr
1
1
X
X
Figure 5: Resulting DAG of the Sample Optimization
Due to this well-structured approach, the optimizer
can continually b e improved. By backpropagating the
optimized queries, each knowledge source can be cali-
brated and assessed. Thus, the weak points of the op-
timizer can be determined and eliminated. An evolu-
tionary improvement takes place.
As a search strategy,
A
3
search enriched by balloon-
ing has been proposed. By subdividing the costs for
each alternative into history and future costs,
A
3
search
is able to compare the possibly unnished plans with
each other. However, even in states where the way
of building ecient plans is obvious, pure
A
3
search
might generate a large number of alternatives. To al-
leviate this, ballooning was designed to accelerate the
optimizer without degrading its quality.
The viability of our approach was shown by the GOM
Blackboard Optimizer. Based on an object-oriented al-
gebra, a blackboard optimizer was sp ecied. It was
shown how a blackboard, its regions, and its knowledge
sources could be designed. The search algorithm was
explained and the basics of a cost model were described.
For illustration purpose a sample optimization was
demonstrated.
Acknowledgement
This work was supported by
the German Research Council DFG under contracts
Ke 401/6-1 and SFB 346/A1.
We thank the participants of the Dagstuhl seminar on
query processing organized by J. C. Freytag, D. Maier,
and G. Vossen, and the attendees of a talk one of the
authors gave on invitation by U. Dayal for fruitful dis-
cussions. We also gratefully acknowledge our students
K. Hauten, A. Ro emer, S. Voss, and R. Waurig who
have implemented the rst prototype.
References
[1] J. Banerjee, W. Kim, and K. C. Kim. Queries in ob ject-
oriented databases. In
Proc. IEEE Conf. on Data En-
gineering
, pages 31{38, L.A., USA, Feb 1988.
[2] D. S. Batory. Extensible cost models and query op-
timization in GENESIS.
IEEE Database Engineering
,
10(4), Nov 1987.
[3] L. Becker and R. H. Guting. Rule-based optimiza-
tion and query processing in an extensible geometric
database system.
ACM Trans. on Database Systems
,
17(2):247{303, Jun 1992.
[4] S. Cluet and C. Delobel. A general framework for the
optimization of object-oriented queries. In
Proc. of the
ACM SIGMOD Conf. on Management of Data
, pages
383{392, San Diego, USA, Jun 1992.
[5] J. C. Freytag. A rule-based view of query optimization.
In
Proc. of the ACM SIGMOD Conf. on Management
of Data
, pages 173{180, San Francisco, USA, 1987.
[6] G. Graefe and D. J. DeWitt. The EXODUS optimizer
generator. In
Proc. of the ACM SIGMOD Conf. on
Management of Data
, pages 160{172, San Francisco,
USA, 1987.
[7] G. Graefe and W. J. McKenna. The Volcano opti-
mizer generator: Extensibility and ecient search. In
Proc. IEEE Conf. on Data Engineering
, pages 209{218,
Wien, Austria, Apr 1993.
[8] T. Ibaraki and T. Kameda. Optimal nesting for com-
puting
N
-relational joins.
ACM Trans. on Database
Systems
, 9(3):482{502, 1984.
[9] Y. E. Ioannidis and Y. C. Kang. Cost wells in random
graphs. Submitted for publication, Jun 1992.
[10] Y. E. Ioannidis and E. Wong. Query optimization by
simulated annealing. In
Proc. of the ACM SIGMOD
Conf. on Management of Data
, pages 9{22, San Fran-
cisco, USA, 1987.
[11] A. Kemper, C. Kilger, and G. Moerkotte. Function
materialization in object bases. In
Proc. of the ACM
SIGMOD Conf. on Management of Data
, pages 258{
268, Denver, USA, May 1991.
[12] A. Kemper and G. Moerkotte. Access supp ort in object
bases. In
Proc. of the ACM SIGMOD Conf. on Man-
agement of Data
, pages 364{374, Atlantic City, USA,
May 1990.
[13] A. Kemper and G. Moerkotte. Advanced query process-
ing in object bases using access support relations. In
Proc. of the Conf. on Very Large Data Bases (VLDB)
,
pages 290{301, Brisbane, Australia, Aug 1990.
[14] A. Kemper and G. Moerkotte. Query optimization in
object bases: Exploiting relational techniques. In J.-
C. Freytag, D. Maier, and G. Vossen, editors,
Query
Optimization in Next-Generation Database Systems
.
Morgan-Kaufmann, 1993. (forthcoming).
[15] A. Kemper, G. Moerkotte, and K. Peithner. A black-
board architekture for query optimization in ob ject
bases. Technical Report #92-31, RWTH Aachen, 1992.
[16] R. Lanzelotte and J.-P. Cheiney. Adapting relational
optimisation technology for deductive and object-
oriented declarative database languages. In
Database
Programming Languges Workshop
, pages 322{336, Naf-
plion, Greece, August 1991.
[17] R. S. G. Lanzelotte and P. Valduriez. Extending the
search strategy in a query optimizer. In
Proc. of the
Conf. on Very Large Data Bases (VLDB)
, pages 363{
373, Barcelona, Spain, Sep 1991.
[18] G. M. Lohman. Grammar-like functional rules for rep-
resenting query optimization alternatives. In
Proc. of
the ACM SIGMOD Conf. on Management of Data
,
pages 18{27, Chicago, USA, 1988.
[19] G. Mitchell, S. B. Zdonik, and U. Dayal. An archi-
tecture for query processing in persistent object stores.
In
Proc. Hawaii Intl. Conference on System Sciences
,
1992.
[20] J. Pearl.
Heuristics
. Addison-Wesley, Reading, Mas-
sachusetts, 1984.
[21] G. Piatetsky-Shapiro and C. Connell. Accurate estima-
tion of the number of tuples satisfying a condition. In
Proc. of the ACM SIGMOD Conf. on Management of
Data
, pages 256{276, Boston, USA, Jun 84.
[22] A. Rosenthal and U. S. Chakravarthy. Anatomy of
a modular multiple query optimizer. In
Proc. of the
Conf. on Very Large Data Bases (VLDB)
, pages 230{
239, L.A., USA, Sep 1988.
[23] A. Rosenthal and D. Reiner. An architecture for query
optimization. In
Proc. of the ACM SIGMOD Conf. on
Management of Data
, pages 246{255, Jun 1982.
[24] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin,
R. A. Lorie, and T. G. Price. Access path selection
in a relational database management system. In
Proc.
of the ACM SIGMOD Conf. on Management of Data
,
pages 23{34, Boston, USA, Jun 1979.
[25] T. K. Sellis. Multiple-query optimization.
ACM Trans.
on Database Systems
, 13(1):23{52, Mar 1988.
[26] D. D. Straube and M. T.
Ozsu. Execution plan gen-
eration for an object-oriented data model. In
Proc. of
the Intl. Conf. on Database Theory (ICDT)
, pages 43{
67, Munich, F.R.G., Dec 1991, LCNS # 470, Springer-
Verlag.
[27] S. B. Yao. Approximating block accesses in database or-
ganizations.
Communications of the ACM
, 20(4):260{
261, Apr 1977.
Conference Paper
The performance of parallel I/O operations is highly depen- dent on various parameters like disk transfer rates, speed of processor (network) interconnections, size of available memory for data buffers and so forth. Tuning of parallel I/O to achieve optimum performance is a very complex task for application programmers. This paper presents a method to perform I/O optimization automatically. The approach used is based on a combination of a blackboard system and an A* algorithm, which allows to achieve (near) optimal performance in reasonable time. The architecture of the blackboard is described in detail and illustrated on an example based on a simple cost model.
Conference Paper
The competitive landscape in the simulation domain is expecting network providers to improve scalability, lower latency and higher performance. In this paper, a backbone network infrastructure, Generic Blackboard (GBB), implemented by ST Electronics (Training & Simulation System) Pte Ltd, described on how a better interconnectivity can be achieved, providing optimum avenues for exchange of information. One of the outstanding abilities is that it allowed diverse networks example HLA, DIS to tie together. Generic Blackboard is similar to agent-based systems that allow autonomous software riding and cooperation within a backbone environment. An agent sitting along GBB can be viewed as a self-contained, concurrently executing thread of control that encapsulates different states and communicate with other agents.
Conference Paper
Full-text available
A new approach to query optimization, truly adaptive optimization (TAO), is presented. TAO is a general optimization strategy and is composed of three elements: 1. a fast solution space search algorithm, derived from A*, which uses an informed heuristic lookahead; 2. a relaxation technique which allows to specify a tolerance on the quality of the resulting query execution plan; 3. a paradigm to prove the suboptimality of search subspaces. Non-procedural pruning rules can be used to describe specific problem knowledge, and can be easily added to the optimizer, as the specific problem becomes better understood. The main contribution over previous research is the use of relaxation techniques and that TAO provides a unifying framework for query optimization problems, which models a complexity continuum going from fast heuristic searches to exponential optimal searches while guaranteeing a selected plan quality. In addition, problem knowledge can be exploited to speed the search up. As a preliminary example, the method is applied to query optimization for databases distributed over a broadcast network. Simulation results are reported.
Conference Paper
Full-text available
In this work access support relations are introduced as a means for optimizing query processing in object-oriented database systems. The general idea is to maintain redundant separate structures (disassociated from the object representation) to store object references that are frequently traversed in database queries. The proposed access support relation technique is no longer restricted to relate an object (tuple) to an atomic value (attribute value) as in conventional indexing. Rather, access support relations relate objects with each other and can span over reference chains which may contain collection-valued components in order to support queries involving path expressions. We present several alternative extensions of access support relations for a given path expression, the best of which has to be determined according to the application-specific database usage profile. An analytical cost model for access support relations and their application is developed. This analytical cost model is, in particular, used to determine the best access support relation extension and decomposition with respect to the specific database configuration and application profile.
Conference Paper
Full-text available
Abstract We critically evaluate the current state of research in multiple query opGrnization, synthesize the requirements for a modular opCrnizer, and propose an architecture. Our objective is to facilitate future research by providing modular,subproblems,and a good,general-purpose,data structure. In rhe context,of this archiuzcture. we provide an improved,subsumption,algorithm. and discuss,migration,paths from,single-query to multiple-query oplimizers. The architecture has three key ingredients. First. each type of work
Conference Paper
Full-text available
Even though the large body of knowledge of relational query optimization techniques can be utilized as a start- ing point for object-oriented query optimization the full exploitation of the object-oriented paradigm requires new, customized optimization techniques-not merely the assimilation of relational methods. This paper describes such an optimization strategy used in the GOM (Generic Object Model) project which combines established rela- tional methods with new techniques designed for object models. The optimization method unites two concepts: (1) access support relations and (2) rule-bused query op- timization. Access support relations constitute an in- dex structure that is tailored for accessing objects along reference chains leading from one object to another via single-valued or set-valued attributes. The idea is to re- dundantly maintain frequently traversed reference chains separate from the object representation. The rule-based query optimizer generates for a declaratively stated query an evaluation plan that utilizes as much as possible the ex- isting access support relations. Thii makes the exploita- tion of access support relations entirely transparent to the database user. The rule-based query optimizer is particu- larly amenable to incorporating search heuristics in order to prune the search space for an optimal (or near-optimal) query evaluation plan.
Article
Extensible query optimization requires that the “repertoire” of alternative strategies for executing queries be represented as data, not embedded in the optimizer code. Recognizing that query optimizers are essentially expert systems, several researchers have suggested using strategy rules to transform query execution plans into alternative or better plans. Though extremely flexible, these systems can be very inefficient at any step in the processing, many rules may be eligible for application and complicated conditions must be tested to determine that eligibility during unification. We present a constructive, “building blocks” approach to defining alternative plans, in which the rules defining alternatives are an extension of the productions of a grammar to resemble the definition of a function in mathematics. The extensions permit each token of the grammar to be parametrized and each of its alternative definitions to have a complex condition. The terminals of the grammar are base-level database operations on tables that are interpreted at run-time. The non-terminals are defined declaratively by production rules that combine those operations into meaningful plans for execution. Each production produces a set of alternative plans, each having a vector of properties, including the estimated cost of producing that plan. Productions can require certain properties of their inputs, such as tuple order and location, and we describe a “glue” mechanism for augmenting plans to achieve the required properties. We give detailed examples to illustrate the power and robustness of our rules and to contrast them with related ideas.
Article
When data records are grouped into blocks in secondary storage, it is frequently necessary to estimate the number of blocks XD accessed for a given query. In a recent paper [1], Cardenas gave the expression XD = m(1 - (1 - 1/m)k), (1) assuming that there are n records divided into m blocks and that the k records satisfying the query are distributed uniformly among the m blocks. The derivation of the expression was left to the reader as an exercise.