Page 1
Exploiting Gradient Information in Numerical
Multi–Objective Evolutionary Optimization
Peter. A.N. Bosman
Centre for Mathematics and Computer Science
P.O. Box 94079
1090 GB Amsterdam
The Netherlands
Peter.Bosman@cwi.nl
Edwin D. de Jong
Institute of Information and Computing Sciences
Utrecht University
P.O. Box 80089
3508 TB Utrecht
The Netherlands
dejong@cs.uu.nl
ABSTRACT
Various multi–objective evolutionary algorithms (MOEAs)
have obtained promising results on various numerical multi–
objective optimization problems. The combination with gra-
dient–based local search operators has however been limited
to only a few studies. In the single–objective case it is known
that the additional use of gradient information can be bene-
ficial. In this paper we provide an analytical parametric de-
scription of the set of all non–dominated (i.e. most promis-
ing) directions in which a solution can be moved such that
its objectives either improve or remain the same.
over, the parameters describing this set can be computed
efficiently using only the gradients of the individual objec-
tives. We use this result to hybridize an existing MOEA with
a local search operator that moves a solution in a randomly
chosen non–dominated improving direction. We test the re-
sulting algorithm on a few well–known benchmark problems
and compare the results with the same MOEA without lo-
cal search and the same MOEA with gradient–based tech-
niques that use only one objective at a time. The results
indicate that exploiting gradient information based on the
non–dominated improving directions is superior to using the
gradients of the objectives separately and that it can further-
more improve the result of MOEAs in which no local search
is used, given enough evaluations.
Categories and Subject Descriptors
More-
G.1.6 [Numerical Analysis]: Optimization—Gradient meth-
ods; I.2 [Artificial Intelligence]: Problem Solving, Con-
trol Methods, and Search
General Terms
Algorithms, Performance, Experimentation, Theory
Keywords
Evolutionary Algorithms, Memetic Algorithms, Multi–Ob-
jective Optimization, Numerical Optimization, Gradients
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GECCO’05, June 25–29, 2005, Washington, DC, USA.
Copyright 2005 ACM 1-59593-010-8/05/0006 ...$5.00.
1. INTRODUCTION
Evolutionary algorithms (EAs) seek to exploit the global
structure of the search space.
crossover, the aim of which is to fruitfully exchange good
partial solutions, and the use of estimation–of–distribution
techniques, which provide a way of discovering and using
information about variable linkage. Information about the
local structure of the search space is often disregarded.
In many problems of interest however, particularly those
defined over continuous spaces, local structure can provide a
great deal of information about the directions in which im-
provement can be achieved. Indeed, a substantial part of the
methods studied in the field of machine learning are based
on the principle of following the gradient of a performance
function [14]. Hence, a sensible question is whether EAs for
numerical (i.e. real–valued) optimization can be improved
by incorporating local optimization techniques.
For single-objective problems, the notion of combining
EAs with local (possibly gradient–based) optimization is
well known. The resulting hybrid EAs have mainly become
known under the name of Memetic Algorithms [15]. Also
in multi–objective optimization, the use of such approaches
has recently come into focus. The main focus has been on
combinatorial optimization problems [9, 12, 20].
In single–objective numerical optimization, the gradient
∇f of the function f to be optimized conveys useful infor-
mation. For any l–dimensional point y, the gradient at that
point −(∇f(x))(y) is the direction of greatest decrease of f
starting from point y. Hence this direction can be used in an
algorithm to find local minima of f. Many such algorithms
exist, ranging from straightforward ones such as gradient de-
scent to more advanced ones such as conjugate gradients [8].
These algorithms are efficient at finding local minima. Hy-
bridizing EAs with such gradient–based optimization algo-
rithms has been shown to lead to good results for single–
objective optimization [1]. A natural question therefore is
how we can expand gradient–based techniques to a multi–
objective setting in an attempt to improve upon existing
MOEAs for numerical multi–objective optimization.
In multi-objective optimization the situation is however
more complicated. For instance, there is typically no single
direction to move a solution in so that all objectives are im-
proved simultaneously. Therefore, important first questions
for applying gradient techniques in a multi–objective setting
are whether we can and should use the gradients of the in-
dividual objective functions separately, what the gradient of
Examples are the use of
Page 2
a multi–objective function looks like and how we can utilize
the information conveyed by this gradient. In this article
we will provide an answer to these questions and use these
answers to construct new hybrid MOEAs with.
In this work, we compare three approaches to using gradi-
ent information. First, we consider a method that randomly
chooses an objective and optimizes it using standard single–
objective gradient–based optimization techniques. Different
individuals will thus be optimized with respect to differ-
ent objectives. This method has strong similarities with
the early multi-objective method VEGA [18].
consider a method which, for a given individual, optimizes
the various objectives in turn, again using standard single–
objective gradient–based optimization techniques. A more
interesting question however is whether all objectives in a
multi–objective problem can be optimized simultaneously
by moving a solution in a specific direction. Unless the so-
lution is trapped in a multi–objective local–optimum, using
such a direction will always improve the quality of the solu-
tion. Our third method achieves this aim by identifying all
non–dominated directions starting from the current point in
which all objectives either improve or remain the same.
Next, we
2.RELATED WORK
Currently, there exist only a few publications regarding
real–valued multi–objective memetic algorithms (or hybrid
EAs), the best–known of which is the M-PAES [10]. How-
ever, the M-PAES does not explicitly make use of gradient
information. Lahanas, Baltas and Giannouli do use gradi-
ent information explicitly [13]. However, they use weighted
aggregation to construct a single objective function which is
subsequently optimized. Hence, there is no guarantee of op-
timizing all objectives simultaneously. Brown and Smith [4]
use a hybrid EA based on an analytic description by Fliege
and Svaiter [7] of a direction that has the specific property
that it is the largest direction of simultaneous improvement.
This direction is referred to by the authors as the multi–
objective gradient. A multilevel subdivision technique that
subdivides the search space to perform local search in each
subspace based on a similar derivation of a single direction
of descent by Sch¨ affler, Schultz and Weinzierl [19] was pro-
posed by Dellnitz, Sch¨ utze and Hestermeyer [6]. However,
if the objectives have different ranges, the largest direction
of simultanous descent will be biased towards the objec-
tive with the largest range. Even if the objectives are first
scaled the same way there are, as we shall show, still multi-
ple (typically infinite) directions of improvement that do not
dominate each other (for instance improving objective 0 and
leaving objective 1 unchanged versus improving objective 1
and leaving objective 0 unchanged). All of these directions
are equally useful in multi–objective optimization.
The difference with our work is that we analytically de-
scribe the complete set of non–dominated simultaneously
improving directions. Hence, we consider the multi–objective
gradient to be a set of directions (specifically a m − 1 di-
mensional manifold in a m–dimensional space where m is
the number of objectives). Moreover, the hybrid EA used
by Brown and Smith is only tested on a 2–dimensional, 2–
objective quadratic test problem that has very nice gradient
properties, which is not expected to be a good practical test–
case. Here we use a well–known set of 5 benchmark tests
that have a higher dimensionality and vary in difficulty [21].
3.
3.1
GRADIENTS&MULTIPLEOBJECTIVES
Single objective
The gradient of f returns for any point y the direction of
greatest increase of f starting from y and it is defined as:
∇f(x) = (∂f/∂x0,∂f/∂x1,...,∂f/∂xl−1)
The direction of greatest decrease of f starting from y
is just the negative gradient at y, i.e. −(∇f(x))(y). The
directional derivative of f in direction u is a function that
for any point y returns the rate of change of f in direction
u and it is defined as:
(∇uf(x))(y) = ((∇f(x))(y))Tˆ u
Clearly, it can be shown that the directional derivative
(in some point) is minimal if and only if u points in the
same direction as the negative gradient of f, i.e.
where g = −∇f(x). Hence, finding the direction of greatest
decrease of f at any point y is actually a single–objective
optimization problem defined as:
min
ˆ u
The answer to this optimization problem of course is the
negative gradient of f at point y, i.e. −(∇f(x))(y).
3.2Multiple objectives
We assume to have m real–valued objective functions that,
without loss of generality, we seek to minimize simultane-
ously. We denote the objective functions by fi(x) where
i ∈ {0,1,...,m − 1} and x ∈ Rl.
To keep the analogy with the single–objective setting, we
would like to define the gradient of f = (f0,f1,...,fm−1)
for any point y as the direction of greatest increase of f
starting from y. However, in a multi–objective setting there
is in general no single such direction as we shall see.
Observe the optimization–problem definition of finding
the best direction, i.e. find the unit direction ˆ u such that
the directional derivative in direction ˆ u indicates the best
improvement. In the multi–objective case we define the di-
rectional derivative ∇uf(x) to be a vector such that the ith
component of that vector is the directional derivative in the
ithobjective, i.e. (∇uf(x))i = ∇ufi(x), or:
(1)
(2)
ˆ u = ˆ g
{(∇uf(x))(y)} (3)
∇uf(x) = Gˆ u
(4)
3
7
where G = (∇f0(x),...,∇fm−1(x))T=
2
6
4
(∇f0(x))T
...
(∇fm−1(x))T
5
To find the best direction ˆ u we must solve the optimiza-
tion problem in Equation 3 with f replaced by f. This opti-
mization problem is now multi–objective. Intuitively, this is
because a direction indicates improvement in f if and only
if the directional derivatives of all individual objective func-
tions are non–positive (i.e. improvement in all objectives or
remain the same). Now, similar to the single–objective case
we are ultimately interested in the direction that maximizes
the improvement (i.e. the negative gradient). Hence, in the
multi–objective case we prefer the set of directions that cor-
respond to the Pareto–optimal front of improving directions.
Indeed, in the multi–objective setting there is thus in general
no single direction of greatest increase of f starting from y.
An important question now is whether we can find a para-
metric description of the set of directions that corresponds
to the Pareto–optimal front of improving directions. Oth-
erwise the only way to proceed is to run a multi–objective
optimization algorithm just to find a good simultaneously
improving direction, which clearly is less preferable.
Page 3
-1
-0.5
0
0.5
1-1
-0.5
0
0.5
1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
ˆ u0
ˆ u1
ˆ u2
-8
-6
-4
-2
0
2
4
6
8
-1.5-1 -0.5 0 0.5 1 1.5
∇ˆ uf0(x)
(Unit dir. describing) complete ellipsoid
(Unit dir. describing) ellipsoid perimeter
(Unit dir. describing) ellipsoid extrema (ˆ uex,i)
(Unit dir. describing) ellipsoid surface between extrema (Uex)
(Unit dir. describing) ellipsoid neg.-subspace extrema (ˆ uneg–ex,i)
(Unit dir. describing) surface between neg.-subspace extrema (Uneg–ex)
∇ˆ uf1(x)
Figure 1: Top: Unit directions in a 3–dimensional
parameter space.
Bottom:
directionstothe2–dimensional
directional–derivative space.
Mapping of the unit
multi–objective
Each of the m objectives in the function to optimize in
Equation 4 is just a linearly weighted sum of the compo-
nents of the direction vector, i.e. the jthobjective, j ∈
{0,1,...,m − 1}, is −(∇fj(x))Tˆ u =Pl−1
unit hypersphere of dimensionality l the set of all objective
vectors form an ellipsoid of dimensionality m. Moreover, if
m < l, which is usually the case, the objective vectors lie
both on the surface of the ellipsoid as well as inside the el-
lipsoid. The points on this surface are given by directions ˆ u
on the l–dimensional hypersphere that are themselves con-
nected through a m–dimensional hypersphere. For instance
if l = 3 and m = 2, then the ˆ u lie on a unit sphere, the
objective vectors ∇uf(x) lie on and inside a 2–dimensional
ellipsoid and the ˆ u that are projected onto the perimeter of
this ellipsoid together form a unit circle across surface of the
unit sphere. This case is illustrated in Figure 1.
Since we are interested in extrema (i.e. non–dominated
directional derivatives), we only want to describe the surface
of the ellipsoid. Specifically, we are interested in the part
of the surface that intersects with (−∞,0]msince we are
only interested in negative directional derivatives for our
minimization task. This part of the surface of the ellipsoid
is described by all unit vectors that are contained between
the specific unit vectors ˆ uex,i, i ∈ {0,1,...,m−1} for which
i=0−(∇fj(x))iˆ ui.
Now because the ˆ u to optimize over lie on the surface of a
the multi–objective directional derivative ∇ˆ uex,if(x) is an
extremum in the i–th objective. This set of vectors which
we will denote Uexcontains all vectors that are obtained by
taking a linearly weighted sum of the extrema unit vectors
ˆ uex,i.Note that these vectors have to be normalized to
ensure that they are unit vectors:
(Pm−1
However, set Uexisn’t tight enough because it can still
involve directions for which not all components of the multi–
objective directional derivative are negative simultaneously.
This is illustrated for our example case in Figure 1. Hence we
need m additional constraints to get the desired set Uneg–ex:
(Pm−1
The ultimate goal is to take a random direction from set
Uneg–ex. A straightforward way to do so using Equation 6 is
to repeat drawing values for the αi until the m inequality
constraints are met. However, the number of unit vectors for
which all directional derivatives are at most 0 may be signif-
icantly smaller than the number of unit vectors for which at
least one directional derivative is smaller than 0, especially
if m becomes larger. Hence it would be convenient to have a
sampling method without constraints. To this end we first
find all m unit vectors within set Uexthat map to corner
points of Uneg–ex, i.e. all unit vectors ˆ uneg–ex,ifor which the
multi–objective directional derivative is negative in the ith
direction and zero in all other directions. This means that
we must solve ∇ˆ uneg–ex,if(x) = −λiˆ eiwhere ˆ eiis a unit
column vector with a 1 at position i and zeros at all other
positions and λi is a non–negative real value, λi ≥ 0. More-
over, there is a unique solution to this equation because in
the objective space solving this equation means finding the
intersection of the negative part of an axis with the sur-
face of an ellipsoid that is centered at the origin. Using the
definition in Equation 4 we obtain:
„
Since
more Equation 7 must have a unique solution, also Guneg–ex,i=
−ˆ eimust have a unique solution which we can use to find
ˆ uneg–ex,i=uneg–ex,i/?uneg–ex,i? with. Because matrix G is a
m × l matrix, it is generally speaking not square and can-
not be inverted. However, we know that ˆ uneg–ex,i∈ Uexand
hence we can write uneg–ex,i=Pm−1
2
6
Pm−1
We define a matrix D such that Dkj = (∇fk(x))Tˆ uex,j
and a vector αisuch that αik = αi
be rewritten as Dαi. Moreover, because D is a square
matrix, we can find a unique solution for uneg–ex,ithrough
αiif we solve Dαi= ˆ eiby inverting D, i.e. αi= D−1ˆ ei.
Finally, we note that ˆ uex,j, the extremum in the negative
direction of the multi–objective directional derivative in the
jthdimension is, using the result of Equation 3, of course
just −(∇fj(x)). We can therefore now write:
Uex=
i=0αiˆ uex,i
?Pm−1
i=0αiˆ uex,i?
˛˛˛˛˛
αi ∈ [0,1],
Pm−1
i=0αi = 1
)
(5)
Uneg–ex=
i=0αiˆ uex,i
?Pm−1
i=0αiˆ uex,i?
˛˛˛˛˛
αi∈[0,1],Pm−1
∇Pm−1
i=0αi = 1,
i=0αiˆ uex,ifj(x) ≤ 0
)
(6)
Gˆ uneg–ex,i=−λiˆ ei⇔ G
1
λi ?uneg–ex,i?
«
uneg–ex,i=−ˆ ei(7)
1
λi? uneg–ex,i?is just a scaling factor and since further-
j=0αi
jˆ uex,jwhere we do
not know the αi
j. We now have:
G
m−1
j=0
X
αi
jˆ uex,j
!
=
6
4
Pm−1
j=0αi
j=0αi
j(∇f0(x))Tˆ uex,j
...
j(∇fm−1(x))Tˆ uex,j
3
7
7
5
(8)
k. Then Guneg–ex,ican
Page 4
ˆ uneg–ex,i= −
Pm−1
?Pm−1
j=0
`D−1ˆ ei´
`D−1ˆ ei´
j(∇fj(x))
j=0
j(∇fj(x))?
(9)
This latter result now allows us to rewrite equation 6 and
define the set Uneg–exof unit directions to choose from for
multi–objective gradient search without the inequality con-
straint, but in the same fashion as set Uneg–exinstead:
(Pm−1
4.GRADIENT–BASED HYBRID MOEAS
4.1Base MOEA
The base MOEA we use is the naive MIDEA [3]. This
MOEA is an EDA specifically designed for multi–objective
optimization. It has been shown to give good results on a
wide variety of problems defined in both discrete and con-
tinuous parameter spaces. Moreover, it is fast and easy to
understand, making it a good baseline algorithm. The fol-
lowing gives a brief overview of its main features. For specific
details the interested reader is referred to the literature [3].
The naive MIDEA maintains a population of size n. In
each generation it selects a subset of this population of size
⌊τn⌋, τ ∈ [1
variation n−⌊τn⌋ new solutions are generated which replace
the solutions in the population that were not selected.
Selection is performed using a diversity–preserving selec-
tion operator. Since the goal in multi–objective optimization
is both to get close to the Pareto optimal front and to get
a good diverse representation of that front, a good selection
operator must exert selection pressure with respect to both
aspects. The selection operator in the naive MIDEA does
this by using truncation selection on the basis of domination
count (i.e. the number of times a solution is dominated). If
the number of non–dominated solutions exceeds the targeted
selection size ⌊τn⌋, a nearest–neighbour heuristic in the ob-
jective space is used to ensure that a well–spread, represen-
tative subset of all non–dominated solutions is chosen.
The variation operator is geometrical in nature and is
specifically designed to provide an advantage over tradi-
tional variation operators. The selected solutions are first
clustered in the objective space. Subsequently, the actual
variation takes place only between individuals in the same
cluster, i.e. a mating restriction is employed. The ratio-
nale is that variation inside each cluster can process spe-
cific information about the different regions along the Pareto
front. Such a parallel exploration automatically gives a bet-
ter probability of obtaining a well–spread set of offspring
solutions. To further stimulate diversity along the Pareto
front each new offspring solution is constructed from a ran-
domly chosen cluster. Variation inside a single cluster is
done by estimating a one–dimensional normal–distribution
for each variable separately.
4.2Hybrid MOEAs
4.2.1General hybrid MOEA framework
The hybridization scheme that we employ is a genera-
tional one. At the end of a generation, i.e. after one step
of selection, variation and re–evaluation is finished, a set of
candidate solutions is determined. The gradient–based local
search operator is then applied with each of these candidate
solutions as a starting point. We identify three classes of
candidate–solution sets:
Uneg–ex=
i=0βiˆ uneg–ex,i
?Pm−1
i=0βiˆ uneg–ex,i?
˛˛˛˛˛
βi ∈ [0,1],
Pm−1
i=0βi = 1
)
(10)
n,1), to perform variation with. By means of
1. The entire population
This allows for the widest search and the largest prob-
ability of discovering of new local optima, but also has
the largest chance of finding improvements that are
not better than the current non–dominated solutions.
2. The selected solutions
Since the set of selected solutions contains all promis-
ing solutions from the viewpoint of the MOEA, this
set is a logical choice for selecting solutions to attempt
to improve. This set represents a rational trade–off
choice between the entire population and the set of
the non–dominated solutions only.
3. The non–dominated solutions
Although it is directly beneficial if solutions from this
set can be improved upon, there is also a consider-
able probability that the solutions in this set lie in a
multi–objective local optimum and hence cannot be
improved further using local search.
To control the ratio between how much search effort is
spent by the MOEA and how much search effort is spent by
the gradient–based local search operator, we introduce a ra-
tio parameter ρe. In our scheme we aim to keep the ratio of
the number of evaluations required by the gradient–based
local search operator and the total number of evaluations
required so far equal to ρe. To this end, the gradient–based
local search operator is applied only as long as the current
actual ratio is smaller than ρe. Moreover, the order in which
the candidate solutions are searched is randomized. We do
not suggest that this hybridization approach is optimal, but
it will point out whether a gradient–based local search algo-
rithm can aid in real–valued multi–objective optimization.
4.2.2Random–objective conjugate gradients
In this straightforward approach single–objective gradient–
based local search is applied to a randomly chosen objective.
Because the local search is now focused on a single objec-
tive, more advanced single–objective algorithms known from
literature can be used. We use the conjugate gradients al-
gorithm and call the resulting strategy ROCG (Random–
Objective Conjugate Gradients). It depends completely on
the correlation between the objectives whether the best local
improvement in a single objective also leads to an improve-
ment in the other objectives. Since this is typically not the
case, the percentage of local searches that leads to an im-
provement (i.e. a solution that dominates the solution from
where the local search started) is expected to be small.
4.2.3Alternating–objective repeated line–search
To reduce the probability of improving a single objec-
tive while making the objective value in the other objective
worse, the objective that is searched locally can be altered
during local search. This still allows the use of well–known
single–objective approaches. However, care should be taken
in the design because it does not make sense to let the local
search in a single objective converge to a minimum. Doing
so results in the same approach as ROCG. Hence we propose
to perform a line–search (i.e. find a local minimum with re-
spect to a single direction) in a single, alternatingly chosen
objective in the direction of the negative gradient of that
objective. This process is repeated until a multi–objective
local minimum is found. In our multi–objective setting we
propose to switch to a randomly chosen different objective
once a line–search has terminated. We refer to this strategy
as AORL (Alternating–Objective Repeated Line–search).
Page 5
4.2.4
This final strategy uses the results from Section 3.2. The
approach is best described as a multi–objective version of
gradient descent. A line–search is performed in a promising
direction.When the line–search terminates, a new line–
search is initiated in a new promising direction found at the
new location. It was shown in Section 3.2 that there is a set
of non–dominated directions that are all most promising.
These directions are described by equation 10. Hence, when
a line–search terminates, a random vector of β is drawn such
that each βi ∈ [0,1] andPm−1
this case does not aim to minimize a single objective but
aims to find the best non–dominated solution in the given
direction, i.e. the line–search in this strategy is a multi–
objective search algorithm as well. We refer to this strategy
as CORL (Combined–Objectives Repeated Line–search).
5.EXPERIMENTS
5.1Setup
5.1.1Multi–objective optimization problems
The problems we have used have been taken from the lit-
erature on designing difficult and interesting multi–objective
optimization problems and on comparing various MOEAs [5,
21]. Specifically, we have used the problems known as ECi,
i ∈ {1,2,3,4,6}. For specific details regarding the difficulty
of these problems we refer the interested reader to the indi-
cated literature. Here we only present their definition.
Combined–objectives repeated line–search
i=0βi = 1. Equation 10 is then
used to obtain a new direction. Note that a line–search in
Name ObjectivesDomain
EC1
f0=x0,f1 = γ
“Pl−1
f1 = γ`1 − (f0/γ)2´
“
1 −pf0/γ
”
γ =1 + 9
i=1xi/(l − 1)
”
”
[0,1]30
(l = 30)
EC2
f0=x0,
γ =1 + 9
“Pl−1
1 −pf0/γ − (f0/γ)sin(10πf0)
γ =1 + 9
“Pl−1
f0=x0,f1 = γ
“
γ =1 + 10(l − 1) +Pl−1
f0=1 − e−4x0sin6(6πx0),
“Pl−1
Performance indicator
To measure performance we only consider the subset of
all non–dominated solutions in the population upon termi-
nation. We call such a subset an approximation set and
denote it by S. A performance indicator is a function of
approximation sets S and returns a real value that indicates
how good S is in some aspect. More detailed information
regarding the importance of using good performance indica-
tors for evaluation may be found in literature [2, 11, 22].
Here we use a single performance indicator that is based
on knowledge of the optimum, i.e. the Pareto–optimal front.
We define the distance d(x0,x1) between two multi–objective
solutions x0and x1to be the Euclidean distance between
their objective values f(x0) and f(x1). The performance in-
dicator we use computes the average of the distance to the
closest solution in an approximation set S over all solutions
in the Pareto–optimal set PS. We denote this indicator by
i=1xi/(l − 1)
[0,1]30
(l = 30)
EC3
f0=x0
f1=γ
“
”
i=1xi/(l − 1)
”
[0,1]30
(l = 30)
EC4
1 −pf0/γ
i=1
”
`x2
”0.25
i− 10cos(4πxi)´
f1 = γ`1 − (f0/γ)2´
[−1,1]×
[−5,5]9
(l = 10)
[0,1]10
EC6
γ =1 + 9
i=1xi/(l − 1)
(l = 10)
5.1.2
DPF→S and refer to it as the distance from the Pareto–
optimal front to an approximation set. A smaller value for
this performance indicator is preferable and a value of 0
is obtained if and only if the approximation set and the
Pareto–optimal front are identical. This indicator is ideal
for evaluating performance if the optimum is known because
it describes how well the Pareto–optimal front is covered
and thereby represents an intuitive trade–off between the
diversity of the approximation set and its proximity (i.e.
closeness to the Pareto–optimal front). Even if all points
in the approximation set are on the Pareto–optimal front
the indicator is not minimized unless the solutions in the
approximation set are spread out perfectly.
Because the Pareto–optimal front may be continuous a
line integration over the entire Pareto front is required in
the definition of the performance indicator. In a practical
setting, it is easier to compute a uniformly sampled set of
many solutions along the Pareto optimal front and to use
this discretized representation of PF instead. We have used
this approach using 5000 uniformly sampled points. The
performance indicator now is defined as follows:
1
|PS|
x1∈PS
DPF→S(S) =
X
min
x0∈S{d(x0,x1)}(11)
5.1.3
For selection we set τ to 0.3, conforming to earlier work [3]
and the rule–of–thumb for FDA [16]. We allowed gradient–
based local search operators 10 iterations each time they
were called. We set ρe ∈ {0,0.1,0.25,0.5}. Note that for
ρe = 0 the pure naive MIDEA is obtained. Gradient in-
formation was approximated when required using ∆xi =
10−13. Furthermore, we have used the Polak–Ribiere vari-
ant of the conjugate gradient algorithm [17]. All reported
results were averaged over 30 runs, except for the conver-
gence plots, which were averaged over 100 runs.
It is important to note that all variables have a bounded
range.If the variables move outside of this range, some
objective values can become non–existent. It is therefore
important to keep the variables within their ranges. How-
ever, a simple repair mechanism that changes a variable to
its boundary value if it has exceeded this boundary value
gives artifacts that may lead us to draw false conclusions
about the performance of the tested MOEAs. If for instance
the search probes a solution that has all negative values for
each of the variables xi with i ≥ 1, then the repair mech-
anism in the case of all problems except problem EC6 sets
all these variables to 0. This is especially well possible dur-
ing a gradient–search procedure because the gradient with
respect to the second objective points in the direction of all
negative values for variables xi with i ≥ 1. It is not hard
to see that the solution resulting after boundary repair lies
on the Pareto front. We have therefore adapted the local
search operators such that local search never changes a so-
lution into one that lies out of the problem range. Similarly,
the sampling procedure of the naive MIDEA is changed to
prevent the generation of solutions that are out of bounds.
5.2Results
In Table 1 the average DPF→S value is shown for all
tested MOEAs with a maximum of either 20000 or 200000
evaluations. All population sizes between 0 and 1000 with a
stepsize of 25 were tested and ultimately the one that lead
to the best average indicator value was chosen.
General algorithmic setup
Page 6
Max. eval. = 20000Max. eval. = 200000
EC1 EC2 EC3 EC4 EC6
“
Grad. C EC1 EC2 EC3 EC4 EC6
“
None–
0.43 0.83 1.17 3.28 1.21 1.88 2.10 0.56 1.58 1.17
ρe= 0.10
ROCG P0.53 1.33 1.31 4.31 1.44
ROCG S0.54 1.11 1.29 3.89 1.43
ROCG N0.52 1.08 1.28 4.07 1.38
AORL P0.49 1.25 1.30 4.03 1.39
AORL S0.54 1.29 1.28 3.75 1.41
AORL N0.51 1.28 1.31 3.97 1.41
CORL P0.50 1.14 1.29 4.47 1.42
CORL S0.51 1.15 1.28 4.22 1.46
CORL N0.50 1.06 1.27 4.10 1.42
ρe= 0.25
ROCG P0.73 1.54 1.49 5.22 1.62
ROCG S0.72 1.77 1.58 4.86 1.62
ROCG N0.76 1.89 1.57 4.35 1.69
AORL P0.74 1.50 1.56 3.50 1.54
AORL S0.73 1.86 1.55 3.49 1.64
AORL N0.74 1.69 1.61 4.64 1.65
CORL P0.67 1.59 1.50 3.62 1.80
CORL S0.68 1.76 1.43 3.37 1.67
CORL N 0.68 1.68 1.44 3.39 1.55
ρe= 0.50
ROCG P 1.44 3.37 2.04 6.34 2.37
ROCG S 1.43 3.20 2.22 6.01 2.44
ROCG N1.39 3.22 2.20 6.60 2.34
AORL P1.40 3.30 2.31 3.25 2.23
AORL S 1.43 3.19 2.28 3.80 2.19
AORL N 1.39 3.32 2.28 5.30 2.31
CORL P 1.19 3.08 1.92 3.15 2.16
CORL S 1.14 2.97 1.81 3.54 2.29
CORL N 1.05 3.11 1.74 4.76 2.20
Table 1:Average DPF→S metric for all tested
MOEAs and all benchmark problems for two differ-
ent maximum number of evaluations.
the set of candidates, P = population, S = selected
solutions, N = non–dominated solutions.
The first question to answer is which local gradient–based
search technique is the most promising. The hybridization
based on CORL almost always gives better results than the
other approaches. The superiority of CORL is emphasized
further in Table 2 where it is clear that CORL has by far the
largest probability of improving a solution. Hence, if local
gradient–based search is to be used, the CORL approach
proposed in this paper is the most promising one.
The next question to answer is whether the additional
use of local gradient–based search in MOEAs is a useful
approach at all. The results in Table 1 show a dominating
performance of the base MOEA alone for all problems except
EC4. Moreover, the the performance generally deteriorates
with an increased number of evaluations allowed to be spent
by the local search operator (i.e. a larger ρe). This would at
first sight indicate that the added use of local gradient–based
search generally speaking only hampers MOEAs. However,
this general statement can be differentiated when observing
the results more thoroughly.
First of all, local search is expensive. Finding a local min-
imum requires a relatively large amount of resources com-
pared to a global search across the entire landscape. More-
over, in the multi–objective case local search is relatively
speaking even more expensive because there are typically
many (Pareto–) optimal solutions and a single local search
only improves a single solution. An EA on the other hand
improves multiple solutions simultaneously (if variation is
successful). In multi–objective optimization this may in-
volve the improvement of large parts of the Pareto–front.
Hence, local search benefits are better visible if more eval-
uations are allowed, especially in the multi–objective case.
Indeed, Table 1 shows that for EC4 the improvement of the
·101” “
·101” “
·101”
·103” “
·103” “
·102”“
·102”
2.07 2.28 0.61 1.55 1.40
2.11 2.26 0.64 1.49 1.32
2.13 2.28 0.63 1.71 1.37
2.12 2.28 0.62 1.60 1.42
2.08 2.32 0.62 1.36 1.55
2.05 2.35 0.63 1.38 1.55
2.04 2.26 0.61 1.07 1.40
2.03 2.26 0.58 1.10 1.41
2.02 2.27 0.59 1.25 1.40
2.50 2.75 0.72 1.78 1.88
2.51 2.73 0.74 1.69 1.74
2.53 2.73 0.74 1.62 1.86
2.50 2.83 0.72 1.57 2.08
2.53 2.89 0.78 1.64 2.35
2.45 2.84 0.77 1.44 2.14
2.35 2.59 0.71 1.02 1.65
2.28 2.53 0.64 1.20 2.16
2.33 2.53 0.66 1.31 1.86
3.94 4.34 1.05 1.82 4.72
3.91 4.35 1.19 2.00 6.40
3.95 4.54 1.20 1.60 7.01
3.88 4.57 1.23 1.36 3.45
3.94 4.52 1.26 1.92 5.93
3.86 4.50 1.22 2.12 4.70
3.27 3.77 0.99 0.82 4.79
3.15 3.55 0.82 1.17 5.19
3.28 3.70 0.80 1.48 5.57
C indicates
Max. eval. = 20000 Max. eval. = 200000
EC1 EC2 EC3 EC4 EC6
0.00.00.0
Grad. C EC1 EC2 EC3 EC4 EC6
None– 0.0 0.00.00.0
ρe= 0.10
0.6 17.6
0.8 20.1
0.2 17.1
2.1 16.7 39.2
0.56.5 40.4
0.0 6.0 41.3
0.00.00.0
ROCG P
ROCG S
ROCG N
AORL P
AORL S
AORL N
CORL P
CORL S
CORL N
0.0 10.6
0.0 11.9
0.0 10.6
0.0 58.1
0.0 49.7
0.0 53.0
99.2 95.6 80.8 91.2 88.2
97.6 93.0 93.7 77.6 85.2
97.2 90.6 91.6 74.4 82.3
9.0
2.7
3.1
0.0
0.0
0.0
0.0 19.2
0.0 14.7
0.0 11.2
96.9 93.1 78.1 76.2 24.7
97.9 94.9 95.7 55.7 17.5
98.7 96.3 97.0 58.4 15.5
8.4
3.8
3.3
5.9
1.3
1.3
5.2 15.6 21.8
0.03.2 14.4
0.0 1.4 10.2
0.8
0.4
0.3
9.0
4.9
4.6
ρe= 0.25
0.9 14.4
0.5 14.7
0.6 13.1
2.9 18.0 31.3
0.19.1 32.2
0.2 4.9 27.5
ROCG P
ROCG S
ROCG N
AORL P
AORL S
AORL N
CORL P
CORL S
CORL N
0.0 10.5
0.0 10.1
0.0
0.0 51.6
0.0 46.6
0.0 45.1
97.3 93.7 78.9 87.1 82.3
96.1 94.6 90.7 87.5 75.8
96.6 91.3 91.2 53.3 75.4
8.8
2.9
2.7
0.0
0.0
0.0
0.0 19.2
0.0 14.0
0.0 10.0
93.9 89.6 73.9 69.3 22.7
95.2 93.3 92.9 24.7 23.6
97.6 94.9 94.5 42.3 20.6
8.1
3.7
2.6
6.4
1.2
0.9
5.0 17.3 21.2
0.0 3.6 13.5
0.0 0.9
1.7
0.2
0.3
9.8
4.7
3.49.3
8.9
ρe= 0.50
1.7 15.3
1.0 20.4
0.9
1.9 13.1 19.7
0.2 11.5 25.8
0.1 2.8 20.3
ROCG P
ROCG S
ROCG N
AORL P
AORL S
AORL N
CORL P
CORL S
CORL N
Table 2: Average percentage of calls to the local–
search operator that resulted in an improvement
(i.e. a dominating solution) for all tested MOEAs
and all benchmark problems for two different max-
imum number of evaluations. C indicates the same
as in Figure 1.
base MOEA combined with CORL over base MOEA alone
increases if 200000 evaluations are allowed instead of 20000.
But still, for all other problems, the base MOEA combined
with CORL does not outperform the base MOEA alone.
Interestingly enough, although the final outcome is worse
than using the base MOEA alone, for problems EC1, EC2
and EC3 the CORL approach does reach very high prob-
abilities of improving a solution. So, local search doesn’t
fail in this case. The reason why the base MOEA alone in
the end is still better for a fixed maximum number of eval-
uations is that these three problems are relatively speaking
“too easy” for the base MOEA. For these problems, the
base MOEA is already able to move the Pareto front by
improving many solutions simultaneously. Gradient–search
can then not provide additional help fast enough because the
number of evaluations required before a solution is improved
is relatively large, making local search much more expensive.
The ROCG approach requires on average 72 evaluations per
local search call, but has a very low improvement–ratio. The
AORL approach has a slightly better improvement–ratio but
requires 124 evaluations per call on average. Although the
the CORL approach has a very good improvement–ratio, it
does require 316 evaluations per call on average. Thus, the
MOEA may actually be hampered if local search is used
because it could have moved many solutions simultaneously
using the same number of evaluations. The fundamental dif-
ference with single–objective optimization should be noted
here. The added use of gradient–search applied to only a sin-
gle solution can only help if the number of non-dominated
solutions moved simultaneously by the base MOEA is not
too large. The number of solutions for which this balance is
tipped towards the hybrid side or the non–hybrid side is of
0.0 11.2
0.0
0.0
0.0 44.1
0.0 51.8
0.0 35.1
94.2 86.2 77.3 80.9 80.0
92.7 85.8 88.5 68.0 68.3
94.6 76.7 90.1 45.7 58.6
9.0
2.8
3.3
0.0
0.0
0.0
0.0 19.7
0.0 15.0
0.0
89.4 83.6 72.8 63.8 42.3
91.3 88.4 82.8 41.6 25.3
94.7 88.8 88.7 37.5 21.5
7.7
3.8
2.5
6.2
1.5
1.1
4.8 12.7 19.8
0.03.6 12.4
0.00.5
2.4 11.2
0.4
0.3
9.5
8.1
5.4
2.57.5
9.68.1
Page 7
0
6
0.2
0.4
0.6
0.8
1
Average DPF→S
0
1
2
3
4
5
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
Population size
Average DPF→S
None
ROCG ρe= 0.10
ROCG ρe= 0.25
ROCG ρe= 0.50
AORL ρe= 0.10
AORL ρe= 0.25
AORL ρe= 0.50
CORL ρe= 0.10
CORL ρe= 0.25
CORL ρe= 0.50
Figure 2: Average DPF→S indicator values when us-
ing the complete population as the candidate set for
local search. Top: EC1, 20000 evaluations. Bottom:
EC6, 200000 evaluations.
course problem specific. The idea of less solutions to move
corresponds to smaller population sizes in base MOEA. In-
deed, for smaller population sizes, the results of the hy-
bridized MOEA are better. Only when the population size
increases can the base MOEA perform better. Although this
behavior was observed across the board, two illustrative ex-
amples are provided in Figure 2.
Generally speaking, the additional use of gradient–based
local search at rate ρe is only useful if the part of the over-
all improvement contributed by the local search is also at
least ρe. The main reason why this ratio was not obtained
on problems EC1, EC2 and EC3 is because the base MOEA
could itself reach the optimal Pareto–front with relatively
little evaluations and could hence improve many non–domi-
nated solutions at the same time.
however, reaching the optimal Pareto–front is hard for the
base MOEA alone. Moreover, the CORL approach reaches
high probabilities of improving a solution for this problem.
Because it is hard for the base MOEA to improve the so-
lutions, the relative contribution to the improvement of the
CORL local–search operator is large enough for the hybrid
MOEA to lead to better results. CORL can help to move
some solutions closer to optimal Pareto–front after which the
MOEA is able to find solutions that lie on a similar front as
that solution and so advance the entire front. The CORL
approach now already helps quickly as can be seen from Ta-
ble 1. But it helps even more in the longer run as can be
seen from the same Figure as well as from Figure 3 in which
convergence graphs are shown for a population size of 500
and a maximum number of evaluations of 1·106. Note again
that only the added use of CORL really makes a difference.
On the EC4 problem
0.1
1
10
100
Average DPF→S
(ρe= 0.10)
0.1
100
1
10
100
Average DPF→S
(ρe= 0.25)
0.01
0.1
1
10
0 2000004000006000008000001e+006
Average DPF→S
(ρe= 0.50)
Evaluations
Figure 3: Convergence graphs for various settings of
ρe for all tested hybrid MOEAs on the EC4 problem.
For problem EC6the base MOEA alone comes close to the
optimal Pareto–front only after 200000 evaluations. This is
a lot more than for problems EC1, EC2and EC3, where only
20000 evaluations are required to come close to the optimal
Pareto–front. Hence, the base MOEA isn’t very good on
the EC6 problem either. However, for this problem gradi-
ent information is simply less helpful upon approaching the
optimal front as can be seen from Table 2. The probabil-
ity of success for the CORL approach heavily decreases on
problem EC6 as more evaluations are allowed.
6. DISCUSSION
The investigation of the results in the previous Section
has provided us with explanations for the observed behavior.
On problems EC1, EC2 and EC3, both the MOEA and the
CORL approach work well. However, because the MOEA
is capable of improving multiple solutions at the same time,
which leads to less required evaluations for an overall bigger
improvement, hybridization stands in the way of improve-
ment. On the EC6 problem the opposite happens. Both
the base MOEA and the CORL approach do not work well.
Still, the MOEA approach is relatively more useful for the
same reasons and hence hybridization again stands in the
Page 8
way of improvement, albeit less efficient in this case. Fi-
nally, on the EC4 problem, the base MOEA performs even
worse but the CORL approach performs reasonably well.
For this reason we have observed superiority of the hybrid
MOEA on the EC4 problem.
Combined with the use of EAs, local gradient–based op-
timization is less fruitful in multi–objective optimization.
A good hybridization scheme should thus be carefully con-
structed. It would be interesting to attempt to construct
such a scheme that takes the observations made in this pa-
per into account and to test it on the same benchmark as
well as on additional problems of varying difficulty and per-
haps a larger dimensionality, both in the sense of the num-
ber of objectives as the number of problem variables. Such
a scheme would be interesting to subsequently apply to a
real–world application.
7.CONCLUSIONS
We have presented a parameterized, easy–to–use descrip-
tion of the set of all non–dominated improving directions
for any point in the parameter space of a multi–objective
optimization problem. We have furthermore investigated
the added use of gradient–based local search operators for
numerical multi–objective optimization with MOEAs.
Our experiments show that the best way to exploit gra-
dient information in multi–objective optimization is to use
the set of all non–dominated maximally–improving direc-
tions described in this paper. However, the added use of
gradient–based local search in a MOEA is only useful if the
relvative contribution it makes to the overall improvement is
at least as big as the relative amount of resources it is allowed
to spend. We have indicated that for multi–objective opti-
mization this criterion is harder to achieve because MOEAs
have the ability to advance multiple solutions simultaneously
towards different regions of the optimal Pareto–front, giv-
ing it a bigger relative advantage than in the single–objective
case. Hence, more often than in the single–objective case,
the added use of local gradient–based search is not always
efficient. Thus, although we have provided a solid basis for
exploiting gradient information in numerical multi–objective
evolutionary optimization, it should not be disregarded that
our results also indicate that EAs really have an advantage
over non–population–based approaches if the goal is to ob-
tain a good approximation set (i.e. instead of only a single
solution on the optimal Pareto–front).
8.REFERENCES
[1] P. A. N. Bosman and D. Thierens. Exploiting gradient
information in continuous iterated density estimation
evolutionary algorithms. In B. Kr¨ ose et al., editors,
Proceedings of the 13th Belgium–Netherlands Artificial
Intelligence Conference BNAIC’01, pages 69–76, 2001.
[2] P. A. N. Bosman and D. Thierens. The balance between
proximity and diversity in multi–objective evolutionary
algorithms. IEEE Transactions on Evolutionary
Computation, 7:174–188, 2003.
[3] P. A. N. Bosman and D. Thierens. The naive MIDEA: a
baseline multi–objective EA. In C. A. Coello Coello et al.,
editors, Evolutionary Multi–Criterion Optimization –
EMO’05, pages 428–442, Berlin, 2005. Springer–Verlag.
[4] M. Brown and R. E. Smith. Effective use of directional
information in multi–objective evolutionary computation.
In E. Cant´ u-Paz et al., editors, Proceedings of the
GECCO–2003 Genetic and Evolutionary Computation
Conference, pages 778–789, Berlin, 2003. Springer–Verlag.
[5] K. Deb. Multi-objective genetic algorithms: Problem
difficulties and construction of test problems. Evolutionary
Computation, 7(3):205–230, 1999.
[6] M. Dellnitz, O. Sch¨ utze, and T. Hestermeyer. Covering
pareto sets by multilevel subdivision techniques. J. of Opti-
mization Theory and Applications, 124(1):113–136, 2005.
[7] J. Fliege and B. F. Svaiter. Steepest descent methods for
multicriteria optimization. Mathematical Methods of
Operations Research, 51(3):479–494, 2000.
[8] M.R. Hestenes and E. Stiefel. Methods of conjugate
gradients for solving linear systems. J. Res. Nat. Bur.
Standards, 6:409–436, 1952.
[9] H. Ishibuchi, T. Yoshida, and T. Murata. Balance between
genetic search and local search in memetic algorithms for
multiobjective permutation flowshop scheduling. IEEE
Trans. on Evolutionary Computation, 7:204–223, 2003.
[10] J. D. Knowles and D. Corne. M–PAES: a memetic
algorithm for multiobjective optimization. In Proceedings of
the 2000 Congress on Evolutionary Computation -
CEC–2000, pages 325–332, Piscataway, New Jersey, 2000.
IEEE Press.
[11] J. D. Knowles and D. Corne. On metrics for comparing
non–dominated sets. In Proceedings of the 2002 Congress
on Evol. Comp. CEC 2002, pages 666–674, Piscataway,
New Jersey, 2002. IEEE Press.
[12] J. D. Knowles and D. W. Corne. A comparison of diverse
approaches to memetic multiobjective combinatorial
optimization. In W. Hart et al., editors, Proceedings of the
Workshop on Memetic Algorithms WOMA at the Genetic
and Evolutionary Computation Conference -
GECCO–2000, pages 103–108, 2000.
[13] M. Lahanas, D. Baltas, and S. Giannouli. Global
convergence analysis of fast multiobjective gradient based
dose optimization algorithms for high-dose-rate
brachytherapy. Phys. Med. Biol., 48:599–617, 2003.
[14] T. M. Mitchell. Machine Learning. McGraw-Hill, New
York, New York, 1997.
[15] P. Moscato. On evolution, search, optimization, genetic
algorithms and martial arts: Towards memetic algorithms.
Technical Report Caltech Concurrent Computation
Program, Report. 826, California Institute of Technology,
Pasadena, California, 1989.
[16] H. M¨ uhlenbein and T. Mahnig. FDA – a scalable
evolutionary algorithm for the optimization of additively
decomposed functions. Evolutionary Computation,
7:353–376, 1999.
[17] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P.
Flannery. Numerical Recipes In C: The Art Of Scientific
Computing. Cambridge University Press, Cambridge, 1992.
[18] J. D. Schaffer. Multiple objective optimization with vector
evaluated genetic algorithms. In J. J. Grefenstette, editor,
Proceedings of the 1st International Conference on Genetic
Algorithms, pages 93–100, Mahwah, New Jersey, 1985.
Lawrence Erlbaum Associates, Inc.
[19] S. Sch¨ affler, R. Schultz, and K. Weinzierl. Stochastic
method for the solution of unconstrained vector
optimization problems. Journal of Optimization Theory
and Applications, 114(1):209–222, 2002.
[20] E.-G. Talbi, M. Rahoual, M.-H. Mabed, and C. Dhaenens.
New genetic approach for multicriteria optimization
problems : Application to the flow-shop. In E. Zitzler et al.,
editors, Evolutionary Multi–Criterion Optimization -
EMO’01, Berlin, 2001. Springer–Verlag.
[21] E. Zitzler, K. Deb, and L. Thiele. Comparison of
multiobjective evolutionary algorithms: Empirical results.
Evol. Computation, 8(2):173–195, 2000.
[22] E. Zitzler, M. Laumanns, L. Thiele, C. M. Fonseca, and
V. Grunert da Fonseca. Why quality assessment of
multiobjective optimizers is difficult. In W. B. Langdon
et al., editors, Proceedings of the GECCO–2002 Genetic
and Evolutionary Computation Conference, pages 666–674,
San Francisco, California, 2002. Morgan Kaufmann.
Download full-text