# Exploiting gradient information in numerical multi--objective evolutionary optimization.

**ABSTRACT** Various multi--objective evolutionary algorithms (MOEAs) have obtained promising results on various numerical multi--objective optimization problems. The combination with gradient--based local search operators has however been limited to only a few studies. In the single--objective case it is known that the additional use of gradient information can be beneficial. In this paper we provide an analytical parametric description of the set of all non--dominated (i.e. most promising) directions in which a solution can be moved such that its objectives either improve or remain the same. Moreover, the parameters describing this set can be computed efficiently using only the gradients of the individual objectives. We use this result to hybridize an existing MOEA with a local search operator that moves a solution in a randomly chosen non--dominated improving direction. We test the resulting algorithm on a few well--known benchmark problems and compare the results with the same MOEA without local search and the same MOEA with gradient--based techniques that use only one objective at a time. The results indicate that exploiting gradient information based on the non--dominated improving directions is superior to using the gradients of the objectives separately and that it can furthermore improve the result of MOEAs in which no local search is used, given enough evaluations.

**0**Bookmarks

**·**

**62**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**The production planning optimization for mineral processing is important for non-renewable raw mineral resource utilization. This paper presents a nonlinear multiobjective programming model for a mineral processing production planning (MPPP) for optimizing five production indices, including its iron concentrate output, the concentrate grade, the concentration ratio, the metal recovery, and the production cost. A gradient-based hybrid operator is proposed in two evolutionary algorithms named the gradient-based NSGA-II (G-NSGA-II) and the gradient-based SPEA2 (G-SPEA2) for MPPP optimization. The gradient-based operator of the proposed hybrid operator is normalized as a strictly convex cone combination of negative gradient direction of each objective, and is provided to move each selected point along some descent direction of the objective functions to the Pareto front, so as to reduce the invalid trial times of crossover and mutation. Two theorems are established to reveal a descent direction for the improvement of all objective functions. Experiments on standard test problems, namely ZDT 1-3, CONSTR, SRN, and TNK, have demonstrated that the proposed algorithms can improve the chance of minimizing all objectives compared to pure evolutionary algorithms in solving the multiobjective optimization problems with differentiable objective functions under short running time limitation. Computational experiments in MPPP application case have indicated that the proposed algorithms can achieve better production indices than those of NSGA-II, T-NSGA-FD, T-NSGA-SP, and SPEA2 in the case of small number of generations. Also, those experimental results show that the proposed hybrid operators have better performance than that of pure gradient-based operators in attaining either a broad distribution or maintaining much diversity of obtained non-dominated solutions.IEEE Transactions on Evolutionary Computation 09/2011; · 4.81 Impact Factor - SourceAvailable from: msu.edu[Show abstract] [Hide abstract]

**ABSTRACT:**In this paper, we study the influence of the number of objectives of a continuous multiobjective optimization problem on its hardness for evolution strategies which is of particular interest for many-objective optimization problems. To be more precise, we measure the hardness in terms of the evolution (or convergence) of the population toward the set of interest, the Pareto set. Previous related studies consider mainly the number of nondominated individuals within a population which greatly improved the understanding of the problem and has led to possible remedies. However, in certain cases this ansatz is not sophisticated enough to understand all phenomena, and can even be misleading. In this paper, we suggest alternatively to consider the probability to improve the situation of the population which can, to a certain extent, be measured by the sizes of the descent cones. As an example, we make some qualitative considerations on a general class of uni-modal test problems and conjecture that these problems get harder by adding an objective, but that this difference is practically not significant, and we support this by some empirical studies. Further, we address the scalability in the number of objectives observed in the literature. That is, we try to extract the challenges for the treatment of many-objective problems for evolution strategies based on our observations and use them to explain recent advances in this field.IEEE Transactions on Evolutionary Computation 09/2011; · 4.81 Impact Factor - SourceAvailable from: Michael Emmerich[Show abstract] [Hide abstract]

**ABSTRACT:**In multi-objective optimization the hypervolume indicator is a measure for the size of the space within a reference set that is dominated by a set of $\mu$ points. It is a common performance indicator for judging the quality of Pareto front approximations. As it does not require a-priori knowledge of the Pareto front it can also be used in a straightforward manner for guiding the search for finite approximations to the Pareto front in multi-objective optimization algorithm design. In this paper we discuss properties of the gradient of the hypervolume indicator at vectors that represent approximation sets to the Pareto front. An expression for relating this gradient to the objective function values at the solutions in the approximation set and their partial derivatives is described for arbitrary dimensions $m \geq 2$ as well as an algorithm to compute the gradient field efficiently based on this information. We show that in the bi-objective and tri-objective case these algorithms are asymptotically optimal with time complexity in $\Theta(\mu d + \mu \log \mu)$ for $d$ being the dimension of the search space and $\mu$ being the number of points in the approximation set. For the case of four objective functions the time complexity is shown to be in $\mathcal{O}(\mu d + \mu^2)$. The tight computation schemes reveal fundamental structural properties of this gradient field that can be used to identify zeros of the gradient field. This paves the way for the formulation of stopping conditions and candidates for optimal approximation sets in multi-objective optimization.01/2014;

Page 1

Exploiting Gradient Information in Numerical

Multi–Objective Evolutionary Optimization

Peter. A.N. Bosman

Centre for Mathematics and Computer Science

P.O. Box 94079

1090 GB Amsterdam

The Netherlands

Peter.Bosman@cwi.nl

Edwin D. de Jong

Institute of Information and Computing Sciences

Utrecht University

P.O. Box 80089

3508 TB Utrecht

The Netherlands

dejong@cs.uu.nl

ABSTRACT

Various multi–objective evolutionary algorithms (MOEAs)

have obtained promising results on various numerical multi–

objective optimization problems. The combination with gra-

dient–based local search operators has however been limited

to only a few studies. In the single–objective case it is known

that the additional use of gradient information can be bene-

ficial. In this paper we provide an analytical parametric de-

scription of the set of all non–dominated (i.e. most promis-

ing) directions in which a solution can be moved such that

its objectives either improve or remain the same.

over, the parameters describing this set can be computed

efficiently using only the gradients of the individual objec-

tives. We use this result to hybridize an existing MOEA with

a local search operator that moves a solution in a randomly

chosen non–dominated improving direction. We test the re-

sulting algorithm on a few well–known benchmark problems

and compare the results with the same MOEA without lo-

cal search and the same MOEA with gradient–based tech-

niques that use only one objective at a time. The results

indicate that exploiting gradient information based on the

non–dominated improving directions is superior to using the

gradients of the objectives separately and that it can further-

more improve the result of MOEAs in which no local search

is used, given enough evaluations.

Categories and Subject Descriptors

More-

G.1.6 [Numerical Analysis]: Optimization—Gradient meth-

ods; I.2 [Artificial Intelligence]: Problem Solving, Con-

trol Methods, and Search

General Terms

Algorithms, Performance, Experimentation, Theory

Keywords

Evolutionary Algorithms, Memetic Algorithms, Multi–Ob-

jective Optimization, Numerical Optimization, Gradients

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

GECCO’05, June 25–29, 2005, Washington, DC, USA.

Copyright 2005 ACM 1-59593-010-8/05/0006 ...$5.00.

1.INTRODUCTION

Evolutionary algorithms (EAs) seek to exploit the global

structure of the search space.

crossover, the aim of which is to fruitfully exchange good

partial solutions, and the use of estimation–of–distribution

techniques, which provide a way of discovering and using

information about variable linkage. Information about the

local structure of the search space is often disregarded.

In many problems of interest however, particularly those

defined over continuous spaces, local structure can provide a

great deal of information about the directions in which im-

provement can be achieved. Indeed, a substantial part of the

methods studied in the field of machine learning are based

on the principle of following the gradient of a performance

function [14]. Hence, a sensible question is whether EAs for

numerical (i.e. real–valued) optimization can be improved

by incorporating local optimization techniques.

For single-objective problems, the notion of combining

EAs with local (possibly gradient–based) optimization is

well known. The resulting hybrid EAs have mainly become

known under the name of Memetic Algorithms [15]. Also

in multi–objective optimization, the use of such approaches

has recently come into focus. The main focus has been on

combinatorial optimization problems [9, 12, 20].

In single–objective numerical optimization, the gradient

∇f of the function f to be optimized conveys useful infor-

mation. For any l–dimensional point y, the gradient at that

point −(∇f(x))(y) is the direction of greatest decrease of f

starting from point y. Hence this direction can be used in an

algorithm to find local minima of f. Many such algorithms

exist, ranging from straightforward ones such as gradient de-

scent to more advanced ones such as conjugate gradients [8].

These algorithms are efficient at finding local minima. Hy-

bridizing EAs with such gradient–based optimization algo-

rithms has been shown to lead to good results for single–

objective optimization [1]. A natural question therefore is

how we can expand gradient–based techniques to a multi–

objective setting in an attempt to improve upon existing

MOEAs for numerical multi–objective optimization.

In multi-objective optimization the situation is however

more complicated. For instance, there is typically no single

direction to move a solution in so that all objectives are im-

proved simultaneously. Therefore, important first questions

for applying gradient techniques in a multi–objective setting

are whether we can and should use the gradients of the in-

dividual objective functions separately, what the gradient of

Examples are the use of

Page 2

a multi–objective function looks like and how we can utilize

the information conveyed by this gradient. In this article

we will provide an answer to these questions and use these

answers to construct new hybrid MOEAs with.

In this work, we compare three approaches to using gradi-

ent information. First, we consider a method that randomly

chooses an objective and optimizes it using standard single–

objective gradient–based optimization techniques. Different

individuals will thus be optimized with respect to differ-

ent objectives.This method has strong similarities with

the early multi-objective method VEGA [18].

consider a method which, for a given individual, optimizes

the various objectives in turn, again using standard single–

objective gradient–based optimization techniques. A more

interesting question however is whether all objectives in a

multi–objective problem can be optimized simultaneously

by moving a solution in a specific direction. Unless the so-

lution is trapped in a multi–objective local–optimum, using

such a direction will always improve the quality of the solu-

tion. Our third method achieves this aim by identifying all

non–dominated directions starting from the current point in

which all objectives either improve or remain the same.

Next, we

2.RELATED WORK

Currently, there exist only a few publications regarding

real–valued multi–objective memetic algorithms (or hybrid

EAs), the best–known of which is the M-PAES [10]. How-

ever, the M-PAES does not explicitly make use of gradient

information. Lahanas, Baltas and Giannouli do use gradi-

ent information explicitly [13]. However, they use weighted

aggregation to construct a single objective function which is

subsequently optimized. Hence, there is no guarantee of op-

timizing all objectives simultaneously. Brown and Smith [4]

use a hybrid EA based on an analytic description by Fliege

and Svaiter [7] of a direction that has the specific property

that it is the largest direction of simultaneous improvement.

This direction is referred to by the authors as the multi–

objective gradient. A multilevel subdivision technique that

subdivides the search space to perform local search in each

subspace based on a similar derivation of a single direction

of descent by Sch¨ affler, Schultz and Weinzierl [19] was pro-

posed by Dellnitz, Sch¨ utze and Hestermeyer [6]. However,

if the objectives have different ranges, the largest direction

of simultanous descent will be biased towards the objec-

tive with the largest range. Even if the objectives are first

scaled the same way there are, as we shall show, still multi-

ple (typically infinite) directions of improvement that do not

dominate each other (for instance improving objective 0 and

leaving objective 1 unchanged versus improving objective 1

and leaving objective 0 unchanged). All of these directions

are equally useful in multi–objective optimization.

The difference with our work is that we analytically de-

scribe the complete set of non–dominated simultaneously

improving directions. Hence, we consider the multi–objective

gradient to be a set of directions (specifically a m − 1 di-

mensional manifold in a m–dimensional space where m is

the number of objectives). Moreover, the hybrid EA used

by Brown and Smith is only tested on a 2–dimensional, 2–

objective quadratic test problem that has very nice gradient

properties, which is not expected to be a good practical test–

case. Here we use a well–known set of 5 benchmark tests

that have a higher dimensionality and vary in difficulty [21].

3.

3.1

GRADIENTS&MULTIPLEOBJECTIVES

Single objective

The gradient of f returns for any point y the direction of

greatest increase of f starting from y and it is defined as:

∇f(x) = (∂f/∂x0,∂f/∂x1,...,∂f/∂xl−1)

The direction of greatest decrease of f starting from y

is just the negative gradient at y, i.e. −(∇f(x))(y). The

directional derivative of f in direction u is a function that

for any point y returns the rate of change of f in direction

u and it is defined as:

(∇uf(x))(y) = ((∇f(x))(y))Tˆ u

Clearly, it can be shown that the directional derivative

(in some point) is minimal if and only if u points in the

same direction as the negative gradient of f, i.e.

where g = −∇f(x). Hence, finding the direction of greatest

decrease of f at any point y is actually a single–objective

optimization problem defined as:

min

ˆ u

The answer to this optimization problem of course is the

negative gradient of f at point y, i.e. −(∇f(x))(y).

3.2Multiple objectives

We assume to have m real–valued objective functions that,

without loss of generality, we seek to minimize simultane-

ously. We denote the objective functions by fi(x) where

i ∈ {0,1,...,m − 1} and x ∈ Rl.

To keep the analogy with the single–objective setting, we

would like to define the gradient of f = (f0,f1,...,fm−1)

for any point y as the direction of greatest increase of f

starting from y. However, in a multi–objective setting there

is in general no single such direction as we shall see.

Observe the optimization–problem definition of finding

the best direction, i.e. find the unit direction ˆ u such that

the directional derivative in direction ˆ u indicates the best

improvement. In the multi–objective case we define the di-

rectional derivative ∇uf(x) to be a vector such that the ith

component of that vector is the directional derivative in the

ithobjective, i.e. (∇uf(x))i = ∇ufi(x), or:

(1)

(2)

ˆ u = ˆ g

{(∇uf(x))(y)}(3)

∇uf(x) = Gˆ u

(4)

3

7

where G = (∇f0(x),...,∇fm−1(x))T=

2

6

4

(∇f0(x))T

...

(∇fm−1(x))T

5

To find the best direction ˆ u we must solve the optimiza-

tion problem in Equation 3 with f replaced by f. This opti-

mization problem is now multi–objective. Intuitively, this is

because a direction indicates improvement in f if and only

if the directional derivatives of all individual objective func-

tions are non–positive (i.e. improvement in all objectives or

remain the same). Now, similar to the single–objective case

we are ultimately interested in the direction that maximizes

the improvement (i.e. the negative gradient). Hence, in the

multi–objective case we prefer the set of directions that cor-

respond to the Pareto–optimal front of improving directions.

Indeed, in the multi–objective setting there is thus in general

no single direction of greatest increase of f starting from y.

An important question now is whether we can find a para-

metric description of the set of directions that corresponds

to the Pareto–optimal front of improving directions. Oth-

erwise the only way to proceed is to run a multi–objective

optimization algorithm just to find a good simultaneously

improving direction, which clearly is less preferable.

Page 3

-1

-0.5

0

0.5

1-1

-0.5

0

0.5

1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

ˆ u0

ˆ u1

ˆ u2

-8

-6

-4

-2

0

2

4

6

8

-1.5-1 -0.5 0 0.5 1 1.5

∇ˆ uf0(x)

(Unit dir. describing) complete ellipsoid

(Unit dir. describing) ellipsoid perimeter

(Unit dir. describing) ellipsoid extrema (ˆ uex,i)

(Unit dir. describing) ellipsoid surface between extrema (Uex)

(Unit dir. describing) ellipsoid neg.-subspace extrema (ˆ uneg–ex,i)

(Unit dir. describing) surface between neg.-subspace extrema (Uneg–ex)

∇ˆ uf1(x)

Figure 1: Top: Unit directions in a 3–dimensional

parameter space.

Bottom:

directionstothe2–dimensional

directional–derivative space.

Mapping of the unit

multi–objective

Each of the m objectives in the function to optimize in

Equation 4 is just a linearly weighted sum of the compo-

nents of the direction vector, i.e. the jthobjective, j ∈

{0,1,...,m − 1}, is −(∇fj(x))Tˆ u =Pl−1

unit hypersphere of dimensionality l the set of all objective

vectors form an ellipsoid of dimensionality m. Moreover, if

m < l, which is usually the case, the objective vectors lie

both on the surface of the ellipsoid as well as inside the el-

lipsoid. The points on this surface are given by directions ˆ u

on the l–dimensional hypersphere that are themselves con-

nected through a m–dimensional hypersphere. For instance

if l = 3 and m = 2, then the ˆ u lie on a unit sphere, the

objective vectors ∇uf(x) lie on and inside a 2–dimensional

ellipsoid and the ˆ u that are projected onto the perimeter of

this ellipsoid together form a unit circle across surface of the

unit sphere. This case is illustrated in Figure 1.

Since we are interested in extrema (i.e. non–dominated

directional derivatives), we only want to describe the surface

of the ellipsoid. Specifically, we are interested in the part

of the surface that intersects with (−∞,0]msince we are

only interested in negative directional derivatives for our

minimization task. This part of the surface of the ellipsoid

is described by all unit vectors that are contained between

the specific unit vectors ˆ uex,i, i ∈ {0,1,...,m−1} for which

i=0−(∇fj(x))iˆ ui.

Now because the ˆ u to optimize over lie on the surface of a

the multi–objective directional derivative ∇ˆ uex,if(x) is an

extremum in the i–th objective. This set of vectors which

we will denote Uexcontains all vectors that are obtained by

taking a linearly weighted sum of the extrema unit vectors

ˆ uex,i.Note that these vectors have to be normalized to

ensure that they are unit vectors:

(Pm−1

However, set Uexisn’t tight enough because it can still

involve directions for which not all components of the multi–

objective directional derivative are negative simultaneously.

This is illustrated for our example case in Figure 1. Hence we

need m additional constraints to get the desired set Uneg–ex:

(Pm−1

The ultimate goal is to take a random direction from set

Uneg–ex. A straightforward way to do so using Equation 6 is

to repeat drawing values for the αi until the m inequality

constraints are met. However, the number of unit vectors for

which all directional derivatives are at most 0 may be signif-

icantly smaller than the number of unit vectors for which at

least one directional derivative is smaller than 0, especially

if m becomes larger. Hence it would be convenient to have a

sampling method without constraints. To this end we first

find all m unit vectors within set Uexthat map to corner

points of Uneg–ex, i.e. all unit vectors ˆ uneg–ex,ifor which the

multi–objective directional derivative is negative in the ith

direction and zero in all other directions. This means that

we must solve ∇ˆ uneg–ex,if(x) = −λiˆ eiwhere ˆ eiis a unit

column vector with a 1 at position i and zeros at all other

positions and λi is a non–negative real value, λi ≥ 0. More-

over, there is a unique solution to this equation because in

the objective space solving this equation means finding the

intersection of the negative part of an axis with the sur-

face of an ellipsoid that is centered at the origin. Using the

definition in Equation 4 we obtain:

„

Since

more Equation 7 must have a unique solution, also Guneg–ex,i=

−ˆ eimust have a unique solution which we can use to find

ˆ uneg–ex,i=uneg–ex,i/?uneg–ex,i? with. Because matrix G is a

m × l matrix, it is generally speaking not square and can-

not be inverted. However, we know that ˆ uneg–ex,i∈ Uexand

hence we can write uneg–ex,i=Pm−1

2

6

Pm−1

We define a matrix D such that Dkj = (∇fk(x))Tˆ uex,j

and a vector αisuch that αik = αi

be rewritten as Dαi.Moreover, because D is a square

matrix, we can find a unique solution for uneg–ex,ithrough

αiif we solve Dαi= ˆ eiby inverting D, i.e. αi= D−1ˆ ei.

Finally, we note that ˆ uex,j, the extremum in the negative

direction of the multi–objective directional derivative in the

jthdimension is, using the result of Equation 3, of course

just −(∇fj(x)). We can therefore now write:

Uex=

i=0αiˆ uex,i

?Pm−1

i=0αiˆ uex,i?

˛˛˛˛˛

αi ∈ [0,1],

Pm−1

i=0αi = 1

)

(5)

Uneg–ex=

i=0αiˆ uex,i

?Pm−1

i=0αiˆ uex,i?

˛˛˛˛˛

αi∈[0,1],Pm−1

∇Pm−1

i=0αi = 1,

i=0αiˆ uex,ifj(x) ≤ 0

)

(6)

Gˆ uneg–ex,i=−λiˆ ei⇔ G

1

λi ?uneg–ex,i?

«

uneg–ex,i=−ˆ ei(7)

1

λi? uneg–ex,i?is just a scaling factor and since further-

j=0αi

jˆ uex,jwhere we do

not know the αi

j. We now have:

G

m−1

j=0

X

αi

jˆ uex,j

!

=

6

4

Pm−1

j=0αi

j=0αi

j(∇f0(x))Tˆ uex,j

...

j(∇fm−1(x))Tˆ uex,j

3

7

7

5

(8)

k. Then Guneg–ex,ican

Page 4

ˆ uneg–ex,i= −

Pm−1

?Pm−1

j=0

`D−1ˆ ei´

`D−1ˆ ei´

j(∇fj(x))

j=0

j(∇fj(x))?

(9)

This latter result now allows us to rewrite equation 6 and

define the set Uneg–exof unit directions to choose from for

multi–objective gradient search without the inequality con-

straint, but in the same fashion as set Uneg–exinstead:

(Pm−1

4. GRADIENT–BASED HYBRID MOEAS

4.1Base MOEA

The base MOEA we use is the naive MIDEA [3]. This

MOEA is an EDA specifically designed for multi–objective

optimization. It has been shown to give good results on a

wide variety of problems defined in both discrete and con-

tinuous parameter spaces. Moreover, it is fast and easy to

understand, making it a good baseline algorithm. The fol-

lowing gives a brief overview of its main features. For specific

details the interested reader is referred to the literature [3].

The naive MIDEA maintains a population of size n. In

each generation it selects a subset of this population of size

⌊τn⌋, τ ∈ [1

variation n−⌊τn⌋ new solutions are generated which replace

the solutions in the population that were not selected.

Selection is performed using a diversity–preserving selec-

tion operator. Since the goal in multi–objective optimization

is both to get close to the Pareto optimal front and to get

a good diverse representation of that front, a good selection

operator must exert selection pressure with respect to both

aspects. The selection operator in the naive MIDEA does

this by using truncation selection on the basis of domination

count (i.e. the number of times a solution is dominated). If

the number of non–dominated solutions exceeds the targeted

selection size ⌊τn⌋, a nearest–neighbour heuristic in the ob-

jective space is used to ensure that a well–spread, represen-

tative subset of all non–dominated solutions is chosen.

The variation operator is geometrical in nature and is

specifically designed to provide an advantage over tradi-

tional variation operators. The selected solutions are first

clustered in the objective space. Subsequently, the actual

variation takes place only between individuals in the same

cluster, i.e. a mating restriction is employed. The ratio-

nale is that variation inside each cluster can process spe-

cific information about the different regions along the Pareto

front. Such a parallel exploration automatically gives a bet-

ter probability of obtaining a well–spread set of offspring

solutions. To further stimulate diversity along the Pareto

front each new offspring solution is constructed from a ran-

domly chosen cluster. Variation inside a single cluster is

done by estimating a one–dimensional normal–distribution

for each variable separately.

4.2Hybrid MOEAs

4.2.1General hybrid MOEA framework

The hybridization scheme that we employ is a genera-

tional one. At the end of a generation, i.e. after one step

of selection, variation and re–evaluation is finished, a set of

candidate solutions is determined. The gradient–based local

search operator is then applied with each of these candidate

solutions as a starting point. We identify three classes of

candidate–solution sets:

Uneg–ex=

i=0βiˆ uneg–ex,i

?Pm−1

i=0βiˆ uneg–ex,i?

˛˛˛˛˛

βi ∈ [0,1],

Pm−1

i=0βi = 1

)

(10)

n,1), to perform variation with. By means of

1. The entire population

This allows for the widest search and the largest prob-

ability of discovering of new local optima, but also has

the largest chance of finding improvements that are

not better than the current non–dominated solutions.

2. The selected solutions

Since the set of selected solutions contains all promis-

ing solutions from the viewpoint of the MOEA, this

set is a logical choice for selecting solutions to attempt

to improve. This set represents a rational trade–off

choice between the entire population and the set of

the non–dominated solutions only.

3. The non–dominated solutions

Although it is directly beneficial if solutions from this

set can be improved upon, there is also a consider-

able probability that the solutions in this set lie in a

multi–objective local optimum and hence cannot be

improved further using local search.

To control the ratio between how much search effort is

spent by the MOEA and how much search effort is spent by

the gradient–based local search operator, we introduce a ra-

tio parameter ρe. In our scheme we aim to keep the ratio of

the number of evaluations required by the gradient–based

local search operator and the total number of evaluations

required so far equal to ρe. To this end, the gradient–based

local search operator is applied only as long as the current

actual ratio is smaller than ρe. Moreover, the order in which

the candidate solutions are searched is randomized. We do

not suggest that this hybridization approach is optimal, but

it will point out whether a gradient–based local search algo-

rithm can aid in real–valued multi–objective optimization.

4.2.2Random–objective conjugate gradients

In this straightforward approach single–objective gradient–

based local search is applied to a randomly chosen objective.

Because the local search is now focused on a single objec-

tive, more advanced single–objective algorithms known from

literature can be used. We use the conjugate gradients al-

gorithm and call the resulting strategy ROCG (Random–

Objective Conjugate Gradients). It depends completely on

the correlation between the objectives whether the best local

improvement in a single objective also leads to an improve-

ment in the other objectives. Since this is typically not the

case, the percentage of local searches that leads to an im-

provement (i.e. a solution that dominates the solution from

where the local search started) is expected to be small.

4.2.3Alternating–objective repeated line–search

To reduce the probability of improving a single objec-

tive while making the objective value in the other objective

worse, the objective that is searched locally can be altered

during local search. This still allows the use of well–known

single–objective approaches. However, care should be taken

in the design because it does not make sense to let the local

search in a single objective converge to a minimum. Doing

so results in the same approach as ROCG. Hence we propose

to perform a line–search (i.e. find a local minimum with re-

spect to a single direction) in a single, alternatingly chosen

objective in the direction of the negative gradient of that

objective. This process is repeated until a multi–objective

local minimum is found. In our multi–objective setting we

propose to switch to a randomly chosen different objective

once a line–search has terminated. We refer to this strategy

as AORL (Alternating–Objective Repeated Line–search).

Page 5

4.2.4

This final strategy uses the results from Section 3.2. The

approach is best described as a multi–objective version of

gradient descent. A line–search is performed in a promising

direction.When the line–search terminates, a new line–

search is initiated in a new promising direction found at the

new location. It was shown in Section 3.2 that there is a set

of non–dominated directions that are all most promising.

These directions are described by equation 10. Hence, when

a line–search terminates, a random vector of β is drawn such

that each βi ∈ [0,1] andPm−1

this case does not aim to minimize a single objective but

aims to find the best non–dominated solution in the given

direction, i.e. the line–search in this strategy is a multi–

objective search algorithm as well. We refer to this strategy

as CORL (Combined–Objectives Repeated Line–search).

5.EXPERIMENTS

5.1Setup

5.1.1Multi–objective optimization problems

The problems we have used have been taken from the lit-

erature on designing difficult and interesting multi–objective

optimization problems and on comparing various MOEAs [5,

21]. Specifically, we have used the problems known as ECi,

i ∈ {1,2,3,4,6}. For specific details regarding the difficulty

of these problems we refer the interested reader to the indi-

cated literature. Here we only present their definition.

Combined–objectives repeated line–search

i=0βi = 1. Equation 10 is then

used to obtain a new direction. Note that a line–search in

Name Objectives Domain

EC1

f0=x0,f1 = γ

“Pl−1

f1 = γ`1 − (f0/γ)2´

“

1 −pf0/γ

”

γ =1 + 9

i=1xi/(l − 1)

”

”

[0,1]30

(l = 30)

EC2

f0=x0,

γ =1 + 9

“Pl−1

1 −pf0/γ − (f0/γ)sin(10πf0)

γ =1 + 9

“Pl−1

f0=x0,f1 = γ

“

γ =1 + 10(l − 1) +Pl−1

f0=1 − e−4x0sin6(6πx0),

“Pl−1

Performance indicator

To measure performance we only consider the subset of

all non–dominated solutions in the population upon termi-

nation. We call such a subset an approximation set and

denote it by S. A performance indicator is a function of

approximation sets S and returns a real value that indicates

how good S is in some aspect. More detailed information

regarding the importance of using good performance indica-

tors for evaluation may be found in literature [2, 11, 22].

Here we use a single performance indicator that is based

on knowledge of the optimum, i.e. the Pareto–optimal front.

We define the distance d(x0,x1) between two multi–objective

solutions x0and x1to be the Euclidean distance between

their objective values f(x0) and f(x1). The performance in-

dicator we use computes the average of the distance to the

closest solution in an approximation set S over all solutions

in the Pareto–optimal set PS. We denote this indicator by

i=1xi/(l − 1)

[0,1]30

(l = 30)

EC3

f0=x0

f1=γ

“

”

i=1xi/(l − 1)

”

[0,1]30

(l = 30)

EC4

1 −pf0/γ

i=1

”

`x2

”0.25

i− 10cos(4πxi)´

f1 = γ`1 − (f0/γ)2´

[−1,1]×

[−5,5]9

(l = 10)

[0,1]10

EC6

γ =1 + 9

i=1xi/(l − 1)

(l = 10)

5.1.2

DPF→S and refer to it as the distance from the Pareto–

optimal front to an approximation set. A smaller value for

this performance indicator is preferable and a value of 0

is obtained if and only if the approximation set and the

Pareto–optimal front are identical. This indicator is ideal

for evaluating performance if the optimum is known because

it describes how well the Pareto–optimal front is covered

and thereby represents an intuitive trade–off between the

diversity of the approximation set and its proximity (i.e.

closeness to the Pareto–optimal front). Even if all points

in the approximation set are on the Pareto–optimal front

the indicator is not minimized unless the solutions in the

approximation set are spread out perfectly.

Because the Pareto–optimal front may be continuous a

line integration over the entire Pareto front is required in

the definition of the performance indicator. In a practical

setting, it is easier to compute a uniformly sampled set of

many solutions along the Pareto optimal front and to use

this discretized representation of PF instead. We have used

this approach using 5000 uniformly sampled points. The

performance indicator now is defined as follows:

1

|PS|

x1∈PS

DPF→S(S) =

X

min

x0∈S{d(x0,x1)}(11)

5.1.3

For selection we set τ to 0.3, conforming to earlier work [3]

and the rule–of–thumb for FDA [16]. We allowed gradient–

based local search operators 10 iterations each time they

were called. We set ρe ∈ {0,0.1,0.25,0.5}. Note that for

ρe = 0 the pure naive MIDEA is obtained. Gradient in-

formation was approximated when required using ∆xi =

10−13. Furthermore, we have used the Polak–Ribiere vari-

ant of the conjugate gradient algorithm [17]. All reported

results were averaged over 30 runs, except for the conver-

gence plots, which were averaged over 100 runs.

It is important to note that all variables have a bounded

range.If the variables move outside of this range, some

objective values can become non–existent. It is therefore

important to keep the variables within their ranges. How-

ever, a simple repair mechanism that changes a variable to

its boundary value if it has exceeded this boundary value

gives artifacts that may lead us to draw false conclusions

about the performance of the tested MOEAs. If for instance

the search probes a solution that has all negative values for

each of the variables xi with i ≥ 1, then the repair mech-

anism in the case of all problems except problem EC6 sets

all these variables to 0. This is especially well possible dur-

ing a gradient–search procedure because the gradient with

respect to the second objective points in the direction of all

negative values for variables xi with i ≥ 1. It is not hard

to see that the solution resulting after boundary repair lies

on the Pareto front. We have therefore adapted the local

search operators such that local search never changes a so-

lution into one that lies out of the problem range. Similarly,

the sampling procedure of the naive MIDEA is changed to

prevent the generation of solutions that are out of bounds.

5.2Results

In Table 1 the average DPF→S value is shown for all

tested MOEAs with a maximum of either 20000 or 200000

evaluations. All population sizes between 0 and 1000 with a

stepsize of 25 were tested and ultimately the one that lead

to the best average indicator value was chosen.

General algorithmic setup

Page 6

Max. eval. = 20000 Max. eval. = 200000

EC1 EC2 EC3 EC4 EC6

“

Grad. C EC1 EC2 EC3 EC4 EC6

“

None–

0.43 0.83 1.17 3.28 1.21 1.88 2.10 0.56 1.58 1.17

ρe= 0.10

ROCG P0.53 1.33 1.31 4.31 1.44

ROCG S 0.54 1.11 1.29 3.89 1.43

ROCG N 0.52 1.08 1.28 4.07 1.38

AORL P0.49 1.25 1.30 4.03 1.39

AORL S0.54 1.29 1.28 3.75 1.41

AORL N0.51 1.28 1.31 3.97 1.41

CORL P 0.50 1.14 1.29 4.47 1.42

CORL S0.51 1.15 1.28 4.22 1.46

CORL N0.50 1.06 1.27 4.10 1.42

ρe= 0.25

ROCG P0.73 1.54 1.49 5.22 1.62

ROCG S 0.72 1.77 1.58 4.86 1.62

ROCG N 0.76 1.89 1.57 4.35 1.69

AORL P0.74 1.50 1.56 3.50 1.54

AORL S0.73 1.86 1.55 3.49 1.64

AORL N0.74 1.69 1.61 4.64 1.65

CORL P 0.67 1.59 1.50 3.62 1.80

CORL S0.68 1.76 1.43 3.37 1.67

CORL N0.68 1.68 1.44 3.39 1.55

ρe= 0.50

ROCG P1.44 3.37 2.04 6.34 2.37

ROCG S1.43 3.20 2.22 6.01 2.44

ROCG N 1.39 3.22 2.20 6.60 2.34

AORL P 1.40 3.30 2.31 3.25 2.23

AORL S1.43 3.19 2.28 3.80 2.19

AORL N 1.39 3.32 2.28 5.30 2.31

CORL P1.19 3.08 1.92 3.15 2.16

CORL S1.14 2.97 1.81 3.54 2.29

CORL N1.05 3.11 1.74 4.76 2.20

Table 1:Average DPF→S metric for all tested

MOEAs and all benchmark problems for two differ-

ent maximum number of evaluations.

the set of candidates, P = population, S = selected

solutions, N = non–dominated solutions.

The first question to answer is which local gradient–based

search technique is the most promising. The hybridization

based on CORL almost always gives better results than the

other approaches. The superiority of CORL is emphasized

further in Table 2 where it is clear that CORL has by far the

largest probability of improving a solution. Hence, if local

gradient–based search is to be used, the CORL approach

proposed in this paper is the most promising one.

The next question to answer is whether the additional

use of local gradient–based search in MOEAs is a useful

approach at all. The results in Table 1 show a dominating

performance of the base MOEA alone for all problems except

EC4. Moreover, the the performance generally deteriorates

with an increased number of evaluations allowed to be spent

by the local search operator (i.e. a larger ρe). This would at

first sight indicate that the added use of local gradient–based

search generally speaking only hampers MOEAs. However,

this general statement can be differentiated when observing

the results more thoroughly.

First of all, local search is expensive. Finding a local min-

imum requires a relatively large amount of resources com-

pared to a global search across the entire landscape. More-

over, in the multi–objective case local search is relatively

speaking even more expensive because there are typically

many (Pareto–) optimal solutions and a single local search

only improves a single solution. An EA on the other hand

improves multiple solutions simultaneously (if variation is

successful).In multi–objective optimization this may in-

volve the improvement of large parts of the Pareto–front.

Hence, local search benefits are better visible if more eval-

uations are allowed, especially in the multi–objective case.

Indeed, Table 1 shows that for EC4 the improvement of the

·101” “

·101” “

·101”

·103” “

·103” “

·102”“

·102”

2.07 2.28 0.61 1.55 1.40

2.11 2.26 0.64 1.49 1.32

2.13 2.28 0.63 1.71 1.37

2.12 2.28 0.62 1.60 1.42

2.08 2.32 0.62 1.36 1.55

2.05 2.35 0.63 1.38 1.55

2.04 2.26 0.61 1.07 1.40

2.03 2.26 0.58 1.10 1.41

2.02 2.27 0.59 1.25 1.40

2.50 2.75 0.72 1.78 1.88

2.51 2.73 0.74 1.69 1.74

2.53 2.73 0.74 1.62 1.86

2.50 2.83 0.72 1.57 2.08

2.53 2.89 0.78 1.64 2.35

2.45 2.84 0.77 1.44 2.14

2.35 2.59 0.71 1.02 1.65

2.28 2.53 0.64 1.20 2.16

2.33 2.53 0.66 1.31 1.86

3.94 4.34 1.05 1.82 4.72

3.91 4.35 1.19 2.00 6.40

3.95 4.54 1.20 1.60 7.01

3.88 4.57 1.23 1.36 3.45

3.94 4.52 1.26 1.92 5.93

3.86 4.50 1.22 2.12 4.70

3.27 3.77 0.99 0.82 4.79

3.15 3.55 0.82 1.17 5.19

3.28 3.70 0.80 1.48 5.57

C indicates

Max. eval. = 20000Max. eval. = 200000

EC1 EC2 EC3 EC4 EC6

0.00.00.0

Grad. C EC1 EC2 EC3 EC4 EC6

None–0.00.00.00.0

ρe= 0.10

0.6 17.6

0.8 20.1

0.2 17.1

2.1 16.7 39.2

0.56.5 40.4

0.06.0 41.3

0.00.00.0

ROCG P

ROCG S

ROCG N

AORL P

AORL S

AORL N

CORL P

CORL S

CORL N

0.0 10.6

0.0 11.9

0.0 10.6

0.0 58.1

0.0 49.7

0.0 53.0

99.2 95.6 80.8 91.2 88.2

97.6 93.0 93.7 77.6 85.2

97.2 90.6 91.6 74.4 82.3

9.0

2.7

3.1

0.0

0.0

0.0

0.0 19.2

0.0 14.7

0.0 11.2

96.9 93.1 78.1 76.2 24.7

97.9 94.9 95.7 55.7 17.5

98.7 96.3 97.0 58.4 15.5

8.4

3.8

3.3

5.9

1.3

1.3

5.2 15.6 21.8

0.03.2 14.4

0.01.4 10.2

0.8

0.4

0.3

9.0

4.9

4.6

ρe= 0.25

0.9 14.4

0.5 14.7

0.6 13.1

2.9 18.0 31.3

0.19.1 32.2

0.24.9 27.5

ROCG P

ROCG S

ROCG N

AORL P

AORL S

AORL N

CORL P

CORL S

CORL N

0.0 10.5

0.0 10.1

0.0

0.0 51.6

0.0 46.6

0.0 45.1

97.3 93.7 78.9 87.1 82.3

96.1 94.6 90.7 87.5 75.8

96.6 91.3 91.2 53.3 75.4

8.8

2.9

2.7

0.0

0.0

0.0

0.0 19.2

0.0 14.0

0.0 10.0

93.9 89.6 73.9 69.3 22.7

95.2 93.3 92.9 24.7 23.6

97.6 94.9 94.5 42.3 20.6

8.1

3.7

2.6

6.4

1.2

0.9

5.0 17.3 21.2

0.03.6 13.5

0.00.9

1.7

0.2

0.3

9.8

4.7

3.49.3

8.9

ρe= 0.50

1.7 15.3

1.0 20.4

0.9

1.9 13.1 19.7

0.2 11.5 25.8

0.12.8 20.3

ROCG P

ROCG S

ROCG N

AORL P

AORL S

AORL N

CORL P

CORL S

CORL N

Table 2: Average percentage of calls to the local–

search operator that resulted in an improvement

(i.e. a dominating solution) for all tested MOEAs

and all benchmark problems for two different max-

imum number of evaluations. C indicates the same

as in Figure 1.

base MOEA combined with CORL over base MOEA alone

increases if 200000 evaluations are allowed instead of 20000.

But still, for all other problems, the base MOEA combined

with CORL does not outperform the base MOEA alone.

Interestingly enough, although the final outcome is worse

than using the base MOEA alone, for problems EC1, EC2

and EC3 the CORL approach does reach very high prob-

abilities of improving a solution. So, local search doesn’t

fail in this case. The reason why the base MOEA alone in

the end is still better for a fixed maximum number of eval-

uations is that these three problems are relatively speaking

“too easy” for the base MOEA. For these problems, the

base MOEA is already able to move the Pareto front by

improving many solutions simultaneously. Gradient–search

can then not provide additional help fast enough because the

number of evaluations required before a solution is improved

is relatively large, making local search much more expensive.

The ROCG approach requires on average 72 evaluations per

local search call, but has a very low improvement–ratio. The

AORL approach has a slightly better improvement–ratio but

requires 124 evaluations per call on average. Although the

the CORL approach has a very good improvement–ratio, it

does require 316 evaluations per call on average. Thus, the

MOEA may actually be hampered if local search is used

because it could have moved many solutions simultaneously

using the same number of evaluations. The fundamental dif-

ference with single–objective optimization should be noted

here. The added use of gradient–search applied to only a sin-

gle solution can only help if the number of non-dominated

solutions moved simultaneously by the base MOEA is not

too large. The number of solutions for which this balance is

tipped towards the hybrid side or the non–hybrid side is of

0.0 11.2

0.0

0.0

0.0 44.1

0.0 51.8

0.0 35.1

94.2 86.2 77.3 80.9 80.0

92.7 85.8 88.5 68.0 68.3

94.6 76.7 90.1 45.7 58.6

9.0

2.8

3.3

0.0

0.0

0.0

0.0 19.7

0.0 15.0

0.0

89.4 83.6 72.8 63.8 42.3

91.3 88.4 82.8 41.6 25.3

94.7 88.8 88.7 37.5 21.5

7.7

3.8

2.5

6.2

1.5

1.1

4.8 12.7 19.8

0.03.6 12.4

0.00.5

2.4 11.2

0.4

0.3

9.5

8.1

5.4

2.57.5

9.68.1

Page 7

0

6

0.2

0.4

0.6

0.8

1

Average DPF→S

0

1

2

3

4

5

050 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000

Population size

Average DPF→S

None

ROCG ρe= 0.10

ROCG ρe= 0.25

ROCG ρe= 0.50

AORL ρe= 0.10

AORL ρe= 0.25

AORL ρe= 0.50

CORL ρe= 0.10

CORL ρe= 0.25

CORL ρe= 0.50

Figure 2: Average DPF→S indicator values when us-

ing the complete population as the candidate set for

local search. Top: EC1, 20000 evaluations. Bottom:

EC6, 200000 evaluations.

course problem specific. The idea of less solutions to move

corresponds to smaller population sizes in base MOEA. In-

deed, for smaller population sizes, the results of the hy-

bridized MOEA are better. Only when the population size

increases can the base MOEA perform better. Although this

behavior was observed across the board, two illustrative ex-

amples are provided in Figure 2.

Generally speaking, the additional use of gradient–based

local search at rate ρe is only useful if the part of the over-

all improvement contributed by the local search is also at

least ρe. The main reason why this ratio was not obtained

on problems EC1, EC2 and EC3 is because the base MOEA

could itself reach the optimal Pareto–front with relatively

little evaluations and could hence improve many non–domi-

nated solutions at the same time.

however, reaching the optimal Pareto–front is hard for the

base MOEA alone. Moreover, the CORL approach reaches

high probabilities of improving a solution for this problem.

Because it is hard for the base MOEA to improve the so-

lutions, the relative contribution to the improvement of the

CORL local–search operator is large enough for the hybrid

MOEA to lead to better results. CORL can help to move

some solutions closer to optimal Pareto–front after which the

MOEA is able to find solutions that lie on a similar front as

that solution and so advance the entire front. The CORL

approach now already helps quickly as can be seen from Ta-

ble 1. But it helps even more in the longer run as can be

seen from the same Figure as well as from Figure 3 in which

convergence graphs are shown for a population size of 500

and a maximum number of evaluations of 1·106. Note again

that only the added use of CORL really makes a difference.

On the EC4 problem

0.1

1

10

100

Average DPF→S

(ρe= 0.10)

0.1

100

1

10

100

Average DPF→S

(ρe= 0.25)

0.01

0.1

1

10

0200000 400000600000800000 1e+006

Average DPF→S

(ρe= 0.50)

Evaluations

Figure 3: Convergence graphs for various settings of

ρe for all tested hybrid MOEAs on the EC4 problem.

For problem EC6the base MOEA alone comes close to the

optimal Pareto–front only after 200000 evaluations. This is

a lot more than for problems EC1, EC2and EC3, where only

20000 evaluations are required to come close to the optimal

Pareto–front. Hence, the base MOEA isn’t very good on

the EC6 problem either. However, for this problem gradi-

ent information is simply less helpful upon approaching the

optimal front as can be seen from Table 2. The probabil-

ity of success for the CORL approach heavily decreases on

problem EC6 as more evaluations are allowed.

6. DISCUSSION

The investigation of the results in the previous Section

has provided us with explanations for the observed behavior.

On problems EC1, EC2 and EC3, both the MOEA and the

CORL approach work well. However, because the MOEA

is capable of improving multiple solutions at the same time,

which leads to less required evaluations for an overall bigger

improvement, hybridization stands in the way of improve-

ment. On the EC6 problem the opposite happens. Both

the base MOEA and the CORL approach do not work well.

Still, the MOEA approach is relatively more useful for the

same reasons and hence hybridization again stands in the

Page 8

way of improvement, albeit less efficient in this case. Fi-

nally, on the EC4 problem, the base MOEA performs even

worse but the CORL approach performs reasonably well.

For this reason we have observed superiority of the hybrid

MOEA on the EC4 problem.

Combined with the use of EAs, local gradient–based op-

timization is less fruitful in multi–objective optimization.

A good hybridization scheme should thus be carefully con-

structed. It would be interesting to attempt to construct

such a scheme that takes the observations made in this pa-

per into account and to test it on the same benchmark as

well as on additional problems of varying difficulty and per-

haps a larger dimensionality, both in the sense of the num-

ber of objectives as the number of problem variables. Such

a scheme would be interesting to subsequently apply to a

real–world application.

7. CONCLUSIONS

We have presented a parameterized, easy–to–use descrip-

tion of the set of all non–dominated improving directions

for any point in the parameter space of a multi–objective

optimization problem.We have furthermore investigated

the added use of gradient–based local search operators for

numerical multi–objective optimization with MOEAs.

Our experiments show that the best way to exploit gra-

dient information in multi–objective optimization is to use

the set of all non–dominated maximally–improving direc-

tions described in this paper. However, the added use of

gradient–based local search in a MOEA is only useful if the

relvative contribution it makes to the overall improvement is

at least as big as the relative amount of resources it is allowed

to spend. We have indicated that for multi–objective opti-

mization this criterion is harder to achieve because MOEAs

have the ability to advance multiple solutions simultaneously

towards different regions of the optimal Pareto–front, giv-

ing it a bigger relative advantage than in the single–objective

case. Hence, more often than in the single–objective case,

the added use of local gradient–based search is not always

efficient. Thus, although we have provided a solid basis for

exploiting gradient information in numerical multi–objective

evolutionary optimization, it should not be disregarded that

our results also indicate that EAs really have an advantage

over non–population–based approaches if the goal is to ob-

tain a good approximation set (i.e. instead of only a single

solution on the optimal Pareto–front).

8.REFERENCES

[1] P. A. N. Bosman and D. Thierens. Exploiting gradient

information in continuous iterated density estimation

evolutionary algorithms. In B. Kr¨ ose et al., editors,

Proceedings of the 13th Belgium–Netherlands Artificial

Intelligence Conference BNAIC’01, pages 69–76, 2001.

[2] P. A. N. Bosman and D. Thierens. The balance between

proximity and diversity in multi–objective evolutionary

algorithms. IEEE Transactions on Evolutionary

Computation, 7:174–188, 2003.

[3] P. A. N. Bosman and D. Thierens. The naive MIDEA: a

baseline multi–objective EA. In C. A. Coello Coello et al.,

editors, Evolutionary Multi–Criterion Optimization –

EMO’05, pages 428–442, Berlin, 2005. Springer–Verlag.

[4] M. Brown and R. E. Smith. Effective use of directional

information in multi–objective evolutionary computation.

In E. Cant´ u-Paz et al., editors, Proceedings of the

GECCO–2003 Genetic and Evolutionary Computation

Conference, pages 778–789, Berlin, 2003. Springer–Verlag.

[5] K. Deb. Multi-objective genetic algorithms: Problem

difficulties and construction of test problems. Evolutionary

Computation, 7(3):205–230, 1999.

[6] M. Dellnitz, O. Sch¨ utze, and T. Hestermeyer. Covering

pareto sets by multilevel subdivision techniques. J. of Opti-

mization Theory and Applications, 124(1):113–136, 2005.

[7] J. Fliege and B. F. Svaiter. Steepest descent methods for

multicriteria optimization. Mathematical Methods of

Operations Research, 51(3):479–494, 2000.

[8] M.R. Hestenes and E. Stiefel. Methods of conjugate

gradients for solving linear systems. J. Res. Nat. Bur.

Standards, 6:409–436, 1952.

[9] H. Ishibuchi, T. Yoshida, and T. Murata. Balance between

genetic search and local search in memetic algorithms for

multiobjective permutation flowshop scheduling. IEEE

Trans. on Evolutionary Computation, 7:204–223, 2003.

[10] J. D. Knowles and D. Corne. M–PAES: a memetic

algorithm for multiobjective optimization. In Proceedings of

the 2000 Congress on Evolutionary Computation -

CEC–2000, pages 325–332, Piscataway, New Jersey, 2000.

IEEE Press.

[11] J. D. Knowles and D. Corne. On metrics for comparing

non–dominated sets. In Proceedings of the 2002 Congress

on Evol. Comp. CEC 2002, pages 666–674, Piscataway,

New Jersey, 2002. IEEE Press.

[12] J. D. Knowles and D. W. Corne. A comparison of diverse

approaches to memetic multiobjective combinatorial

optimization. In W. Hart et al., editors, Proceedings of the

Workshop on Memetic Algorithms WOMA at the Genetic

and Evolutionary Computation Conference -

GECCO–2000, pages 103–108, 2000.

[13] M. Lahanas, D. Baltas, and S. Giannouli. Global

convergence analysis of fast multiobjective gradient based

dose optimization algorithms for high-dose-rate

brachytherapy. Phys. Med. Biol., 48:599–617, 2003.

[14] T. M. Mitchell. Machine Learning. McGraw-Hill, New

York, New York, 1997.

[15] P. Moscato. On evolution, search, optimization, genetic

algorithms and martial arts: Towards memetic algorithms.

Technical Report Caltech Concurrent Computation

Program, Report. 826, California Institute of Technology,

Pasadena, California, 1989.

[16] H. M¨ uhlenbein and T. Mahnig. FDA – a scalable

evolutionary algorithm for the optimization of additively

decomposed functions. Evolutionary Computation,

7:353–376, 1999.

[17] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P.

Flannery. Numerical Recipes In C: The Art Of Scientific

Computing. Cambridge University Press, Cambridge, 1992.

[18] J. D. Schaffer. Multiple objective optimization with vector

evaluated genetic algorithms. In J. J. Grefenstette, editor,

Proceedings of the 1st International Conference on Genetic

Algorithms, pages 93–100, Mahwah, New Jersey, 1985.

Lawrence Erlbaum Associates, Inc.

[19] S. Sch¨ affler, R. Schultz, and K. Weinzierl. Stochastic

method for the solution of unconstrained vector

optimization problems. Journal of Optimization Theory

and Applications, 114(1):209–222, 2002.

[20] E.-G. Talbi, M. Rahoual, M.-H. Mabed, and C. Dhaenens.

New genetic approach for multicriteria optimization

problems : Application to the flow-shop. In E. Zitzler et al.,

editors, Evolutionary Multi–Criterion Optimization -

EMO’01, Berlin, 2001. Springer–Verlag.

[21] E. Zitzler, K. Deb, and L. Thiele. Comparison of

multiobjective evolutionary algorithms: Empirical results.

Evol. Computation, 8(2):173–195, 2000.

[22] E. Zitzler, M. Laumanns, L. Thiele, C. M. Fonseca, and

V. Grunert da Fonseca. Why quality assessment of

multiobjective optimizers is difficult. In W. B. Langdon

et al., editors, Proceedings of the GECCO–2002 Genetic

and Evolutionary Computation Conference, pages 666–674,

San Francisco, California, 2002. Morgan Kaufmann.

#### View other sources

#### Hide other sources

- Available from Edwin D. de Jong · May 21, 2014
- Available from psu.edu