Content uploaded by Vanessa Volz

Author content

All content in this area was uploaded by Vanessa Volz on Feb 21, 2017

Content may be subject to copyright.

Surrogate-Assisted Partial Order-Based

Evolutionary Optimisation

Vanessa Volz1, G¨unter Rudolph1, and Boris Naujoks2

1TU Dortmund University

{vanessa.volz, guenter.rudolph}@tu-dortmund.de

2TH K¨oln - University of Applied Sciences

boris.naujoks@th-koeln.de

Abstract. In this paper, we propose a novel approach (SAPEO) to

support the survival selection process in evolutionary multi-objective al-

gorithms with surrogate models. The approach dynamically chooses in-

dividuals to evaluate exactly based on the model uncertainty and the dis-

tinctness of the population. We introduce multiple SAPEO variants that

diﬀer in terms of the uncertainty they allow for survival selection and

evaluate their anytime performance on the BBOB bi-objective bench-

mark. In this paper, we use a Kriging model in conjunction with an

SMS-EMOA for SAPEO. We compare the obtained results with the

performance of the regular SMS-EMOA, as well as another surrogate-

assisted approach. The results open up general questions about the ap-

plicability and required conditions for surrogate-assisted evolutionary

multi-objective algorithms to be tackled in the future.

Keywords: partial order, multi-objective, surrogates, evolutionary al-

gorithms, bbob

1 Introduction

Surrogate model-assisted evolutionary multi-objective algorithms (SA-EMOAs)

are a group of fairly recent but popular approaches1to solve multi-objective

problems with expensive ﬁtness functions. Using surrogate model predictions

of the function values instead of / to complement exact evaluations within an

evolutionary algorithm (EA) can save computational time and in some cases

make the problem tractable at all.

Many EMOAs only consider objective values for the purpose of sorting and

then selecting the best individuals in a population. Assuming that individuals

can conﬁdently be distinguished based on surrogate model predictions, knowing

the individuals’ exact objective values is not necessary. Under this assumption,

the algorithm and its evolutionary path would not be aﬀected at all by trusting

the predicted sorting, and the computational budget could be reduced.

In this paper, we present a novel approach to integrate surrogate models

1Workshop on the topic in 2016: http://samco.gforge.inria.fr/doku.php

2 Volz, Rudolph, Naujoks

and evolutionary (multi-objective) algorithms (dubbed SAPEO for Surrogate-

Assisted Partial Order-Based Evolutionary Optimisation2) that seeks to reduce

the number of function evaluations while simultaneously controlling the prob-

ability of detrimental eﬀects on the solution quality. The idea is to choose the

individuals for exact evaluation dynamically based on the model uncertainty and

the distinctness of the population. Preliminary experiments on single-objective

problems showed promising results, so in this paper, we investigate the approach

in the currently sought-after multi-objective context. We also present diﬀerent

versions that allow diﬀering levels and types of uncertainties for the survival

selection process, which, in turn, can potentially eﬀect the solution quality.

In the following, we describe our extensive analysis of the anytime perfor-

mance of SAPEO using the BBOB-BIOBJ benchmark (refer to section 2.2),

focusing on use cases with low budgets to simulate applications with expensive

functions. Our SAPEO implementation3uses a Kriging surrogate model [14] in

conjunction with the SMS-EMOA [2]. We further compare the algorithm and

its variants to the underlying SMS-EMOA and an SA-EMOA approach called

pre-selection [4] (SA-SMS in the following).

We speciﬁcally investigate if and under which conditions SAPEO outper-

forms the SMS-EMOA and SA-SMS in terms of the hypervolume indicator that

all algorithms use to evaluate populations. Surprisingly, none of the surrogate-

assisted algorithms can convincingly beat out the baseline SMS-EMOA, even

for small function budgets. This result opens up questions about SA-EMOAs in

general and about the necessary quality of the integrated surrogate models.

A potential explanation for this performance is the increased uncertainty of

the surrogate model predictions when compared to the single-objective exper-

iments. Thus, we analyse the eﬀects of prediction uncertainty on the overall

performance of the SA-EMOAs. In the future, the resulting insights could be-

come important when (1) deciding whether using a surrogate model is beneﬁcial

on a given problem at all and (2) when choosing the sample size and further pa-

rameters for the model. This is especially crucial for multi- and many-objective

problems, where learning an accurate surrogate model becomes increasingly ex-

pensive and thus renders analysing the trade-oﬀ between surrogate model com-

putation and function evaluations critical. Additionally, the stated questions and

insights are also relevant for noisy optimisation problems where uncertainty can

be reduced by repeated evaluations (although not eliminated).

In the following, we present related work in section 2 and introduce the

proposed SAPEO algorithm in section 3. The description and visualisation of

the benchmarking results are found in section 4. Section 5 concludes the paper

with an analysis of the results and lists open research problems.

2Acknowledgements: The SAPEO concept was developed during the

SAMCO Workshop in March 2016 at the Lorentz Center (Leiden, NL).

https://www.lorentzcenter.nl This work is part of a project that has received

funding from the European Unions Horizon 2020 research and innova- tion program

under grant agreement No 692286.

3Code and visualisations available at: http://url.tu-dortmund.de/volz

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 3

2 Background and Related work

2.1 Surrogate-Assisted Evolutionary Multi-Objective Optimisation

Let X1, . . . , Xλ∈Rnbe a population and the corresponding ﬁtness function

f:Rn→Rd. General concepts of multi-objective optimisation will not be

discussed here (refer to e.g. [16]). We will be referring to Pareto-dominance as

, and to its weak and strong versions as -and ≺, respectively. We use the

same notation to compare vectors in objective space Rd, i.e. let a, b ∈Rd, then

ab⇐⇒ ∀k∈ {1. . . d}:ak≤bk∧ ∃k∈ {1. . . d}:ak< bk.

Surrogate-assisted evolutionary multi-objective optimisation is surveyed in

[7,8], where several approaches for the integration of surrogates and EMOAs are

described. According to the surveys, the selection approaches can generally be

divided into individual-based (e.g. SAPEO), generation-based, and population-

based strategies. Additionally, there is pre-selection (e.g. SA-SMS [4]), which is

similar to individual-based strategies but does not retain any individuals with

uncertain ﬁtness values.

However, neither of the cited surveys, nor [9], features any algorithm that

chooses to propagate uncertainty instead of assuming a distribution and using

aggregated metrics such as the expected value. In [10], uncertainty propagation

is implemented for noisy optimisation problems using a partial order based on

conﬁdence intervals (or hypercubes in higher dimensions) as examined in [13].

The only work transferring this approach to SA-EMOAs we are aware of is

GP-DEMO [11] and the authors’ previous publications who use a diﬀerential

evolution algorithm.

Apart from the diﬀerences owed to the underlying optimisation algorithms

(e.g. crowding distance vs. hypervolume), GP-DEMO is very similar to the

SAPEO variant that allows the least uncertainty (SAPEO-uf-ho, cf. 4.1). An

important diﬀerence, however, is SAPEOs dynamic adaptation of allowed un-

certainty throughout the runtime of the algorithm and executing diﬀerent partial

orders in sequence. In addition to the partial orders inspired by [13] used in both

[11] and SAPEO, we propose another order that interprets the conﬁdence inter-

val bounds as objectives and thus deals diﬀerently with overlapping intervals. A

further diﬀerence is the choice of samples for the surrogate models: GP-DEMO

uses the Pareto front, whereas SAPEO always uses a local model relative to the

solution in question.

2.2 Benchmarking with BBOB

BBOB-BIOBJ is a bi-objective Black-Box Optimisation Benchmarking test suite

[15]. It consists of 55 bi-objective functions that are a combination of 10 of

the 24 single-objective functions in the BBOB test suite established in 2009

[6]. In order to measure general algorithm performance across function types,

single-objective functions were chosen such that the resulting benchmark would

be diverse in terms of separability, conditioning, modality and global structure

[6]. Based on these properties, the single-objective functions are divided into 5

4 Volz, Rudolph, Naujoks

separable

moderate

ill−conditioned

multi−modal

weakly structured

f01 f02 f06 f08 f13 f14 f15 f17 f20 f21

f21

f20

f17

f15

f14

f13

f08

f06

f02

f01

Gallagher 101

Schwefel x*sin(x)

Schaffer F7 c10

Rastrigin

Sum diff. powers

Sharp ridge

Rosenbrock

Attractive sector

Ellipsoid

Sphere

01 02 03 04 05 06 07 08 09 10

11 12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27

28 29 30 31 32 33 34

35 36 37 38 39 40

41 42 43 44 45

46 47 48 49

50 51 52

53 54

55

Fig. 1: The 55 BBOB-BIOBJ functions are combinations of 10 single-objective

functions (on the top and right). The groups the single-objective and the result-

ing bi-objective functions belong to are colour-coded according to the legend.

function groups, from which 2 functions are chosen each. The resulting problems

and corresponding properties are visualised in ﬁgure 1.

In an eﬀort to measure performance on the diﬀerent function types more

accurately, each of the functions in the bi-objective test suite has 10 instances

that diﬀer in terms of some properties, e.g. the location of optima. As an eﬀect,

the scale of the optima and achievable absolute improvement of the objective

values also vary signiﬁcantly across instances (thus also between objectives). The

robustness of an algorithm’s performance on a function group can be evaluated

with a higher conﬁdence by testing on multiple members of that group.

All of the functions in the test suites are deﬁned for search spaces of multiple

dimensions, of which we will be considering dimensions 2,3,5,10 and 20 in order

to be able to evaluate a wide range of problem sizes. The search space of each

function is limited to [−100,100] ⊂Rper dimension in BBOB-BIOBJ.

The performance of an algorithm on the benchmarking suite is measured

using a quality indicator expressing both the size of the obtained Pareto set

and the proximity to a reference front. Since the true Pareto front is not known

for the functions in the test suite, an approximation is obtained by combining

all known solutions from popular algorithms. The ideal and nadir points are

known, however, and used to normalise the quality indicator to enable compar-

isons across functions [3]. The metric reported as a performance measure for the

algorithm is called precision. It is the diﬀerence of the quality indicator of the

reference set Iref and the indicator value of the obtained set. 58 target preci-

sions are ﬁxed and the number of function evaluations needed to achieve them

is reported during a benchmark run. This way, the COCO platform enables an

anytime comparison of algorithms, i.e. an evaluation of algorithm performance

for each target precision and number of function evaluations [3].

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 5

3 Surrogate-Assisted Partial Order-Based Evolutionary

Optimisation

3.1 Formal description

Let ˜

f(Xi)∈Rdbe the predicted ﬁtness for individual Xias computed by a local

surrogate model with uncertainty ˜σimodelled by

˜

fk(Xi) = fk(Xi) + ei, ei∼ N (0,˜σi), k ∈ {1. . . d}.

Assuming (Assumption A1) that the assumptions made by Kriging models [14]

hold and ˜σiwas estimated correctly, it follows that

Pfk(xi)∈[˜

fk(xi)−ui,˜

fk(xi) + ui]= 1 −αwith

ui= ˜σiz1−α

2(1)

since P(|ei| ≤ ui)=1−α. Here, zdenotes the quantile function of the standard

normal distribution. In case the objective values are stochastically independent,

which is true for the BBOB-BIOBJ benchmark, we can therefore conclude that

f(Xi) lies within the hypercube (or bounding box) bounded by the described

conﬁdence interval in each dimension with probability (1 −α)d.

Assuming the function values lie within the deﬁned hypercubes (Assumption

A2), we can distinguish individuals conﬁdently just based on the predicted hy-

percubes. Of course, in case of large uncertainties ˜σi, a distinction can rarely be

meaningful. To combat this problem, SAPEO introduces a threshold εgfor the

uncertainty that is adapted in each iteration gof the EMOA and decreased over

the runtime of the algorithm depending on the distinctness of the population

(more details in section 3.2). Individuals Xiwith an uncertainty higher than the

threshold (ui> εg) are evaluated exactly in generation g.

To distinguish between individuals, we propose binary relations incorporating

the information on conﬁdence bounds to varying degrees. All of these relations

induce a strict partial order (irreﬂexivity, transitivity) on a population, includ-

ing and akin to the Pareto dominance commonly used in EMOAs for the ﬁrst

part of the selection process. We analyse the proposed relations in terms of the

probability and magnitude of a sorting error eothat per our deﬁnition are the

pairwise diﬀerences between the diﬀerent orders induced by Pareto dominance

and the proposed relations, respectively. We deﬁne the probability P(ei,j

o,r) of a

sorting error made by relation ron individuals Xi, Xjand the magnitude of

the error ei,j

o,r:

P(ei,j

o,r) = PXi6 Xj|XirXj

ei,j

o,r =(|f(Xi)−f(Xj)|if (XirXj)∧(Xi6 Xj)

0 else

A single or more sorting errors can but do not have to lead to selection errors

es, where the individuals selected diﬀer from the baseline comparison. This type

of error will not be analysed in this paper, but it is bounded by eo.

6 Volz, Rudolph, Naujoks

We deﬁne:

f: Pareto dominance on function values This relation is the standard in EMOAs

and, since only fis considered, it is obvious that P(ei,j

o,f ) = 0.

XifXj:= f(Xi)f(Xj)

u: Conﬁdence interval dominance (cf. [11,13]) Assuming A2, if

XiuXj:= ^

k∈[1...d]

˜

fk(Xi) + ui<˜

fk(Xj)−uj

holds, it is guaranteed that Xi≺Xj. Assuming stochastic independence of

the errors on predicted uncertainty, we can compute an upper bound for the

probability of sorting errors per dimension:

P(ei,j

o,u(k)) = Pfk(Xi)≥fk(Xj)|XiuXj

≤Pfk(Xi)≥˜

fk(Xi) + ui∨fk(Xj)≤˜

fk(Xj)−uj

≤α

2+α

2=α

Only if the conﬁdence hypercubes of two individuals intersect, the probability

of them being incomparable is greater than 0. Because of the way XiuXj

is deﬁned, this is only possible if a sorting error is made in every dimension. It

follows that P(ei,j

o,u)≤αdassuming A1, making sorting errors controllable.

c: Conﬁdence interval bounds as objectives Another way of limiting the predic-

tion errors potentially perpetuated through the algorithm is to limit the magni-

tude of the sorting error. For this reason we deﬁne:

XicXj:= ^

k∈[1...d]˜

fk(Xi)−ui

˜

fk(Xi)+ui-˜

fk(Xj)−uj

˜

fk(Xj)+uj

∧ ∃k∈[1 . . . d] : ˜

fk(Xi)−ui

˜

fk(Xi)+ui˜

fk(Xj)−uj

˜

fk(Xj)+uj

Under assumption A2, the error per dimension is bounded by the length of

intersection of the conﬁdence intervals, which is in turn bounded by the width

of the smaller interval. Therefore, it holds that ei,j

o,c ≤2dmin(ui, uj)n.

p: Pareto dominance on predicted values This relation is the most straightfor-

ward, but it does not take the uncertainties of the predictions into account.

XipXj:= ˜

f(Xi)˜

f(Xj)

Assuming A2 again, a sorting error can only be committed if the conﬁdence

intervals intersect. Because of the symmetric nature of the interval, it holds that

ei,j

o,p ≤(ui+uj)d, as the magnitude of the sorting error is again bounded by the

conﬁdence interval widths.

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 7

o: Pareto dominance on lower bounds This optimistic relation was motivated

by [4] where SA-EMOAs performed better using oinstead of p.

XioXj:= ^

k∈[1...d]

˜

fk(Xi)−ui≤˜

fk(Xj)−uj

∧ ∃k∈[1 . . . d] : ˜

fk(Xi)−ui<˜

fk(Xj)−uj

The maximum error occurs when the lower conﬁdence interval bounds are close

together, but in the wrong order, making ei,j

o,o ≤2nmax(ui, uj)d.

Now assume we have obtained a strict partial order based on any of the given

binary relations. Let rcbe the rank of the µ-th individual. Then, all individu-

als with rank less than rccan conﬁdently (with maximum errors as described

above) be selected. In case a selection has to be made from the individuals with

the critical rank rc, one option is to apply another dominance relation to the

individuals in question in hopes that the required distinction can be made. In

case of ≺f, this always means evaluating uncertain individuals until a conﬁdent

distinction can be made according to the previous relation.

Another option is to use a relation inducing a total preorder (transitivity,

totality) as a secondary selection criterion (and random choice in case of further

ties) as most EMOAs do. Again incorporating diﬀerent information, we have

tested the following hypervolume-based (hv) relations for this purpose:

–Hypervolume contribution of objective values:

Xi≤ho Xj:= hv(fo(Xi)) ≥hv(fo(Xj)), where fo(Xi) = (f(Xi)ui= 0

˜

f(Xi) else

–Hypervolume contribution of conﬁdence interval bounds:

Xi≤hc Xj:= Qk∈[1...d]hv˜

fk(Xi)−ui

˜

fk(Xi)+ui≥Qk∈[1...d]hv˜

fk(Xj)−uj

˜

fk(Xj)+uj

3.2 SAPEO algorithm

Algorithm 1 describes the basic SAPEO algorithm, which any EMOA and surro-

gate model with uncertainty estimates can be plugged into. As inputs, the algo-

rithm receives the ﬁtness function fun, the number of points considered for the

surrogate model local size and the budget of function evaluations. The order

of dominance relations (strategies) and the secondary criterion (scnd crit)

are used for selection (cf. section 3.1). The output is the ﬁnal population.

The algorithm starts with mandatory data structures; the population is ini-

tialised randomly (line 1) and evaluated using the considered ﬁtness function

fun (line 2), the EMOA is set up (line 3) and the error tolerance εas well as the

generation counter are set to their initial values (line 4).

The core optimisation loop starts in line 5 and stops if either the considered

optimiser terminates (due to the allocated budget or convergence), but not while

both the error tolerance εis larger than 0 and there are function evaluations left

to avoid convergence on imprecise values. Within the loop, new candidate solu-

tions Xare ﬁrst generated by the optimisation algorithm (line 6) and evaluated

8 Volz, Rudolph, Naujoks

Algorithm 1 SAPEO

Input: fun, local size, budget, strategies, scnd crit

Output: Xﬁnal ﬁnal population

1: X0.genome ⇐[random(n) : i∈1. . . pop size]random initialisation

2: X0.f ⇐[fun(x) : x∈X0]; X0.e = 0 evaluate sampled individuals

3: O⇐init(X0, budget)init optimiser with initial population

4: ε0⇐ ∞;g⇐1init error tolerance, generation counter

5: while (¬O.stop()) ∨(ε > 0∧budget > 0) do

6: Xg.phenome ⇐O.evolv e(Xg−1)get new population

7: Xg.f, Xg.e ⇐[model(x, k nearest(x, X[X.e == 0], local size)) : x∈Xg]

8: predict value, error with surrogate from evaluated neighbours

9: εg⇐min(eg−1, α-percentiles(diﬀ(Xg.f)) update error tolerance

10: for x∈Xgdo

11: if x.e > εg∨O.select(Xg, strateg y, scnd crit) == NULL then

12: x.f =fun(x)evaluate individual

13: bbob.recommend(X[X.e > 0,last]) recommend solution

14: end if

15: end for

16: Xg=O.select(Xg, strateg y, scnd crit)SAPEO survival selection

17: g=g+ 1 increase generation counter

18: end while

19: Xﬁnal =Xg−1;Xﬁnal.f = [f un(x) : x∈Xﬁnal ]evaluate ﬁnal population

based on a local surrogate model trained from the local size evaluated indi-

viduals closest in design space (line 8). The predicted function and the expected

model errors (cf. equation 1) are stored. The error tolerance threshold is then

adapted (line 9). We reduce the threshold during the course of the algorithm

in order to limit the probability of sorting errors with large eﬀects on the ﬁnal

population. Therefore, εgis the minimum of the previous threshold εg−1and

the α-percentiles of the euclidian distances in objective space per dimension.

The distances are a way to measure the distinctness of a population and thus

the potential of overlapping conﬁdence intervals. By adapting gaccordingly, we

reduce the number and magnitude of potential sorting errors.

If any of the individuals in the population need to be evaluated - either

because the predicted uncertainty is above the threshold or because the indi-

viduals cannot be distinguished (see line 11) - they are evaluated in line 12 and

updated accordingly. In order to simulate anytime behaviour of the algorithm,

each time a solution is evaluated, an individual is recommended to the BBOB

framework in line 13. This serves the purpose of measuring the solution quality

of the algorithm had it been stopped at the time more accurately.

The set of candidate solutions, along with the (predicted but reasonably

certain) function values and the expected prediction errors are then passed to

the optimiser in line 16. Depending on the selected strategy, the optimiser then

selects the succeeding population as described above with regard to the predicted

function values and uncertainties and resumes its regular process.

Finally, after the optimisation loop terminates, the function values of the

individuals in the ﬁnal population are computed using the real ﬁtness function

in line 19, in case there are any individuals left that have not been evaluated.

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 9

4 Evaluation

4.1 Experimental Setup

Each experiment was run with 550 parallel jobs that took less than 3 hours

each with speciﬁcations according to table 1. Since the performance is strictly

measured in terms of function evaluations (target precision reached per function

evaluation, cf. section 2.2), the runtime does not inﬂuence it.

Table 1: Experiment speciﬁcations and parameters

budget 1000 per dimension (usecase: expensive function)

variation operators standard for all algorithms (cf. [2])

populations size 100 (as suggested in [4])

sample size for surrogate 15 (due to computational concerns)4

number of candidate oﬀspring 15 (for SA-SMS, same as sample size)

correlation assumption squared exponential

trend assumption constant

regression weights maximum likelihood using COBYLA

start: 10−2, bounds: [10−4,101]

We compare the performances with a standard [2] and surrogate-assisted

SMS-EMOA with pre-selection as proposed by [4], since we are not aware of

any other SA-EMOAs using the SMS-EMOA with individual-based surrogate

management strategies. Speciﬁcally, we look at the following algorithms:

SMS-EMOA Standard SMS-EMOA as baseline comparison.

SA-SMS-p Surrogate assisted SMS-EMOA using pfor pre-selection.

SA-SMS-o Surrogate assisted SMS-EMOA using oinstead (experimentally

shown to improve the performance of pre-selection for the NSGA-II [4]).

SAPEO-uf-ho SAPEO using uto rank the oﬀspring, thus accepting a risk

of sorting errors of only α2(cf. 3.1). For as long as the population cannot

be distinguished by u, the invididuals are evaluated according to f, thus

avoiding making any further sorting errors. The hypervolume relation ≤ho is

used as secondary criterion. This algorithm should therefore only take small

risks and behave like the SMS-EMOA while saving function evaluations.

SAPEO-ucp-ho SAPEO using increasingly risky relations u,c,pto avoid

evaluations completely if not forced by the uncertainty threshold ε, taking

the opposite approach as SAPEO-uf-ho. ≤ho is used as secondary criterion.

SAPEO-uc-hc SAPEO using multi-objectiﬁcation of the conﬁdence interval

boundaries fully. It uses uas a ﬁrst safer way of ranking, followed by c

on critical individuals. Secondary criterion is ≤hc.

4In a real-world application, the sample size should be chosen considering the tradeoﬀ

between computation times for the model and the ﬁtness function.

10 Volz, Rudolph, Naujoks

4.2 Visualisation and Interpretation of Results

There are two main angles to evaluating the anytime performance of algorithms:

(1) measuring the performance indicator after a predeﬁned number of function

evaluations (ﬁxed budget ) and (2) recording the function evaluation when target

performances are reached (ﬁxed target ) [6]. In the following, we use the latter.

For a detailed depiction of an algorithm’s performance for a ﬁxed target, we

use heatmaps (cf. ﬁgure 2) that show the percentage of budget used per dimen-

sion until a target was reached according to the colour scale on the right. If the

target is not reached within the allocated budget, the corresponding square is

white. The dimensions and instances of each function are shown separately to

enable analysis of the generalisation of algorithm performance across function

instances and dimensions. This is very important to justify the aggregation of

performance measures across instances. The functions have colour codes accord-

ing to the legend in ﬁgure 1 that specify their function groups.

dimensions and instances

functions

0

20

40

60

80

100

●

d02_i01

d02_i02

d02_i03

d02_i04

d02_i05

d02_i06

d02_i07

d02_i08

d02_i09

d02_i10

d03_i01

d03_i02

d03_i03

d03_i04

d03_i05

d03_i06

d03_i07

d03_i08

d03_i09

d03_i10

d05_i01

d05_i02

d05_i03

d05_i04

d05_i05

d05_i06

d05_i07

d05_i08

d05_i09

d05_i10

d10_i01

d10_i02

d10_i03

d10_i04

d10_i05

d10_i06

d10_i07

d10_i08

d10_i09

d10_i10

d20_i01

d20_i02

d20_i03

d20_i04

d20_i05

d20_i06

d20_i07

d20_i08

d20_i09

d20_i10

f55

f54

f53

f52

f51

f50

f49

f48

f47

f46

f45

f44

f43

f42

f41

f40

f39

f38

f37

f36

f35

f34

f33

f32

f31

f30

f29

f28

f27

f26

f25

f24

f23

f22

f21

f20

f19

f18

f17

f16

f15

f14

f13

f12

f11

f10

f09

f08

f07

f06

f05

f04

f03

f02

f01

Fig. 2: SAPEO-uf-ho performance in terms of the percentage of the budget per

dimension used to reach target 100for all function instances and dimensions.

From the plot, it is apparent that for the selected target 100, the algorithm

SAPEO-uf-ho has trouble with a number of functions even in small dimensions.

Additionally, for those functions, the algorithm’s performance seems to drop with

increasing dimension of the search space. Especially the Rosenbrock function

seems to be problematic for the algorithm: SAPEO-uf-ho rarely reaches the

target for dimensions 10 or 20 when the Rosenbrock function is part of the

problem (f 04, 13, 21, 28-34). A potential cause is an inaccurate representation

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 11

of its narrow valley containing the optimum with the surrogate model. Another

explanation could be a mismatch of the Rosenbrock function and the variation

operators, causing diﬃculty for the underlying SMS-EMOA. As expected, some

of the weakly structured problems were diﬃcult for SAPEO-uf-ho as well.

A discussion of the potential causes of the performances is only possible

with reference to other algorithms. In order to get a better overview of the

performances of all algorithms and to detect patterns, we have compiled ﬁgure

3, which is an assembly of 30 heatmaps like the one in ﬁgure 2 for all algorithms

and diﬀerent targets. The same colour scale as in ﬁgure 2 is used. Recall that

white spaces signify targets that were not reached within the allocated budget.

In ﬁgure 3, the most obvious trend is the declining performance for each

target, which was of course expected. It is also apparent that the SMS-EMOA

performs better in general than all other algorithms for each target precision. We

SAPEO-uc-hc

10110010−110−210−3

SAPEO-ucp-hoSAPEO-uf-hoSA-SMS-oSA-SMS-pSMS-EMOA

Fig. 3: Heatmaps visualising target performances for all algorithms (rows) across

multiple targets (columns). Refer to ﬁgure 2 for a detailed explanation.

12 Volz, Rudolph, Naujoks

can also see that all SAPEO versions are an improvement when compared to the

SA-SMS algorithms. Interestingly, we also see similar patterns in terms of which

functions are more diﬃcult for all algorithms, indicating that the added surrogate

models do not inﬂuence the underlying optimisation behaviour signiﬁcantly.

Unfortunately, while providing a good overview, ﬁgure 3 is not well suited to

interpret the performance of each algorithm per function. While very detailed,

the plots are not easy to interpret due to the abundance of information displayed

at once. In order to analyse the circumstances of diﬀerent performance patterns,

we compile a plot that aggregates the diﬀerent instances of a function. This way,

the general performance of an algorithm per function can be expressed without

risk of overﬁtting, as intended by the COCO framework. To do that, we use the

expected runtime (expected number of function evaluations) to reach a target

[5] as a performance measure. The measure is estimated for a restart algorithm

with 1000 samples. The results are again displayed in a heatmap (ﬁgure 4). The

colour visualises the estimated expected runtime per dimension according to the

scale on the right in log10-scale. Higher values than the maximum budget (>3)

occur if a target is not reached in all instances. White spaces occur if the target

was never reached by the algorithm in all instances.

The plot displays expected runtime for all dimensions in diﬀerent columns

according to the labels above. Each of these columns is again divided into 3,

displaying the results for diﬀerent algorithms according to the labels on the

bottom. There are two algorithms per column, whose results are displayed on

top of each other for each row corresponding to a function. For each algorithm,

the expected runtimes for targets 101,100,10−1,10−2,10−3are depicted in that

order. In case a target was never reached for all algorithms in a column, it is

omitted. For example, the expected runtime for SAPEO-uf-ho on function 01,

target 101and dimension 2 is on the top left corner and encoded in a light

blue. Therefore, the expected runtime to reach target 101is around 100∗2 = 2.

The SMS-EMOA is directly below that and a shade lighter, so has a slightly

higher expected runtime. Like in ﬁgure 2, the groups each function belongs to

are encoded according to the colour scheme in the legend of ﬁgure 1.

The general trends as seen in ﬁgure 3 can be observed here as well. However,

we can also see that SAPEO-uf-ho beats the SMS-EMOA in terms of precision

reached on very rare occasions, for example on functions f03 and f41 in dimension

2 and function f20 in dimension 10. Still, the SMS-EMOA generally reaches the

same or more precision targets than the other algorithms. However, in most

cases where a higher precision target is reached by only a single algorithm, the

corresponding colour indicates a very high expected runtime. This means that

the algorithm did not reach the higher target for most instances, which speaks

against a robust performance of that algorithm. More importantly, the SA-SMS

variants often reach less targets than the other algorithms, especially in higher

dimensional problems, meaning they are clearly outperformed.

The colour gradients in most functions are remarkably alike, indicating simi-

lar behaviour and diﬃculties experienced with each problem. This is expected, as

the intention of SA-EMOAs is to avoid function evaluations with only controlled

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 13

−2

−1

0

1

2

3

4

5

●

f55

f54

f53

f52

f51

f50

f49

f48

f47

f46

f45

f44

f43

f42

f41

f40

f39

f38

f37

f36

f35

f34

f33

f32

f31

f30

f29

f28

f27

f26

f25

f24

f23

f22

f21

f20

f19

f18

f17

f16

f15

f14

f13

f12

f11

f10

f09

f08

f07

f06

f05

f04

f03

f02

f01

SAPEO−uf−ho

SMS−EMOA

SAPEO−ucp−ho

SAPEO−uc−hc

SA−SMS−p

SA−SMS−o

SAPEO−uf−ho

SMS−EMOA

SAPEO−ucp−ho

SAPEO−uc−hc

SA−SMS−p

SA−SMS−o

SAPEO−uf−ho

SMS−EMOA

SAPEO−ucp−ho

SAPEO−uc−hc

SA−SMS−p

SA−SMS−o

SAPEO−uf−ho

SMS−EMOA

SAPEO−ucp−ho

SAPEO−uc−hc

SA−SMS−p

SA−SMS−o

SAPEO−uf−ho

SMS−EMOA

SAPEO−ucp−ho

SAPEO−uc−hc

SA−SMS−p

SA−SMS−o

dim 2 dim 3 dim 5 dim 10 dim 20

Fig. 4: BBOB-BIOBJ performance results for all algorithms regarding expected

runtime (colour coded in log-scale) for targets 101,100,10−1,10−2,10−3

eﬀects on the evolutionary path. Possibly due to the aggregating nature of the

expected runtime measure, the performance contrast does not appear to be as

stark as in ﬁgure 3. The gradient and number of performance targets reached per

function is in fact relatively similar for all algorithms. In most cases, diﬀerences

occur towards the end of the gradient, indicating that the precision improvement

of the surrogate-assisted algorithms is less steep than for the SMS-EMOA. How-

ever, in order to analyse the algorithms’ behaviour appropriately in that regard,

a more thorough analysis of the separate selection steps is required.

Regarding the diﬀerent functions, there seems to be no clear performance

pattern. The diﬀerent SAPEO versions vary rarely. The performance of all al-

gorithms seems to be more closely tied to the single-objective functions, e.g.,

Rosenbrock seems to pose problems whereas Schwefel seems more manageable.

14 Volz, Rudolph, Naujoks

5 Conclusions and Future Work

In this paper, we have proposed a novel approach to surrogate-assisted multi-

objective evolutionary algorithms called SAPEO. An extensive analysis of its

anytime performance using the BBOB-BIOBJ benchmark showed that it was

outperformed by its underlying algorithm in this study, the SMS-EMOA. This

fact is quite surprising, since the SAPEO-uf-ho variant allows minimal uncer-

tainties and should rarely make diﬀerent decisions. However, SAPEO still beats

another SA-EMOA based on the SMS-EMOA [4] on the benchmark.

One potential source of error is the surrogate model, e.g. assumptions A1,

A2 (section 3.1) could be wrong. A large error in the predicted uncertainty

could have a tremendous inﬂuence on the algorithm. However, this is control-

lable through the adaptation of the uncertainty threshold ε. Additionally, the

uncertainties during the start of the SAPEOs were relatively large, which could

also send the algorithm into a wrong direction. The uncertainties could be mit-

igated by using surrogate ensembles instead, distributing the samples better,

increasing the sample size or selecting a ﬁtting kernel. Additionally, the per-

formances of local vs. global surrogates should be analysed more thoroughly.

It is apparent that the quality of the surrogate model is a major concern for

SA-EMOAs, which could be problematic for black-box optimisation in general.

Apart from the model, there are possible improvements regarding the bi-

nary relations used. For one, ucould be deﬁned without forcing strict Pareto

dominance of the hypercubes. Furthermore, using hypercubes for the potential

location of the ﬁtness values is a simpliﬁcation. Perhaps a binary relation on

hyperellipsoids could provide better results.

Furthermore, while the SAPEO approach worked well for single-objective

problems, the corresponding multi-objective problems pose an incomparably

larger diﬃculty for a surrogate model. Additionally, even slightly overestimated

function values could lead to an incorrect identiﬁcation of dominated individuals.

This is because a large number of critical values would need to be distinguished

with less certain relations or expensive function evaluations.

Notice that previous studies [12,1] have shown that EAs with a larger number

of oﬀspring are less vulnerable to noisy ﬁtness functions. Therefore, it may be

conjectured that the µ+ 1 selection scheme of the SMS-EMOA causes the poor

performance under noise induced by the surrogate. This hypothesis, however,

remains a question for future research. In general, the inﬂuence of the quality

of surrogate models on SA-EMOAs should be analysed more carefully. With the

proper noise-robust optimisation algorithm and parametrisation, SAPEO should

be able to beat its underlying algorithm as it does on single-objective problems.

References

1. Arnold, D., Beyer, H.-G.: On the Beneﬁts of Populations for Noisy Optimization.

Evolutionary Computation 11(2), 111–127 (2003)

2. Beume, N., Naujoks, B., Emmerich, M.: SMS-EMOA: Multiobjective selection

Surrogate-Assisted Partial Order-Based Evolutionary Optimisation 15

based on dominated hypervolume. European Journal of Operations Research

181(3), 1653–1669 (2007)

3. Brockhoﬀ, D., Tuˇsar, T., Tuˇsar, D., Wagner, T., Hansen, N., Auger, A.: Biobjective

Performance Assessment with the COCO Platform. CoRR abs/1605.01746 (2016),

retrieved: 22/12/2016

4. Emmerich, M., Giannakoglou, K., Naujoks, B.: Single- and Multi-objective Evo-

lutionary Optimization Assisted by Gaussian Random Field Metamodels. IEEE

Transactions on Evolutionary Computation 10(4), 421–439 (2006)

5. Hansen, N., Auger, A., Ros, R., Finck, S., Poˇs´ık, P.: Comparing results of 31 algo-

rithms from the black-box optimization benchmarking bbob-2009. In: Companion

of Genetic and Evolutionary Computation Conference (GECCO 2010). pp. 1689–

1696. ACM Press, New York (2010)

6. Hansen, N., Finck, S., Ros, R., Auger, A.: Real-Parameter Black-Box Optimization

Benchmarking 2009: Noiseless Functions Deﬁnitions. Research Report RR-6829,

INRIA (2009), retrieved: 22/12/2016

7. Jin, Y.: A Comprehensive Survey of Fitness Approximation in Evolutionary Com-

putation. Soft Computing 9(1), 3–12 (2005)

8. Jin, Y.: Surrogate-assisted evolutionary computation: Recent advances and future

challenges. Swarm and Evolutionary Computation 1(2), 61–70 (2011)

9. Knowles, J., Nakayama, H.: Meta-Modeling in Multiobjective Optimization. In:

Branke, J., et al. (eds.) Multiobjective Optimization - Interactive and Evolutionary

Approaches, pp. 245–284. Springer, Berlin (2008)

10. Limbourg, P., Aponte, D.E.S.: An Optimization Algorithm for Imprecise Multi-

Objective Problem Functions. In: IEEE Congress on Evolutionary Computation

(CEC 2005). IEEE Press, Piscataway, NJ (2005)

11. Mlakar, M., Petelin, D., Tuˇsar, T., Filipiˇc, B.: GP-DEMO: Diﬀerential Evolu-

tion for Multiobjective Optimization based on Gaussian Process models. European

Journal of Operational Research 243(2), 347–361 (2015)

12. Nissen, V., Propach, J.: Optimization with noisy function evaluations. In: Parallel

Problem Solving from Nature (PPSN V). pp. 159–168. Springer, Berlin (1998)

13. Rudolph, G.: A Partial Order Approach to Noisy Fitness Functions. In: IEEE

Congress on Evolutionary Computation (CEC 2001). pp. 318–325. IEEE Press,

Piscataway, NJ (2001)

14. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer

experiments. Statistical Science 4(4), 409–423 (1989)

15. Tuˇsar, T., Brockhoﬀ, D., Hansen, N., Auger, A.: COCO: The Bi-objective Black

Box Optimization Benchmarking (bbob-biobj) Test Suite. CoRR abs/1604.00359

(2016), retrieved: 22/12/2016

16. Zitzler, E., Knowles, J., Thiele, L.: Quality Assessment of Pareto Set Approxima-

tions. In: Branke, J., et al. (eds.) Multiobjective Optimization - Interactive and

Evolutionary Approaches, pp. 373–404. Springer, Berlin (2008)