Content uploaded by Geoffrey Pruvost

Author content

All content in this area was uploaded by Geoffrey Pruvost on May 04, 2020

Content may be subject to copyright.

On the Combined Impact of Population

Size and Sub-problem Selection

in MOEA/D

Geoﬀrey Pruvost1(B

), Bilel Derbel1(B

), Arnaud Liefooghe1(B

),KeLi

2(B

),

and Qingfu Zhang3(B

)

1University of Lille, CRIStAL, Inria, Lille, France

{geoffrey.pruvost,bilel.derbel,arnaud.liefooghe}@univ-lille.fr

2University of Exeter, Exeter, UK

k.li@exeter.ac.uk

3City University Hong Kong, Kowloon Tong, Hong Kong

qingfu.zhang@cityu.edu.hk

Abstract. This paper intends to understand and to improve the work-

ing principle of decomposition-based multi-objective evolutionary algo-

rithms. We review the design of the well-established Moea/d framework

to support the smooth integration of diﬀerent strategies for sub-problem

selection, while emphasizing the role of the population size and of the

number of oﬀspring created at each generation. By conducting a compre-

hensive empirical analysis on a wide range of multi- and many-objective

combinatorial NK landscapes, we provide new insights into the combined

eﬀect of those parameters on the anytime performance of the underlying

search process. In particular, we show that even a simple random strat-

egy selecting sub-problems at random outperforms existing sophisticated

strategies. We also study the sensitivity of such strategies with respect to

the ruggedness and the objective space dimension of the target problem.

1 Introduction

Context. Evolutionary multi-objective optimization (EMO) algorithms [7]have

been proved extremely eﬀective in computing a high-quality approximation of

the Pareto set, i.e., the set of solutions providing the best trade-oﬀs among the

objectives of a multi-objective combinatorial optimization problem (MCOP).

Since the working principle of an evolutionary algorithm (EA) is to evolve a

population of solutions, this population can be explicitly mapped with the tar-

get approximation set. The goal is then to improve the quality of the popula-

tion, and to guide its incumbent individuals to be as close and as diverse as

possible w.r.t. the (unknown) Pareto set. Existing EMO algorithms can be dis-

tinguished according to how the population is evolved. They are based on an

iterative process where at each iteration: (i) some individuals (parents) from

the population are selected, (ii) new individuals (oﬀspring) are generated using

variation operators (e.g., mutation, crossover) applied to the selected parents,

c

Springer Nature Switzerland AG 2020

L. Paquete and C. Zarges (Eds.): EvoCOP 2020, LNCS 12102, pp. 131–147, 2020.

https://doi.org/10.1007/978-3-030-43680-3_9

132 G. Pruvost et al.

and (iii) a replacement process updates the population with newly generated

individuals. Apart from the problem-dependent variation operators, the design

of selection and replacement is well-understood to be the main challenge for an

eﬃcient and eﬀective EMO algorithm, since these interdependent steps allow to

control both the convergence of the population and its diversity. In contrast with

dominance- (e.g., [7]) or indicator-based (e.g., [2]) approaches, aggregation-based

approaches [16] rely on the transformation of the objective values of a solution

into a scalar value, that can be used for selection and replacement. In this paper,

we are interested in studying the working principles of this class of algorithms,

while focusing on the so-called Moea/d (Multi-objective evolutionary algorithm

based on decomposition) [11,22], which can be considered as a state-of-the-art

framework.

Motivations. The Moea/d framework is based on the decomposition of the

original MCOP into a set of smaller sub-problems that are mapped to a popu-

lation of individuals. In its basic variant [22], Moea/d considers a set of single-

objective sub-problems deﬁned using a scalarizing function transforming a multi-

dimensional objective vector into a scalar value w.r.t. one weight (or direction)

vector in the objective space. The population is then typically structured by

mapping one individual to one sub-problem targeting a diﬀerent region of the

objective space. Individuals from the population are evolved following a coopera-

tive mechanism in order for each individual (i) to optimize its own sub-problem,

and also (ii) to help solving its neighboring sub-problems. The population hence

ends up having a good quality w.r.t. all sub-problems. Although being extremely

simple and ﬂexible, the computational ﬂow of Moea/d is constantly redesigned

to deal with diﬀerent issues. Diﬀerent Moea/d variants have been proposed so

far in the literature, e.g., to study the impact of elitist replacements [19], of

generational design [13], or of stable-matching based evolution [12], and other

mechanisms [1]. In this paper, we are interested in the interdependence between

the population size, which is implied by the number of sub-problems deﬁned in

the initial decomposition, and the internal evolution mechanisms of Moea/d.

The population size has a deep impact on the dynamics and performance of

EAs. In Moea/d, the sub-problems target diversiﬁed and representative regions

of the Pareto front. They are usually deﬁned to spread evenly in the objective

space. Depending on the shape of the (unknown) Pareto front, and on the num-

ber of objectives, one may need to deﬁne a diﬀerent number of sub-problems.

Since, the population is structured following the so-deﬁned sub-problems, it is

not clear how the robustness of the Moea/d selection and replacement strate-

gies can be impacted by a particular setting of the population size. Conversely,

it is not clear what population size shall be chosen, and how to design a selection

and replacement mechanism implying a high-quality approximation. Besides, the

proper setting of the population size (see e.g. [6,8,20]) in is EAs can depend on

the problem properties, for example in terms of solving diﬃculty. EMO algo-

rithms are no exceptions. In Moea/d, sub-problems may have diﬀerent charac-

teristics, and the selection and replacement mechanisms can be guided by such

Population Size and Sub-problem Selection in MOEA/D 133

considerations. This is for example the case for a number of Moea/d variants

where it is argued that some sub-problems might be more diﬃcult to solve than

others [18,23], and hence that the population shall be guided accordingly.

Methodology and Contribution. In this paper, we rely on the observation

that the guiding principle of Moea/d can be leveraged in order to support a

simple and high-level tunable design of the selection and replacement mecha-

nisms on one hand, while enabling a more ﬁne-grained control over the choice

of the population size, and subsequently its impact on approximation quality on

the other hand. More speciﬁcally, our work can be summarized as follows:

– We consider a revised design of Moea/d which explicitly dissociates between

three components: (i) the number of individuals selected at each generation,

(ii) the strategy adopted for selecting those individuals and (iii) the setting

of the population size. Although some sophisticated strategies to distribute

the computational eﬀort of sub-problems exploration were integrated within

some Moea/d variants [10,18,23], to the best of our knowledge, the indi-

vidual impact of such components were loosely studied in the past.

– Based on this ﬁne-grained revised design, we conduct a comprehensive analy-

sis about the impact of those three components on the convergence proﬁle of

Moea/d. Our analysis is conducted in an incremental manner, with the aim

of providing insights about the interdependence between those design com-

ponents. In particular, we show evidence that the number of sub-problems

selected at each generation plays an even more important role than the way

the sub-problems are selected. Sophisticated selection strategies from the lit-

erature are shown to be outperformed by simpler, well conﬁgured strategies.

– We consider a broad range of multi- and many-objective NK landscapes,

viewed as a standard and diﬃcult family of MCOP benchmarks, which is

both scalable in the number of objectives and exposes a controllable diﬃculty

in terms of ruggedness. By a thorough benchmarking eﬀort, we are then able

to better elicit the impact of the Moea/d population size, and the robustness

of selection strategies on the (anytime) approximation quality.

It is worth noticing that our work shall not be considered as yet another variant

in the Moea/d literature. In fact, our analysis precisely aims at enlightening the

main critical design parameters and components that can be hidden behind a suc-

cessful Moea/d setting. Our investigations are hence to be considered as a step

towards the establishment of a more advanced component-wise conﬁguration

methodology allowing the setting up of future high-quality decomposition-based

EMO algorithms for both multi- and many-objective optimization.

Outline. In Sect. 2, we recall basic deﬁnitions and we detail the working princi-

ple of Moea/d. In Sect. 3, we describe our contribution in rethinking Moea/d

by explicitly dissociating between the population size and the number of selected

sub-problems, then allowing us to leverage existing algorithms as instances of

the revised framework. In Sect. 4, we present our experimental study and we

134 G. Pruvost et al.

state our main ﬁndings. In Sect. 5, we conclude the paper and discuss further

research.

2 Background

2.1 Multi-objective Combinatorial Optimization

Amulti-objective combinatorial optimization problem (MCOP) can be deﬁned

by a set of Mobjective functions f=(f1,f

2,...,f

M), and a discrete set X

of feasible solutions in the decision space.LetZ=f(X)⊆IR Mbe the set of

feasible outcome vectors in the objective space. To each solution x∈Xis assigned

an objective vector z∈Z, on the basis of the vector function f:X→Z.In

a maximization context, an objective vector z∈Zis dominated by a vector

z∈Ziﬀ ∀m∈{1,...,M},zmz

mand ∃m∈{1,...,M}s.t. zm<z

m.A

solution x∈Xis dominated by a solution x∈Xiﬀ f(x) is dominated by

f(x). A solution x∈Xis Pareto optimal if there does not exist any other

solution x∈Xsuch that xis dominated by x. The set of all Pareto optimal

solutions is the Pareto set. Its mapping in the objective space is the Pareto

front. The size of the Pareto set is typically exponential in the problem size. Our

goal is to identify a good Pareto set approximation, for which EMO algorithms

constitute a popular eﬀective option [7]. As mentioned before, we are interested

in aggregation-based methods, and especially in the Moea/d framework which

is sketched below.

2.2 The Conventional MOEA/D Framework

Aggregation-based EMO algorithms seek good-performing solutions in multiple

regions of the Pareto front by decompo sing the original multi-objective problem

into a number of scalarized single-objective sub-problems [16]. In this paper, we

use the Chebyshev scalarizing function: g(x, ω) = maxi∈{1,...,M }ωi·

z

i−fi(x)

,

where x∈X,ω=(ω1,...,ω

M) is a positive weight vector, and z=(z

1,...,z

M)

is a reference point such that z

i>f

i(x)∀x∈X,i∈{1,...,M}.

In Moea/d [22], sub-problems are optimized cooperatively by deﬁning a

neighborhood relation between sub-problems. Given a set of μweight vec-

tors Wμ=(ω1,...,ω

μ), with ωj=(ωj

1,...,ω

j

M) for every j∈{1,...,μ},

deﬁning μsub-problems, Moea/d maintains a population Pμ=(x1,...,x

μ)

where each individual xjcorresponds to one sub-problem. For each sub-problem

j∈{1,...,μ}, a set of neighbors Bjis deﬁned by considering the Tclosest

weight vectors based on euclidean distance. All sub-problems are considered at

each generation. Given a sub-problem j, two sub-problems are selected at ran-

dom from Bj, and the two corresponding solutions are considered as parents.

An oﬀspring xis created by means of variation (e.g., crossover, mutation). For

every k∈B

j,ifximproves k’s current solution xk, then xreplaces it, i.e.,

if g(x,ω

k)<g(xj,ω

k) then xk=x. The algorithm loops over sub-problems,

i.e., weight vectors, or equivalently over the individuals in the population, until

Population Size and Sub-problem Selection in MOEA/D 135

a stopping condition is satisﬁed. In the conventional Moea/d terminology, an

iteration refers to making selection, oﬀspring generation, and replacement for one

sub-problem. By contrast, a generation consists in processing all sub-problems

once, i.e., after one generation μoﬀspring are generated. Notice that other issues

are also addressed, such as the update of the reference point zrequired by the

scalarizing function, and the option to incorporate an external archive for storing

all non-dominated points found so far during the search process.

From the previous description, it should be clear that, at each iteration,

Moea/d is applying an elitist (T+1)-EA w.r.t. the sub-population Biunderlying

the neighborhood of the current sub-problem. After one generation, one can

roughly view Moea/d as applying a (μ+μ)-EA w.r.t. the full population. A

noticeable diﬀerence is that the basic Moea/d is not a generational algorithm, in

the sense that it does not handle the population as a whole, but rather in a local

and greedy manner. This is actually a distinguishable feature of Moea/d, since

the population is structured by the initial sub-problems and evolved accordingly.

3 Revising and Leveraging the Design of MOEA/D

3.1 Positioning and Rationale

As in any EA, both the population size and the selection and replacement mech-

anisms of Moea/d play a crucially important role. Firstly, a number of weight,

i.e., a population size, that is too small may not only be insuﬃcient to cover well

the whole Pareto front, but may also prevent the identiﬁcation of high-quality

solutions for the deﬁned sub-problems. This is because the generation of new oﬀ-

spring is guided by the so-implied (T+1)-EA for which the local sub-population

of size Tmight be too diﬀerent and hence too restrictive for generating good

oﬀspring. On the other hand, a too large population may result in a substantial

waste of resources, since too many sub-problems might map to the same solu-

tion. Secondly, a small population size can be suﬃcient to approach the Pareto

front in a reduced number of steps. However, a larger population is preferable

to better cover the Pareto front. As in single-objective optimization, a larger

population might also help escaping local optima [20]. As a result, it is not clear

what is a proper setting of the population size in Moea/d, since the previously

discussed issues seem contradictory.

Although one can ﬁnd diﬀerent studies dealing with the impact of the pop-

ulation size in EAs [5,6,8,20], this issue is explicitly studied only to a small

extent, especially for decomposition-based multi- and many-objective optimiza-

tion [9,15]. For instance, in [8], oﬄine and online scheduling strategies for control-

ling the population size are coupled with SMS-EMOA [2], a well-known indicator-

based EMO algorithm, for bi-objective continuous benchmarks. Leveraging such

a study to combinatorial domains with more than two objectives, and within

the Moea/d framework, is however a diﬃcult question. Tightly related to the

population size, other studies investigate the distribution of the computational

eﬀort over the sub-problems [3,4,10,18,23]. The rationale is that the deﬁned

sub-problems might have diﬀerent degrees of diﬃculty and/or that the progress

136 G. Pruvost et al.

over some sub-problems might be more advanced than others in the course of

the search process. Hence, diﬀerent adaptive mechanisms have been designed in

order to detect which sub-problems to consider, or equivalently which solutions

to select when generating a new oﬀspring. A representative example of such

approaches is the so-called Moea/d–Dra (Moea/d with dynamical resource

allocation) [23], that can be considered as a state-of-the-art algorithm when deal-

ing with the proper distribution of the computational eﬀort over sub-problems.

In Moea/d–Dra, a utility function is deﬁned w.r.t. the current status of sub-

problems. A tournament selection is used to decide which sub-problems to select

when generating a new oﬀspring. Despite a skillful design, such an approach

stays focused on the relative diﬃculty of solving sub-problems, while omitting

to analyze the impact of the number of selected sub-problems and its interaction

with both the population size and the characteristics of the underlying problem.

We propose to revise the Moea/d framework and to study in a more explicit

and systematic manner the combined eﬀect of population size and sub-problem

selection in light of the properties of the MCOP at hand. As mentioned above,

this was investigated only to a small extent in the past although, as revealed

by our experimental ﬁndings, it is of critical importance to reach an optimal

performance when adopting the Moea/d framework.

3.2 The Proposed MOEA/D–(μ,λ,sps) Framework

In order to better study and analyze the combined eﬀect of population size and

sub-problem selection, we propose to rely on a revised framework for Moea/d,

denoted Moea/d–(μ,λ,sps), as deﬁned in the high-level template depicted in

Algorithm 1. This notation is inspired by the standard (μ+λ)-EA scheme, where

starting from a population of size μ,λnew individuals are generated and merged

to form a new population of size μafter replacement. In the Moea/d framework,

however, this has a speciﬁc meaning as detailed in the following.

The proposed Moea/d–(μ,λ,sps) algorithm follows the same steps as the

original Moea/d. However, it explicitly incorporates an additional component,

denoted sps, which stands for the sub-problem selection strategy. Initially, the

population is generated and mapped to the initial μweight vectors. An optional

external archive is also incorporated in the usual way with no eﬀect on the search

process. The algorithm then proceeds in diﬀerent generations (the outer while

loop). At each generation, λsub-problems, denoted Iλ, are selected using the

sps strategy. A broad range of deterministic and stochastic selection strategies

can be integrated. In particular, λcan be though as an intrinsic parameter of

the EMO algorithm itself, or implied by a speciﬁc sps strategy. The so-selected

sub-problems are processed in order to update the population (the inner for

loop). For the purpose of this paper, we adopt the same scheme than conven-

tional Moea/d: selected sub-problems are processed in an iterative manner,

although other generational EA schemes could be adopted. At each iteration,

that is for each selected sub-problem, denoted i, some parents are selected as

usual from the T-neighborhood Biof weight vector ωiw.r.t. Wμ. The setting of

the neighborhood Bican be exactly the same as in conventional Moea/d and

Population Size and Sub-problem Selection in MOEA/D 137

Algorithm 1. High level template of Moea/d–(μ,λ,sps)

Input: Wμ:= ω1,...,ω

μ: weights; g(·|ω): scalar function; T: neighb. size;

1EP ← ∅: (optional) external archive ;

2Pμ←x1,...,x

μ: generate and evaluate initial population of size μ;

3z←initialize reference point from Pμ;

4while StoppingCriteria do

5Iλ←sps(Wμ,Pμ,history);

6for i∈Iλdo

7Bi←the T-neighborhood of sub-problem iusing Wμ;

8X←matingSelection(Bi);

9x←varia tion( X);

10 F(x)←evaluate x;

11 EP ← update external archive using x;

12 z←update reference point using F(x);

13 Pμ←replacement(Pμ,x

,Bi|g);

14 history ←update search history;

its variants. However, at this step, it is important to emphasize that the consid-

ered neighborhood Biis w.r.t. the whole set of available weight vectors Wμ, that

is considering all the initially designed sub-problems, and not only the selected

ones. In particular, Bimay include some sub-problems that were not selected

by the sps strategy. This is motivated by the fact that parents that are likely to

produce a good oﬀspring should be deﬁned w.r.t. the population as a whole, and

not solely within the subset of active sub-problems at a given generation, which

might be restrictive. A new oﬀspring xis then generated using standard vari-

ation operators (e.g., crossover, mutation). The reference point required by the

scalarizing function and the optional external archive are updated. Thereafter,

the oﬀspring is considered for replacement as in the conventional Moea/d and

its variants. Here again, this is handled using the neighborhood Biof the current

sub-problem i, computed w.r.t. the whole population. It is worth noticing that

population update is made on the basis of the scalarizing function g, which is a

distinguishable feature of aggregation-based approaches.

At last, notice that we also use a history variable, referring to the evolution of

the search state, and hence serving as a memory where any relevant information

could be store for the future actions of the algorithm. In particular, we explicitly

integrate the history within the sps strategy, since this will allow us to leverage

some existing Moea/d variants, as further discussed below.

3.3 Discussion and Outlook

It shall be clear from the previous description that the Moea/d–(μ,λ,sps)

framework allows us to emphasize the interdependence between three main com-

ponents in a more ﬁne-grained manner while following the same working principle

than the original Moea/d. Firstly, the number of weight vectors, or equivalently

138 G. Pruvost et al.

Table 1. Diﬀerent instantiations of the Moea/d-(μ, λ, sps) framework.

Algorithm Pop. size # selected sub-prob. Selection strategy Ref.

Moea/d μ μ spsAll [22]

Moea/d–Dra μμ/5spsDra [23]

Moea/d–Rnd μ λ μspsRnd here

the population size, is now made more explicit. In fact, the set of weight vectors

now ‘simply’ plays the role of a global data structure to organize the individuals

from the population. This structure can be used at the selection and replacement

steps. In particular, one is not bound to iterate over all weight vectors, but might

instead select a subset of individuals following a particular strategy. Secondly, the

number of selected sub-problems λdetermines directly the number of oﬀspring

to be generated at each generation. From an exploration/exploitation perspec-

tive, we believe this is of critical importance in general for (μ+λ)-EAs, and it

is now made more explicit within the Moea/d framework. Furthermore, the λ

oﬀspring solutions are not simply generated from the individuals mapping to the

selected sub-problems. Instead, parent selection interacts directly with the whole

population, structured around the μweight vectors, since the local neighborhood

of each selected sub-problem may be used. Thirdly, the interaction between μ

and λis complemented more explicitly by the sub-problem selection strategy.

In conventional Moea/d for instance, the selection strategy turns out to be:

spsAll = ‘select all sub-problems’, with λ=μ. However, advanced Moea/d

variants can be captured as well. For instance, Moea/d–Dra [23], focusing

on the dynamic distribution of computations, can easily be instantiated as fol-

lows. For each sub-problem, we store and update the utility value as introduced

in [23] by using the history variable. Let us recall that in Moea/d–Dra,the

utility of a sub-problem is simply the amount of progress made by solution xi

for sub-problem ωiin terms of the scalarized ﬁtness value g(·|ωi) over diﬀerent

generations. In addition, Mboundary weight vectors (in the objective space) are

selected at each generation, and further (μ/5−M) weight vectors are selected

by means of a tournament selection of size 10. Hence, the sub-problem selection

strategy turns out to be spsDra = ‘select the boundary vectors and sub-problems

using a tournament selection of size 10’, with λ=μ/5. Notice that this choice

is to recall the one-ﬁfth success rule from (μ+λ) evolution strategies [14].

In the reminder, Moea/d–(μ,μ,spsAll) refers to the conventional Moea/d

as described in [22], and Moea/d–(μ,μ/5, spsDra) refers to Moea/d–Dra [23];

see Table 1. Other settings and parameters can be conveniently investigated as

well. Since we are interested in the combined eﬀect of μ,λand sps, we also con-

sider a simple baseline sub-problem selection strategy, denoted spsRnd,whichis

to select a subset of sub-problems uniformly at random. Notice that our empir-

ical analysis shall shed more lights on the behavior and the accuracy of the

existing spsDra strategy.

Population Size and Sub-problem Selection in MOEA/D 139

4 Experimental Analysis

4.1 Experimental Setup

Multi-objective NK Landscapes. We consider multi-objective NK land-

scapes as a problem-independent model of multi-objective multi-modal combina-

torial optimization problems [17]. Solutions are binary strings of size Nand the

objective vector to be maximized is deﬁned as f:{0,1}N→ [0,1]M. The param-

eter Kdeﬁnes the ruggedness of the problem, that is the number of (random)

variables that inﬂuence the contribution of a given variable to the objectives. By

increasing Kfrom0to(N−1), problems can be gradually tuned from smooth

to rugged. We consider instances with the following settings: the problem size is

set to N= 100, the number of objectives to M∈{2,3,4,5}, and the ruggedness

to K∈{0,1,2,4}, that is, from linear to highly rugged landscapes. We generate

one instance at random for each combination.

Parameter Setting. For our analysis, we consider three competing algorithms

extracted from the Moea/d–(μ,λ, sps) framework as depicted in Table 1.For

the conventional Moea/d, only one parameter is kept free, that is the popu-

lation size μ.ForMoea/d–Dra, the sub-problem selection strategy is imple-

mented as described in the original paper [23]. We further consider to exper-

iment Moea/d–Dra with other λvalues. Recall that in the original vari-

ant, only μ/5 sub-problems are selected, while including systematically the

Mboundary weight vectors. For fairness, we follow the same principle when

implementing the Moea/d–Rnd strategy. Notice that the boundary weight

vectors were shown to impact the coordinates of the reference point zused

by the scalarizing function [18]. They are then important to consider at each

generation. To summarize, for both Moea/d–Dra and Moea/d–Rnd, two

parameters are kept free, namely the population size μand the number of

selected sub-problems λ. They are chosen to cover a broad range of values,

from very small to relatively very high, namely, μ∈{1,10,50,100,500}and

λ∈{1,2,5,10,25,50,100,150,200,300,400,450,500}such that λμ.

The other common parameters are set as follows. The initial weights are

generated using the methodology described in [21]. The neighborhood size is

set to 20% of the population size: T=0.2μ. Two parents are considered for

mating selection, i.e., the parent selection in the neighborhood of a current sub-

problem i. The ﬁrst parent is the current solution xi, and the second one is

selected uniformly at random from Bi. Given that solutions are binary strings,

we use a two-point crossover operator and a bit-ﬂip mutation operator where each

bit is ﬂipped with a rate of 1/N .Moea/d–Dra involves additional parameters

which are set following the recommendations from [23].

Performance Evaluation. Given the large number of parameter values (more

than 2 000 diﬀerent conﬁgurations), and in order to keep our experiments man-

ageable in a reasonable amount of time, every conﬁguration is executed 10 inde-

pendent times, for a total of more than 20 000 runs. In order to appreciate the

140 G. Pruvost et al.

Fig. 1. Convergence proﬁle of the conventional Moea/d w.r.t. population size (μ).

convergence proﬁle and the anytime behavior of the competing algorithms, we

consider diﬀerent stopping conditions of 100,101,...,107calls to the evalua-

tion function. Notice however that due to lack of space, we shall only report our

ﬁndings on the basis of a representative set of our experimental data.

For performance assessment, we use the hypervolume indicator (hv)[24]to

assess the quality of the obtained approximation sets, the reference point being

set to the origin. More particularly, we consider the hypervolume relative devi-

ation, computed as hvrd(A)=(hv(R)−hv(A))/hv(R), where Ais the obtained

approximation set, and Ris the best Pareto front approximation, obtained by

aggregating the results over all executions and removing dominated points. As

such, a lower value is better. It is important to notice that we consider the exter-

nal archive, storing all non-dominated points found so far during the search pro-

cess, for performance assessment. This is particularly important when comparing

conﬁgurations using diﬀerent population sizes.

4.2 Impact of the Population Size: spsAll with Varying μVal ues

We start our analysis by studying the impact of the population size for the con-

ventional Moea/d, that is Moea/d–(μ,μ,spsAll) following our terminology. In

Population Size and Sub-problem Selection in MOEA/D 141

Fig. 1, we show the convergence proﬁle using diﬀerent μvalues for the considered

instances. Recall that hypervolume is measured on the external archive.

For a ﬁxed budget, a smaller population size allows the search process to focus

the computational eﬀort on fewer sub-problems, hence approaching the Pareto

front more quickly. By contrast, using a larger population implies more diver-

siﬁed solutions/sub-problems, and hence a better spreading along the Pareto

front. This is typically what we observe when a small and a large budget are

contrasted. In fact, a larger population size can be outperformed by a smaller one

for relatively small budgets, especially when the problem is quite smooth (K1)

and the number of objectives relatively high (M3). Notice also that it is not

straightforward to quantify what is meant by a ‘small’ population, depending on

the problem diﬃculty. For a linear bi-objective problem (M=2,K= 0), a par-

ticularly small population size of μ= 10 is suﬃcient to provide a relatively high

accuracy. However, for quadratic many-objective problems (M4, K= 1), a

small population size of μ=10(resp.μ= 50) is only eﬀective up to a budget of

about 104(resp. 105) evaluations.

To summarize, it appears that the approximation quality depends both on

the problem characteristics and on the available budget. For small budgets, a

small population size is to be preferred. However, as the available budget grows,

and as the problem diﬃculty increases in terms of ruggedness and number of

objectives, a larger population performs better. These ﬁrst observations suggest

that the anytime behavior of Moea/d can be improved by more advanced selec-

tion strategy, allowing to avoid wasting resources in processing a large number

of sub-problems at each iteration, as implied by the conventional spsAll strategy

which iterates over all sub-problems. This is further analyzed next.

4.3 Impact of the Sub-problem Selection Strategy

In order to fairly compare the diﬀerent selection strategies, we analyze the impact

of λ, i.e., the number of selected sub-problems, independently for each strategy.

It is worth-noticing that both the value of λand the selection strategy impact

the probability of selecting a weigh vector. Our results are depicted in Fig. 2

for spsDra and spsRnd, for diﬀerent budgets and on a representative subset of

instances. Other instances are not reported due to space restrictions. The main

observation is that the best setting for λdepends on the considered budget, on

the instance type, and on the sub-problem selection strategy itself.

Impact of λon spsRnd.For the random strategy spsRnd (Fig. 2, top), and for

smooth problems (K= 0), a small λvalue is found to perform better for a small

budget. As the available budget grows, the λvalue providing the best perfor-

mance starts to increase until it reaches the population size μ. In other words,

for small budgets one should select very few sub-problems at each generation,

whereas for large budgets selecting all sub-problems at each generation, as done

in the standard Moea/d, appears to be a more reasonable choice. However, this

tendency only holds for smooth many-objective problems. When the rugged-

ness increases, that is when the degree of non-linearity Kgrows, the eﬀect of λ

142 G. Pruvost et al.

Fig. 2. Quality vs. number of selected sub-problems (λ) w.r.t. budget (μ= 500).

changes. For the highest value of K= 4, the smallest value of λ= 1 still appears

to be eﬀective, independently of the available budget. However, the diﬀerence

with a large λvalue is seemingly less pronounced, especially for a relatively large

budget, and the eﬀect of λseems to decay as the ruggedness increases. Notice

also that for the ‘easiest’ problem instance (with K=0andM= 2), it is only

for a small budget or for a high λvalue that we observe a loss in performance. We

attribute this to the fact that, when the problem is harder, search improvements

are scarce within all sub-problems, it thus makes no diﬀerence to select few or

many of them at each generation. By contrast, when the problem is easier, it is

enough to select fewer sub-problems, as a small number of improving oﬀspring

solutions are likely suﬃcient to update the population.

Impact of λon spsDra.The impact of λappears to be diﬀerent when analyzing

the spsDra strategy (Fig. 2, bottom). In fact, the eﬀect of λseems relatively uni-

form, and its optimal setting less sensitive to the available budget and instance

type. More precisely, the smallest value of λ= 1 is always found to perform bet-

ter, while an increasing λvalue leads to a decrease in the overall approximation

quality. We attribute this to the adaptive nature of spsDra, for which the prob-

ability of selecting non-interesting sub-problems is smaller for lower λvalues.

Interestingly, in the original setting of Moea/d–Dra [23], from which spsDra

is extracted, the number of selected sub-problems is ﬁxed to μ/5. Not only we

found that this setting can be sub-optimal, but it can actually be substantially

outperformed by a simple setting of λ=1.

spsAll vs. spsDra vs. spsRnd.Having gained insights about the eﬀect of λfor the

diﬀerent selection strategies, we can fairly analyze their relative performance by

using their respective optimal setting for λ. We actually show results with λ=1

for both spsDra and spsRnd. Although this setting was shown to be optimal for

spsDra, it only provides a reasonably good (but sub-optimal) performance in

the case of the simple random spsRnd strategy, for which other λvalues can

be even more eﬃcient. Our results are shown in Fig.3for a subset of instances.

Population Size and Sub-problem Selection in MOEA/D 143

Fig. 3. Convergence proﬁle of MOEA/D–(μ,λ,sps) w.r.t. sub-problem selection strat-

egy (μ= 500; λ= 500 for spsAll ,λ∈{1,μ/5}for spsDra,andλ=1forspsRnd).

Table 2. Ranks and average hvrd value (between brackets, in percentage) obtained by

the diﬀerent sps strategies after 104,105,106,107evaluations (a lower value is better).

Results for spsRnd and spsDra are for λ= 1. For each budget and instance, a rank of c

indicates that the corresponding strategy was found to be signiﬁcantly outperformed

by cother strategies w.r.t. a Wilcoxon statistical test at a signiﬁcance level of 0.05.

Ranks in bold correspond to approaches that are not signiﬁcantly outperformed by any

other, and the underlined hvrd value corresponds to the best approach in average.

M K 104evaluations 105evaluations 106evaluations 107evaluations

spsAll spsDra spsRnd spsAll spsDra spsRnd spsAll spsDra spsRnd spsAll spsDra spsRnd

2 0 2(11.2) 1(10.5) 0(09.1) 2(09.4) 1(09.3) 0(09.0) 2(09.1) 0(09.0) 0(09.0) 0(09.0) 0(09.0) 2(09.0)

1 1(14.0) 1(13.2) 0(11.4) 1(10.9) 1(10.5) 0(09.5) 2(09.8) 1(09.6) 0(09.2) 2(09.5) 1(09.4) 0(09.2)

2 1(17.2) 0(15.6) 0(15.4) 1(13.0) 0(12.5) 0(11.7) 0(11.3) 0(10.9) 0(10.6) 0(10.2) 0(10.1) 0(09.8)

4 2(22.1) 0(19.5) 0(18.9) 1(17.5) 0(14.9) 0(16.0) 2(14.6) 0(13.3) 0(13.0) 0(13.1) 0(12.0) 0(12.2)

3 0 1(15.1) 1(15.0) 0(10.2) 1(11.0) 1(10.9) 0(09.2) 0(09.1) 0(09.1) 0(09.1) 0(09.0) 0(09.0) 2(09.1)

1 1(18.4) 1(18.4) 0(14.2) 1(13.3) 1(13.0) 0(10.5) 1(10.8) 1(10.5) 0(09.4) 1(09.5) 1(09.5) 0(09.0)

2 1(24.4) 1(23.4) 0(20.6) 1(16.3) 0(16.1) 0(14.7) 2(12.8) 1(12.3) 0(11.1) 1(11.7) 1(11.1) 0(09.4)

4 1(31.0) 0(29.9) 0(28.0) 0(22.7) 0(20.5) 0(21.4) 0(16.7) 0(15.4) 0(15.6) 0(13.6) 0(13.0) 0(12.2)

4 0 1(20.5) 1(20.5) 0(13.2) 1(12.9) 1(12.8) 0(09.9) 0(09.4) 0(09.4) 0(09.3) 0(09.0) 0(09.0) 2(09.2)

1 1(25.0) 1(24.9) 0(18.5) 1(15.2) 1(15.4) 0(11.8) 1(09.9) 1(09.8) 0(08.8) 1(08.4) 1(08.5) 0(07.9)

2 1(29.8) 1(30.6) 0(24.7) 1(19.5) 1(18.7) 0(16.0) 1(12.9) 1(12.3) 0(10.1) 1(09.8) 1(09.9) 0(07.9)

4 1(38.0) 1(37.1) 0(32.5) 1(26.4) 1(25.4) 0(22.5) 1(16.8) 1(16.6) 0(15.1) 0(11.6) 0(11.3) 0(10.5)

5 0 2(26.3) 1(25.2) 0(15.4) 1(14.1) 1(14.7) 0(10.2) 0(08.0) 1(08.3) 2(08.5) 0(07.4) 0(07.5) 2(08.0)

1 1(29.9) 1(29.9) 0(21.5) 1(16.5) 1(17.1) 0(13.0) 1(08.3) 1(08.5) 0(07.7) 0(05.9) 0(06.1) 1(06.1)

2 1(35.2) 1(34.1) 0(28.1) 1(21.4) 0(20.0) 0(17.6) 0(10.9) 0(10.5) 0(09.7) 0(06.8) 0(06.2) 0(05.3)

4 1(41.6) 1(40.7) 0(35.5) 1(26.5) 1(26.9) 0(24.1) 0(14.7) 0(15.1) 0(14.3) 0(06.0) 0(07.4) 0(07.4)

The spsAll strategy, corresponding to the conventional Moea/d [22], and spsDra

with λ=μ/5, corresponding to Moea/d–Dra [23], are also included. We can

see that the simple random selection strategy spsRnd has a substantially better

anytime behavior. In other words, selecting a single sub-problem at random is

likely to enable identifying a high-quality approximation set more quickly, for a

wide range of budgets, and independently of the instance type.

Pushing our analysis further, the only situation where a simple random strat-

egy is outperformed by the conventional Moea/d or by a Moea/d–Dra setting

144 G. Pruvost et al.

Fig. 4. Convergence proﬁle of MOEA/D–(μ,1,spsRnd ) w.r.t. population size (μ).

using an optimal λvalue is essentially for the very highest budget (107evalu-

ations) and when the problem is particularly smooth (K= 0). This can be

more clearly observed in Table 2, where the relative approximation quality of

the diﬀerent strategies are statistically compared for diﬀerent budgets. Remem-

ber however that these results are for λ= 1, which is shown to be an optimal

setting for spsDra, but not necessarily for spsRnd where higher λvalues perform

better.

4.4 Robustness of MOEA/D–(μ,λ,spsRnd) w.r.t. μand λ

In the previous section, the population size was ﬁxed to the highest value of

μ= 500. However, we have shown in Sect.4.2 that the anytime behavior of the

conventional Moea/d can be relatively sensitive to the setting of μ, in particular

for some instance types. Hence, we complement our analysis by studying the sen-

sitivity of the spsRnd strategy, which was found to have the best anytime behavior

overall, w.r.t the population size μ. Results for spsRnd with λ= 1 are reported

in Fig. 4. In contrast with the spsAll strategy from the conventional Moea/d

reported in Fig. 1, we can clearly see that the anytime behavior underlying spsRnd

is much more stable. In fact, the hypervolume increases with μ, independently

of the considered budget and instance type. Notice also that when using small

μvalues, convergence occurs much faster for smooth problems (K= 0) com-

pared against rugged ones (K= 4). This means that a larger population size μ,

combined with a small value of λ, shall be preferred.

From a more general perspective, this observation is quite insightful since

it indicates that, by increasing the number of weight vectors, one can allow

for a high-level structure of the population, being eventually very large. Notice

also that such a data structure can be maintained very eﬃciently in terms of

CPU time complexity, given the scalar nature of Moea/d. This is to contrast

with, e.g., dominance-based EMO algorithms, where maintaining a large popu-

lation may be computationally intensive, particularly for many-objective prob-

lems. Having such an eﬃcient structure, the issue turns out to select some sub-

problems from which the (large) population is updated. A random strategy for

sub-problem selection is found to work arguably well. However, in order to reach

Population Size and Sub-problem Selection in MOEA/D 145

an optimal performance, setting up the number of sub-problems λmight require

further conﬁguration issues. Overall, our analysis, reveals that a small λvalue,

typically ranging from 1 to 10, is recommended for relatively rugged problems,

whereas a large value of λshould be preferred for smoother problems.

5 Conclusions and Perspectives

In this paper, we reviewed the design principles of the Moea/d framework by

providing a high-level, but more precise, reformulation taking inspiration from

the (μ+λ) scheme from evolutionary computation. We analyzed the role of three

design components: the population size (μ), the number of sub-problems selected

(and then the number of oﬀspring generated) at each generation (λ), and the

strategy used for sub-problems selection (sps). Besides systematically informing

about the combined eﬀect of these components on the performance proﬁle of

the search process as a function of problem diﬃculty in terms of ruggedness and

objective space dimension, our analysis opens new challenging questions on the

design and practice of decomposition-based EMO algorithms.

Although we are now able to derive a parameter setting recommendation

according to the general properties of the problem at hand, such properties

might not always be known beforehand by the practitioner, and other properties

might be considered as well. For instance, one obvious perspective would be to

extend our analysis to the continuous domain. More importantly, an interest-

ing research line would be to infer the induced landscape properties in order to

learn the ‘best’ parameter setting, either oﬀ-line or on-line; i.e. before or during

the search process. This would not only avoid the need of additional algorithm

conﬁguration (tuning) eﬀorts, but it could also lead to an even better anytime

behavior. One might for instance consider an adaptive setting where the values

of μ,λ,andsps are adjusted according to the search behavior observed over

diﬀerent generations. Similarly, we believe that considering a problem where the

objectives expose some degree of heterogeneity, e.g., in terms of solving diﬃculty,

is worth investigating. In such a scenario, the design of an accurate sps strategy

is certainly a key issue. More generally, we advocate for a more systematic anal-

ysis of such considerations for improving our fundamental understanding of the

design issues behind Moea/d and EMO algorithms in general, of the key diﬀer-

ences between EMO algorithm classes, and of their success in solving challenging

multi- and many-objective optimization problems.

Acknowledgments. This work was supported by the French national research agency

(ANR-16-CE23-0013-01) and the Research Grants Council of Hong Kong (RGC Project

No. A-CityU101/16).

References

1. Aghabeig, M., Jaszkiewicz, A.: Experimental analysis of design elements of scalar-

izing function-based multiobjective evolutionary algorithms. Soft. Comput. 23(21),

10769–10780 (2018). https://doi.org/10.1007/s00500-018-3631-x

146 G. Pruvost et al.

2. Beume, N., Naujoks, B., Emmerich, M.: SMS-EMOA: multiobjective selection

based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)

3. Cai, X., Li, Y., Fan, Z., Zhang, Q.: An external archive guided multiobjective

evolutionary algorithm based on decomposition for combinatorial optimization.

IEEE Trans. Evolut. Comput. 19(4), 508–523 (2015)

4. Chiang, T., Lai, Y.: MOEA/D-AMS: improving MOEA/D by an adaptive mating

selection mechanism. In: CEC 2011, pp. 1473–1480 (2011)

5. Corus, D., Oliveto, P.S.: Standard steady state genetic algorithms can Hillclimb

faster than mutation-only evolutionary algorithms. IEEE Trans. Evol. Comput.

22(5), 720–732 (2018)

6. ˇ

Crepinˇsek, M., Liu, S.H., Mernik, M.: Exploration and exploitation in evolutionary

algorithms: a survey. ACM Comput. Surv. 45(3), 1–33 (2013)

7. Deb, K.: Multi-objective Optimization Using Evolutionary Algorithms. Wiley,

Hoboken (2001)

8. Glasmachers, T., Naujoks, B., Rudolph, G.: Start small, grow big? Saving multi-

objective function evaluations. In: Bartz-Beielstein, T., Branke, J., Filipiˇc, B.,

Smith, J. (eds.) PPSN 2014. LNCS, vol. 8672, pp. 579–588. Springer, Cham (2014).

https://doi.org/10.1007/978-3-319- 10762-2 57

9. Ishibuchi, H., Imada, R., Masuyama, N., Nojima, Y.: Two-layered weight vector

speciﬁcation in decomposition-based multi-objective algorithms for many-objective

optimization problems. In: CEC, pp. 2434–2441 (2019)

10. Lavinas, Y., Aranha, C., Ladeira, M.: Improving resource allocation in MOEA/D

with decision-space diversity metrics. In: Mart´ın-Vide, C., Pond, G., Vega-

Rodr´ıguez, M.A. (eds.) TPNC 2019. LNCS, vol. 11934, pp. 134–146. Springer,

Cham (2019). https://doi.org/10.1007/978-3-030-34500-6 9

11. Li, H., Zhang, Q.: Multiobjective optimization problems with complicated Pareto

sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 13(2), 284–302 (2009)

12. Li, K., Zhang, Q., Kwong, S., Li, M., Wang, R.: Stable matching-based selection

in evolutionary multiobjective optimization. IEEE TEC 18(6), 909–923 (2014)

13. Marquet, G., Derbel, B., Liefooghe, A., Talbi, E.-G.: Shake them all! Rethinking

selection and replacement in MOEA/D. In: Bartz-Beielstein, T., Branke, J., Filipiˇc,

B., Smith, J. (eds.) PPSN 2014. LNCS, vol. 8672, pp. 641–651. Springer, Cham

(2014). https://doi.org/10.1007/978-3-319- 10762-2 63

14. Schumer, M., Steiglitz, K.: Adaptive step size random search. IEEE Trans. Autom.

Control 13(3), 270–276 (1968)

15. Tanabe, R., Ishibuchi, H.: An analysis of control parameters of MOEA/D under

two diﬀerent optimization scenarios. Appl. Soft Comput. 70, 22–40 (2018)

16. Trivedi, A., Srinivasan, D., Sanyal, K., Ghosh, A.: A survey of multiobjective evo-

lutionary algorithms based on decomposition. IEEE TEC 21(3), 440–462 (2017)

17. Verel, S., Liefooghe, A., Jourdan, L., Dhaenens, C.: On the structure of multiob-

jective combinatorial search space: MNK-Landscapes with correlated objectives.

Eur. J. Oper. Res. 227(2), 331–342 (2013)

18. Wang, P., et al.: A new resource allocation strategy based on the relationship

between subproblems for MOEA/D. Inf. Sci. 501, 337–362 (2019)

19. Wang, Z., Zhang, Q., Zhou, A., Gong, M., Jiao, L.: Adaptive replacement strategies

for MOEA/D. IEEE Trans. Cybern. 46(2), 474–486 (2016)

20. Witt, C.: Population size versus runtime of a simple evolutionary algorithm. Theor.

Comput. Sci. 403(1), 104–120 (2008)

21. Zapotecas-Mart´ınez, S., Aguirre, H., Tanaka, K., Coello, C.: On the low-

discrepancy sequences and their use in MOEA/D for high-dimensional objective

Population Size and Sub-problem Selection in MOEA/D 147

spaces. In: Congress on Evolutionary Computation (CEC 2015), pp. 2835–2842

(2015)

22. Zhang, Q., Li, H.: MOEA/D: a multiobjective evolutionary algorithm based on

decomposition. IEEE Trans. Evol. Comput. 11(6), 712–731 (2007)

23. Zhou, A., Zhang, Q.: Are all the subproblems equally important? Resource alloca-

tion in decomposition-based multiobjective evolutionary algorithms. IEEE Trans.

Evol. Comput. 20(1), 52–64 (2016)

24. Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Grunert da Fonseca, V.:

Performance assessment of multiobjective optimizers: an analysis and review. IEEE

Trans. Evol. Comput. 7(2), 117–132 (2003)