Page 1
An Enhanced Statistical Approach for Evolutionary
Algorithm Comparison
Eduardo G. Carrano
Research Group for Intelligent
Systems  GPSI
Centro Federal de Educação
Tecnológica de Minas Gerais
Av. Amazonas, 5253  Nova
Suiça  30480000
Belo Horizonte  MG, Brazil
egcarrano@deii.cefetmg.br
Ricardo H. C. Takahashi
Dep. Mathematics
Universidade Federal de
Minas Gerais
Av. Antônio Carlos, 6627 
Pampulha  31270010
Belo Horizonte  MG, Brazil
taka@mat.ufmg.br
Elizabeth F. Wanner
Dep. Mathematics
Universidade Federal de Ouro
Preto
Campus Universitário, Morro
do Cruzeiro  35400000
Ouro Preto  MG, Brazil
efwanner@iceb.ufop.br
ABSTRACT
This paper presents an enhanced approach for comparing evolu
tionary algorithm. This approach is based on three statistical tech
niques: (a) Principal Component Analysis, which is used to make
the data uncorrelated; (b) Bootstrapping, which is employed to
build the probability distribution function of the merit functions;
and (c) Stochastic Dominance Analysis, that is employed to make
possible the comparison between two or more probability distri
bution functions. Since the approach proposed here is not based
on parametric properties, it can be applied to compare any kind of
quantity, regardless the probability distribution function. The re
sults achieved by the proposed approach have provided more sup
ported decisions than former approaches, when applied to the same
problems.
Categories and Subject Descriptors
G.1.6 [Mathematic of Computing]: Numerical Analysis—Opti
mization
General Terms
Algorithms
Keywords
evolutionary algorithms, algorithm comparison, evolutionary en
coding schemes, tree network design
1. INTRODUCTION
When deterministic optimization algorithms are compared, their
performances are characterized by their computational complexity
only. Since those algorithms always perform the same sequence
of deterministic steps, it is guaranteed under some assumptions
that, starting from a given initial point, an algorithm converges (i.e.,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GECCO’08, July 12–16, 2008, Atlanta, Georgia, USA.
Copyright 2008 ACM 9781605581309/08/07...$5.00.
reaches a stop criterion) in a fixed number of algorithm iterations
(the solution that is reached is not necessarily the global optimum
of the problem) [5].
The performance analysis of evolutionary algorithms cannot fol
low the same methodology. The stochastic nature of those algo
rithms introduces another issue that must be considered: it is not
guaranteed, in any single run, the achievement of the same solu
tion that has been found in another run and, even when the same
solution is reached, the computational effort spent to obtain this
solution varies in different runs of the same algorithm [5].
The flexible structure of the evolutionary algorithms makes pos
sible to build them in several different ways, what leads to different
algorithms which usually present different performance. This com
binatorial scenario of possible algorithms motivates the efforts for
developing evaluation/comparison methods for evolutionary algo
rithms, such as the ones presented in [5, 4, 18, 19, 1].
Some of the approaches proposed in literature state the assump
tion that, in some cases, the convergence is almost ensured, and
thereforecouldbeassumedastrue. Therefore, itispossibletocom
pare the algorithms based on a single criterion: the computational
effort required to reach the optimum. In this approach, the com
putational cost of the algorithm is modeled as a random variable,
and some statistical analysis are employed to infer on this variable
[5]. Although this approach can be useful in some particular situ
ations, the assumption of convergence is too strong. Notice that in
large NPhard problems, for instance, the evolutionary algorithms
are expected to find “good” suboptimal solutions only, and the as
sumption of finding the exact optimum is infeasible, in practice.
There are similar approaches which fix the computational cost, es
tablishing a predetermined number of algorithm generations. This
approach also can carry problems, since the computational cost be
comes an arbitrary parameter, that can hardly affect the algorithm
analysis.
It is reasonable to consider the tradeoff between faster algo
rithms that lead to rough solutions, and slower algorithms that de
liver more accurate solutions. A feasible way for comparing algo
rithms in which the convergence is not ensured is the employment
of multicriteria comparison schemes, such as discussed in [19] and
presented in [18]. In this kind of approach, the convergence ability
and the computational cost of the algorithms are modeled as ran
dom variables, and a statistical estimator, such as the mean, is em
ployed to allow comparisons. The reference [18] for example, uses
the number of function evaluations to estimate the computational
cost of the algorithm and the convergence rate to estimate its con
vergence capacity. These criteria are evaluated through their mean,
897
Page 2
and a multiobjective analysis [9] is employed to find which algo
rithms can be considered efficient. Since a multiobjective analysis
isadopted, theresultofthisapproachisasetofefficientalgorithms,
instead of a single “best one”.
Although the approach described above is more well suited than
a singlecriterion one, it still presents an weakness: the compar
isons are made based only in mean values, what often implies the
loss of the information about how the points are spread around that
mean. This paper proposes a multiobjective comparison approach
ofevolutionary algorithmsthatisbasedontheconcept ofstochastic
dominance [16], instead of using comparisons of the mean values
only. It makes possible to consider the deviation of the criteria
around mean, what consequently provides more supported deci
sions. The approximated probability distribution functions (PDF)
are built using a Bootstrapping procedure [6]. Since this approach
is nonparametric, it can be employed for comparing any data set,
regardless the distribution. The results achieved have shown that
the approach has a high capacity of detecting significant statistical
differences between algorithms.
The paper is structured as follows:
• Section 2 introduces the statistical concepts which are used
in the comparison approach;
• Section 3 presents the methodology of the comparison pro
posed in this work;
• Section 4 presents the results achieved by the comparison
approach in three instances of a discrete problem. The results
obtained by the approach are compared with previous results
for the same problem.
2. CONCEPTUAL BACKGROUND
The concepts which are employed in the comparison approach
proposed in this paper are briefly introduced in this section.
2.1Principal Component Analysis
1Principal Component Analysis (PCA) is a mathematical pro
cedure that is employed to transform a set of correlated variables
into a new set (sometimes smaller) of uncorrelated variables. The
new variables are arranged in such a way that they are sorted by
variability, with the first component accounting for as much of the
variability in the data as possible, and each succeeding component
accounts for as much of the remaining variability as possible. The
variables which represent a significant part of variability are called
principal components. Sometimes, this procedure is called Proper
Orthogonal Decomposition (POD) or Hotelling Transform.
The PCA is usually employed for some specific tasks:
• Reduce the dimensionality of a data set;
• Identify new meaningful underlying variables;
• Find the uncorrelated data, to analyze the variables indepen
dently.
Let X be a set of M variables with N observations each. The PCA
of X can be performed in 13 steps:
1. Arrange the data in N data vectors, X1,...,XN. Each vector Xi
is a column vector (M × 1) with a single observation of the M
variables.
2. Build a matrix X (M × N) with the column vectors.
1The description of Principal Component Analysis presented in
this section has been adapted from the references [10, 11, 13].
3. Find the empirical mean of each variable m = 1,...,M:
N
?
4. Subtract the empirical mean u from each column of matrix X:
B = X − u · h
where h[1,m] = 1 ∀ m = [1,...,M].
5. Find the empirical covariance matrix C (M × M):
C =
NB · B∗
where B∗is the conjugate transpose of matrix B.
6. Find the eigenvalues of matrix C and build a column vector E
(M × 1).
7. Find the matrix of eigenvectors V in which:
V−1· C · V = D
where D is a diagonal matrix, such that:
D[m,m] = E(m,1) ∀ m = [1,...,M].
8. Sort the columns of V and D in descendant order of eigenvalues.
9. Find the cumulative energy content for each eigenvector:
M
?
• Eigenvectors with higher cumulative energy represent most
part of variance than eigenvectors with lower cumulative en
ergy.
u[m,1] =
1
N
n=1
X[m,n].
1
g[m] =
i=1
D[i,i] ∀ m = [1,...,M]
10. Select a subset of the eigenvectors as basis vectors:
W[p,q] = V [p.q] ∀ p = [1,...,M] q = [1,...,L]
where 1 ≤ L ≤ M.
• The vector g can be used as a guide for choosing L. For in
stance, L can be chosen such that: g[m = L] ≥ 0.9.
11. Find the empirical standard deviation vector s (M × 1):
s[m,1] =
12. Calculate the zscore matrix Z (M × N):
Z =
s · h(divide element by element).
13. Project the zscores of the data onto the new basis:
Y = W∗· Z.
The result of this procedure is a set of data Y , which is uncorre
lated, and is fitted to a lower dimension (L ≤ M). This procedure
is useful in the analysis of high dimension data.
In the specific case of this work, the PCA is used only to make
the data uncorrelated, regardless the dimensionreduction that could
be exploited. Therefore, only the steps 1 to 8 are required. The
uncorrelated data, is obtained through (1).
?C(m,m) ∀ m = [1,...,M].
B
Y = V · X
(1)
In evolutionary algorithm comparison, at least two merit criteria
should be considered: a criterion which estimates the convergence
capacityofthealgorithmandacriterionwhichestimatesitscompu
tational cost [18]. Therefore, each algorithm run results in a point
in a twodimensional space (or in higher dimension, if more criteria
are considered), which is often correlated. The decoupling of data
can be useful to make meaningful the independent analysis of each
merit criterion considered.
2.2Bootstrapping
2Bootstrapping is a statistical procedure employed to infer the
properties of an estimator (such as mean or variance) by sampling
from an approximating distribution. The empirical distribution ob
tained in some observations is generally used as approximating dis
tribution.
2The description of Bootstrapping presented in this section has
been adapted from the references [2, 6, 7, 8].
898
Page 3
The bootstrapping procedure can be useful in the following situ
ations:
• build hypothesis tests;
• make possible inferences based on parametric assumptions;
• make possible inferences in nonparametric data sets, us
ing distribution comparison methods, such as nonparametric
tests or stochastic dominance.
Let P = {x1,x2,x3,...,xN} be a population and
S = {X1,X2,X3,...,xn} be an independent sample extracted
from P (where n is much smaller than N). The properties of an
estimator T for the population, Θ = T(P) can be inferred from
the sample, θ = T(S), by the following steps:
1. Extract one sample Siof size n from S with replacement.
2. Calculate the value of the estimator T for the sample Si.
3. Repeat the steps 1 and 2 for a predetermined number of
times.
4. Build the probability density function of the estimator T for
P using the values obtained in step 2.
From the distribution probability function obtained in bootstrap
ping, it is possible to find confidence intervals, quantiles distribu
tion, etc, which can be useful for further analysis.
The main advantage of bootstrapping over analytical methods is
its simplicity – it is easy to employ the bootstrapping in order to
estimate standard errors and confidence intervals for estimators of
complex parameters of the distribution, such as percentile points,
proportions and correlation coefficients.
In this work, the bootstrapping is employed for estimating the
independentprobabilitydensityfunction(PDF)ofthemeanofeach
merit criterion. Since data is uncorrelated (which is guaranteed by
the employment of PCA) it is possible to perform analysis over the
independent PDFs instead of using a joint PDF. The comparison of
evolutionary algorithms using the PDF can provide more supported
decisions than approaches which are based only in mean values,
which is commonly adopted for this task.
2.3First Order Stochastic Dominance
Stochastic dominance is a statistical concept employed to com
pare two (or more) distribution functions. Since it is not dependent
on parametric distributions it can be employed for comparing any
entity, provided that the PDFs are known. The stochastic domi
nance can be defined as follows:
DEFINITION 2.1. Stochastic Dominance: Consider a problem
of minimization. A random variable X1 presents stochastic domi
nance of first order over another random variable X2if:
cdf(X1) ≥ cdf(X2)
∀ xi ∈ X1∪ X2
(2)
where cdf(X) is the cumulative distribution function of X.
?
The stochastic dominance is illustrated in Figure 1.
Once the stochastic dominance is going to be evaluated compu
tationally, a discrete approximation of the concept becomes nec
essary, since it is not possible to process an infinitedimensional
variable (represented by a continuous function over a noncompact
interval). This approximation can be calculated considering a fi
nite number of values of the cumulative distribution function of the
variables, what leads to an interval comparison, as follows.
x
cdf(X1)
cdf(X2)
Figure 1: 1storder stochastic dominance example: X1 dom
inates X2, since the quantiles of X1 always occur before the
quantiles of X2(minimization problem).
Let X1be a random variable, and xq1
the points of X1which define the quantiles,
q0,q
1
1,xq2
1,...,xqi
1,...,xqn
1 be
n−1,...,q i−1
n−1,...,q1, as shown in Figure 23.
X1
xq1
1
xq2
1
xq3
1
xqi
1
xqn−1
1
xqn
1
q0
q
1
n−1
q
2
n−1
q i−1
n−1
q n−2
n−1
q1
P
?
?
?
?
?
?
X1< xq1
1
?
?
?
?
= 0
PX1< xq2
1
=
1
n−1
PX1< xq3
1
=
2
n−1
PX1< xqi
1
=
i−1
n−1
PX1< xqn−1
1
?
=n−2
n−1
PX1< xqn
1
?
= 1
Figure 2: Random variable X1
Let X2 be another random variable, and xq1
be the points of X2which define the same quantiles,
q0,q
1
It is said that X1“is better than” X2if, and only if:
2,...,xqi
2,...,xqn
2
n−1,...,q i−1
n−1,...,q1.
xqi
1
xqi
1
≤
<
xqi
2
xqi
2
∀ i ∈ [1,n]
for some i ∈ [1,n]
(3)
One should note that, when n → ∞, (3) is equivalent to First
Order Stochastic Dominance [16].
3.COMPARISON METHODOLOGY
In the proposed approach, two merit criteria have been estab
lished for comparison:
• Number of function evaluations performed by the algorithm
(nfe), which is employed for estimating the computational
3A point xkof X defines the quantile qαif P(X < xk) = α.
899
Page 4
cost required by the algorithm. Optionally, the computation
time of the algorithm could be used as an indicator of the
algorithm cost [19].
• Objective function value of the best solution achieved by the
algorithm (fbs), which is employed for estimating the con
vergence ability of the algorithm.
It is straightforward to note that each algorithm run results in a
point in the 2 dimension space [nfe, fbs]T.
The algorithm comparison methodology proposed in this paper
is performed through the following steps:
1. Each algorithm is executed nrtimes, in a given test problem;
2. In each run, the number of objective function evaluations up
to the stopping criterion being reached, nfe, and the best
reached value of objective function, fbs, are recorded;
3. The whole data achieved (all runs performed by all algo
rithms) is then submitted to a PCA, in order to obtain un
correlated data with regard to the merit functions;
4. For each merit criterion:
(a) A bootstrapping procedure is performed over the un
correlated data:
• Bootstrapping is employed to estimate the proba
bility distribution function of the estimator mean,
for all algorithms analyzed.
(b) The stochastic dominance analysis is used to rank the
algorithms, based on the PDFs found in bootstrapping.
The algorithms which are not dominated by any other
one receive rank 1, the algorithms which are dominated
only by the algorithms of rank 1 receive rank 2, and so
forth.
Thisprocedureprovidesarankofthealgorithmsineachcriterion
under which they are being analyzed. A multiobjective analysis can
be easily derived from this rank:
• If an algorithm A is better than an algorithm B in one cri
terion without being worse in the other, it is possible to say
that A dominates B, taking into account all the merit func
tions considered.
It is possible to note that this approach is more powerful than for
mer methods that are based on mean values, since it considers the
whole distribution instead of considering only the mean. It makes
possible to take into account the spread of the merit function values
for each one of the algorithms.
It is important to make clear that the accuracy of this approach is
hardly dependant on the quality of the estimated PDFs. When the
PDFs present high variability over different bootstrapping runs, it is
possible to take conclusions which does not reflect exactly what is
present in data. Higher samples tends to provide better PDF estima
tions, and should be used when possible. However, as can be seen
in results section, the approach proposed in this paper has leaded to
reliable results, even for small samples (15 and 30 points), showing
that this approach can be useful in complex problems (where each
algorithm run requires high computation time). Finally, the use of
unidimensional PDFs instead of using a joint PDF (which consid
ers all merit functions jointly) does not introduces analysis errors,
since the data is uncorrelated by PCA.
4.NUMERICAL RESULTS
The comparison methodology proposed has been tested in the
same instances proposed in [1], for the Optimal Communication
Spanning Tree problem (OCST). The comparison approach pro
posed in that paper has been used as a benchmark for testing the
approach proposed in this work. The comparison method pro
posed in [1] is based on KruskalWallis Nonparametric Tests and
Multiple Comparisons [3]. Although this method is general, and
consequently can be applied to any kind of distribution, it is less
powerful than parametric approaches, due to the characteristics of
nonparametric tests: the lower power of those tests increases the
chance of do not find significant difference between algorithms that
could be considered different in practical situations. It is expected
that the approach proposed in this paper presents higher capacity of
discriminating between algorithms that could be significantly dif
ferent for practical purposes.
4.1Optimal Communication Spanning Tree
In Optimal Communication Spanning Tree problem [12], the al
gorithm looks for the spanning tree which presents minimum cost
and complies with the communication requirements between the
nodes [17]. This problem has been proved to be MAX SNPhard
[14, 17].
The problem can be stated as follows:
?
where:
CX
Ri,j is the communication requirement between i and j.
V is the set of vertices.
The only constraint considered here is that the network must be
a tree. Additionally, constraints which limit the maximum degree
of each node could be considered, for modeling equipments used
in real cases, such as hubs or switches (which present a limited
number of ports).
min
i,j∈V
Ri,j· CX
i,j
(4)
i,jis the sum of weights of edges in path i − j.
Such as in the reference [1], the comparison approach has been
employed to compare 12 genetic algorithms, when they are applied
to solve 10, 25 and 50 node instances of the OCST problem.
In order to make possible the comparison of the approaches, the
best function value and the computation time have been considered
here as merit criteria. The algorithms have been tested on a Pen
tium IV (Prescott) at 3.2GHz with 1GB of RAM, using the Matlab
7 environment. Although the computation times are not compara
ble with other approaches, since they are strictly dependent of the
hardware and software used, the time ratio between the methods
can provide useful information about the computational cost of the
methods. They have been labeled as follows:
Label
A
B
C
D
E
F
G
H
I
J
K
L
Decoding
Characteristic vector
Prüfer numbers
Prüfer numbers
Network random keys
Network random keys
Edge sets
Edge sets
Node biased
Node biased
Link and node biased
Link and node biased

Crossover
single point
single point
single point
single point
single point
single point
single point
single point
single point
single point
single point
kruskal
Mutation
swap
point
swap
point
swap
point
swap
point
swap
point
swap
kruskal
Finally, the following parameters have been adopted in all simu
lations:
900
Page 5
• Number of runs: 30 runs per method;
• Population size: 50 individuals;
• Crossover probability: 0.80;
• Mutation probability: 0.45 (per individual);
• Linear ranking and roulettewheel selection;
• Generational replacement with elitism;
• Stop criterion: 100 generations without improvement.
4.2 Comparison Results  Full Data
The results achieved by the former approach [1] (which will be
referred here as Kruskal+MC) have been reproduced here to make
easier the evaluation of the comparison methods. The tables 1 and
2 show the results (convergence time and best function value) ob
tained on three instances of the OCST problem. The ordering of the
algorithm performances provided by both comparison approaches
is presented together with the tables. The following notation has
been adopted for exposing the results: the methods are listed in de
scending order of performance; in this list, underlining and overlin
ing are used to indicate which methods did not exhibit statistically
significant differences from one to the other one.
Table 1: OCST (30 runs)  Best function value
10 nodes25 nodes
Lab.avgsdavg
A
3.65e57.25e3 2.58e6
B
3.67e51.24e4 2.90e6
C
3.74e5 1.62e42.85e6
D
3.75e5 1.59e42.93e6
E
3.74e51.57e4 2.80e6
F
3.64e5 2.57e6
G
3.64e5 2.60e6
H
3.64e5 2.56e6
I
3.64e5 1.65e32.58e6
J
3.64e5 1.44e32.64e6
K
3.64e5 9.32e22.62e6
L
3.64e52.57e6
50 nodes
avg
1.19e7
1.32e7
1.30e7
1.42e7
1.42e7
1.16e7
1.25e7
1.11e7
1.12e7
1.17e7
1.17e7
1.20e7
sdsd
1.09e4
2.01e5
2.00e5
2.81e5
1.59e5
9.89e3
2.13e4
6.24e3
1.02e4
3.37e4
3.16e4
5.60e3
3.52e5
7.53e5
8.96e5
1.36e6
1.64e6
2.28e5
7.30e5
2.53e4
3.97e4
2.98e5
3.07e5
4.99e5
Kruskal+MC:
10 nodes:
FG
25 nodes:
HLKIJABECD
HLFIAGKJECBD
50 nodes:
HIFKJALGCBED
Proposed approach:
10 nodes:
LF
25 nodes:
GHKIJABECD
HLAIFGKJECBD
50 nodes:
HIFKJALGCBED
The Tables 1 and 2 show that the comparison methods have pre
sented coherent results: algorithms which have been classified as
efficient by one method have also received similar classification by
Table 2: OCST (30 runs)  Computation time (ms)
10 nodes 25 nodes
Lab.avgsd
A
1.956e42.628e3 1.325e5
B
8.984e37.426e2 3.009e4
C
8.578e3 1.655e32.246e4
D
1.492e42.255e37.437e4
E
1.648e4 3.692e37.306e4
F
1.516e4 1.225e36.471e4
G
1.638e42.266e37.306e4
H
1.648e48.892e2 7.474e4
I
1.682e44.310e25.902e4
J
1.773e41.085e31.103e5
K
1.913e4 1.785e31.321e5
L
2.001e4 1.745e39.654e4
50 nodes
avg
7.317e5
1.917e5
1.272e5
3.376e5
3.194e5
4.280e5
3.412e5
5.401e5
3.409e5
7.909e5
8.369e5
6.972e5
avgsdsd
2.235e4
7.329e3
4.737e3
1.828e4
2.400e4
9.189e3
2.147e4
1.331e4
1.546e4
2.824e4
4.066e4
1.227e4
2.262e5
4.363e4
4.821e4
1.046e5
1.132e5
9.074e4
1.241e5
1.531e5
9.116e4
2.169e5
2.968e5
1.792e5
Kruskal+MC:
10 nodes:
CB
25 nodes:
DFGEHIJKAL
CBFEGDIHLJAK
50 nodes:
CBEDGFIHLAJK
Proposed approach:
10 nodes:
CB
25 nodes:
CB
50 nodes:
CB
FDGHIEJKAL
FGIEHDLAJK
IGFHEDLAJK
the other one. However, it is important to see that the approach
proposed in this paper has shown the capacity of distinguishing
between some algorithms that have been considered equivalent by
the Kruskal+MC method. This was expected, and can be cred
ited to the higher power of the statistical tests employed. The
Kruskal+MC method presents some intransitivity in classification
that can make the decision process harder: for instance, in Table 1
(for 25 nodes), the Kruskal+MC determined that the algorithm H
is equivalent to L and L is equivalent to G, however H is consid
ered statistically better than G. This phenomenon has not occurred
in the approach proposed in this work.
FromTable1andtheresultsobtainedbythebootstrappingbased
approach, it is possible to conclude that the algorithm H has pre
sented the best convergence performance. This conclusion could
not be safely taken with the Kruskal+MC approach, since it has not
detected any significantdifferencebetween H andI inany instance
considered. With respect to the convergence time, the comparison
approach proposed in this work has stated that the algorithm C is
significantly faster than other ones. Once more, this conclusion
could not be drawn by the Kruskal+MC approach, since it could
not distinguish between B and C.
As discussed in section 3 it is possible to compare the algorithms
considering both criteria jointly. It is assumed that one algorithm is
better than other if it is better in one criterion without being worse
in the other one. This methodology leads to the following classifi
cation:
901
Page 6
10 nodes:
L
25 nodes:
G
50 nodes:
A
FBGHCIJDKEA
CFHIBLAEDJK
CHIBFGJKDEL
This classification suggests that the algorithms H and C are the
best ones with regard to convergence capacity and time required
respectively. The algorithms F and I also represent intermediate
choices, presenting good convergence (however lower than H) and
requiring a computation time smaller than H. The choice between
these algorithms should be taken considering the time available to
achieve a good solution. In the authors’ opinion the algorithms H
and I are the most suitable ones in this case. The algorithm C
should not be taken as a reasonable choice, since it presents very
poor convergence performance, and could lead to expensive solu
tions.
4.3 Comparison Results  Reduced Data
A lower dimension set of 15 algorithm runs has been built by
sampling (without replacement) 15 runs of the 30 previously per
formed. The intention of this test is to evaluate the effect of the
reduction of the number of algorithm runs in the results achieved
by the comparison approaches. If one comparison method requires
a smaller set of runs to efficiently compare two or more algorithms
it can be considered more suitable, since the acquisition of addi
tional data to perform the comparison carries a high computational
cost (it is required a whole algorithm run to find a single point for
comparison).
From Tables 3 and 4, it is noticeable that the smaller set of al
gorithm runs has reduced the capacity of Kruskal+MC detects sig
nificant difference between the algorithms. When this comparison
method is applied over the reduced set, it provides a sorting simi
lar to one achieved in the higher set (with 30 runs). However the
uncertainty about the extent to which each method is better than
other ones increases significantly, what makes harder the decision
of which algorithm should be used.
On the other hand, the smaller number of algorithm runs does
not carry too many losses to the bootstrapping based approach. It
is obvious that some information is lost, since a small sample of
ten implies in a less reliable approximating function. However, at
least in the particular case discussed in this paper, this loss has not
caused significant changes in algorithm sorting. This is an inter
esting property of the proposed method, since it seems to provide
efficient comparisons from small sets of algorithm runs.
5.CONCLUSION AND FUTURE WORK
5.1 Concluding Remarks
This paper has proposed a new methodology for comparison of
evolutionary algorithms that is based on the techniques of (a) Prin
cipal Component Analysis, which is used to make the data uncorre
lated; (b) Bootstrapping, which is employed to build the probabil
ity distribution function of the merit functions; and (c) Stochastic
Dominance Analysis, which performs a multicriteria ordering be
tween algorithms, considering their ability to reach solution accu
racy and the related computational effort.
The proposed technique has presented a high capability to dis
criminate between different algorithms, allowing to detect statisti
cally significant differences between performances which were not
detected by former methodologies.
Table 3: OCST (15 runs)  Best function value
10 nodes 25 nodes
Lab. avgsd avg
A
3.64e5 2.57e6
B
3.67e51.16e42.93e6
C
3.73e51.58e42.84e6
D
3.71e51.27e42.92e6
E
3.69e59.57e42.85e6
F
3.64e52.57e6
G
3.64e5 2.60e6
H
3.64e52.57e6
I
3.64e5 2.13e32.58e6
J
3.64e5 1.33e32.65e6
K
3.64e59.76e22.63e6
L
3.64e5 2.57e6
50 nodes
avg
1.19e7
1.32e7
1.28e7
1.42e7
1.42e7
1.16e7
1.25e7
1.11e7
1.12e7
1.17e7
1.17e7
1.19e7
sdsd
8.76e3
2.29e5
1.63e5
2.19e5
1.91e5
9.26e3
2.33e4
8.40e3
1.22e4
3.92e4
2.70e4
5.10e3
3.03e5
9.19e5
6.72e5
1.43e6
1.59e6
1.98e5
7.64e5
8.60e3
3.51e4
3.35e5
3.78e5
3.78e5
Kruskal+MC:
10 nodes:
AFGHLKJIBEDC
25 nodes:
HLFAIGKJCEDB
50 nodes:
HIFJKALGCBED
Proposed approach:
10 nodes:
AF
25 nodes:
HL
50 nodes:
HI
GHLKJIBEDC
AFIGKJCEDB
FJKALGCBED
5.2Future Work
There are some points which are being considered as future ex
tensions of this work:
• To exploit a parametric version of bootstrapping: when ap
plied to estimate the characteristics of the operator mean the
bootstrapping often returns normal (or normallike) distribu
tions, what can be proved by the Central Limit Theorem [15].
This property can be exploited in order to try to differen
tiate algorithms which have been considered similar in the
stochastic dominance analysis.
• To extend the proposed comparison approach for multiob
jective problems: the approach proposed in this paper can be
extended to multiobjective problems by replacing the con
vergence criteria by some Pareto quality indicator. Scalar
Paretoset quality metrics, such as the SMetric and the Inte
grated Sphere Counting can be used for this task.
6.ACKNOWLEDGMENTS
The authors would like to thank Brazilian agencies Capes, CNPq
and Fapemig by the financial support.
7.
[1] E. G. Carrano, C. M. Fonseca, R. H. C. Takahashi, L. C. A.
Pimenta, and O. M. Neto. A preliminary comparison of tree
REFERENCES
902
Page 7
Table 4: OCST (15 runs)  Computation time (ms)
10 nodes25 nodes
Lab.avgsd
A
1.986e42.664e31.478e5
B
8.974e38.852e23.965e4
C
8.730e31.601e3 3.072e4
D
1.570e42.495e39.002e4
E
1.519e42.476e3 8.940e4
F
1.531e41.452e3 8.035e4
G
1.692e4 2.438e3 8.970e4
H
1.645e44.474e21.165e5
I
1.673e43.829e29.570e4
J
1.745e41.116e31.520e5
K
1.912e41.563e3 1.637e5
L
2.009e41.218e3 1.221e5
50 nodes
avg
7.842e5
2.075e5
1.466e5
3.515e5
3.823e5
4.737e5
3.823e5
7.136e5
4.742e5
9.112e5
8.590e5
7.615e5
avgsdsd
2.311e4
7.581e3
3.284e3
1.823e4
2.319e4
9.831e3
2.388e4
1.438e4
1.413e4
2.759e4
4.161e4
1.415e4
2.213e5
4.319e4
4.711e4
1.040e5
1.292e5
1.006e5
1.312e5
1.634e5
8.480e5
2.364e5
1.875e5
1.623e5
Kruskal+MC:
10 nodes:
CBEFDHIGJKAL
25 nodes:
CBFEGDIHLAJK
50 nodes:
CBDEGFIHLAKJ
Proposed approach:
10 nodes:
CB
25 nodes:
CB
50 nodes:
CB
EFHIGDJKAL
FGIEHDLAJK
IGFDHELAKJ
encoding schemes for evolutionary algorithms. In Proc.
IEEE International Conference on Systems Man and
Cybernetics, Vancouver, Canada, 2007.
[2] M. R. Chernick. Bootstrap Methods, A practitioner’s guide,
volume Wiley Series in Probability and Statistics. John
Wiley and Sons, 1999.
[3] W. J. Connover. Practical Nonparametric Statistics. Wiley,
3rd edition, 1999.
[4] B. G. W. Craenen, A. E. Eiben, and J. I. van Hemert.
Comparing evolutionary algorithms on binary constraint
satisfaction problems. IEEE Transactions on Evolutionary
Computation, 7:424–444, 2003.
[5] P. Dutta and D. DuttalMajumder. Performance comparison of
two evolutionary schemes. In Proc. International Conference
on Pattern Recognition, pages 659–663, Viena, Austria,
1996.
[6] B. Efron. Bootstrap methods: Another look at the jackknife.
The Annals of Statistics, 7:1–26, 1979.
[7] B. Efron. Nonparametric estimates of standard error: The
jackknife, the bootstrap and other methods. Biometrika,
68:589–599, 1981.
[8] B. Efron. The jackknife, the bootstrap, and other resampling
plans. Technical report, Society of Industrial and Applied
Mathematics CBMSNSF Monographs, 1982.
[9] M. Ehrgott. Multicriteria Optimization. Springer, 2000.
[10] K. Fukunaga. Introduction to Statistical Pattern Recognition.
Elsevier, 2 edition, 1990.
[11] R. C. Gonzalez and R. E. Woods. Digital image processing.
Addison Wesley, 2 edition, 1992.
[12] T. C. Hu. Optimum communication spanning trees. SIAM
Journal of Computing, 3:188–195, 1974.
[13] E. Oja. Neural networks, principal component, and
subspaces. International Journal of Neural Systems,
1:61–68, 1989.
[14] C. H. Papadimitriou and M. Yannakakis. Optimization,
approximation, and complexity classes. Journal of Computer
System Science, 43:425–440, 1991.
[15] A. Papoulis. Probability, Random Variables and Stochastic
Processes. McGraw Hill, 3rd edition, 1991.
[16] S. Pemmaraju and S. Skiena. Implementing Discrete
Mathematics: Combinatorics and Graph Theory with
Mathematica. Cambridge University Press, 2003.
[17] S. Soak, D. W. Corne, and B. Ahn. The
edgewindowdecoder representation for treebased
problems. IEEE Transactions on Evolutionary Computation,
10(124–144), 2006.
[18] R. H. C. Takahashi, J. A. Vasconcelos, J. A. Ramirez, and
L. Krahenbuhl. A multiobjective methodology for evaluating
genetic operators. IEEE Transactions on Magnetics,
39:1321–1324, 2003.
[19] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G.
Fonseca. Performance assessment of multiobjective
optimizers: An analysis and review. IEEE Transactions on
Evolutionary Computation, 7:117–132, 2003.
903