ArticlePDF Available

Abstract

Probabilistic graphical models (PGMs) are used in estimation of dis-tribution algorithms (EDAs) as a model of the search space. Graphical components of PGMs can be also analyzed as networks. In this paper we show that topological measures extracted from these networks capture characteristic information of the optimization problem. The measures can be also used to describe the EDA behavior. Using a simplified pro-tein folding optimization problem, we show that the network information extracted from a set of problem instances can be effectively used to predict characteristics of similar instances.
Technical Report UPM-FI/DIA/2010-3 1
Network measures for re-using problem
information in EDAs
Roberto Santana, Concha Bielza, Pedro Larranaga
Departamento de Inteligencia Artificial, Universidad Polit´ecnica de Madrid
28660 Boadilla del Monte, Madrid, Spain.
roberto.santana@upm.es, pedro.larranaga@fi.upm.es, mcbielza@fi.upm.es
June 30, 2010
Abstract
Probabilistic graphical models (PGMs) are used in estimation of dis-
tribution algorithms (EDAs) as a model of the search space. Graphical
components of PGMs can be also analyzed as networks. In this paper
we show that topological measures extracted from these networks capture
characteristic information of the optimization problem. The measures
can be also used to describe the EDA behavior. Using a simplified pro-
tein folding optimization problem, we show that the network information
extracted from a set of problem instances can be effectively used to predict
characteristics of similar instances.
1 Introduction
EDAs [9] are a class of optimization algorithms based on probabilistic modeling
of the search space. These algorithms represent particular characteristics of
high quality solutions using PGMs. Probabilistic modelling allows EDA to
capture, represent, and use relevant interactions between the problem variables,
increasing the efficiency of the search.
In many cases, the end user is not only interested in the solution of a given
optimization problem, but also in reaching a better problem understanding.
Usually, several runs of the EDA are done and their different outputs are con-
strasted. As a side product, these runs will produce a set of probabilistic models
which store valuable information about the optimization problem. The models
contain clues about the way in which the final solutions have been obtained.
This information can in some cases be transformed into knowledge by the user.
However, inspecting the models to detect characteristic patterns is not an easy
task, being necessary a more automatic way to proceed.
Analysis of PGMs is not only applicable to the understanding of a single
problem instance. Extracting and reusing problem information may have ap-
plication in situations in which an optimization algorithm is expected to solve
Technical Report UPM-FI/DIA/2010-3 2
several instances of the same class of problems. Commonly, these problem in-
stances share some sort of (structural) similarity which would be beneficial to
identify and exploit.
In this paper we treat two different but very related problems. Why type
of information can be automatically extracted from the PGMs and how to em-
ploy this information? We approach these questions by analyzing the graphical
structures of the learned PGMs as networks. Networks produced by EDAs are
mined to extract a set of topological measures, that conveniently processed and
fed to machine learning algorithms, are used to characterize similar optimization
problems.
Automatic procedures for extracting and reusing information in the future of
the solution of similar problems were presented in [6]. Two different approaches
were introduced to extract the problem information. They were applied with
good results to solve similar optimization problems. However, none of these
approaches uses the problem information to infer, predict or characterize at-
tributes of the related instances or the EDA’s behavior for these intances. Using
measures that contain information about the model structure and parameters
[3] can be seen as a possible way to generalize structural modeling. However,
these type of measures have not been yet applied to problem characterization
or instance classification within EDAs.
We argue that the use of network measures computed from graphs represent-
ing problem structural information can serve as a basis for the application of
transfer learning in optimization. Transfer learning [13] studies how the knowl-
edge acquired while solving a given problem can be applied to solve different
but related problems. Our contribution consists of adapting the results from
network theory to the particular case of probabilistic graphical models used in
EDAs and introducing network measures extracted from the PGMs learned by
EDAs as a basis for transfer learning in optimization.
2 Estimation of distribution algorithms
Let Xirepresent a discrete random variable. A possible value of Xiis denoted
xi. Similarly, we use X= (X1,...,Xn) to represent an n-dimensional random
variable and x= (x1,...,xn) to represent one of its possible values. We will
work with positive probability distributions denoted by p(x).
The type of probabilistic model, and the particular class of learning and sam-
pling methods used are EDAs distinguished features. EDAs that use Bayesian
networks (BNs) [4, 12] are among the most efficient algorithms able to represent
higher order interactions. We use the estimation of Bayesian networks algorithm
(EBNA) [4]. A pseudocode of EBNA is shown in Algorithm 1. The algorithm
was implemented in Matlab using the MATEDA-2.0 software [14]. The scoring
metric used by EBNA was the Bayesian metric with uniform priors, and each
node was allowed to have a maximum number of 5 parents.
Technical Report UPM-FI/DIA/2010-3 3
Algorithm 1: EBNA
1Generate an initial population D0of individuals and evaluate them
2t1
3do {
4DSe
t1Select Nindividuals from Dt1using truncation selection
5Using DSe
t1as the data set, apply local search to find one BN
structure that optimizes the scoring metric
6Calculate the parameters of the BN using DSe
t1as the data set
7DtSample Mindividuals from the BN and evaluate them
8}until Stopping criterion is met
3 Analyzing graphs as networks
In recent years, results from graph theory have been developed and integrated
into the modern theory of networks [1, 11]. Statistical network measures, that
unveil the global structure of the network, and local measures, which serve to
identify local patterns in the networks’ topology are both useful tools to uncover
and characterize the patterns of interactions in complex systems.
Most of the graphs used in EDAs (i.e. undirected, directed and weighted
graphs) can be analyzed as networks. We conduct our analysis using the di-
rected acyclic graphs (DAGs) learned in each generation of the EDA. We have
computed several measures that serve to characterize these networks. A detailed
description of these measures is beyond the space constraints of this paper and
can be found in [5, 10, 11, 15, 16]. An account on the network measures used
for our experiments follows.
1. dagdif : Number of different arcs between the DAGs learned at generations
iand i+ 1.
2. Ndensity: Connection density of the network, i.e. the number of connec-
tions present in the network out of all possible (n2n).
3. indegree: For a vertex, number of incoming arcs.
4. outdegree: For a vertex, number of outcoming arcs.
5. betw. conn.: Edge betweenness centrality. It is the fraction of all shortest
paths in the network that traverse a given edge.
6. pair dist.: For a vertex, average distance to the rest of vertices. Discon-
nected vertices are assigned a very high, unattainable, distance value.
7. reachability: For a vertex, average reachability to the rest of vertices. The
reachability value between vertices iand jis 1 if iis reachable from j, 0
otherwise.
Technical Report UPM-FI/DIA/2010-3 4
8. clust. coef.: For a vertex, the clustering coefficient is the fraction of the
existing number of vertex links to the total possible number of neighbor-
neighbor links [16].
9. shortcut prob. The shortcut probability is the fraction of shortcuts in the
graph [15]. Shortcuts are edges which significantly reduce the character-
istic path length.
10. n. motifs, M= 3: Motif frequency for all motifs of size M= 3. A
motif [11] is a connected graph consisting of Mvertices and a set of edges
with connectedness ensured forming a subgraph of a larger network. Its
frequency is the number of times it appears in the network.
11. n. motifs, M= 4: Motif frequency for all motifs of size M= 4.
12. max. modularity: The maximum modularity gives a modularity value cor-
responding to a network module decomposition computed with Newman’s
spectral optimization method, generalized to directed networks [10]. A
module is a densely connected subset of nodes that is only sparsely linked
to the remaining network.
13. vert. participation coef.: The participation coefficient [5] defines how well
distributed the links of a node are between different modules.
In the previous list, network measures 3, 4, 6, 7, and 8 are computed as
the average of the local measures calculated for each vertex. Similarly, network
measure 5 is the average of the measures computed for each edge. The rest of
measures are global. In the learned DAGs, there are 4 different motifs (M= 3)
and 24 different motifs (M= 4). Therefore, the total number of measures
extracted from each graph is 39.
Our objective is to identify some properties in the networks generated by
EDAs that support information about the problem being solved or serve as
descriptors of EDAs behavior. In general, we would like that the analysis of the
networks could serve to compare the difficulty of different problem instances
and to extract problem information. The particular goal is to be able to predict
problem features from the network measures derived from the DAGs learned by
the EDAs.
We will start from a data set of characterized optimization problems. The
problem characterization is given by a set of problem characteristics, e.g. the
number of suboptima. We also have the previously described network measures
computed from the DAGs generated by EBNA for each problem. The network
measures of a subset of the problems are used to predict the characteristics
of the rest of problems. This is a classical supervised classification problem.
Classification accuracy is used as a measure of the informativeness of the used
network descriptors. It also serves to evaluate the potentiality of our approach
to reuse information extracted from the EDAs. We expect that our approach
will enable us to find answers to the following general questions:
Technical Report UPM-FI/DIA/2010-3 5
b b
b
b
b
bc
bc
bcbc
bcbc b
b b
b
b
bcbc
bcbc
bcbc
a) b)
Figure 1: (a): One possible configuration of sequence HH HP H P P P P P H in
the HP functional model. Hydrophobic proteins are represented by black beads
and polar proteins, by white beads. There is one HH interaction (represented
by a dotted line with wide spaces), one HP interaction (represented by a dashed
line) and two P P interactions (represented by dotted lines) contacts. (b): An-
other possible configuration of the same sequence with a different pattern of
interactions.
1. Can we predict the number of local optima the problem has?
2. Is it possible to determine whether the optimum has been found or not?
3. Can we identify the most similar and most different characterized problems
with respect to a given uncharacterized problem?
4 Experiments
As problem benchmark we use a simplified protein model. The HP simplified
protein model [2] is used in bioinformatics to investigate protein folding. In the
HP model, a protein is considered a sequence of hydrophobic (H) and hydrophilic
or polar (P) residues which are located in regular lattice models forming self-
avoided paths. Figure 1 shows the graphical representations of two possible
configurations for sequence HHHPHPPPPPH.
Interactions between neighbor residues (adjacent in the lattice but not con-
nected in the sequence) contribute to the total energy of the HP lattice config-
uration. The energy values associated with the functional HP model [7] contain
both attractive ǫHH =2 and repulsive interactions (ǫP P = 1, ǫHP = 1, and
ǫP H = 1). The HP problem consists of finding the solution (HP chain topo-
logical configuration) that minimizes the total energy. The energy that the
functional model protein associates with the configuration shown in Figure 1a)
is 1 because there is one HH interaction, one HP interaction and two P P
interactions.
An HP protein configuration can be represented as a walk in the lattice
(sequence of moves). In the sequence of moves, the two initial residues arelocated
adjacent in the lattice. Each other residue is located to the left, to the righ,
or forming a line with the previous two residues. For a given HP sequence
and lattice, Xiwill represent the relative move of residue iin relation to the
previous two residues. Taking as a reference the location of the previous two
residues in the lattice, Xitakes values in {0,1,2}. With respect to the location
of the previous two residues, xi= 0 means that residue iis located to left,
Technical Report UPM-FI/DIA/2010-3 6
similarly xi= 1 and xi= 2 respectively mean that residue iwill be located in
line with the previous two residues and to their right. Values for X1and X2
are meaningless, they are arbitrarily set to 0. This codification is called relative
encoding [8]. The representations of configurations in Figure 1 a) and b) are
xi= (0,0,0,2,2,0,0,2,2,0,0) and xj= (0,0,2,2,0,1,0,2,2,0,0), respectively.
Protein folds corresponding to proteins from the same family usually share
common structural patterns. We expect that two similar HP sequences will
have similar optimal lattice configurations. This fact explains the choice of this
problem for the experiments.
4.1 Experimental framework
We use a data set1of 611 functional HP proteins corresponding to different
sequences of 23 residues. These instances have a suitable characteristic. We
know their optimal value, which is reached at a single configuration (disregarding
symmetric representations like that shown in Figure 1 (b)). In addition, we
know the closer suboptimal value and the number of configurations where this
suboptimal value is reached. We use this information as a characterization of
the problem. The optimal values of the 611 instances lie between 26 and
8. 374 instances have a number of suboptima in {1,...,4}and the other 237
instances have a number of suboptima in {193,...,2532}.
To evaluate the EDA behavior and collect the networks, 30 independent
runs of EBNA were run for each HP protein instance. For each instance, we
computed how many times the optimum was found in the 30 experiments, the
average generation at which it has been found and the average fitness of the
best solutions found in all runs. For 310 of the 611 problems the optimum was
found at least once. For 80 instances it was found only once and for 62 it was
found 10 or more times. Most of the times the optimum is found, on average,
between generations 10 and 15.
From each directed network corresponding to the structure (DAG) of the
Bayesian network learned at each generation we compute the network descrip-
tors2introduced in Section 3:
For the HP problem our general questions can be reformulated as follows:
1. Can we predict the number of local optimal for a given HP protein in-
stance?
2. Can we predict whether EBNA has converged to the optimum value with-
out knowing which the value of the optimum actually is?
3. Given a predefined similarity measure between instances, can we distin-
guish between the most similar and most different characterized HP in-
stances to a given uncharacterized instance?
1This set is a subset of an original database introduced in [8].
2To compute them, we use the brain connectivity toolbox, available from
http://sites.google.com/a/brain-connectivity-toolbox.net/bct/metrics
Technical Report UPM-FI/DIA/2010-3 7
1234
0
100
200
300
400
500
600
Motif number
Motifs M=3
Succ=0
Succ>0
Succ>9
0 5 10 15 20 25
0
1000
2000
3000
4000
5000
6000
Motif number
Motifs M=4
Succ=0
Succ>0
Succ>9
Figure 2: Motif frequencies computed from the networks of instances in which
EBNA respectively has succesful rate 0 (blue), higher than 0 (green) and equal
or higher than 9 (red).
We assume that prediction is done based on the networks learned from pre-
vious, characterized problems, and the networks obtained from the current,
uncharacterized problem. Also, notice that the questions stated above address
three distinct types of information about the problems: 1) Information about
the problem characteristics. 2) Information about the algorithm behavior. 3)
Information about the similarity between the problems.
The first problems considered are the determination of the algorithm conver-
gence and the number of suboptima of the problem. For these two classification
problems, we specify two classes. In the first case, classes are: 1A) Instances for
which EBNA did not converge to the optimum in any of the 30 experiments.
1B) The rest of instances. For the second classification problem, classes are: 2A)
Instances with 4 or fewer suboptima. 2B) The rest of instances, i.e. those with
193 or more suboptima. To get some clues about possible characteristic pat-
terns associated to each of the classes, we computed and analyzed the average
network descriptors from networks in each of the classes.
Figure 2 shows the motif frequencies for problems in classes 1A and 1B.
In addition, we display information for a subset of instances of class 1B. This
subset is comprised of instances where the EDA converged in 9 or more times
from the 30 experiments. An initial observation is that the frequencies of all
motif classes get higher for problems for which the EDA converges more often.
A similar pattern is appreciated for the problem of classifying the number of
suboptima (classes 2A and 2B) of the instance (data not shown), in which
instances with a lower number of suboptima produce networks with a higher
frequency of all types of motifs.
Technical Report UPM-FI/DIA/2010-3 8
4.2 Numerical results
To evaluate predictors of the problem characteristics, we use a multivariate
Gaussian classifier in which the conditional density of a solution given the class
Aiis computed as
p(z|Ai) = (2π)
n
2|ΣAi|
1
2e
1
2(zµAi)tΣAi
1(zµAi)(1)
where Zi {f1, . . . , fm}, i.e. Zis a subset of components taken from the com-
plete set of m= 39 features of the problem (network topological descriptors).
Aidenotes a class of those described in the previous section, and µAiand ΣAi
are the parameters of a multivariate Gaussian distribution estimated from the
points in class Ai. In the simplest case |Z|= 1, i.e. only one network descriptor
is used as predictor. In this case, equation (1) only involves univariate Gaussian
distributions.
For a given set of features, we estimate the classifier accuracy using k-fold
cross-validation with k= 5. The parameters of the multivariate Gaussians are
learned using maximum likelihood estimation. To assign the classes, we use
p(Ai|z)p(Ai,z) = p(z|Ai)p(Ai) and assume all classes are a priori equiprob-
able. Therefore the assigned class is the one with highest p(z|Ai). The k-fold
cross-validation procedure was repeated 50 times and from these experiments
we computed the mean and standard deviation of the classifier accuracy.
For the first two classification problems, we independently computed the
predicted accuracy given by each of the features. These results are shown in
Table 1. For the sets of network motifs (M= 3 and M= 4), we only include
in the table the accuracy corresponding to the network motif with the highest
accuracy. It can be seen that the best accuracy is achieved by the betweenness
connectivity in the first problem, and by the clustering coefficient in the second
problem. Accuracies are higher for the second problem than for the first. It
seems easier here to predict whether the problem has few or many suboptima
than determining if the algorithm has converged to the optimum.
In order to improve the classification accuracy we consider interactions be-
tween the predictors. In this case, we search for a set of features that maximizes
the classification accuracy. This feature subset selection problem, with 39 vari-
ables, is addressed using an EDA as implemented with MATEDA [14]. Only one
run of the EDA was used to compute the best set of features, therefore solutions
are likely to be improvable. The accuracies obtained with the best combination
of features are shown in the last row of Table 1. For both problems, improve-
ments over the best single classifiers were achieved. The classification accuracies
of these sets of predictors are respectively above 70% and 90%.
We have empirically shown that the information learned during the opti-
mization of past problems for which some particular features are known can be
employed to predict characteristics of new problems for which we do not have
the same kind of information.
In the next step, we intend to use the DAGs to distinguish, in a data set of
characterized problems, similar from dissimilar problems (question 3). We use
Technical Report UPM-FI/DIA/2010-3 9
Convergence Suboptima
feature name accuracy std.dev. accuracy std.dev.
1dagdif 0.6023 0.0027 0.7601 0.0020
2Ndensity 0.6635 0.0022 0.8841 0.0014
3indegree 0.6637 0.0025 0.8838 0.0014
4outdegree 0.6621 0.0031 0.8842 0.0018
5betw. conn. 0.6789 0.0025 0.7323 0.0025
6pair dist. 0.6151 0.0023 0.8593 0.0018
7reachability 0.6137 0.0020 0.8581 0.0014
8clust. coef. 0.6597 0.0026 0.8901 0.0017
9shortcut prob. 0.6097 0.0043 0.6065 0.0068
10 : 13 n. motifs, M=3 0.6761 0.0025 0.8796 0.0024
14 : 37 n. motifs, M=4 0.6783 0.0022 0.8772 0.0016
38 max. modularity 0.6748 0.0034 0.7761 0.0020
39 vert. mod. part. 0.6376 0.0032 0.7875 0.0031
Best combination 0.7084 0.0065 0.9132 0.0035
Table 1: Classification accuracy and standard deviation for each single predictor
and best combination for the EDA convergence to the optimum and for the
prediction of the number of suboptima.
two different measures of similarity between instances. 1) The sequence similar-
ity, which is the number of common residues in the two sequences and 2) The
fitness correlation between problems, computed from 10000 random solutions.
To construct the database of cases, we identify, for each of the 611 instances,
the most similar and most different instance in the set. Then for each pair of
instances (i, j), we compute the difference zizjbetween their corresponding
network descriptors and associate the class value 1 if the pair is the most similar,
or 0 if the pair is the most dissimilar. Notice that an instance may have more
than one most similar or dissimilar matches. This is particularly the case for the
sequence similarity measure. However, we select an arbitrary instance among
those being closest (respectively most distant). As a result, for each similarity
measure, there is a database of 611×2 = 1222 cases equally distributed between
the two classes.
We use the same type of classifier and experimental protocols utilized in
the previous classification experiments. Results are shown in Table 2. In this
prediction problem the single classifiers have a more similar performance among
each other. The best individual predictor when sequence similarity measure
is used, is the reachability measure (feature 7). When the fitness correlation
measure is used, the best predictor is the outdegree (feature 4). In general, single
predictors do not provide a high accuracy. However, when interactions between
features are considered, the accuracy in the prediction is much higher for both
problems (an increase of 7% for the first problem and of 15% for the second).
The main conclusion from the experiment is that information extracted from
the networks can be used to distinguish similar and dissimilar pairs of instances.
Technical Report UPM-FI/DIA/2010-3 10
Based on seq. similarity Based on fitness correlation
feature name accur acy std.dev . accuracy std.dev.
1dagdif 0.5639 0.0039 0.5608 0.0052
2Ndensity 0.6516 0.0019 0.6207 0.0035
3indegree 0.6516 0.0021 0.6211 0.0036
4outdegree 0.6514 0.0024 0.6683 0.0017
5betw. conn. 0.6126 0.0031 0.6113 0.0033
6pair dist. 0.6554 0.0021 0.6100 0.0037
7reachability. 0.6558 0.0023 0.6457 0.0026
8clust. coef. 0.6495 0.0026 0.6208 0.0030
9shortcut prob. 0.5959 0.0025 0.6097 0.0039
10 : 13 n. motifs, M=3 0.6469 0.0026 0.6103 0.0030
14 : 37 n. motifs, M=4 0.6435 0.0024 0.6193 0.0027
38 max. modularity 0.6164 0.0028 0.6164 0.0030
39 vert. part. coef. 0.6056 0.0027 0.5822 0.0034
Best combination 0.7271 0.0041 0.8143 0.0043
Table 2: Classification accuracy and standard deviation of each single predictor
and best combination for the prediction of the most similar and dissimilar pairs
of instances.
5 Conclusions and future work
We have introduced a novel approach for re-using information in EDAs. It is
based on the use of network measures computed from networks generated by
EDAs and on the application of machine learning algorithms. We argue that the
use of these measures could serve to devise “intelligent” optimization methods,
able to learn from past experience to recognize and solve related problems.
References
[1] L. A. N. Amaral, A. Scala, M. Barth´el´emy, and H. E. Stanley. Classes
of small-world networks. Proceedings of the National Academy of Sciences
(PNAS), 97(21):11149–11152, 2000.
[2] K. A. Dill. Theory for the folding and stability of globular proteins. Bio-
chemistry, 24(6):1501–1509, 1985.
[3] C. Echegoyen, A. Mendiburu, R. Santana, and J. A. Lozano. A quanti-
tative analysis of estimation of distribution algorithms based on Bayesian
networks. Technical Report EHU-KZAA-IK-3, Department of Computer
Science and Artificial Intelligence, University of the Basque Country, Oc-
tober 2009.
[4] R. Etxeberria and P. Larra˜naga. Global optimization using Bayesian net-
works. In A. Ochoa, M. R. Soto, and R. Santana, editors, Proceedings of the
Second Symposium on Artificial Intelligence (CIMAF-99), pages 151–173,
1999.
[5] R. Guimera and L. A. N. Amaral. Functional cartography of complex
metabolic networks. Nature, 433:895–900, 2005.
Technical Report UPM-FI/DIA/2010-3 11
[6] M. Hauschild, M. Pelikan, K. Sastry, and D. E. Goldberg. Using previous
models to bias structural learning in the hierarchical BOA. MEDAL Report
No. 2008003, Missouri Estimation of Distribution Algorithms Laboratory
(MEDAL), 2008.
[7] J. D. Hirst. The evolutionary landscape of functional model proteins. Pro-
tein Engineering, 12:721–726, 1999.
[8] N. Krasnogor, B. P. Blackburne, E. K. Burke, and J. D. Hirst. Algorithms
for protein structure prediction. In Parallel Problem Solving from Nature
- PPSN VII, volume 2439 of Lecture Notes in Computer Science, pages
769–778. Springer, 2002.
[9] P. Larra˜naga and J. A. Lozano, editors. Estimation of Distribution Al-
gorithms. A New Tool for Evolutionary Computation. Kluwer Academic
Publishers, Boston/Dordrecht/London, 2002.
[10] E. A. Leicht and M. E. J. Newman. Community structure in directed
networks. Physical Review Letters, 100:118703, 2008.
[11] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon.
Network motifs: Simple building blocks of complex networks. Science,
298:824–827, 2002.
[12] M. Pelikan. Hierarchical Bayesian Optimization Algorithm. Toward a New
Generation of Evolutionary Algorithms, volume 170 of Studies in Fuzziness
and Soft Computing. Springer, 2005.
[13] R. Raina, A. Y. Ng, and D. Koller. Constructing informative priors using
transfer learning. In Proceedings of the 23rd International Conference on
Machine Learning ICML-2006, pages 713–720, New York, NY, USA, 2006.
ACM Press.
[14] R. Santana, C. Bielza, P. Larra˜naga, J. A. Lozano, C. Echegoyen,
A. Mendiburu, R. Arma˜nanzas, and S. Shakya. MATEDA: A Matlab
package for the implementation and analysis of estimation of distribution
algorithms. Journal of Statistical Software, 2010.
[15] O. Sporns. Neuroscience Databases. A Practical Guide, chapter Graph
theory methods for the analysis of neural connectivity patterns, pages 171–
186. Kluwer, 2002.
[16] D. J. Watts and S. Strogatz. Collective dynamics of small-world networks.
Nature, 393(6684):440–442, 1998.
... Finally, we investigate whether the probabilistic model is able to capture some structural relationships of the problem. One advantage of EDAs with respect to other optimization algorithms is that the probabilistic models learned during the search can capture, sometimes a priori unknown, information about the problem domain in the form of probabilistic dependencies between the variables [8,47,43]. ...
Technical Report
Full-text available
Spiking network polychronization refers to reproducible time-locked but not synchronous firing patterns with millisecond precision. There are many factors that influence the number of polychronous groups in spiking networks. In this paper we address the question of maximizing the number of polychronous groups by biasing the axonal conduction delay values of the spiking network. An estimation of distribution algorithm (EDA) that learns tree models to search for spiking networks with optimal delay configurations is used. An analysis of the evolved networks show a higher number of polychronous groups with respect to networks with random delay values. We evaluate to what extent the spiking network structure is reflected in the probabilistic models learned by the EDA. Finally, we introduce the informative edge ranking measure r that quantifies how much of the original spiking network structure is captured in the tree models.
... By applying evolution in the two directions of difficulty we can guarantee that the two final sets will differ in terms of complexity with respect to the original set, and more significantly, between them. The type of network measures extracted from the instances can be also applied to the structures of the graphical models for unveiling relevant information from the problem and allowing transfer learning between problem instances [5]. ...
Article
In this paper we empirically investigate the structural characteristics that can help to predict the complexity of NK-landscape instances for estimation of distribution algorithms (EDAs). We evolve instances that maximize the EDA complexity in terms of its success rate. Similarly, instances that minimize the algorithm complexity are evolved. We then identify network measures, computed from the structures of the NK-landscape instances, that have a statistically significant difference between the set of easy and hard instances. The features identified are consistently significant for different values of N and K.
Article
For complex system design (e.g., satellite layout optimization design) in practical engineering, when launching a new optimization instance with another parameter configuration from the intuition of designers, it is always executed from scratch which wastes much time to repeat the similar search process. Inspired by transfer learning which can reuse past experiences to solve relevant tasks, many researchers pay more attention to explore how to learn from past optimization instances to accelerate the target one. In real-world applications, there have been numerous similar source instances stored in the database. The primary question is how to measure the transferability from numerous sources to avoid the notorious negative transferring. To obtain the relatedness between source and target instance, we develop an optimization instance representation method named Centroid Distribution, which is by the aid of the probabilistic model learned by elite candidate solutions in Estimation of Distribution Algorithm (EDA) during the evolutionary process. Wasserstein Distance is employed to evaluate the similarity between the centroid distributions of different optimization instances, based on which, we present a novel framework called Multi-Source Selective Transfer Optimization with three strategies to select sources reasonably. To choose the suitable strategy, four selection suggestions are summarized according to the similarity between the source and target centroid distribution. The framework is beneficial to choose the most suitable sources, which could improve the search efficiency in solving multi-objective optimization problems. To evaluate the effectiveness of the proposed framework and selection suggestions, we conduct two experiments: (1) comprehensive empirical studies on complex multi-objective optimization problem benchmarks; (2) a real-world satellite layout optimization design problem. Suggestions for strategy selection coincide with the experiment results, based on which, we propose a mixed strategy to deal with the negative transfer in the experiments successfully. Results demonstrate that our proposed framework achieves competitive performance on most of the benchmark problems in convergence speed and hypervolume values and performs best on real-world applications among all the comparison algorithms.
Article
Full-text available
This chapter presents an overview of hybridization mechanisms in evolutionary algorithms. Such mechanisms are aimed to introducing problem knowledge in the optimization technique by means of the synergistic combination of general-purpose methods and problemspecific add-ons. This combination is presented in this work from two wide perspectives: memetic algorithms and cooperative optimization models. Memetic algorithms are based on the smart orchestration of global (population-based) and local (trajectorybased) techniques, using an algorithmic scheme in which the latter are often subordinated to the former. As to cooperative models, they are based on the collaboration of different optimization techniques that exchange information in order to boost their respective performances. Both approaches, memetic algorithms and cooperative models, provide a framework to achieve synergistic algorithmic combinations for the resolution of large-scale combinatorial problems.
Conference Paper
In this paper we investigate the question of transfer learning in evolutionary optimization using estimation of distribution algorithms. We propose a framework for transfer learning between related optimization problems by means of structural transfer. Different methods for incrementing or replacing the (possibly unavailable) structural information of the target optimization problem are presented. As a test case we solve the multi-marker tagging single-nucleotide polymorphism (SNP) selection problem, a real world problem from genetics. The introduced variants of structural transfer are validated in the computation of tagging SNPs on a database of 1167 individuals from 58 human populations worldwide. Our experimental results show significant improvements over EDAs that do not incorporate information from related problems.
Article
In this paper we investigate the effect of biasing the axonal connection delay values in the number of polychronous groups produced for a spiking neuron network model. We use an estimation of distribution algorithm (EDA) that learns tree models to search for optimal delay configurations. Our results indicate that the introduced approach can be used to considerably increase the number of such groups.
Article
Full-text available
The successful application of estimation of distribution algorithms (EDAs) to solve different kinds of problems has reinforced their candidature as promising black-box optimization tools. However, their internal behavior is still not completely understood and therefore it is necessary to work in this direction in order to advance their development. This paper presents a new methodology of analysis which provides new information about the behavior of EDAs by quantitatively analyzing the probabilistic models learned during the search. We particularly focus on calculating the probabilities of the optimal solutions, the most probable solution given by the model and the best individual of the population at each step of the algorithm. We carry out the analysis by optimizing functions of different nature such as Trap5, two variants of Ising spin glass and Max-SAT. By using different structures in the probabilistic models, we also analyze the influence of the structural model accuracy in the quantitative behavior of EDAs. In addition, the objective function values of our analyzed key solutions are contrasted with their probability values in order to study the connection between function and probabilistic models. The results not only show information about the EDA behavior, but also about the quality of the optimization process and setup of the parameters, the relationship between the probabilistic model and the fitness function, and even about the problem itself. Furthermore, the results allow us to discover common patterns of behavior in EDAs and propose new ideas in the development of this type of algorithms.
Conference Paper
Full-text available
Despite intensive studies during the last 30 years researchers are yet far from the “holy grail” of blind structure prediction of the three dimensional native state of a protein from its sequence of amino acids. We introduce here a Multimeme Algorithm which is robust across a range of protein structure models and instances. New benchmark sequences for the triangular lattice in the HP model and Functional Model Proteins in two and three dimensions are included here with their known optima. As there is no favourite protein model nor exact energy potentials to describe proteins, robustness accross a range of models must be present in any putative structure prediction algorithm. We demonstrate in this paper that while our algorithm present this feature it remains, in terms of cost, competitive with other techniques.
Article
Full-text available
This paper describes Mateda-2.0, a MATLAB package for estimation of distribution algorithms (EDAs). This package can be used to solve single and multi-objective discrete and continuous optimization problems using EDAs based on undirected and directed probabilistic graphical models. The implementation contains several methods commonly employed by EDAs. It is also conceived as an open package to allow users to incorporate different combinations of selection, learning, sampling, and local search procedures. Additionally, it includes methods to extract, process and visualize the structures learned by the probabilistic models. This way, it can unveil previously unknown information about the optimization problem domain. Mateda-2.0 also incorporates a module for creating and validating function models based on the probabilistic models learned by EDAs.
Conference Paper
Many applications of supervised learning re- quire good generalization from limited la- beled data. In the Bayesian setting, we can try to achieve this goal by using an informa- tive prior over the parameters, one that en- codes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and al- lows parameters to be dependent. The algo- rithm uses other "similar" learning problems to estimate the covariance of pairs of indi- vidual parameters. We then use a semidefi- nite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior.
Article
We consider the problem of finding communities or modules in directed networks. In the past, the most common approach to this problem has been to ignore edge direction and apply methods developed for community discovery in undirected networks, but this approach discards potentially useful information contained in the edge directions. Here we show how the widely used community finding technique of modularity maximization can be generalized in a principled fashion to incorporate information contained in edge directions. We describe an explicit algorithm based on spectral optimization of the modularity and show that it gives demonstrably better results than previous methods on a variety of test networks, both real and computer generated.
Article
Using lattice statistical mechanics, we develop theory to account for the folding of a heteropolymer molecule such as a protein to the globular and soluble state. Folding is assumed to be driven by the association of solvophobic monomers to avoid solvent and opposed by the chain configurational entropy. Theory predicts a phase transition as a function of temperature or solvent character. Molecules that are too short or too long or that have too few solvophobic residues are predicted not to fold. Globular molecules should have a largely solvophobic core, but there is an entropic tendency for some residues to be "out of place", particularly in small molecules. For long chains, molecules comprised of globular domains are predicted to be thermodynamically more stable than spherical molecules. The number of accessible conformations in the globular state is calculated to be an exceedingly small fraction of the number available to the random coil. Previous estimates of this number, which have motivated kinetic theories of folding, err by many tens of orders of magnitude.