Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This chapter shows how Estimation of Distribution Algorithms (EDAs) can benefit from data clustering in order to optimize both discrete and continuous multimodal functions. To be exact, the advantage of incorporating clustering into EDAs is two-fold: to obtain all the best solutions rather than only one of them, and to alleviate the difficulties that affect many evolutionary algorithms when more than one global optimum exists. We propose the use of Bayesian networks and conditional Gaussian networks to perform such a data clustering when EDAs are applied to optimization in discrete and continuous multimodal domains, respectively. The dynamics and performance of our approach are shown by evaluating it on a number of symmetrical functions, some of them highly multimodal.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Therefore, the use of more complex models that do not assume a Gaussian density and do not assume linear dependencies between variables may allow the exploration of subspaces of the landscape without forcing the algorithm to choose one of the two areas of the space and thus avoid falling into local optima solutions. This idea of independently exploring different areas of the landscape was previously developed by Pelikan and Goldberg [51] by restricting the search space in subareas using K-means clustering and exploring them with EDAs, as well as by Peña, Lozano and Larrañaga [52], who used mixtures of Gaussians to model the population in each iteration. In this paper, we present SPEDA as a generalization of the previous approaches where the use of mixtures is extended to the use of KDEs, but only in those variables that do not fit a Gaussian. ...
Article
Full-text available
Traditional estimation of distribution algorithms (EDAs) often use Gaussian densities to optimize continuous functions, such as the estimation of Gaussian network algorithms (EGNAs) which use Gaussian Bayesian networks (GBNs). However, this assumes a parametric density function, and, in GBNs, linear dependencies between variables. Furthermore, the EGNA baseline learns a GBN at each iteration based on the best individuals in the last iteration, which may lead to local optimum convergence or large variance between solutions across multiple independent runs of the algorithm. In this work we propose a semiparametric EDA in which the restriction of assuming Gaussianity in the variables is relaxed using semiparametric Bayesian networks, in which nodes estimated by kernels coexist with nodes that assume Gaussianity, and the algorithm itself is able to determine where to use each type of node. Additionally, our approach takes into account information from several past iterations to learn the semiparametric Bayesian network from which the new solutions are sampled in each iteration. The empirical results show that semiparametric EDAs are a useful tool for continuous scenarios compared to different kinds of EDAs and other optimization techniques in continuous environments.
... Específicamente se presenta el Algoritmo Evolutivo con Factorización de Mezclas de Árboles (Mixture Tree Factorized of Distribution Algorithm (MT-FDA)). En [105] los autores emplean las mezclas de distribuciones para realizar agrupamiento de datos y así resolver problemas de optimización de funciones multimodales discretas y continuas. En [111] se extienden los resultados de [116,117] proponiendo un algoritmo EDA basado en mezclas de distribuciones con aproximaciones de Kikuchi [138,139]. ...
... This approach is, therefore, computationally heavy and the more complex model is, the higher the necessary computational time to find the best solution. There are EDAs with mixture distributions: the Estimation of Mixtures of Distributions Algorithm (EMDA) [48] , the mixed Iterated Density Estimation Evolutionary Algorithm (mIDEA) [8] , and the Online Gaussian Mixture Model Algorithm (EDAOGMM) [18] . ...
Article
Estimation of Distribution Algorithms (EDAs) constitute a class of evolutionary algorithms that can extract and exploit knowledge acquired throughout the optimization process. The most critical step in the EDAs is the estimation of the joint probability distribution associated to the variables from the most promising solutions determined by the evaluation function. Recently, a new approach to EDAs has been developed, based on copula theory, to improve the estimation of the joint probability distribution function. However, most copula-based EDAs still present two major drawbacks: focus on copulas with constant parameters, and premature convergence. This paper presents a new copula-based estimation of distribution algorithm for numerical optimization problems, named EDA based on Multivariate Elliptical Copulas (EDA-MEC). This model uses multivariate copulas to estimate the probability distribution for generating a population of individuals. The EDA-MEC differs from other copula-based EDAs in several aspects: the copula parameter is dynamically estimated, using dependence measures; it uses a variation of the learned probability distribution to generate individuals that help to avoid premature convergence; and uses a heuristic to reinitialize the population as an additional technique to preserve the diversity of solutions. The paper shows, by means of a set of parametric tests, that this approach improves the overall performance of the optimization process when compared with other copula-based EDAs and with other efficient heuristics such as the Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
... Meiguins et al. [37] propose the use of EDAs for the automatic generation of density-based clustering algorithms, in an approach that makes use of an EDA as a hyper-heuristic to optimize macro-parameters of a density-based clustering strategy. Finally, note that several papers propose clustering strategies to enhance EDAs [17], [38]–[40], though not EDAs to generate clustering partitions, which is the case of Clus- EDA. To the best of our knowledge, this paper presents the first EDA that generates data partitions following the binary medoid-based approach. ...
... After multivariate interactions, mixture distributions have been used by several researchers. The mixture distributions can be obtained by clustering [38] or by using the expectation maximization algorithm [42,43]. By using mixture distributions, a powerful, yet computationally tractable probability distribution is incorporated in EAs that is able to process complicated non-linear interactions between the problem variables. ...
Article
Stochastic optimization by learning and using probabilistic models has received an increasing amount of attention over the last few years. Algorithms within this field estimate the probability distribution of a selection of the available solutions and subsequently draw more samples from the estimated probability distribution. The resulting algorithms have displayed a good performance on a wide variety of single-objective optimization problems, both for binary as well as for real-valued variables. Mixture distributions offer a powerful tool for modeling complicated dependencies between the problem variables. Moreover, they allow for elegant and parallel exploration of a multi-objective front. This parallel exploration aids the important preservation of diversity in multi-objective optimization. In this paper, we propose a new algorithm for evolutionary multi-objective optimization by learning and using probabilistic mixture distributions. We name this algorithm Multi-objective Mixture-based Iterated Density Estimation Evolutionary Algorithm (IDA). To further improve and maintain the diversity that is obtained by the mixture distribution, we use a specialized diversity preserving selection operator. We verify the effectiveness of our approach in two different problem domains and compare it with two other well-known efficient multi-objective evolutionary algorithms.
... There exist at least two other implementations using BNs to solve the graph bipartitioning problem [28,29]. But both papers are proof of concepts only. ...
Article
We present a theory of population based optimization methods using approximations of search distributions. We prove convergence of the search distribution to the global optima for the factorized distribution algorithm (FDA) if the search distribution is a Boltzmann distribution and the size of the population is large enough. Convergence is defined in a strong sense––the global optima are attractors of a dynamical system describing mathematically the algorithm. We investigate an adaptive annealing schedule and show its similarity to truncation selection. The inverse temperature β is changed inversely proportionally to the standard deviation of the population. We extend FDA by using a Bayesian hyper-parameter. The hyper-parameter is related to mutation in evolutionary algorithms. We derive an upper bound on the hyper-parameter to ensure that FDA still generates the optima with high probability. We discuss the relation of the FDA approach to methods used in statistical physics to approximate a Boltzmann distribution and to belief propagation in probabilistic reasoning. In the last part are sparsely connected. Our empirical results are as good or even better than any other method used for this problem.
Article
Identifying the number of niches in multimodal optimization is vital to enhancement of efficiency of algorithms. This paper presents a genetic algorithm (GA)-based clustering method for multiple optimal determinations. The approach uses self-organizing map (SOM) neural networks to detect clusters in GA population. After clustering all population and recognizing the number of niches, the phenotypic space is partitioned. Within each partition, a simple GA is independently running to evolve to the actual optima. Before the SOM starts, we allow GA to run several generations until the borders of clusters are identified. Our proposed algorithm is easy to implement, and does not require any prior knowledge about the fitness function. The algorithm was tested for seven multimodal functions and four constrained engineering optimization functions, and the results have been compared with the other related algorithms based on three performance criteria. We found that the present algorithm has acceptable diversification and function evaluation number.
Chapter
In this chapter, we review the Estimation of Distribution Algorithms proposed for the solution of combinatorial optimization problems and optimization in continuous domains. Different approaches for Estimation of Distribution Algorithms have been ordered by the complexity of the interrelations that they are able to express. These will be introduced using one unified notation.
Article
Full-text available
Estimation of Distribution Algorithms (EDAs) are a new tool for Evolutionary Computation. Based on Genetic Algorithms (GAs) this new class of algorithms generalizes GAs by replacing the crossover and mutation operators by learning and sampling the probability distribution of the best individuals of the population at each iteration. In this paper we present an introduction to EDAs in the field of combinatorial optimization. The algorithms are organised taking the complexity of the probabilistic model used into account. We also provide some points to the literature.
Article
Full-text available
This paper describes an aggregation pheromone system (APS), which is an extension of ACO for continuous domains, using the collective behavior of individuals that communicate using aggregation pheromones. APS is tested on several test functions. Results show APS could solve real-parameter optimization problems fairly well. The sensitivity of control parameters of APS is also studied.
Chapter
Full-text available
This chapter focuses on the parallelization of Estimation of Distribution Algorithms (EDAs). More specifically, it presents guidelines for designing efficient parallel EDAs that employ parallel fitness evaluation and parallel model building. Scalability analysis techniques are employed to identify and parallelize the main performance bottlenecks to ensure that the achieved speedup grows almost linearly with the number of utilized processors. The proposed approach is demonstrated on the parallel Mixed Bayesian Optimization Algorithm (MBOA). We determine the time complexity of parallel MBOA and compare this complexity with experimental results obtained on a set of random instances of the spin glass optimization problem. The empirical results fit well the theoretical time complexity, so the scalability and efficiency of parallel MBOA for unknown spin glass instances can be predicted.
Article
Full-text available
Estimation of Distribution Algorithms (EDAs) are a new tool for Evolutionary Computation. Based on Genetic Algorithms (GAs) this new class of algorithms generalizes GAs by replacing the crossover and mutation operators by learning and sampling the probability distribution of the best individuals of the population at each iteration. In this paper we present an introduction to EDAs in the field of combinatorial optimization. The algorithms are organised taking the complexity of the probabilistic model used into account. We also provide some points to the literature.
Article
Full-text available
Inference of statistical models and discovery of patterns in random data sets are problems common to many fields of investigation. In particular, in the observation and control of processes where the physical mechanisms are too complex or not well understood to provide a model structure a priori, the choice of model structure and model size becomes a key element in the analysis. This paper describes an unsupervised technique for the ranking and model structures and choice of model size based on the expression (-log likelihood + model size (in bits) 1. This criterion is shown to be equivalent to seeking a parsimonious representation for data, and its derivation is motivated through a Bayesian argument. Limiting properties of the criterion and applications to number of clusters, dimension of a linear predictor, degree of polynomial approximation, or order of a Markov chain are discussed.
Article
Full-text available
We define and investigate classes of statistical models for the analysis of associations between variables, some of which are qualitative and some quantitative. In the cases where only one kind of variables is present, the models are well-known models for either contingency tables or covariance structures. We characterize the subclass of decomposable models where the statistical theory is especially simple. All models can be represented by a graph with one vertex for each variable. The vertices are possibly connected with arrows or lines corresponding to directional or symmetric associations being present. Pairs of vertices that are not connected are conditionally independent given some of the remaining variables according to specific rules.
Conference Paper
Full-text available
This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors.
Article
Full-text available
This paper introduces and evaluates a new class of knowledge model, the recursive Bayesian multinet (RBMN), which encodes the joint probability distribution of a given database. RBMNs extend Bayesian networks (BNs) as well as partitional clustering systems. Briefly, a RBMN is a decision tree with component BNs at the leaves. A RBMN is learnt using a greedy, heuristic approach akin to that used by many supervised decision tree learners, but where BNs are learnt at leaves using constructive induction. A key idea is to treat expected data as real data. This allows us to complete the database and to take advantage of a closed form for the marginal likelihood of the expected complete data that factorizes into separate marginal likelihoods for each family (a node and its parents). Our approach is evaluated on synthetic and real-world databases.
Article
Full-text available
The Breeder Genetic Algorithm (BGA) was designed according to the theories and methods used in the science of livestock breeding. The prediction of a breeding experiment is based on the response to selection (RS) equation. This equation relates the change in a population's fitness to the standard deviation of its fitness, as well as to the parameters selection intensity and realized heritability. In this paper the exact RS equation is derived for proportionate selection given an infinite population in linkage equilibrium. In linkage equilibrium the genotype frequencies are the product of the univariate marginal frequencies. The equation contains Fisher's fundamental theorem of natural selection as an approximation. The theorem shows that the response is approximately equal to the quotient of a quantity called additive genetic variance, VA, and the average fitness. We compare Mendelian two-parent recombination with gene-pool recombination, which belongs to a special class of genetic algorithms that we call univariate marginal distribution (UMD) algorithms. UMD algorithms keep the genotypes in linkage equilibrium. For UMD algorithms, an exact RS equation is proven that can be used for long-term prediction. Empirical and theoretical evidence is provided that indicates that Mendelian two-parent recombination is also mainly exploiting the additive genetic variance. We compute an exact RS equation for binary tournament selection. It shows that the two classical methods for estimating realized heritability--the regression heritability and the heritability in the narrow sense--may give poor estimates. Furthermore, realized heritability for binary tournament selection can be very different from that of proportionate selection. The paper ends with a short survey about methods that extend standard genetic algorithms and UMD algorithms by detecting interacting variables in nonlinear fitness functions and using this information to sample new points.
Article
Full-text available
Population-Based Incremental Learning (PBIL) is an abstraction of a genetic algorithm, which solves optimization problems by explicitly constructing a probabilistic model of the promising regions of the search space. At each iteration the model is used to generate a population of candidate solutions and is itself modified in response to these solutions. Through the extension of PBIL to Real-valued search spaces, a more powerful and general algorithmic framework arises which enables the use of arbitrary probability density estimation techniques in evolutionary optimization. To illustrate the usefulness of the framework, we propose and implement an evolutionary algorithm which uses a finite Adaptive Gaussian mixture model density estimator. This method offers considerable power and flexibility in the forms of the density which can be effectively modeled. We discuss the general applicability of the framework, and suggest that future work should lead to the developmen...
Article
Full-text available
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state of the art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we examine and evaluate approaches for inducing classifiers from data, based on recent results in the theory of learning Bayesian networks. Bayesian networks are factored representations of probability distributions that generalize the naive Bayes classifier and explicitly represent statements about independence. Among these approacheswe single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness which are characteristic of naive Bayes. We experimentally tested these approaches using benchma...
Article
This paper shows how the Gaussian network paradigm can be used to solve optimization problems in continuous domains. Some methods of structure learning from data and simulation of Gaussian networks are applied in the Estimation of Distribution Algorithm (EDA) as well as new methods based on information theory are proposed. Experimental results are also presented. 1 Estimation of Distribution Algorithms approaches in continuous domains Figure 1 shows a schematic of the EDA approach for continuous domains. We will use x = (x 1 ; : : : ; xn ) to denote individuals, and D l to denote the population of N individuals in the l-th generation. Similarly, D Se l will represent the population of the selected Se individuals from D l . In the EDA [9] our interest will be to estimate f(x j D Se ), that is, the joint probability density function over one individual x being among the selected individuals. We denote as f l (x) = f l (x j D Se l 1 ) the joint density of the l-th genera...
Conference Paper
The presence of symmetry in the representation of an optimization problem can cause the search algorithm to get stuck before reaching the global optimum. The traveling salesman problem and the graph coloring problem are some well-known NP-complete optimization problems containing symmetry in their usual representation. This paper describes a class of symmetry, called spin-flip symmetry, which one finds in functions like the one-dimensional nearest neighbor interaction functions. Spin-flip symmetry indicates that bit-complementary strings have the same function value. This notion can be generalized to substrings, called spin-flip blocks, in a canonical way. We distinguish two specific cases of spin-flip symmetry and introduce a spin-flip detection algorithm. The performance of the algorithm strongly depends on the initial sample the algorithm uses to detect the spin-flip blocks. The algorithm is designed to detect spin-flip symmetry in the whole search space, as well as in its hyperplanes. The difficulty with detection in hyperplanes is related to the detection of the correct hyperplane, which can be time consuming. Once the desired hyperplane is found, the spin-flip block can be detected using the normal detection procedure. The one-max problem shows that spin-flip symmetry in other types of subspaces, like the subspace in which the number of 1 s equals the number of Os, is not detected. For these kinds of subspaces the method needs to be extended.
Article
A scheme is presented for modeling and local computation of exact probabilities, means, and variances for mixed qualitative and quantitative variables. The models assume that the conditional distribution of the quantitative variables, given the qualitative, is multivariate Gaussian. The computational architecture is set up by forming a tree of belief universes, and the calculations are then performed by local message passing between universes. The asymmetry between the quantitative and qualitative variables sets some additional limitations for the specification and propagation structure. Approximate methods when these are not appropriately fulfilled are sketched. It has earlier been shown how to exploit the local structure in the specification of a discrete probability model for fast and efficient computation, thereby paving the way for exploiting probability-based models as parts of realistic systems for planning and decision support. The purpose of this article is to extend this computational scheme to networks, where some vertices represent entities that are measured on a quantitative and some on a qualitative scale. An extension has the advantage of unifying several known techniques, but allows more flexible and faithful modeling and speeds computation as well. To handle this more general case, the properties of (CG) conditional Gaussian distributions are exploited. A fictitious but simple example is used for illustration throughout the paper, concerned with monitoring emissions from a waste incinerator. From optical measurements of the darkness of the smoke, the concentration of CO2—which are both on a continuous scale—and possible knowledge about qualitative characteristics such as the type of waste burned, one wants to infer about the state of the incinerator and the current emission of heavy metals.
Article
This paper is an experimental ,study on hypegraph ,partitioning using ,the simple genetic algorithm (GA) based on the ,schema ,theorem,and ,the advanced ,algorithms based ,on the ,estimation of distribution ,of promising,solution. Primarily we have implemented,a simple,GA based on the,GaLib library[Gal94] and some hybrid variant included a fast heuristics to speed up the convergence,of the optimization process. Secondly we have,implemented ,the Univariate Marginal ,Distribution algorithm ,(UMDA) and ,the Bivariate Marginal Distribution algorithm (BMDA), both have been published even recently[Pel98] and used a share version of a superior new program,BOA based on the Bayesian Optimization Algorithm [Pel99]. We have also extended the BMDA algorithm,to a new,version with finite alphabet encoding,of chromozomes,and new metric that enables the m-way partitioning graphs. The aim ,of our ,paper is to test the efficiency of new ,approaches ,for discrete combinatorial,problems,represented by hypergraph partitioning. Key words: decomposition, hypergraph partitioning, simple and hybrid GA, estimation of distribution algorithm,
Article
This paper is devoted to the proposal of two classes of compromise conditional Gaussian networks for data clustering as well as to their experimental evaluation and comparison on synthetic and real-world databases. According to the reported results, the models show an ideal trade-off between efficiency and effectiveness, i.e., a balance between the cost of the unsupervised model learning process and the quality of the learnt models. Moreover, the proposed models are very appealing due to their closeness to human intuition and computational advantages for the unsupervised model induction process, while preserving a rich enough modeling power.
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Conference Paper
We use the one-dimensional nearest neighbor interaction functions (NNIs) to show how the presence of symmetry in a fitness function greatly influences the convergence behavior of the simple genetic algorithm (SGA). The effect of symmetry on the SGA supports the statement that it is not the amount of interaction present in a fitness function, measured e.g. by Davidor's epistasis variance and the experimental design techniques introduced by Reeves and Wright, which is important, but the kind of interaction. The NNI functions exhibit a minimal amount of second order interaction, are trivial to optimize de-terministically and yet show a wide range of SGA behavior. They have been extensively studied in statistical physics; results from this field explain the negative effect of symmetry on the convergence behavior of the SGA. This note intends to introduce them to the GA-community.
Article
The purpose of this paper is to present and evaluate a heuristic algorithm for learning Bayesian networks for clustering. Our approach is based upon improving the Naive-Bayes model by means of constructive induction. A key idea in this approach is to treat expected data as real data. This allows us to complete the database and to take advantage of factorable closed forms for the marginal likelihood. In order to get such an advantage, we search for parameter values using the EM algorithm or another alternative approach that we have developed: a hybridization of the Bound and Collapse method and the EM algorithm, which results in a method that exhibits a faster convergence rate and a more effective behaviour than the EM algorithm. Also, we consider the possibility of interleaving runnings of these two methods after each structural change. We evaluate our approach on synthetic and real-world databases.
Article
The application of the Bayesian Structural EM algorithm to learn Bayesian networks (BNs) for clustering implies a search over the space of BN structures alternating between two steps: an optimization of the BN parameters (usually by means of the EM algorithm) and a structural search for model selection. In this paper, we propose to perform the optimization of the BN parameters using an alternative approach to the EM algorithm: the method. We provide experimental results to show that our proposal results in a more effective and efficient version of the Bayesian Structural EM algorithm for learning BNs for clustering.
Article
This paper is devoted to the proposal of two classes of compromise conditional Gaussian networks for data clustering as well as to their experimental evaluation and comparison on synthetic and real-world databases. According to the reported results, the models show an ideal trade-off between efficiency and effectiveness, i.e., a balance between the cost of the unsupervised model learning process and the quality of the learnt models. Moreover, the proposed models are very appealing due to their closeness to human intuition and computational advantages for the unsupervised model induction process, while preserving a rich enough modeling power.
Article
The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.
Article
Thesis (Ph. D.)--University of Michigan, 1975. Includes bibliographical references (leaves 253-256). Photocopy.
Article
A novel genetic algorithm (GA) using minimal representation size cluster (MRSC) analysis is designed and implemented for solving multimodal function optimization problems. The problem of multimodal function optimization is framed within a hypothesize-and-test paradigm using minimal representation size (minimal complexity) for species formation and a GA. A multiple-population GA is developed to identify different species. The number of populations, thus the number of different species, is determined by the minimal representation size criterion. Therefore, the proposed algorithm reveals the unknown structure of the multimodal function when a priori knowledge about the function is unknown. The effectiveness of the algorithm is demonstrated on a number of multimodal test functions. The proposed scheme results in a highly parallel algorithm for finding multiple local minima. In this paper, a path-planning algorithm is also developed based on the MRSC_GA algorithm. The algorithm utilizes MRSC_GA for planning paths for mobile robots, piano-mover problems, and N-link manipulators. The MRSC_GA is used for generating multipaths to provide alternative solutions to the path-planning problem. The generation of alternative solutions is especially important for planning paths in dynamic environments. A novel iterative multiresolution path representation is used as a basis for the GA coding. The effectiveness of the algorithm is demonstrated on a number of two-dimensional path-planning problems.
Conference Paper
The presence or symmetry in the representation of an optimization problem can have a positive or a negative influence on the dynamics of a search algorithm. Symmetry can cause a genetic algorithm or simulated annealing to get stuck in a local optimum, but it can also help an algorithm to find the optima more quickly, as the dual genetic algorithm does in some cases. The first part of the paper describes three common types of symmetry and their effects on evolutionary algorithms. Next to obvious permutations on the string representation leaving the objective value invariant, the interaction structure of the problem can be a source of symmetry. Typical examples can be found in the class of aggregated problems, studied in the second part of the paper. An abstract model for aggregated problems is introduced together with a strategy to overcome their symmetry
Article
This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. 1 INTRODUCTION A fundamental feature of a good model is the ability to uncover and exploit independencies in the data it is presented with. For many commonly used models, such as neural nets and belief networks, the dependency structure encoded in the model is fixed, in the sense that it is not allowed to vary depending on actual values of the variables or with the current case. However, dependency structures that are conditional on values of variables abound in the world around us. Consider for example bitmaps of handwritten digits. They obviously contain many dependencies between pixels; however, the pattern of these dependencies will vary acr...
Conference Paper
This paper introduces clustering as a tool to improve the effects of recombination and incorporate niching in evolutionary algorithms. Instead of processing the entire set of parent solutions, the set is first clustered and the solutions in each of the clusters are processed separately. This alleviates the problem of symmetry which is often a major difficulty of many evolutionary algorithms in combinatorial optimization. Furthermore, it incorporates niching into genetic algorithms and, for the first time, the probabilistic model-building genetic algorithms. The dynamics and performance of the proposed method are illustrated on example problems. 1 Introduction Symmetry is one of the major difficulties of genetic algorithms on combinatorial problems. In spite of that the nature of the problem encourages the use of some kind of recombination, due to the symmetry of the problem the recombination often slows down the convergence by disrupting good solutions. Moreover, the use of ni...
Article
This paper discusses the use of various scoring metrics in the Bayesian optimization algorithm (BOA) which uses Bayesian networks to model promising solutions and generate the new ones. The use of decision graphs in Bayesian networks to improve the performance of the BOA is proposed. To favor simple models, a complexity measure is incorporated into the Bayesian-Dirichlet metric for Bayesian networks with decision graphs. The presented modi cations are compared on a number of interesting problems.
Article
. We use the one-dimensional nearest neighbor interaction functions (NNIs) to show how the presence of symmetry in a fitness function greatly influences the convergence behavior of the simple genetic algorithm (SGA). The effect of symmetry on the SGA supports the statement that it is not the amount of interaction present in a fitness function, measured e.g. by Davidor's epistasis variance and the experimental design techniques introduced by Reeves and Wright, which is important, but the kind of interaction. The NNI functions exhibit a minimal amount of second order interaction, are trivial to optimize deterministically and yet show a wide range of SGA behavior. They have been extensively studied in statistical physics; results from this field explain the negative effect of symmetry on the convergence behavior of the SGA. This note intends to introduce them to the GA-community. Introduction One factor influencing the convergence behavior of the simple genetic algorithm (SG...
Cluster Analysis for Applications Theory refinement in Bayesian networks
  • M R W Anderberg
Learning with Mixtures of Trees. Doctoral Dissertation
  • M Meilá
Learning conditional Gaussian networks for data clustering via edge exclusion tests
  • J M Peña
  • J A Lozano
  • P Larrañaga
  • JM Peña
Geographical Clustering of Cancer Incidence by Means of Bayesian Networks and Conditional Gaussian Networks
  • J M Peña
  • J A Lozano
  • P Larrañaga
  • JM Peña
Detecting Spin-flip Symmetry in Optimization Problems. Theoretical Aspects of Evolutionary Computing
  • C Van Hoyweghen
  • C Hoyweghen van
Theory refinement in Bayesian networks
  • M R Anderberg
  • MR Anderberg