ArticlePDF Available

Test Function Generators as Embedded Landscapes

Authors:

Abstract and Figures

NK-landscapes and kSAT problems have been proposed as potential test problem domains for Genetic Algorithms. We demonstrate that GAs have difficulty solving both kSAT and NK-landscape problems. The construction of random kSAT and NK-landscape problems are very similar, but the differences between kSAT and NK-landscape generation result in vastly different fitness landscapes. In this paper we introduce a parameterized model for the construction of test function generators. This model, called embedded landscapes, can be used to isolate the features of combinatorial optimization problems for more control during experimentation. We also show that common forms of embedded landscapes allow for a polynomial time Walsh analysis. This means we also can compose exact schema averages in polynomial time for schema up to order-K, where K is a constant. Yet, in the general case, this information does not allow one to infer the global optimum of a function unless the complexity classes P ...
Content may be subject to copyright.
0.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0 10 20 30 40 50 60 70
MEAN BEST SOLUTION FOUND
K
THREE ALGORITHMS RUN ON NK-LANDSCAPES
CHC
SGA
RBC+
... As a result, for comparative analyses, there has been a recent shift in the GA community towards the use of problem generators for testing evolutionary algorithms. The reasons for this trend are: For these reasons, there is a growing sentiment that test generators for broad classes of problems are the correct approach for testing evolutionary algorithms [Heckendorn, Rana and Whitley, 1998]. In addition, Whitley has argued that the use of test suites should be hypothesis driven. ...
... Problem generators have been developed by several researchers to enable an operator to generate random, non-separable problems of varying degrees of epistasis [De Jong, et al., 1997] [Heckendorn, Rana and Whitley, 1998]. Note that the difficulty of measuring GAhardness is therefore avoided due to the operational perspective taken in designing the problem generator. ...
... The degree of epistasis increases with the number of peaks[De Jong, et al., 1997]. Both NK-Landscape problems and L-SAT problems have been shown to be quite difficult for SGAs generating problems that are very different with respect to their fitness landscapes[Heckendorn, et al., 1998]. It has been observed however, that both generators have the property that the variance in fitness values decreases as the amount of epistasis increases; this has lead to the development of the Multi-modal problem generator which is capable of generating problems with high epistasis without significantly decreasing fitness variance[De Jong, et al., 1997].Assumption 4-6 ForMulti-modal problems, R = 1 (one peak), R = 250 (250 peaks) and R = 500 (500 peaks) correspond to low, medium and high epistasis problems respectively.The degree of epistasis in a Multi-modal problem generator increases with the number of peaks. ...
... Nous fonctions NK (en particulier l'interaction entre certaines variables) tout en exhibant des difficultés spécifiques, dues notamment à une discrétisation moins fine du codomaine de la fonction objectif. Ces deux modèles de fonctions booléennes sont classiquement les plus étudiées en optimisation évolutionnaire lorsqu'il s'agit de construire des paysages de fitness ayant les propriétés souhaitées afin d'analyser le comportement d'algorithmes de recherche [2,3,4]. La suite de cette section présente plus précisément ces deux problèmes étudiés. ...
Thesis
Résoudre un problème d’optimisation consiste à en trouver les meilleures solutions possibles. Pour y parvenir, une approche commune est d’utiliser des algorithmes spécifiques, en général conçus pour des classes de problèmes précises. Cette approche souffre néanmoins de deux désavantages. D’abord à chaque nouveau type de problème, un nouvel algorithme doit souvent être défini, ce qui est un processus long, nécessitant une connaissance des propriétés du problème en question. Ensuite, si ces algorithmes ne sont testés que sur certaines instances du problème, il est possible qu’ils s’avèrent trop spécifiques et donc finalement moins performants sur l’ensemble des instances de la classe. Dans ce travail de thèse, nous explorons la possibilité de générer automatiquement des algorithmes d’optimisation pour un problème donné. Le processus de génération reste suffisamment générique tan dis que les algorithmes ainsi produits peuvent être très spécifiques afin d’être les plus efficaces possibles. Plus précisément, nous faisons évoluer de simples algorithmes de recherche par voisinage via les fonctions d’évaluation qu’ils utilisent pour explorer l’espace des solutions du problème. Le processus évolutionnaire permet implicitement d’adapter le paysage de recherche à la stratégie de résolution basique, tout en conservant une cohérence avec la fonction objectif initiale du problème à résoudre. Ce processus de génération est testé sur deux classes de problèmes dont les difficultés sont très différentes, et obtient des résultats encourageants. Cette expérimentation est complétée par une analyse du processus de génération et des algorithmes ainsi générés.
... The proof uses the Kushilevitz-Mansour algorithm [19] which is an application of discrete Fourier (or Walsh) analysis to the approximation of Boolean functions. Fourier analysis has been extensively used in the analysis of genetic algorithms [8] or fitness landscapes [27,12,11]. It also plays an important role in the construction of a search matrix in the analysis [3] of the coin weighing problem [7] which is related to OneMax. ...
Preprint
A new class of test functions for black box optimization is introduced. Affine OneMax (AOM) functions are defined as compositions of OneMax and invertible affine maps on bit vectors. The black box complexity of the class is upper bounded by a polynomial of large degree in the dimension. The proof relies on discrete Fourier analysis and the Kushilevitz-Mansour algorithm. Tunable complexity is achieved by expressing invertible linear maps as finite products of transvections. The black box complexity of sub-classes of AOM functions is studied. Finally, experimental results are given to illustrate the performance of search algorithms on AOM functions.
... The element of x at position i and K other elements are equal to one, i.e., they influence subfunction f i . The NK model is considered hard for genetic algorithms with traditional recombination operators [14]. Traditional recombination operators do not explore the interaction between decision variables to define which elements should be inherited from one or another parent. ...
... In this paper we consider pseudo-Boolean vector functions with k-bounded epistasis, where the component functions are embedded landscapes [7] or Mk Landscapes [15]. We will extend the concept of Mk Landscapes to the multi-objective domain and, thus, we will base our nomenclature in that of Whitley [15]. ...
Conference Paper
Local search algorithms and iterated local search algorithms are a basic technique. Local search can be a stand-alone search method, but it can also be hybridized with evolutionary algorithms. Recently, it has been shown that it is possible to identify improving moves in Hamming neighborhoods for k-bounded pseudo-Boolean optimization problems in constant time. This means that local search does not need to enumerate neighborhoods to find improving moves. It also means that evolutionary algorithms do not need to use random mutation as a operator, except perhaps as a way to escape local optima. In this paper, we show how improving moves can be identified in constant time for multiobjective problems that are expressed as k-bounded pseudo-Boolean functions. In particular, multiobjective forms of NK Landscapes and Mk Landscapes are considered.
... In this paper we consider pseudo-Boolean vector functions with k-bounded epistasis, where the component functions are embedded landscapes [7] or Mk Land-scapes [15]. We will extend the concept of Mk Landscapes to the multi-objective domain and, thus, we will base our nomenclature in that of Whitley [15]. ...
Article
Full-text available
Local search algorithms and iterated local search algorithms are a basic technique. Local search can be a stand along search methods, but it can also be hybridized with evolutionary algorithms. Recently, it has been shown that it is possible to identify improving moves in Hamming neighborhoods for k-bounded pseudo-Boolean optimization problems in constant time. This means that local search does not need to enumerate neighborhoods to find improving moves. It also means that evolutionary algorithms do not need to use random mutation as a operator, except perhaps as a way to escape local optima. In this paper, we show how improving moves can be identified in constant time for multiobjective problems that are expressed as k-bounded pseudo-Boolean functions. In particular, multiobjective forms of NK Landscapes and Mk Landscapes are considered.
... Random NK Landscapes are considered to be difficult optimization problems for genetic algorithms (GAs) with standard recombination and mutation operators [12,18,1]. One problem is that crossover operators generally do not explore the linkage between the loci in the NK model. ...
Conference Paper
Full-text available
A partition crossover operator is introduced for use with NK landscapes, MAX-kSAT and for all k-bounded pseudo-Boolean functions. By definition, these problems use a bit representation. Under partition crossover, the evaluation of offspring can be directly obtained from partial evaluations of substrings found in the parents. Partition crossover explores the variable interaction graph of the pseudo-Boolean functions in order to partition the variables of the solution vector. Proofs are presented showing that if the differing variable assignments found in the two parents can be partitioned into q non-interacting sets, partition crossover can be used to find the best of 2q possible offspring. Proofs are presented which show that parents that are locally optimal will always generate offspring that are locally optimal with respect to a (more restricted) hyperplane subspace. Empirical experiments show that parents that are locally optimal generate offspring that are locally optimal in the full search space more than 80 percent of the time. Experimental results also show the effectiveness of the proposed crossover when used in combination with a hybrid genetic algorithm.
Chapter
This chapter introduces evolutionary computation/genetic algorithms starting at a high level. It uses the schema sampling theorem to provide an intuitive understanding for how evolution, operating on a population of chromosomes (symbol strings), will produce offspring that contain variants of the symbol patterns in the more fit parents each generation, and shows how the recombination operators will be biased for and against some patterns. The No Free Lunch (NFL) theorem of Wolpert and Macready for optimization search algorithms has shown that over the space of all possible problems, there can be no universally superior algorithm. Hence, it is incumbent on any algorithm to attempt to identify the domain of problems for which it is effective and try to identify its strengths and limitations. In the next section, we introduce Eshelman’s CHC genetic algorithm and recombination operators that have been developed for bit string and integer chromosomes. After showing its strengths particularly in dealing with some of the challenges for traditional genetic algorithms, its limitations are also shown. The final section takes up the application of CHC to subset selection problems, a domain of considerable utility for many machine learning applications. We present a series of empirical tests that lead us to the index chromosome representation and the match and mix set-subset size (MMX_SSS) recombination operator that seem well suited for this domain. Variants are shown for when the size of the desired subset is known and when it is not known. We apply this algorithm in later chapters to the feature subset selection problem that is key to our application of developing a speech-based diagnostic test for dementia.
Chapter
This chapter first reviews the simple genetic algorithm. Mathematical models of the genetic algorithm are also reviewed, including the schema theorem, exact infinite population models, and exact Markov models for finite populations. The use of bit representations, including Gray encodings and binary encodings, is discussed. Selection, including roulette wheel selection, rank-based selection, and tournament selection, is also described. This chapter then reviews other forms of genetic algorithms, including the steady-state Genitor algorithm and the CHC (cross-generational elitist selection, heterogenous recombination, and cataclysmic mutation) algorithm. Finally, landscape structures that can cause genetic algorithms to fail are looked at, and an application of genetic algorithms in the domain of resource scheduling, where genetic algorithms have been highly successful, is also presented.
Article
Full-text available
A new model of fitness landscapes suitable for the consideration of evolutionary and other search algorithms is developed and its consequences are investigated. Answers to the questions "What is a landscape?" "Are landscapes useful?" and "What makes a landscape difficult to search?" are provided. The model makes it possible to construct landscapes for algorithms that employ multiple operators, including operators that act on or produce multiple individuals. It also incorporates operator transition probabilities. The consequences of adopting the model include a "one operator, one landscape" view of algorithms that search with multiple operators. An investigation into crossover landscapes and hillclimbing algorithms on them illustrates the dual role played by crossover in genetic algorithms. This leads to the "headless chicken" test for the usefulness of crossover to a given genetic algorithm and to serious questions about the usefulness of maintaining a population. A "reverse hillclimbing" algorithms is presented that allows the determination of details of the basin of attraction points on a landscape. These details can be used to directly compare members of a class of hillclimbing algorithms and to accurately predict how long a particular hillclimber will take to discover a given point. A connection between evolutionary algorithms and the heuristic search algorithms of Artificial Intelligence and Operations Research is established. One aspect of this correspondence is investigated in detail: the relationship between fitness functions and heuristic functions. By considering how closely fitness functions approximate the ideal heuristic functions, a measure of search difficult is obtained. The measure, fitness distance correlation, is a remarkably reliableble indicator of problem difficulty for a genetic algorithm on many problems taken from the genetic algorithms literature, even though the measure incorporates no knowledge of the operation of a genetic algorithm. This leads to one answer to the question "What makes a problem hard (or easy) for a genetic algorithm?" The answer is perfectly in keeping with what has been well known in Artificial Intelligence for over thirty years.
Article
The hope that mathematical methods employed in the investigation of formal logic would lead to purely computational methods for obtaining mathematical theorems goes back to Leibniz and has been revived by Peano around the turn of the century and by Hilbert's school in the 1920's. Hilbert, noting that all of classical mathematics could be formalized within quantification theory, declared that the problem of finding an algorithm for determining whether or not a given formula of quantification theory is valid was the central problem of mathematical logic. And indeed, at one time it seemed as if investigations of this “decision” problem were on the verge of success. However, it was shown by Church and by Turing that such an algorithm can not exist. This result led to considerable pessimism regarding the possibility of using modern digital computers in deciding significant mathematical questions. However, recently there has been a revival of interest in the whole question. Specifically, it has been realized that while no decision procedure exists for quantification theory there are many proof procedures available—that is, uniform procedures which will ultimately locate a proof for any formula of quantification theory which is valid but which will usually involve seeking “forever” in the case of a formula which is not valid—and that some of these proof procedures could well turn out to be feasible for use with modern computing machinery. Hao Wang [9] and P. C. Gilmore [3] have each produced working programs which employ proof procedures in quantification theory. Gilmore's program employs a form of a basic theorem of mathematical logic due to Herbrand, and Wang's makes use of a formulation of quantification theory related to those studied by Gentzen. However, both programs encounter decisive difficulties with any but the simplest formulas of quantification theory, in connection with methods of doing propositional calculus. Wang's program, because of its use of Gentzen-like methods, involves exponentiation on the total number of truth-functional connectives, whereas Gilmore's program, using normal forms, involves exponentiation on the number of clauses present. Both methods are superior in many cases to truth table methods which involve exponentiation on the total number of variables present, and represent important initial contributions, but both run into difficulty with some fairly simple examples. In the present paper, a uniform proof procedure for quantification theory is given which is feasible for use with some rather complicated formulas and which does not ordinarily lead to exponentiation. The superiority of the present procedure over those previously available is indicated in part by the fact that a formula on which Gilmore's routine for the IBM 704 causes the machine to computer for 21 minutes without obtaining a result was worked successfully by hand computation using the present method in 30 minutes. Cf. §6, below. It should be mentioned that, before it can be hoped to employ proof procedures for quantification theory in obtaining proofs of theorems belonging to “genuine” mathematics, finite axiomatizations, which are “short,” must be obtained for various branches of mathematics. This last question will not be pursued further here; cf., however, Davis and Putnam [2], where one solution to this problem is given for ele
Article
http://deepblue.lib.umich.edu/bitstream/2027.42/3572/5/bab2694.0001.001.pdf http://deepblue.lib.umich.edu/bitstream/2027.42/3572/4/bab2694.0001.001.txt
Article
The concept of a "fitness landscape," a picturesque term for a mapping of the vertices of a finite graph to the real numbers, has arisen in several fields, including evolutionary theory. The computational complexity of two, qualitatively similar versions of a particularly simple fitness landscape are shown to differ considerably. In one version, the question "Is the global optimum greater than a given value V?" is shown to be answerable in polynominal time by presenting an efficient algorithm that actually computes the optimum. The corresponding problem for the other version of the landscape is shown to be NP complete. The NP completeness of the latter problem leads to some speculations on why P not equal to NP. Key words. rugged fitness landscape, n-k model
Conference Paper
The choice of how to represent the search space for a genetic algorithm (GA) is critical to the GA's performance. Representations are usually engineered by hand and fixed for the duration of the GA run. Here a new method is described in which the degrees of freedom of the representation-i.e. the genes-are increased incrementally. The phenotypic effects of the new genes are randomly drawn from a space of different functional effects. Only those genes that initially increase fitness are kept. The genotype-phenotype map that results from this selection during the construction of the genome allows better adaptation. This effect is illustrated with the NK landscape model. The resulting genotype-phenotype maps are much less epistatic than unselected maps would be, having extremely low values of “K”-the number of fitness components affected by each gene. Moreover, these maps are exquisitely tuned to the specifics of the epistatic fitness function, creating adaptive landscapes that are much smoother than generic NK landscapes with the same genotype-phenotype maps, with fitness peaks many standard deviations higher. Thus a caveat should be made when making arguments about the applicability of generic properties of complex systems to evolved systems. This method may help to solve the problem of choice of representations in genetic algorithms.