## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

For some time, there has been a realisation among Genetic Programming researchers that relying on a single scalar fitness value to drive evolutionary search is no longer a satisfactory approach. Instead, efforts are being made to gain richer insights into the complexity of program behaviour. To this end, particular attention has been focused on the notion of semantic space. In this paper we propose and unified hierarchical approach which decomposes program behaviour into semantic, result and adjudicated spaces, where adjudicated space sits at the top of the behavioural hierarchy and represents an abstraction of program behaviour that focuses on the success or failure of candidate solutions in solving problem sub-components. We show that better, smaller solutions are discovered when crossover is directed in adjudicated space. We investigate the effectiveness of several possible adjudicated strategies on a variety of classification and symbolic regression problems, and show that both of our novel pillage and barter tactics significantly outperform both a standard genetic programming and an enhanced genetic programming configuration on the fourteen problems studied. The proposed method is extremely effective when incorporated into a standard Genetic Programming structure but should also complement several other semantic approaches proposed in the literature.

To read the full-text of this research,

you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.

This paper introduces the concepts of error vector and error space, directly bound to semantics, one of the hottest topics in genetic programming. Based on these concepts, we introduce the notions of optimally aligned individuals and optimally coplanar individuals. We show that, given optimally aligned, or optimally coplanar, individuals, it is possible to construct a globally optimal solution analytically. Thus, we introduce a genetic programming framework for symbolic regression called Error Space Alignment GP (ESAGP) and two of its instances: ESAGP-1, whose objective is to find optimally aligned individuals, and ESAGP-2, whose objective is to find optimally coplanar individuals. We also discuss how to generalize the approach to any number of dimensions. Using two complex real-life applications, we provide experimental evidence that ESAGP-2 outperforms ESAGP-1, which in turn outperforms both standard GP and geometric semantic GP. This suggests that “adding dimensions” is beneficial and encourages us to pursue the study in many different directions, that we summarize in the final part of the manuscript.

Significant recent effort in genetic programming has focused on selecting and combining candidate solutions according to a notion of behaviour defined in semantic space and has also highlighted disadvantages of relying on a single scalar measure to capture the complexity of program performance in evolutionary search. In this paper, we take an alternative, yet complementary approach which directs crossover in what we call adjudicated space, where adjudicated space represents an abstraction of program behaviour that focuses on the success or failure of candidate solutions in solving problem sub-components. We investigate the effectiveness of several possible adjudicated strategies on a variety of classification and symbolic regression problems, and show that both of our novel pillage and barter tactics significantly outperform both a standard genetic programming and an enhanced genetic programming configuration on the fourteen problems studied.

In genetic programming (GP), programs are usually evaluated by applying them to tests, and fitness function indicates only how many of them have been passed. We posit that scrutinizing the outcomes of programs' interactions with individual tests may help making program synthesis more effective. To this aim, we extend our previous work on coevolutionary algorithms and propose DOC, a method that autonomously derives new search objectives by clustering the outcomes of interactions between programs in the population and the tests. The derived objectives are subsequently used to drive the selection process in a single- or multiobjective fashion. An extensive experimental assessment on 15 discrete program synthesis tasks representing two domains shows that DOC significantly outperforms conventional GP and implicit fitness sharing.

This paper provides a structured, unified, formal and empirical perspective on all geometric semantic crossover operators proposed so far, including the exact geometric crossover by Moraglio, Krawiec, and Johnson, as well as the approximately geometric operators. We start with presenting the theory of geometric semantic genetic programming, and discuss the implications of geometric operators for the structure of fitness landscape. We prove that geometric semantic crossover emphcan by construction produce an offspring that is not worse than the fitter parent, and that under certain conditions such an offspring emphis guaranteed to be not worse than the worse parent. We review all geometric semantic crossover operators presented to date in the literature, and conduct a comprehensive experimental comparison using a~tree-based genetic programming framework and a representative suite of nine symbolic regression and nine Boolean function synthesis tasks. We scrutinize the performance (program error and success rate), generalization, computational cost, bloat, population diversity, and the operators' capability to generate geometric offspring. The experiment leads to several interesting conclusions, the primary one being that an operator's capability to produce geometric offspring is positively correlated with performance. The paper is concluded by recommendations regarding the suitability of operators for the particular domains of program induction tasks.

Several methods to incorporate semantic awareness in genetic programming have been proposed in the last few years. These methods cover fundamental parts of the evolutionary process: from the population initialization, through different ways of modifying or extending the existing genetic operators, to formal methods, until the definition of completely new genetic operators. The objectives are also distinct: from the maintenance of semantic diversity to the study of semantic locality; from the use of semantics for constructing solutions which obey certain constraints to the exploitation of the geometry of the semantic topological space aimed at defining easy-to-search fitness landscapes. All these approaches have shown, in different ways and amounts, that incorporating semantic awareness may help improving the power of genetic programming. This survey analyzes and discusses the state of the art in the field, organizing the existing methods into different categories. It restricts itself to studies where semantics is intended as the set of output values of a program on the training data, a definition that is common to a rather large set of recent contributions. It does not discuss methods for incorporating semantic information into grammar-based genetic programming or approaches based on formal methods. The objective is keeping the community updated on this interesting research track, hoping to motivate new and stimulating contributions.

Recently Quantitative Genetics has been successfully employed to understand and improve operators in some Evolutionary Algorithms (EAs) implementations. This theory offers a phenotypic view of an algorithm's behavior at a population level, and suggests new ways of quantifying and measuring concepts such as exploration and exploitation. In this paper, we extend the quantitative genetics approach for use with Genetic Programming (GP), adding it to the set of GP analysis techniques. We use it in combination with some existing diversity and bloat measurement tools to measure, analyze and predict the evolutionary behavior of several GP algorithms. GP specific benchmark problems, such as ant trail and symbolic regression, are used to provide new insight into how various evolutionary forces work in combination to affect the search process. Finally, using the tools, a multivariate phenotypic crossover operator is designed to both improve performance and control bloat on the difficult ant trail problem.

We propose a crossover operator that works with genetic programming trees and is approximately geometric crossover in the semantic space. By defining semantic as program's evaluation profile with respect to a set of fitness cases and constraining to a specific class of metric-based fitness functions, we cause the fitness landscape in the semantic space to have perfect fitness-distance correlation. The proposed approximately geometric semantic crossover exploits this property of the semantic fitness landscape by an appropriate sampling. We demonstrate also how the proposed method may be conveniently combined with hill climbing. We discuss the properties of the methods, and describe an extensive computational experiment concerning logical function synthesis and symbolic regression.

Traditional Genetic Programming (GP) searches the space of functions/programs by using search operators that manipulate their syntactic representation (e.g., parse trees), regardless of their semantic. Recently, semantically aware search operators have been shown to outperform purely syntactic operators. In this work, using a formal geometric view on search operators and representations, we bring the semantic approach to its extreme consequences and introduce a novel form of GP - Semantic GP (SGP) - that searches directly the space of the underlying semantic of the programs. This perspective provides new insights on the relation between syntax, semantic, search operators and fitness landscape, and allows for the principled formal design of semantic search operators for different classes of problems. We derive specific forms of SGP for a number of classic GP domains and report preliminary experimental results. Furthermore, we show that the search of SGP is equivalent to the search of a traditional Genetic Algorithm on the One-Max problem.

We propose a class of crossover operators for genetic programming that aim at making offspring programs semantically intermediate (medial) with respect to parent programs by modifying short fragments of code (subprograms). The approach is applicable to problems that define fitness as a distance between program output and the desired output. Based on that metric, we define two measures of semantic `mediality', which we employ to design two crossover operators: one aimed at making the semantic of offsprings geometric with respect to the semantic of parents, and the other aimed at making them equidistant to parents' semantics. The operators act only on randomly selected fragments of parents' code, which makes them computationally efficient. When compared experimentally with four other crossover operators, both operators lead to success ratio at least as good as for the non-semantic crossovers, and the operator based on equidistance proves superior to all others.

This paper establishes a link between the challenge of solv-ing highly ambitious problems in machine learning and the goal of reproducing the dynamics of open-ended evolution in artificial life. A major problem with the objective function in machine learning is that through deception it may actu-ally prevent the objective from being reached. In a similar way, selection in evolution may sometimes act to discourage increasing complexity. This paper proposes a single idea that both overcomes the obstacle of deception and suggests a sim-ple new approach to open-ended evolution: Instead of either explicitly seeking an objective or modeling a domain to cap-ture the open-endedness of natural evolution, the idea is to simply search for novelty. Even in an objective-based prob-lem, such novelty search ignores the objective and searches for behavioral novelty. Yet because many points in the search space collapse to the same point in behavior space, it turns out that the search for novelty is computationally feasible. Furthermore, because there are only so many simple behav-iors, the search for novelty leads to increasing complexity. In fact, on the way up the ladder of complexity, the search is likely to encounter at least one solution. In this way, by de-coupling the idea of open-ended search from only artificial life worlds, the raw search for novelty can be applied to real world problems. Counterintuitively, in the deceptive maze navigation task in this paper, novelty search significantly out-performs objective-based search, suggesting a surprising new approach to machine learning.

It is well-known that the crossover operator plays an important role in Genetic Programming (GP). In Standard Crossover (SC),
semantics are not used to guide the selection of the crossover points, which are generated randomly. This lack of semantic
information is the main cause of destructive effects from SC (e.g., children having lower fitness than their parents). Recently,
we proposed a new semantic based crossover known GP called Semantic Aware Crossover (SAC) [25]. We show that SAC outperforms SC in solving a class of real-value symbolic regression problems. We clarify the
effect of SAC on GP search in increasing the semantic diversity of the population, thus helping to reduce the destructive
effects of crossover in GP.
KeywordsSemantic Aware Crossover-Semantic-Constructive Effect-Bloat

We investigate the effects of semantically-based crossover operators in genetic programming, applied to real-valued symbolic
regression problems. We propose two new relations derived from the semantic distance between subtrees, known as semantic equivalence
and semantic similarity. These relations are used to guide variants of the crossover operator, resulting in two new crossover
operators—semantics aware crossover (SAC) and semantic similarity-based crossover (SSC). SAC, was introduced and previously
studied, is added here for the purpose of comparison and analysis. SSC extends SAC by more closely controlling the semantic
distance between subtrees to which crossover may be applied. The new operators were tested on some real-valued symbolic regression
problems and compared with standard crossover (SC), context aware crossover (CAC), Soft Brood Selection (SBS), and No Same
Mate (NSM) selection. The experimental results show on the problems examined that, with computational effort measured by the
number of function node evaluations, only SSC and SBS were significantly better than SC, and SSC was often better than SBS.
Further experiments were also conducted to analyse the perfomance sensitivity to the parameter settings for SSC. This analysis
leads to a conclusion that SSC is more constructive and has higher locality than SAC, NSM and SC; we believe these are the
main reasons for the improved performance of SSC.
KeywordsGenetic programming–Semantics–Crossover–Symbolic regression locality

In this paper, we apply the ideas from [2] to investigate the effect of some semantic based guidance to the crossover operator
of GP. We conduct a series of experiments on a family of real-valued symbolic regression problems, examining four different
semantic aware crossover operators. One operator considers the semantics of the exchanged subtrees, while the other compares
the semantics of the child trees to their parents. Two control operators are adopted which reverse the logic of the semantic
equivalence test. The results show that on the family of test problems examined, the (approximate) semantic aware crossover
operators can provide performance advantages over the standard subtree crossover adopted in Genetic Programming.

This paper extends a geometric framework for interpreting crossover and mutation (4) to the case of sequences. This representation is important because it is the link between artificial evolution and bio- logical evolution. We define and theoretically study geometric crossover for sequences under edit distance and show its intimate connection with the biological notion of sequence homology.

A significant challenge in genetic programming is premature convergence to local optima, which often prevents evolution from solving problems. This paper introduces to genetic programming a method that originated in neuroevolution (i.e. the evolution of artificial neural networks) that circumvents the problem of deceptive local optima. The main idea is to search only for behavioral novelty instead of for higher fitness values. Although such novelty search abandons following the gradient of the fitness function, if such gradients are deceptive they may actually occlude paths through the search space towards the objective. Because there are only so many ways to behave, the search for behavioral novelty is often computationally feasible and differs significantly from random search. Counterintuitively, in both a deceptive maze navigation task and the artificial ant benchmark task, genetic programming with novelty search, which ignores the objective, outperforms traditional genetic programming that directly searches for optimal behavior. Additionally, novelty search evolves smaller program trees in every variation of the test domains. Novelty search thus appears less susceptible to bloat, another significant problem in genetic programming. The conclusion is that novelty search is a viable new tool for efficiently solving some deceptive problems in genetic programming.

This paper describes the use of a recently introduced crossover operator for GP, context-aware crossover. Given a randomly selected subtree from one parent, context-aware crossover will always find the best location to place the subtree in the other parent. We examine the performance of GP when context-aware crossover is used as an extra crossover operator, and show that standard crossover is far more destructive, and that performance is better when only context-aware crossover is used. There is still a place for standard crossover, however, and results suggest that using standard crossover in the ini- tial part of the run and then switching to context-aware crossover yields the best performance. We show that, across a range of standard GP benchmark problems, context-aware crossover produces a higher best fitness as well as a higher mean fitness, and even man- ages to solve the 11-bit multiplexer problem without ADFs. Furthermore, the individuals produced this way are much smaller than standard GP, and far fewer individual evalua- tions are required, so GP achieves a higher fitness by evalu- ating fewer and smaller individuals.

Crossover forms one of the core operations in genetic programming and has been the subject of many different investigations. We present a novel technique, based on semantic analysis of programs, which forces each crossover to make candidate programs take a new step in the behavioural search space. We demonstrate how this technique results in better performance and smaller solutions in two separate genetic programming experiments.

The relationship between search space, distances and genetic operators for syntactic trees is little understood. Geometric crossover and geometric mutation are representation-independent operators that are well-defined once a notion of distance over the solution space is defined. In this paper we apply this geometric framework to the syntactic tree representation and show how the well-known structural distance is naturally associated with homologous crossover and sub-tree mutation

In this paper we give a representation-independent topological definition of crossover that links it tightly to the notion of fitness landscape. Building around this definition, a geometric/topological framework for evolutionary algorithms is introduced that clarifies the connection between representation, genetic operators, neighbourhood structure and distance in the landscape. Traditional genetic operators for binary strings are shown to fit the framework. The advantages of this interpretation are discussed 1

The Operator Equalization (OE) family of bloat control methods have achieved promising results in many domains. In particular, the Flat-OE method, that promotes a flat distribution of program sizes, is one of the simplest OE methods and achieves some of the best results. However, Flat-OE, like all OE variants, can be computationally expensive. This work proposes a simplified strategy for bloat control based on Flat-OE. In particular, bloat is studied in the NeuroEvolution of Augmenting Topologies (NEAT) algorithm. NEAT includes a very simple diversity preservation technique based on speciation and fitness sharing, and it is hypothesized that with some minor tuning, speciation in NEAT can promote a flat distribution of program size. Results indicate that this is the case in two benchmark problems, in accordance with results for Flat-OE. In conclusion, NEAT provides a worthwhile strategy that could be extrapolated to other GP systems, for effective and simple bloat control.

Bloat is one of the most interesting theoretical problems in genetic programming (GP), and one of the most important pragmatic limitations in the development of real-world GP solutions. Over the years, many theories regarding the causes of bloat have been proposed and a variety of bloat control methods have been developed. It seems that one of the underlying causes of bloat is the search for fitness; as the fitness-causes-bloat theory states, selective bias towards fitness seems to unavoidably lead the search towards programs with a large size. Intuitively, however, abandoning fitness does not appear to be an option. This paper, studies a GP system that does not require an explicit fitness function, instead it relies on behavior-based search, where programs are described by the behavior they exhibit and selective pressure is biased towards unique behaviors using the novelty search algorithm. Initial results are encouraging, the average program size of the evolving population does not increase with novelty search; i.e., bloat is avoided by focusing on novelty instead of quality.

Natural evolution is an open-ended search process without an a priori fitness function that needs to be optimized. On the other hand, evolutionary algorithms (EAs) rely on a clear and quantitative objective. The Novelty Search algorithm (NS) substitutes fitness-based selection with a novelty criteria; i.e., individuals are chosen based on their uniqueness. To do so, individuals are described by the behaviors they exhibit, instead of their phenotype or genetic content. NS has mostly been used in evolutionary robotics, where the concept of behavioral space can be clearly defined. Instead, this work applies NS to a more general problem domain, classification. To this end, two behavioral descriptors are proposed, each describing a classifier's performance from two different perspectives. Experimental results show that NS-based search can be used to derive effective classifiers. In particular, NS is best suited to solve difficult problems, where exploration needs to be encouraged and maintained.

In evolutionary computation, the fitness of a candidate solutionfunction conveys sparse feedback. Yet in many cases, candidate solutions can potentially yield more information. In genetic programming (GP), one can easily examine program behavior on particular fitness cases or at intermediate execution states. However, how to exploit it to effectively guide the search remains unclear. In this study we apply machine learning algorithms to features describing the intermediate behavior of the executed program. We then drive the standard evolutionary search with additional objectives reflecting this intermediate behavior. The machine learning functions independent of task-specific knowledge and discovers potentially useful components of solutions (subprograms), which we preserve in an archive and use as building blocks when composing new candidate solutions. In an experimental assessment on a suite of benchmarks, the proposed approach proves more capable of finding optimal and/or well-performing solutions than control methods.
Available from ACM: http://dl.acm.org/citation.cfm?id=2598288

While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

Many seemingly different problems in machine learning, artificial intelligence, and symbolic processing can be viewed as requiring the discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a highly fit individual computer program. The recently developed genetic programming paradigm described herein provides a way to search the space of possible computer programs for a highly fit individual computer program to solve (or approximately solve) a surprising variety of different problems from different fields. In the genetic programming paradigm, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (sexual recombination) operator appropriate for genetically mating computer programs. This chapter shows how to reformulate seemingly different problems into a common form (i.e. a problem requiring discovery of a computer program) and, then, to show how the genetic programming paradigm can serve as a single, unified approach for solving problems formulated in this common way.

Multi-objective evolutionary algorithms (MOEAs) that use
non-dominated sorting and sharing have been criticized mainly for: (1)
their O(MN<sup>3</sup>) computational complexity (where M is the number
of objectives and N is the population size); (2) their non-elitism
approach; and (3) the need to specify a sharing parameter. In this
paper, we suggest a non-dominated sorting-based MOEA, called NSGA-II
(Non-dominated Sorting Genetic Algorithm II), which alleviates all of
the above three difficulties. Specifically, a fast non-dominated sorting
approach with O(MN<sup>2</sup>) computational complexity is presented.
Also, a selection operator is presented that creates a mating pool by
combining the parent and offspring populations and selecting the best N
solutions (with respect to fitness and spread). Simulation results on
difficult test problems show that NSGA-II is able, for most problems, to
find a much better spread of solutions and better convergence near the
true Pareto-optimal front compared to the Pareto-archived evolution
strategy and the strength-Pareto evolutionary algorithm - two other
elitist MOEAs that pay special attention to creating a diverse
Pareto-optimal front. Moreover, we modify the definition of dominance in
order to solve constrained multi-objective problems efficiently.
Simulation results of the constrained NSGA-II on a number of test
problems, including a five-objective, seven-constraint nonlinear
problem, are compared with another constrained multi-objective
optimizer, and the much better performance of NSGA-II is observed

ffl The tree has never been executed, ffl The tree has never been executed in a test sub sequence which subsequently failed a consistency check (nak) and it is anticipated that it would not be executed by any of the fitness test cases that have not been used (unknown). However a tree may be selected as the crossover location immediately if either:-- ffl The number of times it was used in a test subsequence which subsequently passed its consistency check (ok) is less than nak, or 1 W.Langdon@cs.ucl.ac.uk W. B. Langdon 17 August 1995 2 ffl both it has never been run successfully and unknown is zero. Otherwise, the following ratio is calculated: nak + unknown ok + unknown (1) When ratios for three trees have been calculated, crossover occurs in the tree with the highest ratio. If 100 t

Size fair and homologous crossover genetic operators for tree based genetic programming are described and tested. Both produce considerably reduced increases in program size and no detrimental effect on GP performance. GP search spaces are partitioned by the ridge in the number of program v. their size and depth. A ramped uniform random initialisation is described which straddles the ridge. With subtree crossover trees increase about one level per generation leading to sub-quadratic bloat in length. 1 INTRODUCTION It has been known for some time that programs within GP populations tend to rapidly increase in size as the population evolves. If unchecked this consumes excessive machine resources. This is usually addressed either by enforcing a size or depth limit on the programs or by an explicit size penalty in the fitness measure, although other techniques may be used. Both main approaches have problems. It has been shown that the protective effect of inviable code (which...

Behavioral programming: a broader and more detailed take on semantic GP

- K Krawiec
- U M Reilly

Using context-aware crossover to improve the performance of GP

- H Majeed
- C M Ryan
- M Cattolico
- D Arnold
- V Babovic
- C Blum
- P Bosman
- M V Butz
- Coello Coello
- C Dasgupta
- D Ficici
- S G Foster
- J Hernandez-Aguirre
- A Hornby
- G Lipson
- H Mcminn
- P Moore
- J Raidl
- G Rothlauf
- F Ryan

For sale or wanted: directed crossover in adjudicated space

- J M Fitzgerald
- C Ryan
- A Rosa
- J J Merelo
- A Dourado
- J M Cadenas
- K Madani
- A Ruano

A new methodology for the GP theory toolbox

- J Bassett
- U Kamath
- De Jong
- K Soule
- T Auger
- A Moore
- J Pelta
- D Solnon
- C Preuss
- M Dorin
- A Ong
- Y S Blum
- C Silva
- D L Neumann
- F Yu
- T Ekart
- A Browne
- W Kovacs
- T Wong
- M L Pizzuti

Approximating geometric crossover in semantic space

- K Krawiec
- P Lichocki
- G Raidl
- F Rothlauf
- G Squillero
- R Drechsler
- T Stuetzle
- M Birattari
- C B Congdon
- M Middendorf
- C Blum
- C Cotta
- P Bosman
- J Grahl
- J Knowles
- D Corne
- H G Beyer
- K Stanley
- J F Miller
- J Van Hemert
- T Lenaerts
- M Ebner
- J Bacardit
- M Neill
- M Di Penta
- B Doerr
- T Jansen
- R Poli

A new methodology for the GP theory toolbox

- J Bassett
- U Kamath
- K De Jong
- T Soule
- A Auger
- J Moore
- D Pelta
- C Solnon
- M Preuss
- A Dorin
- Y S Ong
- C Blum
- D L Silva
- F Neumann
- T Yu
- A Ekart
- W Browne
- T Kovacs
- M L Wong
- C Pizzuti
- J Rowe
- T Friedrich
- G Squillero
- N Bredeche
- S Smith
- A Motsinger-Reif
- J Lozano
- M Pelikan
- S Meyer-Nienberg
- C Igel
- G Hornby
- R Doursat
- S Gustafson
- G Olague
- S Yoo
- J Clark
- G Ochoa
- G Pappa
- F Lobo
- D Tauritz
- J Branke