BookPDF Available
Chapter 1
Local Search is Underused in Genetic
Programming
Leonardo Trujillo, Emigdio Z-Flores, Perla S. Ju´
arez Smith, Pierrick Legrand, Sara
Silva, Mauro Castelli, Leonardo Vanneschi, Oliver Sch¨
utze and Luis Mu˜
noz
Abstract There are two important limitations of standard tree-based genetic pro-
gramming (GP). First, GP tends to evolve unnecessarily large programs, what is
referred to as bloat. Second, GP uses inefficient search operators that focus on modi-
fying program syntax. The first problem has been studied in many works, with many
bloat control proposals. Regarding the second problem, one approach is to use al-
Leonardo Trujillo
Tree-Lab, Posgrado en Ciencias de la Ingenier´
ıa, Instituto Tecnol´
ogico de Tijuana, Blvd. Industrial
y Av. ITR Tijuana S/N, Mesa Otay C.P. 22500, Tijuana B.C., M´
exico
Emigdio Z-Flores
Tree-Lab, Posgrado en Ciencias de la Ingenier´
ıa, Instituto Tecnol´
ogico de Tijuana, Blvd. Industrial
y Av. ITR Tijuana S/N, Mesa Otay C.P. 22500, Tijuana B.C., M´
exico
Perla S. Ju´
arez Smith
Tree-Lab, Posgrado en Ciencias de la Ingenier´
ıa, Instituto Tecnol´
ogico de Tijuana, Blvd. Industrial
y Av. ITR Tijuana S/N, Mesa Otay C.P. 22500, Tijuana B.C., M´
exico
Pierrick Legrand
Universit`
e de Bordeaux, Institut de Math`
ematiques de Bordeaux, UMR CNRS 5251, CQFD Team,
Inria Bordeaux Sud-Ouest, France
Sara Silva
BioISI Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisbon,
Portugal.
Mauro Castelli
NOVA IMS, Universidade Nova de Lisboa, 1070-312 Lisbon, Portugal
Leonardo Vanneschi
NOVA IMS, Universidade Nova de Lisboa, 1070-312 Lisbon, Portugal
Oliver Sch¨
utze
Computer Science Department, CINVESTAV-IPN, Av. IPN 2508, Col. San Pedro Zacatenco,
07360, Mexico City, M´
exico
Luis Mu˜
noz
Tree-Lab, Posgrado en Ciencias de la Ingenier´
ıa, Instituto Tecnol´
ogico de Tijuana, Blvd. Industrial
y Av. ITR Tijuana S/N, Mesa Otay C.P. 22500, Tijuana B.C., M´
exico
1
2 Authors Suppressed Due to Excessive Length
ternative search operators, for instance geometric semantic operators, to improve
convergence. In this work, our goal is to experimentally show that both problems
can be effectively addressed by incorporating a local search optimizer as an addi-
tional search operator. Using real-world problems, we show that this rather simple
strategy can improve the convergence and performance of tree-based GP, while re-
ducing program size. Given these results, a question arises: why are local search
strategies so uncommon in GP? A small survey of popular GP libraries suggests to
us that local search is underused in GP systems. We conclude by outlining plausible
answers for this question and highlighting future work.
Key words: Genetic Programming, Local Search, Bloat, NEAT
1.1 Introduction
Genetic programming (GP) is one of the most competitive approaches towards auto-
matic program induction and automatic programming in artificial intelligence, ma-
chine learning and soft computing (Koza, 2010). In particular, even the earliest ver-
sion of GP, proposed by Koza in the 1990’s and commonly referred to as tree-based
GP or standard GP1(Koza, 1992), continues to produce strong results in applied
domains over 20 years later (Olague and Trujillo, 2011; Trujillo et al, 2012). How-
ever, while tree-based GP is supported by sound theoretical insights (Langdon and
Poli, 2002; Poli and McPhee, 2003a,b, 2008), these formalisms have not allowed
researchers to completely overcome some of GP’s weaknesses.
In this work, we focus on two specific shortcomings of standard GP. The first
drawback is bloat, the tendency of GP to evolve unnecessarily large solutions. In
bloated runs the size (number of nodes) of the best solution and/or the average size
of all the individuals increases even when the quality of the solutions stagnates.
Bloat has been the subject of much research in GP, comprehensively surveyed in
(Silva and Costa, 2009). The most successful bloat control, or size control, strategies
have basically modified the manner in which fitness is assigned (Dignum and Poli,
2008; Poli and McPhee, 2008; Silva, 2011; Silva et al, 2012; Silva and Vanneschi,
2011), focusing the search towards specific regions of solution space.
A second problem of standard GP is the nature of the search operators. Sub-
tree crossover and mutation operate on syntax, but are blind to the effect that these
changes will have on the output of the programs, what is referred to as semantics
(Moraglio et al, 2012). This has lead researches to use the geometric properties of
semantic space (Moraglio et al, 2012) and define search operators that operate at
the syntax level but have a known and bounded effect on semantics, what is known
as Geometric Semantic GP (GSGP). While GSGP has achieved impressive results
in several domains (Vanneschi et al, 2014), it suffers from an intrinsic shortcoming
that is difficult to overstate. In particular, the sizes of the evolved solutions grow
1We will use the terms standard GP and tree-based GP interchangeably in this work, referring to
the basic GP algorithm that relies on a tree representation and subtree genetic operators.
1 Local Search is Underused in Genetic Programming 3
exponentially with the number of generations (Moraglio et al, 2012). Since program
growth is not an epiphenomenon in GSGP, as it is in standard GP, it does not seem
correct to call it bloat, it is just the way that the GSGP search operates. Nonethe-
less, this practically eliminates one of the possible advantages of GP compared to
other machine learning techniques, that the evolved solutions might be amenable to
human interpretation (Koza, 1992, 2010; Olague and Trujillo, 2011).
The goal of this work is twofold. First, we intend to experimentally show that the
effect of these problems can be substantially mitigated, if not practically eliminated,
by integrating a powerful local search (LS) algorithm as an additional search opera-
tor. Our work analyzes the effects of LS on several variants of GP, including standard
GP, a bloat free GP algorithm called neat-GP (Trujillo et al, 2016), and GSGP. In all
cases, we will show that LS has at least one, if not several, of the following conse-
quences: improved convergence, improved performance and reduction in program
size. Moreover, we will argue that the greedy LS strategy does not increase overfit-
ting or computational cost, two common objections towards using such approaches
in meta-heuristic optimization. The second goal of this work is to pose the following
question: why are LS strategies seldom used, if at all, in GP algorithms? While we
do not claim that no previous works have integrated a local optimizer into a GP al-
gorithm, the fact remains that most works with GP do not do so, with most works on
the subject presenting specific application papers. This is particularly notable when
we consider how ubiquitous hybrid evolutionary-LS algorithms have become, what
are commonly referred to as memetic algorithms (Chen et al, 2011; Neri et al, 2012;
Lara et al, 2010). We will attempt to give plausible answers to this question, and to
highlight important future research on the subject.
This chapter proceeds as follows. Section 1.2 discusses related work. Section
1.2.1 describes our proposal to apply LS in GP for symbolic regression with an
experimental example. Section 1.3 shows how LS combined with a bloat-free GP
can substantially reduce code growth. Afterward, Section 1.4 discusses recent works
that apply LS with GSGP, improving convergence and performance in real-world
domains. Based on the previous sections, Section 1.5 argues that LS strategies are
underused in GP search. Finally, Section 1.6 presents our conclusions and future
perspectives.
1.2 Local Search in Genetic Programming
Many works have studied how to combine evolutionary algorithms with LS (Chen
et al, 2011; Neri et al, 2012). The basic idea is to include an additional operator
that takes an individual as an initial point and searches for its optimal neighbor.
Such a strategy can help guarantee that the local region around each individual is
fully exploited. These algorithms, often called memetic algorithms, have produced
impressive results in a variety of domains (Chen et al, 2011; Neri et al, 2012).
When applying a LS strategy to GP, there are basically two broad approaches
to follow: (1) apply a LS on the syntax; or (2) apply it on numerical parameters of
4 Authors Suppressed Due to Excessive Length
the program. Regarding the former, (Azad and Ryan, 2014) presents an interesting
recent example. The authors apply a greedy search on a randomly chosen GP node,
attempting to determine the best function to use in that node among all the pos-
sibilities in the function set. To reduce computational overhead the authors apply a
heuristic decision rule to decide which trees are subject to the LS, preferring smaller
trees to bias the search towards smaller solutions.
Regarding the optimization of numerical parameters within the tree, the follow-
ing works are of note. In (Topchy and Punch, 2001) gradient descent is used to
optimize numerical for symbolic regression problems. However, the work only op-
timizes the value of the terminal elements (tree leaves), it does not consider param-
eters within internal nodes. Similarly, in (Zhang and Smart, 2004) and (Graff et al,
2013) a LS algorithm is used to optimize the value of constant terminal elements. In
(Zhang and Smart, 2004) gradient descent is used and tested on classification prob-
lems, applying the LS process on every individual of the population. Another recent
example is (Graff et al, 2013), where Resilient Backpropagation (RPROP) is used,
in this case applying the LS operator to the best individual of each generation.
From these examples, an important question for memetic algorithms is to deter-
mine when to apply the LS. For instance, (Zhang and Smart, 2004) applies it to
all the population, while (Graff et al, 2013) does so only for the best solution of
each generation, and (Azad and Ryan, 2014) uses a heuristic criterion. In the case
of GP for symbolic regression, this question is studied in (Z-Flores et al, 2014),
concluding that the best strategies might be to apply LS to all the individuals in
the population or a subset of the best individuals. However, that work focused on
synthetic benchmarks and did not consider specialized heuristics (Azad and Ryan,
2014). Nonetheless, (Z-Flores et al, 2014) does show that in general, including a LS
strategy improves convergence and performance, while reducing code growth.
Other works have increased the role of the local optimizer, changing the ba-
sic GP strategy. Fast Function Extraction (FFX) (McConaghy, 2011), for instance,
poses the symbolic regression problem as the search for the best linear combina-
tion of candidate basis functions. Thus, FFX builds linear models, and optimizes
these model using a modified version of the elastic net regression technique, elim-
inating the evolutionary process altogether. A similar approach can be seen in the
prioritized grammar enumeration (PGE) technique (Worm and Chiu, 2013), where
dynamic programming replaces the the basic search operators of traditional GP, and
numerical parameters are optimized using the non-linear Levenberg-Marquardt al-
gorithm.
1.2.1 Local Search in Symbolic Regression with Standard GP
In this work we focus on symbolic regression, however we believe that some of the
conclusions might be more general. For now, this section describes our proposal
to integrate a LS operator in GP in this domain, which we originally presented in
(Z-Flores et al, 2014, 2015). For symbolic regression, the goal is to search for the
1 Local Search is Underused in Genetic Programming 5
symbolic expression KO(θO):RpRthat best fits a particular training set T=
{(x1,y1),...,(xn,yn)}of ninput/output pairs with xiRpand yiR, stated as
(KO
,θO)arg min
KG;θRm
f(K(xi,θ),yi)with i =1,...,n,(1.1)
where Gis the solution or syntactic space defined by the primitive set Pof func-
tions and terminals, fis the fitness function which is based on the difference be-
tween a program’s output K(xi,θ)and the desired output yi, and θis a particular
parametrization of the symbolic expression K, assuming mreal-valued parameters.
This dual problem of simultaneously optimizing syntax (structure) and parametriza-
tion can be addressed following two general approaches (Lohmann, 1991; Em-
merich et al, 2001). The first group is hierarchical structure evolution (HSE), when
θhas a strong influence on fitness, and thus a LS is required at each iteration of the
global (syntactic) search as a nested process. The second group is called simultane-
ous structure evolution (SSE), when θhas a marginal effect on fitness, in such cases
a single evolutionary loop could simultaneously optimize both syntax and parame-
ters. These are abstract categories, but it is reasonable to state that standard GP, for
instance, falls in the SSE group. On the other hand, memetic algorithms, such as the
GP version we proposed in (Z-Flores et al, 2014, 2015), fall in the HSE group.
1.2.2 Proposal
First, as suggested in (Kommenda et al, 2013), for each individual Kin the popula-
tion we add a small linear uppertree above the root node, such that
K0=θ2+θ1(K),(1.2)
K0represents the new program output, while α1,α2Rare the first two parameters
from θ. Second, for all the other nodes nkin the tree Kwe add a weight coefficient
θkR, such that each node is now defined by
n0
k=θknk,(1.3)
where n0
kis the new modified node, k∈ {1, ..., r}and ris the size of tree K. No-
tice that each node has an unique parameter that can be modified to help meet the
overall optimization criteria of the non-linear expression. At the beginning of the
GP run each parameter is initialized by θi=1. During the GP syntax search, sub-
trees belonging to different individuals are swapped, added or removed together
with their corresponding parameters, often called Lamarckian inheritance (Z-Flores
et al, 2014, 2015). We consider each tree as a non-linear expression and the local
search operator must now find the best fit parameters of the model K0. The problem
could be solved using a variety of techniques, but following (Z-Flores et al, 2014,
2015) we use a trust region algorithm.
6 Authors Suppressed Due to Excessive Length
Table 1.1 GP parameters
Parameter Value
Runs 30
Population 100
Function evaluations 2’500,000
Training set 70% of complete data
Testing set 30% of complete data
Crossover operator Standard subtree crossover, 0.9 prob.
Mutation operator Mutation probability per node 0.05
Tree initialization Full, max. depth 6
Function set +,-,×,÷,exp,sin,cos,log,sqrt,tan,tanh
Terminal set Input features, constants
Selection for reproduction Tournament selection of size 7
Elitism Best individual survives
Maximum tree depth 17
Finally, it is important to consider that the LS could increase the computational
cost of the search, particularly when individual trees are very large. While applying
the LS strategy to all trees might produce good results (Z-Flores et al, 2014, 2015),
it is preferable to reduce the amount of trees to which it is applied. Therefore, we use
the heuristic proposed in (Azad and Ryan, 2014), where the LS is applied stochasti-
cally based on a probability p(s)determined by the tree size sand the average size
of the population s(details in (Azad and Ryan, 2014; Z-Flores et al, 2015)). In this
way, smaller trees are more likely to be optimized than larger trees, which reduces
the computational cost and improves the convergence of the optimizer by keeping
the parameter vectors relatively small. We refer to this version of GP as GP-LS.
1.2.3 Experiments and Results
We evaluate the this proposal on a real-world symbolic regression task, the Yacht
problem that has 6 features and 308 input/output samples (Ortigosa et al, 2007).
The experiments are carried out using a modified version of the Matlab GP toolbox
GPLab (Silva and Almeida, 2005). The GP parameters used are given in Table 1.1.
In what follows, we will present results based on the median performance over all
runs. The fitness function used is the RMSE, and the stopping criterion is the total
number of fitness function evaluations. Function evaluations are used to account for
the computational cost of the trust region optimizer, which in this case is allowed to
run for 100 iterations. Results are compared with a standard GP.
Figure 1.1 summarizes the main results. The convergence plots of GP and GP-LS
are shown in Figure 1.1(a), showing the median training and testing performance.
The figure clearly shows that GP-LS converges faster to a lower error, and at the end
of the run it substantially outperforms standard GP, consistent with (Z-Flores et al,
2014, 2015). Figure 1.1(b) presents a scatter plot (each point is one individual) of
all individuals generated in all runs. The individuals are plotted based on function
evaluations and size. Each individual is color coded using a heat map based on test
1 Local Search is Underused in Genetic Programming 7
2.06
1.14
1
2
3
4
5
0 500k 1M 2.5M
Objective function evaluations
RMSE
Variant
GP Training partition
GP Testing partition
GP-LS Testing partition
GP-LS Training partition
Fitness performance
(a)
10
20
30
40
0 500k 1M 2.5M
Objective function evaluations
Size (nodes)
5
10
15
20
RMSE
Population size performance + fitness
(b)
10
20
30
40
0 500k 1M 2.5M
Objective function evaluations
Size (nodes)
0
25
50
75
100
Delta
Delta population fitness (Testing)
(c)
10
20
30
40
0 500k 1M 2.5M
Objective function evaluations
Size (nodes)
-30
0
30
60
90
Delta
Delta population rank (Training)
(d)
Fig. 1.1 Experimental results for GP-LS on the Yacht problem.
performance, with the best individuals (lowest error) in red. Figure 1.1(b) shows
that the best performance is achieved by the largest individuals.
However, our strategy is to apply the LS on the smallest individuals of the pop-
ulation. This is clearly validated in Figures 1.1(c) and 1.1(d). Figure 1.1(c) shows
the raw improvement of test fitness for each individual before and after the LS. A
similar plot is shown in Figure 1.1(d), however instead of showing the raw improve-
ment, this figure plots the improvement in rank within the population. In both plots
the average program size is plotted with a white line, and individuals that were not
passed to the local optimizer have a zero value. These plots reveal that: (1) most
individuals below the average size are improved by the LS; and (b) the largest im-
provement is exhibited by individuals that are only slightly smaller than the average
program size. While the effect on program size by the LS process will be further
discussed in Section 1.3, for now it is important to state that the median of the av-
erage program size produced by standard GP on this problem is 123.576, which is
substantially higher than what is shown by GP-LS.
These results present three interesting and complimentary results. First, GP-LS
clearly outperforms standard GP, in terms of convergence, solution quality and av-
erage solution size. Second, the LS is clearly improving the quality of the smallest
8 Authors Suppressed Due to Excessive Length
median=0.4629
0
10
20
30
0.00 0.25 0.50 0.75 1.00
Proportional frequency
Run
Group
LS selected
Non LS selected
(a)
0
10
20
30
0 10 20 30 40
Generation
Run
Ancestors with LS
0
10
20
30
0 10 20 30 40
Generation
Run
0.00
0.25
0.50
0.75
1.00
Percentage
Ancestors without LS
(b)
Fig. 1.2 Influence of the LS operator on the construction of the best solution found for the Yacht
problem.
individuals in the population, in some cases substantially. On the other hand, and
thirdly, the best solutions are still the largest trees in the population. This means that
while the LS operator improves smaller solutions, the best solutions are not neces-
sarily subjected to the LS process. This means that the LS process should be seen
as an important complimentary operator. While many previous works have applied
a LS process on the best solutions found, our results indicate that this is insufficient,
the LS should be applied more broadly to achieve the best results.
Figure 1.2 summarizes how the LS operator influences the construction of the
best solution. First, Figure 1.2(a) shows how many times the best solution in the
population was chosen by the LS selection heuristic for each run. The plot indicates
that the best solution was chosen about 50% of the time. Second, we track all of the
ancestors of the best individual from each run, and Figure 1.2(b) plots the percentage
of ancestors that were subjected to the LS. This plot also suggests that, on average,
about half of the ancestors were subjected to the LS and half were not.
1.3 Bloat Control and Local search
The goal of this section is to analyze the effect that the LS has on program size.
We use a recently proposed bloat-free GP algorithm called neat-GP (Trujillo et al,
2016), which is based on operator equalization (OE) family of methods (Dignum
and Poli, 2008; Silva et al, 2012).
The OE approach is to control the distribution of program sizes, defining a spe-
cific shape for the distribution and then enforcing heuristic rules to fit the population
to the goal distribution. Surprisingly, some of the best results are achieved by using
a uniform or flat distribution; this method is called Flat-OE (Silva and Vanneschi,
2011). One of the main drawbacks of OE methods has been the difficulty of effi-
ciently controlling the shape of the distribution without modifying the nature of the
1 Local Search is Underused in Genetic Programming 9
search. Recently, neat-GP was proposed to approximate the behavior of Flat-OE in
a simpler manner, exploiting well-known EC principles such as speciation, fitness
sharing and elitism (Trujillo et al, 2016). As the name suggests, neat-GP is designed
following the general principles of the NeuroEvolution of Augmenting Topologies
(NEAT) algorithm (Stanley and Miikkulainen, 2002). While NEAT has been used
in a variety of domains, its applicability for GP in general, and for bloat control in
particular, was not fully exploited until recently (Trujillo et al, 2014, 2016).
The main features of neat-GP are the following. (a) The initial population contain
trees of small depth (3 levels), the NEAT approach is to start with simple (small) so-
lutions, and to progressively build complexity (increasing size) only if the problem
requires it. (b) As the search progresses, the population is divided into subsets called
species, such that each species contains individuals of similar size and shape; this
process is called speciation, which protects innovation during the search. (c) The
algorithm uses fitness sharing, such that individuals from very large species are pe-
nalized more than individuals that belong to smaller species. This allows the search
to maintain an heterogeneous population of individuals based on their size, follow-
ing Flat-OE. The only exception are the best individuals in each species, these are
not penalized allowing the search to maintain the best solutions. (d) Crossover op-
erations mostly take place between individuals from the same species, such that the
offspring will have a very similar size and shape to their parents. For a full descrip-
tion of neat-GP the reader is referred to (Trujillo et al, 2016).
1.3.1 Experiments and Results
The proposal made in this work is straightforward, combine neat-GP with the GP-
LS strategy; hereafter this hybrid method will be referred to as neat-GP-LS. The
experimental work centers around comparing four GP variants: standard GP; neat-
GP with subtree crossover; GP-LS and neat-GP-LS. Each variant was applied to
the real-world problems summarized in Table1.2. This table specifies the number
of input variables and the size of the data sets for the real-world problem and a de-
scription of the dataset is provided along with the appropriate reference publication.
In this case, 10 runs of each algorithm were performed, suing the parameters spec-
ified in Table 1.3. For neat-GP the parameters were set according to (Trujillo et al,
2016). For all algorithms, fitness and performance are computed using the RMSE.
The algorithms are implemented using the DEAP library for Python (De Rainville
et al, 2012) and available for download2.
Several changes were made to the GP-LS approach used in the previous section.
First, the LS operator is applied randomly with a 0.5 probability to every individual
in the population, based on the results shown in Figure 1.2. Second, the LS operator
was only allowed to run for 40 iterations, to reduce the total computational cost.
Third, the termination criterion was set to 100,000 function evaluations.
2http://www.tree-lab.org/index.php/resources-2/downloads/open-source-tools/item/145-neat-gp
10 Authors Suppressed Due to Excessive Length
Table 1.2 Real world regression problems used to compare all algorithms.
Problem Features Samples Brief description
Housing(Quinlan, 1993) 14 506 Housing values in suburbs of Boston
Energy Cooling Load(Tsanas
and Xifara, 2012) 8 768 Energy analysis using different building
shapes simulated in Ecotectn
Table 1.3 Parameters used in all experiments.
Parameter GP Std and GP-LS neat-GP
Runs 10 10
Population 100 100
Function evaluations 100’000 100’000
Training set 70% 70%
Testing set 30% 30%
Operator probabilities
Crossover (pc), Mutation (pm)pc=0.9, pm=0.1 pc=0.7,pm=0.3
Tree initialization Ramped Half-and-Half,
with 6 levels of maximun depth Full initialization,
with 3 levels of maximun depth
Function set +,-,x,sin,cos,log,sqrt,tan,tanh
Terminal set Input variables for each real world problem
Selection for reproduction Tournament selection of size 7 Eliminate the worst individuals
of each specie
Elitism Best individual survives Don’t penalize the best individual
of each species
Maximum tree depth 17 17
Survival threshold - 0.5
Specie threshold value - h=0.15 with α=0.5
LS Optimizer probability P
s=0.5P
s=0.5
The algorithms are compared based on the following performance criteria: best
training fitness, test fitness of the best solution, average size (number of nodes) of all
individuals in the population, and size of the best solution. In particular we will plot
and present in table form the median performance. These results will be summarized
using convergence plots (performance relative to iterations).
Figure 1.3 summarize the results of the tested techniques on both problems,
showing convergence plots for training and testing fitness, and the average program
size, each one with respect to the number of fitness function evaluations, showing
the median over 10 independent runs. Notice that in this case, performance is more
or less equal for all algorithms. This might be due to the different GP implementa-
tions used (GPLab and DEAP) or to the different parametrizations of the LS strategy.
Nevertheless, when we inspect program size we clearly see the benefits of the using
the LS strategy. First, GP evolves substantially larger solutions for both problems,
about one order of magnitude difference relative to the other methods. Second, sur-
prisingly GP-LS is able to control code growth just as well as neat-GP. In other
words, GP-LS has the same parsimonious effect on evolution as an algorithm ex-
plicitly designed for bloat control. Finally, when we combine both algorithms in
neat-GP-LS, the reduction in solution size is drastically improved.
1 Local Search is Underused in Genetic Programming 11
4
6
8
10
12
0 25k 50k 75k 100k
Objective function evaluations
RMSE
GP
neat-GP
GP-LS
neat-GP-LS
Train fitness
(a)
0
2
4
6
8
10
12
14
0 25k 50k 75k 100k
Objective function evaluations
RMSE
GP
neat-GP
GP-LS
neat-GP-LS
Train fitness
(b)
4
6
8
10
12
0 25k 50k 75k 100k
Objective function evaluations
RMSE
GP
neat-GP
GP-LS
neat-GP-LS
Test fitness
(c)
0
2
4
6
8
10
12
14
0 25k 50k 75k 100k
Objective function evaluations
RMSE
GP
neat-GP
GP-LS
neat-GP-LS
Test fitness
(d)
0
100
200
300
400
0 25k 50k 75k 100k
Objective function evaluations
Size (nodes)
GP
neat-GP
GP-LS
neat-GP-LS
Population tree size
(e)
0
100
200
300
0 25k 50k 75k 100k
Objective function evaluations
Size (nodes)
GP
neat-GP
GP-LS
neat-GP-LS
Population tree size
(f)
Fig. 1.3 Results for real world problems Housing (a,c,e) and Energy Cooling Load (b,d,f) plotted
with respect to total function evaluations: (a, b) Fitness over train data; (c, d) Fitness over test data;
and (e, f) Population size. Plots show median values over 10 independent runs.
1.4 Local Search in Geometric Semantic GP
In this section we briefly summarize our recent results of integrating a LS oper-
ator to GSGP. Our approach was originally presented in (Castelli et al, 2015f),
which we briefly summarize first. In (Moraglio et al, 2012) two new genetic opera-
12 Authors Suppressed Due to Excessive Length
tors were proposed, Geometric Semantic Mutation (GSM) and Geometric Semantic
Crossover (GSC). Both operators define syntax transformations, but their effect on
program semantics is determined by the semantics of the parents within certain ge-
ometrical bounds. While other semantic approaches have been proposed (Vanneschi
et al, 2014), the GSGP is probably the most promising given these nice properties.
In (Castelli et al, 2015f), we extended the GSM operator to integrate a greedy LS
optimizer, we call this operator GSM-LS.
The advantage of GSM-LS relative to GSM is that at every mutation event, the
semantics of the offspring TMis not restricted to lie within a ball around the parent
T. Indeed, GSM sometimes produces offspring that are closer to the target seman-
tics, but sometimes it does not. On the other hand, with GSM-LS the semantics of
TMis the closest we can get from the semantics of Tto the target using the GSM
construction, only limited by the particular random semantics of TR1and TR2.
The first effect of GSM-LS is that it inherently improves the convergence speed
of the search process, which was experimentally confirmed in (Castelli et al, 2015f).
In several test cases GSGP reaches the same performance as GSGP-LS, but requires
much more iterations. This is an important difference since, as we stated above,
code-growth in GSGP is an intrinsic property of applying the GSGP operators; i.e.,
the size of the offspring is always (substantially) larger than the size of the parents.
This fact does not necessarily increase the computational cost of the search, for in-
stance by using clever implementation that exploit the nature of the search operators
(Castelli et al, 2015a). However, it does limit the possibility of extracting parsimo-
nious solutions that might be amenable to further human analysis or interpretation.
Therefore, by using GSM-LS in practice, it will be possible to reduce the number
of iterations required by the algorithm to achieve the same level of performance.
This means that the solutions can be vastly smaller (Castelli et al, 2015f). Moreover,
real-world experimental work has shown that GSGP-LS also outperforms GSGP in
overall performance on several noteworthy examples.
In (Castelli et al, 2015d), we applied GSGP-LS to the prediction of energy per-
formance of residential buildings, predicting the heading load and cooling load for
efficient power consumption. In this work, we used a hybrid algorithm, where GSM-
LS is used at the beginning of the run and GSM is used during the remainder
of the search, while also performing linear scaling of the input variables. Experi-
mental results showed that the algorithms outperformed such methods as iteratively
reweighted least squares and random forests. A similar application domain was ad-
dressed in (Castelli et al, 2015c), where GSGP-LS was used for energy consumption
forecasting. Accurate forecasting can have many benefits for electric utilities, with
errors increasing costs and reducing efficiency. In this domain, GSGP-LS outper-
formed GSGP and standard GP, with the former considered to be a state-of-the-art
method in this domain.
Then, in (Castelli et al, 2015e) we applied GSGP-LS to predict the relative posi-
tion of computerized tomography slices, an important medical application for ma-
chine learning methods. In this work, GSGP-LS was compared with GSGP, standard
GP and state-of-the-art results reported on the same problem dataset. GSGP-LS out-
performed all other methods, sometimes the difference was quite large, as much as
1 Local Search is Underused in Genetic Programming 13
22% relative to other published results. One final example we present in Castelli
et al (2015b), where GSGP-LS was used to predict the per capita violent crimes
in urban areas. In this case GSGP-LS was compared with linear regression, radial
basis function networks, isotonic regression, neural networks and support vector
machines (SVM). The only conventional algorithm that achieved equivalent perfor-
mance was SVM. These examples are meant to illustrate the benefits of integrating
LS into GSGP, with clear real-world examples of the state-of-the-art performance
that can be achieved.
1.5 Discussion
Based on the results presented and discussed in sections 1.2, 1.3.1 and 1.4, we make
the following two major conclusions, limiting our discussion to the real-valued sym-
bolic regression domain. First, integrating a numerical LS operator within a GP
search brings about several benefits, including improving convergence, improving
(or at least not reducing) performance, and substantially reducing code growth. We
would stress the importance of the last point, the reduction in solution size is maybe
the most important, if we consider the attention that bloat has received in GP lit-
erature and the potential of GP as a machine learning paradigm to generate human
interpretable solutions. Moreover, program size is reduced even more when LS is
combined with an explicit bloat control strategy. Second, the LS approach should
be seen as an additional genetic operator, and not as a post-processing step. It seems
that by subjecting the GP individuals to a numerical optimization process, the search
is able to unlock the full potential of each individual. It is common to see that small
individuals usually have a substantially lower fitness than larger ones, indeed this is
understood as one of the reasons for bloat to appear (Dignum and Poli, 2008). Our
results make this observation more nuanced, it is not small individuals but small in-
dividuals with sub-optimal parametrizations that will usually perform poorly. There-
fore, the LS operator should be seen as a way of extracting the full potential of each
GP expression, before it is kept or filtered by selection.
These conclusions seem to be supported by our experimental evidence, but we
do not feel like we have hit upon an overly hidden truth or property of the GP
search process. In fact, these observations seem to be relatively obvious and sim-
ple, particularly (as we have said) keeping to symbolic regression, the most com-
mon application domain for GP. Therefore, a question comes to mind: why are LS
strategies so uncommon in GP systems? Take for instance some of the most popu-
lar GP platforms, including for instance lilGP (Punch and Zongker, 1998), TinyGP
(Poli, 2004), DEAP (De Rainville et al, 2012), GPLab (Silva and Almeida, 2005),
ECJ (White, 2012), Open Beagle (Gagne and Parizeau, 2002), Evolving Objects
(Keijzer et al, 2002), JGAP (Chen et al, 2001) and HeuristicLab (Wagner and Af-
fenzeller, 2005). None of these software systems (to the authors knowledge, and
based on current descriptions on their respective websites) include an explicit mech-
anism for applying a memetic algorithmic framework as was discussed here, where
14 Authors Suppressed Due to Excessive Length
a greedy numerical optimizer performs parameter tuning for GP expressions. Some
of these algorithms include numerical constants, and associated mutations for these
constants to search for optimal values, or post-processing functions for solution sim-
plification and/or optimization. But even these features are quite uncommon, and are
not equivalent to the type of approach we describe here.
We speculate that several different reasons might be the causing this, some of
these reasons are practical and some are conceptual. First, it might be that integrat-
ing this functionality might be overly complex based on specific implementation de-
tails. If this is the case, we highly recommend that future versions of these or other
libraries should be made amenable to numerical LS operators. Second, it might be
assumed that integrating a LS operator might make the algorithm converge to local
optima or make the solutions overfit the training data. While this is a valid concern,
our results, in this chapter and previous publications discussed above, suggest that
this does not occur. Though we admit that further evidence should be obtained, it is
reasonable to assume that a GP system should a the very least allow for a LS opera-
tor to be included. Third, it may be that some consider the LS to be a computational
bottleneck, increasing the already long running time of GP. While we have not fully
explored this, we find that the evidence might actually point in the opposite direc-
tion. Consider that when given the same amount of fitness function calls, GP-LS
seems to outperform standard GP. If we factor in the size of the evolved solutions,
it is obvious that integrating a LS operator allows GP to evaluate much smaller, and
by definition, more efficient GP trees (when we do not include loops). Moreover, in
the case of GSGP our results suggest that no extra cost is incurred by performing the
LS process (Castelli et al, 2015f,e). Finally, we feel that it may be the case that LS
might not be used because it is expected that the evolutionary process should find
both the solution syntax and its optimal parametrization. While the first three rea-
sons are practical concerns, the final one is conceptual, regarding the nature of what
GP is expected to do. We believe that the design of GP algorithms should exploit all
of the tools at our disposal to search for optimal models in learning problems, and
that greedy LS operators, particularly those from the well-established mathematical
optimization community, should be considered to be an integral part of GP search.
1.6 Conclusions and Future Work
The first major conclusion to draw from this chapter, including the experimental
evidence and related literature analysis, is that integrating a numerical LS opera-
tor helps to substantially improve the performance of a GP symbolic regression,
based on performance, convergence, and more notably program size. The second
conclusion is that numerical LS and memetic search is seldom integrated in most
GP systems. Numerical LS optimizers should be considered an important part of
any GP-based search, allowing the search process to fully evaluate the usefulness of
a particular GP tree before discarding it, since it very well may be that a low fitness
value is due to a suboptimal parametrization of the solution.
1 Local Search is Underused in Genetic Programming 15
Going forward, we identify several areas of opportunity for the GP community.
The fact that memetic approaches have not been fully explored in GP literature,
opens up several areas future lines of inquiry. For instance, to determine what is
the best memetic strategy for GP, Lamarckian or Baldwinian, a choice that might
be domain dependent. Another topic is to study the effect of using different LS
optimization algorithms. Numerical optimizers are well suited for real-valued sym-
bolic regression, but this approach might not generalize to other domains. Moreover,
while we are assuming that the GP tree is always a non-linear expression, this may
not always be the case. Therefore, other numerical optimization methods should be
evaluated, based on the application domain.
The combination of both syntactic and numerical LS should also be the subject
of future work, allowing us to fully exploit the local neighborhood around each so-
lution. Moreover, while we believe that computational cost is not an issue relative
to standard GP, it would still be advantageous to reduce any associated costs of the
LS operator. One way, of course, is to use very efficient LS techniques or efficient
implementations of these algorithms. Another possibility is to explore the develop-
ment of surrogate models of LS optimization. The most important effect of the LS
process is the change in relative rank for the individuals. It may be possible to derive
predictive models that allow us to determine the expected effect that the LS process
will have on a particular program.
Acknowledgements Funding for this work was provided by CONACYT Basic Science Research
Project No. 178323, TecNM (M´
exico) Research Projects 5414.14-P and 5621.15-P, and by the FP7-
Marie Curie-IRSES 2013 European Commission program through project ACoBSEC with contract
No. 612689. Second, third and ninth author supported by CONACYT (M´
exico) scholarships No.
294213, No. 332554 and No. 302526.
References
Azad R, Ryan C (2014) A simple approach to lifetime learning in genetic
programming-based symbolic regression. Evolutionary computation 22(2):287–
317
Castelli M, Silva S, Vanneschi L (2015a) A c++ framework for geometric semantic
genetic programming. Genetic Programming and Evolvable Machines 16(1):73–
81
Castelli M, Sormani R, Trujillo L, Popoviˇ
c A (2015b) Predicting per capita vio-
lent crimes in urban areas: an artificial intelligence approach. Journal of Ambient
Intelligence and Humanized Computing pp 1–8
Castelli M, Trujillo L, Vanneschi L (2015c) Energy consumption forecasting using
semantic-based genetic programming with local search optimizer. Intell Neuro-
science 2015
Castelli M, Trujillo L, Vanneschi L, Popoviˇ
c A (2015d) Prediction of energy per-
formance of residential buildings: A genetic programming approach. Energy and
16 Authors Suppressed Due to Excessive Length
Buildings 102:67–74
Castelli M, Trujillo L, Vanneschi L, Popoviˇ
c A (2015e) Prediction of relative posi-
tion of CT slices using a computational intelligence system. Applied Soft Com-
puting
Castelli M, Trujillo L, Vanneschi L, Silva S, Z-Flores E, Legrand P (2015f) Geomet-
ric semantic genetic programming with local search. In: Proceedings of the 2015
Annual Conference on Genetic and Evolutionary Computation, ACM, GECCO
’15, pp 999–1006
Chen DY, Chuang TR, Tsai SC (2001) Jgap: A java-based graph algorithms plat-
form. Softw Pract Exper 31(7):615–635
Chen X, Ong YS, Lim MH, Tan KC (2011) A multi-facet survey on memetic com-
putation. Evolutionary Computation, IEEE Transactions on 15(5):591–607
De Rainville FM, Fortin FA, Gardner MA, Parizeau M, Gagn´
e C (2012) Deap: A
python framework for evolutionary algorithms. In: Proceedings of the 14th An-
nual Conference Companion on Genetic and Evolutionary Computation, ACM,
GECCO ’12, pp 85–92
Dignum S, Poli R (2008) Operator Equalisation and Bloat Free GP, Springer Berlin
Heidelberg, chap Genetic Programming: 11th European Conference, EuroGP
2008. Proceedings, pp 110–121
Emmerich M, Gr¨
otzner M, Sch¨
utz M (2001) Design of graph-based evolutionary
algorithms: A case study for chemical process networks. Evol Comput 9(3):329–
354
Gagne C, Parizeau M (2002) Open beagle: A new versatile c++ framework for evo-
lutionary computations. In: In: Late-Breaking Papers of the 2002 Genetic and
Evolutionary Computation Conference (GECCO 2002, pp 161–168
Graff M, Pea R, Medina A (2013) Wind speed forecasting using genetic program-
ming. In: 2013 IEEE Congress on Evolutionary Computation, pp 408–415
Keijzer M, Merelo JJ, Romero G, Schoenauer M (2002) Artificial Evolution: 5th In-
ternational Conference, Evolution Artificielle, Springer Berlin Heidelberg, chap
Evolving Objects: A General Purpose Evolutionary Computation Library, pp
231–242
Kommenda M, Kronberger G, Winkler S, Affenzeller M, Wagner S (2013) Effects
of constant optimization by nonlinear least squares minimization in symbolic re-
gression. In: Proceedings of the 15th Annual Conference Companion on Genetic
and Evolutionary Computation, ACM, GECCO ’13 Companion, pp 1121–1128
Koza JR (1992) Genetic Programming: On the Programming of Computers by
Means of Natural Selection. MIT Press
Koza JR (2010) Human-competitive results produced by genetic programming. Ge-
netic Programming and Evolvable Machines 11(3-4):251–284
Langdon WB, Poli R (2002) Foundations of Genetic Programming. Springer-Verlag
Lara A, Sanchez G, Coello CAC, Sch¨
utze O (2010) Hcs: A new local search strat-
egy for memetic multiobjective evolutionary algorithms. IEEE Transactions on
Evolutionary Computation 14(1):112–132
1 Local Search is Underused in Genetic Programming 17
Lohmann R (1991) Application of evolution strategy in parallel populations. In: Pro-
ceedings of the 1st Workshop on Parallel Problem Solving from Nature, Springer-
Verlag, pp 198–208
McConaghy T (2011) FFX: Fast, Scalable, Deterministic Symbolic Regression
Technology, Springer New York, chap Genetic Programming Theory and Prac-
tice IX, pp 235–260
Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic program-
ming. In: Proceedings of the 12th international conference on Parallel Problem
Solving from Nature - Volume Part I, Springer-Verlag, pp 21–31
Neri F, Cotta C, Moscato P (eds) (2012) Handbook of Memetic Algorithms, Studies
in Computational Intelligence, vol 379. Springer
Olague G, Trujillo L (2011) Evolutionary-computer-assisted design of image opera-
tors that detect interest points using genetic programming. Image Vision Comput
29(7):484–498
Ortigosa I, L´
opez R, Garc´
ıa J (2007) A neural networks approach to residuary re-
sistance of sailing yachts prediction. Proceedings of the international conference
on marine engineering MARINE 2007:250
Poli R (2004) TinyGP. see Genetic and Evolutionary
Computation Conference (GECCO-2004) competition at
http://cswww.essex.ac.uk/staff/sml/gecco/TinyGP.html
Poli R, McPhee NF (2003a) General schema theory for genetic programming with
subtree-swapping crossover: Part i. Evol Comput 11(1):53–66
Poli R, McPhee NF (2003b) General schema theory for genetic programming with
subtree-swapping crossover: Part ii. Evol Comput 11(2):169–206
Poli R, McPhee NF (2008) Parsimony pressure made easy. In: Proceedings of
the 10th Annual Conference on Genetic and Evolutionary Computation, ACM,
GECCO ’08, pp 1267–1274
Punch B, Zongker D (1998) lil-gp 1.1. A genetic programming system. Available at
http://garage.cse.msu.edu/software/lil-gp/
Quinlan JR (1993) Combining instance-based and model-based learning. Machine
Learning 76:236–243
Silva S (2011) Reassembling operator equalisation: A secret revealed. In: Proceed-
ings of the 13th Annual Conference on Genetic and Evolutionary Computation,
ACM, GECCO ’11, pp 1395–1402
Silva S, Almeida J (2005) Gplab-a genetic programming toolbox for matlab. In: In
Proc. of the Nordic MATLAB Conference (NMC-2003), pp 273–278
Silva S, Costa E (2009) Dynamic limits for bloat control in genetic programming
and a review of past and current bloat theories. Genetic Programming and Evolv-
able Machines 10(2):141–179
Silva S, Vanneschi L (2011) The Importance of Being Flat–Studying the Program
Length Distributions of Operator Equalisation, Springer New York, chap Genetic
Programming Theory and Practice IX, pp 211–233
Silva S, Dignum S, Vanneschi L (2012) Operator equalisation for bloat free genetic
programming and a survey of bloat control methods. Genetic Programming and
Evolvable Machines 13(2):197–238
18 Authors Suppressed Due to Excessive Length
Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting
topologies. Evolutionary Computation 10(2):99–127
Topchy A, Punch WF (2001) Faster genetic programming based on local gradient
search of. In: Proceedings of the Genetic and Evolutionary Computation Confer-
ence GECCO’01, Morgan Kaufmann, pp 155–162
Trujillo L, Legrand P, Olague G, L´
eVy-V´
eHel J (2012) Evolving estimators of the
pointwise h¨
older exponent with genetic programming. Inf Sci 209:61–79
Trujillo L, Mu˜
noz L, Naredo E, Mart´
ınez Y (2014) NEAT, Theres No Bloat, Lecture
Notes in Computer Science, vol 8599, Springer Berlin Heidelberg, pp 174–185
Trujillo L, Mu˜
noz L, Galv´
an-L´
opez E, Silva S (2016) neat genetic programming:
Controlling bloat naturally. Information Sciences 333:21 – 43
Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance
of residential buildings using statistical machine learning tools. Energy and Build-
ings 49:560–567
Vanneschi L, Castelli M, Silva S (2014) A survey of semantic methods in genetic
programming. Genetic Programming and Evolvable Machines 15(2):195–214
Wagner S, Affenzeller M (2005) Adaptive and Natural Computing Algorithms: Pro-
ceedings of the International Conference in Coimbra, Portugal, 2005, Springer
Vienna, chap HeuristicLab: A Generic and Extensible Optimization Environment,
pp 538–541
White DR (2012) Software review: the ecj toolkit. Genetic Programming and Evolv-
able Machines 13(1):65–67
Worm T, Chiu K (2013) Prioritized grammar enumeration: Symbolic regression by
dynamic programming. In: Proceedings of the 15th Annual Conference on Ge-
netic and Evolutionary Computation, ACM, GECCO ’13, pp 1021–1028
Z-Flores E, Trujillo L, Sch¨
utze O, Legrand P (2014) Evaluating the Effects of
Local Search in Genetic Programming, Springer International Publishing, chap
EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolu-
tionary Computation V, pp 213–228
Z-Flores E, Trujillo L, Sch¨
utze O, Legrand P (2015) A local search approach to ge-
netic programming for binary classification. In: Proceedings of the 2015 Annual
Conference on Genetic and Evolutionary Computation, ACM, GECCO ’15, pp
1151–1158
Zhang M, Smart W (2004) Genetic Programming with Gradient Descent Search
for Multiclass Object Classification, Springer Berlin Heidelberg, chap Genetic
Programming: 7th European Conference, EuroGP 2004, Coimbra, Portugal, April
5-7, 2004. Proceedings, pp 399–408
... There are two limitations of existing GP-based symbolic regression methods [31]. Firstly, GP tends to evolve unnecessarily large programs, which is known as bloat. ...
... Secondly, the success of the so-called memetic algorithms (MAs 1 ) has proved the importance of local search in augmenting the global search of evolutionary algorithms (EAs) [24]. As one of EAs, local search is underused in GP (especially multi-objective GP) to further improve its search ability [31]. ...
... However, it is well-known that GP tends to evolve unnecessarily large solutions, what is called as bloat. Moreover, local search that can further improve the search ability of GP is underused in the existing works [31]. ...
Article
Full-text available
Symbolic regression is to search the space of mathematical expressions to find a model that best fits a given dataset. As genetic programming (GP) with the tree representation can represent solutions as expression trees, it is popularly-used for regression. However, GP tends to evolve unnecessarily large programs (known as bloat), causing excessive use of CPU time/memory and evolving solutions with poor generalization ability. Moreover, even though the importance of local search has been proved in augmenting the search ability of GP (termed as memetic algorithms), local search is underused in GP-based methods. This work aims to handle the above problems simultaneously. To control bloat, a multi-objective (MO) technique (NSGA-II, Non-dominant Sorting Genetic Algorithm) is selected to incorporate with GP, forming a multi-objective GP (MOGP). Moreover, three mutation-based local search operators are designed and incorporated with MOGP respectively to form three multi-objective memetic algorithms (MOMA), i.e. MOMA_MR (MOMA with Mutation-based Random search), MOMA_MF (MOMA with Mutation-based Function search) and MOMA_MC (MOMA with Mutation-based Constant search). The proposed methods are tested on both benchmark functions and real-world applications, and are compared with both GP-based (i.e. GP and MOGP) and nonGP-based symbolic regression methods. Compared with GP-based methods, the proposed methods can reduce the risk of bloat with the evolved solutions significantly smaller than GP solutions, and the local search strategies introduced in the proposed methods can improve their search ability with the evolved solutions dominating MOGP solutions. In addition, among the three proposed methods, MOMA_MR performs best in RMSE for testing, yet it consumes more training time than others. Moreover, compared with six reference nonGP-based symbolic regression methods, MOMA_MR generally performs better than or similar to them consistently.
... Nonetheless, GP was able to successfully solve problems in different domains [13]. Despite that, the impossibility to optimize the parameters of the model translates into significant limitations, such as search stagnation, bloat [19] and solutions that are poorly understandable [14,6]. This is mostly due to the fact that GP performs a highly-exploratory search, characterized by large fitness changes when a modest syntactic modification occurs and vice-versa [6]. ...
... Considering the particular case of GP, it is possible to distinguish two main methods for applying a local search (LS) strategy: apply LS either on the syntax or on the numerical parameters of the program [14,6]. ...
... The use of LS in GP for symbolic regression was also proposed in [14], where the authors integrated a LS optimizer as an additional search operator. The results showed that the use of the LS operator helps improving the convergence and performance of tree-based GP, while reducing the size (i.e., the number of nodes) of the trees. ...
Chapter
Full-text available
In this paper we continue the investigation of the effect of local search in geometric semantic genetic programming (GSGP), with the introduction of a new general local search operator that can be easily customized. We show that it is able to obtain results on par with the current best-performing GSGP with local search and, in most cases, better than standard GSGP.
... Very recently, Trujillo et al. [46] have argued that local search is necessary to allow GP to reach its full potential; these authors also note that local search seems comparatively little utilized by the GP community. ...
... This reinforces the approach of applying local search to the whole of the final population of the global search algorithm rather then just the best-performing individual produced by the global algorithm. Recently, Trujillo et al. [46] have made a similar observation for local search in the context of single-objective GP. More generally, the same sort of phenomenon has been previously seen in decision trees, which are typically trained to overfitting and then heuristically pruned to improve generalization [42]. ...
... The work described in the paper has been deliberately constrained to local search methods that change the 'shapes' of GP trees by altering sub-trees. Successful local search has also been reported using methods that introduce additional 'tuning' parameters into the tree nodes-for example, [46]. An obvious area for future work is a quantitative comparison between these different approaches to local search, or indeed possible hybridization between them. ...
Article
Full-text available
We report a series of experiments that use semantic-based local search within a multiobjective genetic programming (GP) framework. We compare various ways of selecting target subtrees for local search as well as different methods for performing that search; we have also made comparison with the random desired operator of Pawlak et al. using statistical hypothesis testing. We find that a standard steady state or generational GP followed by a carefully-designed single-objective GP implementing semantic-based local search produces models that are mode accurate and with statistically smaller (or equal) tree size than those generated by the corresponding baseline GP algorithms. The depth fair selection strategy of Ito et al. is found to perform best compared with other subtree selection methods in the model refinement.
... The general area of combining EC methods with local search is referred to as memetic algorithms, and the idea of combining local search mechanisms specifically with GP has been looked at in [12]. ...
Preprint
Full-text available
Image classification is an essential task in computer vision, which aims to categorise a set of images into different groups based on some visual criteria. Existing methods, such as convolutional neural networks, have been successfully utilised to perform image classification. However, such methods often require human intervention to design a model. Furthermore, such models are difficult to interpret and it is challenging to analyse the patterns of different classes. This paper presents a hybrid (memetic) approach combining genetic programming (GP) and Gradient-based optimisation for image classification to overcome the limitations mentioned. The performance of the proposed method is compared to a baseline version (without local search) on four binary classification image datasets to provide an insight into the usefulness of local search mechanisms for enhancing the performance of GP.
Article
We present an approach for regression problems that employs analytic continued fractions as a novel representation. Comparative computational results using a memetic algorithm are reported in this work. Our experiments included fifteen other different machine learning approaches including five genetic programming methods for symbolic regression and ten machine learning methods. The comparison on training and test generalization was performed using 94 datasets of the Penn State Machine Learning Benchmark. The statistical tests showed that the generalization results using analytic continued fractions provide a powerful and interesting new alternative in the quest for compact and interpretable mathematical models for artificial intelligence.
Conference Paper
This research aims to address the practical difficulties of computational heuristics for symbolic regression, which models data with algebraic expressions. In particular we are motivated by cases in which the target unknown function may be best represented as the ratio of functions. We propose an alternative general approach based on a different representation of mathematical models with an analytic continued fraction representation, from which rational function models can be extracted. A memetic algorithm, which is a paradigm of meta-heuristic optimization based on the evolution of solutions by a set of computational agents, is implemented to generate solutions in this representation. A population of computational agents with problem domain knowledge improves feasible solutions using local search heuristics and produces models that fit the data better. In addition, the agents compete in searching for function models with fewer number of variables. Agent interactions are constrained by a population structure which has been previously used in several successful MAs for other combinatorial optimization problems. We utilize a tree-based population structure to improve the algorithm’s consistency and performance. Data from real-world applications are used to measure the potential of our approach and benchmark its performance against other approaches in symbolic regression.
Chapter
In most genetic programming systems, candidate solution programs themselves serve as genome upon which variation operators act. However, because of the hierarchical structure of computer programs and the syntactic constraints that they must obey, it is difficult to implement variation operators that affect different parts of programs with uniform probability. This lack of uniformity can have detrimental effects on evolutionary search, such as increases in code bloat. In prior work, structured programs were linearized prior to variation in order to facilitate uniform variation. However, this necessitated syntactic repair after variation, which reintroduced non-uniformities. In this chapter we describe a new approach that uses linear genomes that are translated into hierarchical programs for execution. We present the new approach in detail and show how it facilitates both uniform variation and the evolution of programs with meaningful structure.
Chapter
Full-text available
The recent Crossover Bias theory has shown that bloat in Genetic Programming can be caused by the proliferation of small unfit individuals in the population. Inspired by this theory, Operator Equalisation is the most recent and successful bloat control method available. In a recent work there has been an attempt to replicate the evolutionary dynamics of Operator Equalisation by joining two key ingredients found in older and newer bloat control methods. However, the obtained dynamics was very different from expected, which prompted a further investigation into the reasons that make Operator Equalisation so successful. It was revealed that, at least for complex symbolic regression problems, the distribution of program lengths enforced by Operator Equalisation is nearly flat, contrasting with the peaky and well delimited distributions of the other approaches. In this work we study the importance of having flat program length distributions for bloat control. We measure the flatness of the distributions found in previous and new Operator Equalisation variants and we correlate it with the amount of search performed by each approach. We also analyze where this search occurs and how bloat correlates to these properties. We conclude presenting a possible explanation for the unique behavior of Operator Equalisation.
Article
Full-text available
A major challenge facing all law-enforcement organizations is to accurately and efficiently analyze the growing volumes of crime data in order to extract useful knowledge for decision makers. This is an increasingly important task, considering the fast growth of urban populations in most countries. In particular, to reconcile urban growth with the need for security, a fundamental goal is to optimize the allocation of law enforcement resources. Moreover, optimal allocation can only be achieved if we can predict the incidence of crime within different urban areas. To answer this call, in this paper we propose an artificial intelligence system for predicting per capita violent crimes in urban areas starting from socio-economic data, law-enforcement data and other crime-related data obtained from different sources. The proposed framework blends a recently developed version of genetic programming that uses the concept of semantics during the search process with a local search method. To analyze the appropriateness of the proposed computational method for crime prediction, different urban areas of the United States have been considered. Experimental results confirm the suitability of the proposed method for addressing the problem at hand. In particular, the proposed method produces a lower error with respect to the existing state-of-the art techniques and it is particularly suitable for analyzing large amounts of data. This is an extremely important feature in a world that is currently moving towards the development of smart cities.
Article
Full-text available
One of the most common techniques in radiology is the computerized tomography (CT) scan. Automatically determining the relative position of a single CT slice within the human body can be very useful. It can allow for an efficient retrieval of slices from the same body region taken in other volume scans and provide useful information to the non-expert user. This work addresses the problem of determining which portion of the body is shown by a stack of axial CT image slices. To tackle this problem, this work proposes a computational intelligence system that combines semantics-based operators for Genetic Programming with a local search algorithm, coupling the exploration ability of the former with the exploitation ability of the latter. This allows the search process to quickly converge towards (near-)optimal solutions. Experimental results, using a large database of CT images, have confirmed the suitability of the proposed system for the prediction of the relative position of a CT slice. In particular, the new method achieves a median localization error of 3.4 cm on unseen data, outperforming standard Genetic Programming and other techniques that have been applied to the same dataset. In summary, this paper makes two contributions: (i) in the radiology domain, the proposed system outperforms current state-of-the-art techniques; (ii) from the computational intelligence perspective, the results show that including a local searcher in Geometric Semantic Genetic Programming can speed up convergence without degrading test performance.
Conference Paper
Full-text available
Since its introduction, Geometric Semantic Genetic Programming (GSGP) has aroused the interest of numerous researchers and several studies have demonstrated that GSGP is able to effectively optimize training data by means of small variation steps, that also have the effect of limiting overfitting. In order to speed up the search process, in this paper we propose a system that integrates a local search strategy into GSGP (called GSGP-LS). Furthermore, we present a hybrid approach, that combines GSGP and GSGP-LS, aimed at exploiting both the optimization speed of GSGP-LS and the ability to limit overfitting of GSGP. The experimental results we present, performed on a set of complex real-life applications, show that GSGP-LS achieves the best training fitness while converging very quickly, but severely overfits. On the other hand, GSGP converges slowly relative to the other methods, but is basically not affected by overfitting. The best overall results were achieved with the hybrid approach, allowing the search to converge quickly, while also exhibiting a noteworthy ability to limit overfitting. These results are encouraging, and suggest that future GSGP algorithms should focus on finding the correct balance between the greedy optimization of a local search strategy and the more robust geometric semantic operators.
Conference Paper
Full-text available
In standard genetic programming (GP), a search is performed over a syntax space defined by the set of primitives, looking for the best expressions that minimize a cost function based on a training set. However, most GP systems lack a numerical optimization method to fine tune the implicit parameters of each candidate solution. Instead, GP relies on more exploratory search operators at the syntax level. This work proposes a memetic GP, tailored for binary classification problems. In the proposed method, each node in a GP tree is weighted by a real-valued parameter, which is then numerically optimized using a continuous transfer function and the Trust Region algorithm is used as a local search method. Experimental results show that potential classifiers produced by GP are improved by the local searcher, and hence the overall search is improved achieving significant performance gains, that are competitive with state-of-the-art methods on well-known benchmarks.
Article
Full-text available
Energy consumption forecasting (ECF) is an important policy issue in today’s economies. An accurate ECF has great benefits for electric utilities and both negative and positive errors lead to increased operating costs. The paper proposes a semantic based genetic programming framework to address the ECF problem. In particular, we propose a system that finds (quasi-)perfect solutions with high probability and that generates models able to produce near optimal predictions also on unseen data. The framework blends a recently developed version of genetic programming that integrates semantic genetic operators with a local search method. The main idea in combining semantic genetic programming and a local searcher is to couple the exploration ability of the former with the exploitation ability of the latter. Experimental results confirm the suitability of the proposed method in predicting the energy consumption. In particular, the system produces a lower error with respect to the existing state-of-the art techniques used on the same dataset. More importantly, this case study has shown that including a local searcher in the geometric semantic genetic programming system can speed up the search process and can result in fitter models that are able to produce an accurate forecasting also on unseen data.
Article
Full-text available
Energy consumption has long been emphasized as an important policy issue in today's economies. In particular, the energy efficiency of residential buildings is considered a top priority of a country's energy policy. The paper proposes a genetic programming-based framework for estimating the energy performance of residential buildings. The objective is to build a model able to predict the heating load and the cooling load of residential buildings. An accurate prediction of these parameters facilitates a better control of energy consumption and, moreover, it helps choosing the energy supplier that better fits the energy needs, which is considered an important issue in the deregulated energy market. The proposed framework blends a recently developed version of genetic programming with a local search method and linear scaling. The resulting system enables us to build a model that produces an accurate estimation of both considered parameters. Extensive simulations on 768 diverse residential buildings confirm the suitability of the proposed method in predicting heating load and cooling load. In particular, the proposed method is more accurate than the existing state-of-the art techniques.
Article
Full-text available
Geometric semantic operators are new and promising genetic operators for genetic programming. They have the property of inducing a unimodal error surface for any supervised learning problem, i.e., any problem consisting in finding the match between a set of input data and known target values (like regression and classification). Thanks to an efficient implementation of these operators, it was possible to apply them to a set of real-life problems, obtaining very encouraging results. We have now made this implementation publicly available as open source software, and here we describe how to use it. We also reveal details of the implementation and perform an investigation of its efficiency in terms of running time and memory occupation, both theoretically and experimentally. The source code and documentation are available for download at http://gsgp.sourceforge.net.
Conference Paper
Full-text available
Genetic programming (GP) is an evolutionary computation paradigm for the automatic induction of syntactic expressions. In general, GP performs an evolutionary search within the space of possible program syntaxes, for the expression that best solves a given problem. The most common application domain for GP is symbolic regression, where the goal is to find the syntactic expression that best fits a given set of training data. However, canonical GP only employs a syntactic search, thus it is intrinsically unable to efficiently adjust the (implicit) parameters of a particular expression. This work studies a Lamarckian memetic GP, that incorporates a local search (LS) strategy to refine GP individuals expressed as syntax trees. In particular, a simple parametrization for GP trees is proposed, and different heuristic methods are tested to determine which individuals should be subject to a LS, tested over several benchmark and real-world problems. The experimental results provide necessary insights in this insufficiently studied aspect of GP, suggesting promising directions for future work aimed at developing new memetic GP systems.
Article
Bloat is one of the most widely studied phenomena in Genetic Programming (GP), it is normally defined as the increase in mean program size without a corresponding improvement in fitness. Several theories have been proposed in the specialized GP literature that explain why bloat occurs. In particular, the Crossover-Bias Theory states that the cause of bloat is that the distribution of program sizes during evolution is skewed in a way that encourages bloat to appear, by punishing small individuals and favoring larger ones. Therefore, several bloat control methods have been proposed that attempt to explicitly control the size distribution of programs within the evolving population. This work proposes a new bloat control method called neat-GP, that implicitly shapes the program size distribution during a GP run. neat-GP is based on two key elements: (a) the NeuroEvolution of Augmenting Topologies algorithm (NEAT), a robust heuristic that was originally developed to evolve neural networks; and (b) the Flat Operator Equalization bloat control method, that explicitly shapes the program size distributions toward a uniform or flat shape. Experimental results are encouraging in two domains, symbolic regression and classification of real-world data. neat-GP can curtail the effects of bloat without sacrificing performance, outperforming both standard GP and the Flat-OE method, without incurring in the computational overhead reported by some state-of-the-art bloat control methods.