Conference PaperPDF Available

Evolutionary tree genetic programming

Authors:

Abstract

We introduce a clustering-based method of subpopulation management in genetic programming (GP) called Evolutionary Tree Genetic Programming (ETGP). The biological motivation behind this work is the observation that the natural evolution follows a tree-like phylogenetic pattern. Our goal is to simulate similar behavior in artificial evolutionary systems such as GP. To test our model we use three common GP benchmarks: the Ant Algorithm, 11-Multiplexer, and Parity problems.The performance of the ETGP system is empirically compared to those of the GP system. Code size and variance are consistently reduced by a small but statistically significant percentage, resulting in a slight speedup in the Ant and 11-Multiplexer problems, while the same comparisons on the Parity problem are inconclusive.
Evolutionary Tree Genetic
Programming
J´
an Antol´
ık
Department of Software Engineering
Charles University
Malostransk´
e n´
am. 25
118 00 Praha 1, Czech Republic
jan.antolik@matfyz.cz
William H. Hsu
Department of Computing and Information
Kansas State University
234 Nichols Hall
Manhattan, Kansas 66506-2302
bhsu@cis.ksu.edu
ABSTRACT
We introduce a clustering-based method of subpopulation
management in genetic programming (GP) called Evolution-
ary Tree Genetic Programming (ETGP). The biological mo-
tivation behind this work is the observation that the natural
evolution follows a tree-like phylogenetic pattern. Our goal
is to simulate similar behavior in artificial evolutionary sys-
tems such as GP. To test our model we use three common
GP benchmarks: the Ant Algorithm, 11-Multiplexer, and
Parity problems. The performance of the ETGP system is
empirically compared to those of the GP system. Code size
and variance are consistently reduced by a small but statis-
tically significant percentage, resulting in a slight speedup
in the Ant and 11-Multiplexer problems, while the same
comparisons on the Parity problem are inconclusive.
Categories and Subject Descriptors: I.2.3 [Comput-
ing Methodologies]: Artificial Intelligence Problem Solving,
Control Methods, and Search
General Terms: Algorithms
Keywords: genetic programming
1. INTRODUCTION
In the field of evolutionary computation, the problem of
managing subpopulations in order to improve the conver-
gence efficiency of a genetic algorithm (GA) or genetic pro-
gramming (GP) system has proven challenging. Work on
this problem has led to some research on biologically plausi-
ble models of speciation. This paper describes an approach
using a clustering method inspired by evolutionary biology
to reorganize subpopulations in genetic programming, with
the goal of producing more highly fit individuals within a
set number of fitness evaluations.
2. MOTIVATION
An interesting observation of natural evolution is that it
proceeds in a tree-like pattern. Scientists believe that at
the beginning of the life on the planet Earth only single
extremely simple life form existed. The members of this
life form gradually evolved until a point when two distinct
species have been segregated. During the millennia of ter-
Copyright is held by the author/owner.
GECCO’05, June 25–29, 2005, Washington, DC, USA.
ACM 1-59593-010-8/05/0006.
restrial evolution the same process was repeated in all the
once-existing species, leading to creation of a structure that
is known as the Tree of Life. This work represents an at-
tempt to simulate this process in very simplified manner.
It is motivated by the hypothesis that modeling an phylo-
genetically well-structured evolutionary process can increase
efficiency of speciation and adaptation in a GP system. Sev-
eral other existing EA systems were already motivated by
some biological foundations of species formation [2] such as
adaptive landscape and shifting balance theory or a more
recent theory of punctuated equilibria.
3. EVOLUTIONARYTREEGENETICPRO-
GRAMMING
3.1 Basic algorithm
Our system consists of a population Pthat is separated
into number of subpopulations Si. The ETGP algorithm
starts with a single large subpopulation. After each cf gen-
erations, where cf is parameter that we will refer to as the
clustering frequency, we divide each subpopulation into a
new set of subpopulations. The sizes of the subpopulations
are constrained to be proportional to the average fitness of
the individuals they contain.
The division of subpopulation Sproceeds in two steps.
At first we determine whether the given subpopulation Sis
large enough to undertake division. If the result of this deci-
sion is positive, we cluster the subpopulation Saccording to
some metric. We then insert each of these clusters into the
new population as new subpopulation instead of the original
subpopulation S. Reader can find a detailed description of
this process in [1].
As the clustering method for the ETGP algorithm, we
have decided to use the Hierarchical Agglomerative Cluster-
ing (HAC). We use the minimum variance as the agglomer-
ative clustering criterion. There are some specific modifica-
tions that we have introduced into the original HAC algo-
rithm. In summary, we present the final modified clustering
algorithm:
find clusters A and B such
that the following property holds:
|A| ≥ |B| ∧ ∀xS: (|x| ≤ |B| ∨ xA)
if (|A|+|B| ≥ αPxS|x|)(|A|
|B|> β)
1789
then
let U:= Aand V:= B
For each cluster CSexcept cluster A and B do
if dist(A, C)dist(B , C)then U:= UCelse
V:= VC
return (U,V)
else continue with the next level of clustering
where Srefers to the current set of clusters, parameter α
was set to 0.7, and the parameter βwas set to 0.4 in all the
experiments.
4. RESULTS
First we would like to very briefly discuss the methodol-
ogy we have used in our experiments. We have conducted a
large number of different tests. In order to make the situa-
tion more comprehensible, we have divided them into three
series. Because of the scope of this paper only the first two
will be discussed. All the tests were repeated 100 times and
the averages over these 100 runs are reported. Everything
was implemented as an extension of Evolutionary Compu-
tation in Java (ECJ) package by Luke [3]. As benchmark
problems the 11-multiplexer,Artificial Ant (Santa Fe
Trail) and the Odd 11-Parity problems were selected.
4.1 First experiment series
The goal of the first series of experiments was to find the
optimal values of the branch factor and clustering frequency
of the ETGP method. We have conducted two experiments.
In both of them, one of these parameters was changed while
the other was fixed to a predetermined value. Therefore, the
interrelationship of these two parameters was not explored.
Also we have used only the 11-multiplexer and Artificial
Ant in these tests. Following table summarizes the best
values of these parameters for both benchmark problems
selected in these experiments:
Clustering frequency Branch factor
Artificial Ant 3, 4 0.2, 0.25
11-multiplexer 7, 10 0.2, 0.3, 0.4
Selected value 7 0.2
Table 1: Best performance values
With respect to this table we have decided that a reason-
able compromise will be to use a value of 0.2 as the Branch
factor and 7 as a value of the Clustering Frequency in all
subsequent experiments.
4.2 Second experiment series
The second series of experiments finally provides us with a
comparison of the ETGP algorithm with the basic version of
the GP system. In these tests we have also switched on the
two additional features of the ETGP algorithm that are not
present in standard GP systems - the mutation and depth
controll. We did not mention the second modification of
ETGP algorithm so far. The reason is, that the motivation
for introducing this parameter follows from constructive ar-
gument that did not fit into our motivation section because
the scope of this paper. Anyways, let us briefly describe
what the depth control modification does. There are two
parameters related to this extension. The first one Starting
depth Sdefines what is the maximal allowed depth of any
GP tree in the population in the beginning of the evolution.
The second one called End depth Edefines the maximal
depth of GP trees in the population in the end of evolution.
The actual maximal allowed depth min generation gcan
be computed from these two parameters in following way
m=S+ (ES)g
numGen where numGen stands for the
maximal number of generation in the given run.
In our experiments we have explored also the relationship
between these two parameters. The mutation ratio was ex-
plored in the range of [0.05 - 0.4] with step 0.05. For the
depth growth control the Start depth parameter is varied in
the range [3 - 17] with step of 2. Because of the scope of
this paper we summarize the results of these experiments in
Table 2. Let us note that the table reports only the best
results with respect to the two varied parameters.
ETGP GP
Best hits Best hits Speedup
MR SD
Artificial Ant 74.5 71.01 +4.91%
0.35 7
11-multiplexer 1996.06 1960.5 +1.81%
0.25 7
11-parity 559.11 562.18 -0.5%
0.05 7
Table 2: This table summarizes the performance
of the ETGP and standard GP algorithm over the
three benchmark problems. The results for the
ETGP algorithm are those achieved with the best
combination of Start Depth (SD) and Mutation Ratio
(MR) parameters.
5. CONCLUSIONS
The focus of this work was to explore the possibility to
simulate evolution of species as we can observe it in nature.
We have extended the basic GP algorithm by incorporating
a dynamic formation of species. The goal was to build a sys-
tem in which the species evolution form a evolutionary tree
just as in natural evolution. As the mechanism of separation
of new species we have employed a clustering technique that
divides the given subpopulation into two new one, according
to the genetic similarity of the individuals. The comparison
with the standard GP algorithm was performed. As the
previous section indicates, the speedup in the convergence
curve, while statistically significant and robust, is minor. A
third experiment series that provided us with some insights
into the dynamics of species formation in ETGP system is
omitted for brevity. The overall conclusion following from
these experiments is that while the system does successfully
use clustering to manage subpopulations, it does not fully
simulate the natural properties of ’tree-like’ evolution.
6. REFERENCES
[1] J. Antol´ık. Evolutionary tree genetic programming. Master’s
thesis, Computing and Information Sciences department,
Kansas State University, Apr. 2004.
[2] S. Baluja. A massively distributed parallel genetic algorithm,
1992.
[3] S. Luke. Evolutionary computation in java.
http://www.cs.umd.edu/projects/plus/ec/ecj/, 2001.
1790
... An approach using a clustering method was described [12] to reorganize subpopulations in GP, with the goal of producing more highly fit individuals. The initial population P is divided into number of subpopulations S i after a nominated clustering frequency and according to the genetic similarity of the individuals. ...
Article
Full-text available
The genetic programming (GP) paradigm, which applies the Darwinian principle of evolution to hierarchical computer programs, has been applied with breakthrough success in various scientific and engineering applications. However, one of the main drawbacks of GP has been the often large amount of computational effort required to solve complex problems. Much disparate research has been conducted over the past 25 years to devise innovative methods to improve the efficiency and performance of GP. This paper attempts to provide a comprehensive overview of this work related to Canonical Genetic Programming based on parse trees and originally championed by Koza (Genetic programming: on the programming of computers by means of natural selection. MIT, Cambridge, 1992). Existing approaches that address various techniques for performance improvement are identified and discussed with the aim to classify them into logical categories that may assist with advancing further research in this area. Finally, possible future trends in this discipline and some of the open areas of research are also addressed.
... Genetic algorithm is an evolutionary algorithm that uses the principles of evolution and natural selection to solve hard problems. Genetic algorithm applies genetic operators such as mutation and crossover to evolve the solutions in order select the best solution [2]. ...
Article
Full-text available
K-means Fast Learning Artificial Neural Network (K-FLANN) is an unsupervised neural network requires two parameters: tolerance and vigilance. Best Clustering results are feasible only by finest parameters specified to the neural network. Selecting optimal values for these parameters is a major problem. To solve this issue, Genetic Algorithm (GA) is used to determine optimal parameters of K-FLANN for finding groups in multidimensional data. K-FLANN is a simple topological network, in which output nodes grows dynamically during the clustering process on receiving input patterns. Original K-FLANN is enhanced to select winner unit out of the matched nodes so that stable clusters are formed with in a less number of epochs. The experimental results show that the GA is efficient in finding optimal values of parameters from the large search space and is tested using artificial and synthetic data sets.
Conference Paper
This paper presents a new general automatic method for segmenting brain tumors in magnetic resonance (MR) images. Our approach addresses all types of brain tumors. The proposed method involves, subsequently, image pre-processing, feature extraction via wavelet transform (WT), dimensionality reduction using genetic algorithm (GA) and classification of the extracted features using support vector machine (SVM). For the segmentation of brain tumor these optimal features are employed. The resulting method is aimed at early tumor diagnostics support by distinguishing between the brain tissue, benign tumor and malignant tumor tissue. The segmentation results on different types of brain tissue are evaluated by comparison with manual segmentation as well as with other existing techniques.
Article
The effectiveness of combinatorial search heuristics, such as Genetic Algorithms (GA), is limited by their ability to balance the need for a diverse set of sampling points with the desire to quickly focus search upon potential solutions. One of the method often used to address this problem is to simulate the theory of punctuated equilibrian in the GA. The GA introduced here uses the basic premises derived from punctuated equilibrian. but hopes to remedy the problems associated with sudden introduction of new genetic material by relying upon a much greater degree of distribution and an overlap population architecture. Presented here is a description and preliminary empirical test results of a massively distributed genetic algorithm. On the seventeen test problems attempted, the mdpGA did significantly better than a simple parallel GA. The massive distribution of the GA and the modified population topology yield improvements in speed, and also prove to be far less vulnerable than other genetic algorithms to biases in the function space which lead away from global optima.
Article
The self-organizing map (SOM) represents an open set of input samples by a topologically organized, finite set of models. In this paper, a new version of the SOM is used for the clustering, organization, and visualization of a large database of symbol sequences (viz. protein sequences). This method combines two principles: the batch computing version of the SOM, and computation of the generalized median of symbol strings.
Article
Genetic programming trees have a strong tendency to grow rapidly and relatively independent of fitness, a serious flaw which has received considerable attention in the genetic programming literature. Much of this literature has implicated introns, subtree structures with no effect on the an individual's fitness assessment. The propagation of inviable code, a certain kind of intron, has been especially linked to tree growth. However this paper presents evidence which shows that denying inviable code the opportunity to propagate actually increases tree growth. The paper argues that rather than causing tree growth, a rise in inviable code is in fact an expected result of tree growth. Lastly, this paper proposes a more general theory of growth for which introns are merely a symptom. 1 INTRODUCTION An unforseen result of genetic programming's tree-based chromosome is bloat, the uncontrolled growth in the size of individuals over the course of a run. This phenomenon has bee...
Evolutionary tree genetic programming Master's thesis, Computing and Information Sciences department
  • J Antolík
J. Antolík. Evolutionary tree genetic programming. Master's thesis, Computing and Information Sciences department, Kansas State University, Apr. 2004.
Master's thesis, Computing and Information Sciences department
  • J Antolík
J. Antolík. Evolutionary tree genetic programming. Master's thesis, Computing and Information Sciences department, Kansas State University, Apr. 2004.
Evolutionary computation in java
  • S Luke
S. Luke. Evolutionary computation in java. http://www.cs.umd.edu/projects/plus/ec/ecj/, 2001.