Content uploaded by Jan Antolik

Author content

All content in this area was uploaded by Jan Antolik on Apr 28, 2015

Content may be subject to copyright.

Evolutionary Tree Genetic

Programming

J´

an Antol´

ık

Department of Software Engineering

Charles University

Malostransk´

e n´

am. 25

118 00 Praha 1, Czech Republic

jan.antolik@matfyz.cz

William H. Hsu

Department of Computing and Information

Kansas State University

234 Nichols Hall

Manhattan, Kansas 66506-2302

bhsu@cis.ksu.edu

ABSTRACT

We introduce a clustering-based method of subpopulation

management in genetic programming (GP) called Evolution-

ary Tree Genetic Programming (ETGP). The biological mo-

tivation behind this work is the observation that the natural

evolution follows a tree-like phylogenetic pattern. Our goal

is to simulate similar behavior in artiﬁcial evolutionary sys-

tems such as GP. To test our model we use three common

GP benchmarks: the Ant Algorithm, 11-Multiplexer, and

Parity problems. The performance of the ETGP system is

empirically compared to those of the GP system. Code size

and variance are consistently reduced by a small but statis-

tically signiﬁcant percentage, resulting in a slight speedup

in the Ant and 11-Multiplexer problems, while the same

comparisons on the Parity problem are inconclusive.

Categories and Subject Descriptors: I.2.3 [Comput-

ing Methodologies]: Artiﬁcial Intelligence Problem Solving,

Control Methods, and Search

General Terms: Algorithms

Keywords: genetic programming

1. INTRODUCTION

In the ﬁeld of evolutionary computation, the problem of

managing subpopulations in order to improve the conver-

gence eﬃciency of a genetic algorithm (GA) or genetic pro-

gramming (GP) system has proven challenging. Work on

this problem has led to some research on biologically plausi-

ble models of speciation. This paper describes an approach

using a clustering method inspired by evolutionary biology

to reorganize subpopulations in genetic programming, with

the goal of producing more highly ﬁt individuals within a

set number of ﬁtness evaluations.

2. MOTIVATION

An interesting observation of natural evolution is that it

proceeds in a tree-like pattern. Scientists believe that at

the beginning of the life on the planet Earth only single

extremely simple life form existed. The members of this

life form gradually evolved until a point when two distinct

species have been segregated. During the millennia of ter-

Copyright is held by the author/owner.

GECCO’05, June 25–29, 2005, Washington, DC, USA.

ACM 1-59593-010-8/05/0006.

restrial evolution the same process was repeated in all the

once-existing species, leading to creation of a structure that

is known as the Tree of Life. This work represents an at-

tempt to simulate this process in very simpliﬁed manner.

It is motivated by the hypothesis that modeling an phylo-

genetically well-structured evolutionary process can increase

eﬃciency of speciation and adaptation in a GP system. Sev-

eral other existing EA systems were already motivated by

some biological foundations of species formation [2] such as

adaptive landscape and shifting balance theory or a more

recent theory of punctuated equilibria.

3. EVOLUTIONARYTREEGENETICPRO-

GRAMMING

3.1 Basic algorithm

Our system consists of a population Pthat is separated

into number of subpopulations Si. The ETGP algorithm

starts with a single large subpopulation. After each cf gen-

erations, where cf is parameter that we will refer to as the

clustering frequency, we divide each subpopulation into a

new set of subpopulations. The sizes of the subpopulations

are constrained to be proportional to the average ﬁtness of

the individuals they contain.

The division of subpopulation Sproceeds in two steps.

At ﬁrst we determine whether the given subpopulation Sis

large enough to undertake division. If the result of this deci-

sion is positive, we cluster the subpopulation Saccording to

some metric. We then insert each of these clusters into the

new population as new subpopulation instead of the original

subpopulation S. Reader can ﬁnd a detailed description of

this process in [1].

As the clustering method for the ETGP algorithm, we

have decided to use the Hierarchical Agglomerative Cluster-

ing (HAC). We use the minimum variance as the agglomer-

ative clustering criterion. There are some speciﬁc modiﬁca-

tions that we have introduced into the original HAC algo-

rithm. In summary, we present the ﬁnal modiﬁed clustering

algorithm:

find clusters A and B such

that the following property holds:

|A| ≥ |B| ∧ ∀x∈S: (|x| ≤ |B| ∨ x≡A)

if (|A|+|B| ≥ αPx∈S|x|)∧(|A|

|B|> β)

1789

then

let U:= Aand V:= B

For each cluster C∈Sexcept cluster A and B do

if dist(A, C)≤dist(B , C)then U:= U∪Celse

V:= V∪C

return (U,V)

else continue with the next level of clustering

where Srefers to the current set of clusters, parameter α

was set to 0.7, and the parameter βwas set to 0.4 in all the

experiments.

4. RESULTS

First we would like to very brieﬂy discuss the methodol-

ogy we have used in our experiments. We have conducted a

large number of diﬀerent tests. In order to make the situa-

tion more comprehensible, we have divided them into three

series. Because of the scope of this paper only the ﬁrst two

will be discussed. All the tests were repeated 100 times and

the averages over these 100 runs are reported. Everything

was implemented as an extension of Evolutionary Compu-

tation in Java (ECJ) package by Luke [3]. As benchmark

problems the 11-multiplexer,Artiﬁcial Ant (Santa Fe

Trail) and the Odd 11-Parity problems were selected.

4.1 First experiment series

The goal of the ﬁrst series of experiments was to ﬁnd the

optimal values of the branch factor and clustering frequency

of the ETGP method. We have conducted two experiments.

In both of them, one of these parameters was changed while

the other was ﬁxed to a predetermined value. Therefore, the

interrelationship of these two parameters was not explored.

Also we have used only the 11-multiplexer and Artiﬁcial

Ant in these tests. Following table summarizes the best

values of these parameters for both benchmark problems

selected in these experiments:

Clustering frequency Branch factor

Artiﬁcial Ant 3, 4 0.2, 0.25

11-multiplexer 7, 10 0.2, 0.3, 0.4

Selected value 7 0.2

Table 1: Best performance values

With respect to this table we have decided that a reason-

able compromise will be to use a value of 0.2 as the Branch

factor and 7 as a value of the Clustering Frequency in all

subsequent experiments.

4.2 Second experiment series

The second series of experiments ﬁnally provides us with a

comparison of the ETGP algorithm with the basic version of

the GP system. In these tests we have also switched on the

two additional features of the ETGP algorithm that are not

present in standard GP systems - the mutation and depth

controll. We did not mention the second modiﬁcation of

ETGP algorithm so far. The reason is, that the motivation

for introducing this parameter follows from constructive ar-

gument that did not ﬁt into our motivation section because

the scope of this paper. Anyways, let us brieﬂy describe

what the depth control modiﬁcation does. There are two

parameters related to this extension. The ﬁrst one Starting

depth Sdeﬁnes what is the maximal allowed depth of any

GP tree in the population in the beginning of the evolution.

The second one called End depth Edeﬁnes the maximal

depth of GP trees in the population in the end of evolution.

The actual maximal allowed depth min generation gcan

be computed from these two parameters in following way

m=S+ (E−S)g

numGen where numGen stands for the

maximal number of generation in the given run.

In our experiments we have explored also the relationship

between these two parameters. The mutation ratio was ex-

plored in the range of [0.05 - 0.4] with step 0.05. For the

depth growth control the Start depth parameter is varied in

the range [3 - 17] with step of 2. Because of the scope of

this paper we summarize the results of these experiments in

Table 2. Let us note that the table reports only the best

results with respect to the two varied parameters.

ETGP GP

Best hits Best hits Speedup

MR SD

Artiﬁcial Ant 74.5 71.01 +4.91%

0.35 7

11-multiplexer 1996.06 1960.5 +1.81%

0.25 7

11-parity 559.11 562.18 -0.5%

0.05 7

Table 2: This table summarizes the performance

of the ETGP and standard GP algorithm over the

three benchmark problems. The results for the

ETGP algorithm are those achieved with the best

combination of Start Depth (SD) and Mutation Ratio

(MR) parameters.

5. CONCLUSIONS

The focus of this work was to explore the possibility to

simulate evolution of species as we can observe it in nature.

We have extended the basic GP algorithm by incorporating

a dynamic formation of species. The goal was to build a sys-

tem in which the species evolution form a evolutionary tree

just as in natural evolution. As the mechanism of separation

of new species we have employed a clustering technique that

divides the given subpopulation into two new one, according

to the genetic similarity of the individuals. The comparison

with the standard GP algorithm was performed. As the

previous section indicates, the speedup in the convergence

curve, while statistically signiﬁcant and robust, is minor. A

third experiment series that provided us with some insights

into the dynamics of species formation in ETGP system is

omitted for brevity. The overall conclusion following from

these experiments is that while the system does successfully

use clustering to manage subpopulations, it does not fully

simulate the natural properties of ’tree-like’ evolution.

6. REFERENCES

[1] J. Antol´ık. Evolutionary tree genetic programming. Master’s

thesis, Computing and Information Sciences department,

Kansas State University, Apr. 2004.

[2] S. Baluja. A massively distributed parallel genetic algorithm,

1992.

[3] S. Luke. Evolutionary computation in java.

http://www.cs.umd.edu/projects/plus/ec/ecj/, 2001.

1790