Content uploaded by Francisco Fernández de Vega
Author content
All content in this area was uploaded by Francisco Fernández de Vega
Content may be subject to copyright.
for my parents
Contents
Acknowledgements
From this page I would like to acknowledge the people that have contributed
in same way to this thesis:
To my parents; their support was fundamental at the beginning of the
research.
To Concha, without whose patience, understanding and encouragement this
thesis would never have been written.
To my directors; for accepting an unknown student and leading him to the
objective.
To Laura, for her support at the beginning of the research.
And finally, to everyone who has in some way influenced my research.
II
Contents
CONTENTS
Preface.........................................................................................................................2
PART I
1. Introduction...........................................................................................................4
1.1. Machine Learning............................................................................................4
1.2. Evolution.........................................................................................................4
1.2.1.Some definitions......................................................................................5
1.3. Evolutionary algorithms: A summary..............................................................6
1.3.1.Genetic Algorithms..................................................................................7
1.3.2.Evolutionary Strategies and Evolutionary Programming..........................9
1.3.3.Genetic Programming: An overview.....................................................10
1.4. Genetic Programming (GP): Basic Concepts...............................................11
1.4.1.Genetic Operators.................................................................................12
1.4.1.1.Crossover.....................................................................................13
1.4.1.2.Mutation.......................................................................................13
1.4.1.3.Reproduction................................................................................13
1.4.2.Fitness...................................................................................................15
1.4.2.1.The Fitness Function....................................................................16
1.4.3.Selection................................................................................................16
1.4.4.GP Algorithm.........................................................................................17
1.4.4.1.Generational GP Algorithms.........................................................17
1.4.4.2.Steady State GP Algorithms.........................................................17
1.5. Parallel and Distributed Models.....................................................................18
1.5.1.Global Parallel Evolutionary Algorithms.................................................19
1.5.1.1.Parallelising at the level of fitness................................................19
1.5.1.2.Parallelising at level of populations..............................................20
1.5.2.Our Multipopulation Genetic Programming Model.................................22
III
Contents
1.5.3.Performance analysis............................................................................23
2. A Parallel and Distributed Genetic Programming (PADGP) Tool...................25
2.1. Software tool based on PVM: PADGP version #1........................................25
2.2. Software tool based on MPI: PADGP version #2.........................................26
2.2.1.Graphical user interface and monitoring tool.........................................27
3. Parallel and Distributed Genetic Programming: Experimental Study...........29
3.1. Introduction...................................................................................................29
3.2. New parameters............................................................................................29
3.3. Designing experiments: Measuring Effort.....................................................31
3.4. Benchmark problems....................................................................................32
3.4.1.The even parity 5 problem.....................................................................32
3.4.2.The symbolic regression problem..........................................................32
3.4.3.The ant on the Santa Fe trail.................................................................33
3.5. Experimental Results....................................................................................34
3.6. Isolated Multipopulation Genetic Programming.............................................34
3.6.1.Results..................................................................................................37
3.6.1.1.The ant on the santa fe trail problem............................................37
3.6.1.2.The even parity 5 problem............................................................40
3.6.1.3.The symbolic regression problem................................................42
3.6.2.Conclusions...........................................................................................44
3.7. PADGP: Studying communication topology.................................................44
3.7.1.Results..................................................................................................45
3.7.1.1.The ant on the santa fe trail..........................................................45
3.7.1.2.The even parity-5 problem...........................................................48
3.7.1.3.Symbolic regression problem.......................................................51
3.7.2.Conclusions...........................................................................................51
3.8. PADGP: The relationship between parameters (region of effort).................53
3.8.1.Results..................................................................................................53
3.8.1.1.The even parity 5 problem............................................................53
3.8.1.2.Symbolic regression problem.......................................................55
IV
Contents
3.8.1.3.The ant on the santa fe trail..........................................................58
3.8.2.Conclusions...........................................................................................60
3.9. PADGP vs. GP..............................................................................................60
3.9.1.Results..................................................................................................60
3.9.1.1.The even parity 5 problem............................................................61
3.9.1.2.The ant on the santa fe trail problem............................................65
3.9.1.3.The symbolic regression problem................................................65
3.9.2.Conclusions ..........................................................................................65
4. Applying GP to Medical Knowledge Representation.......................................70
4.1. Introduction...................................................................................................70
4.2. The problem of diagnosing............................................................................71
4.3. Burn diagnosing............................................................................................71
4.4. Rule Extraction by means of Genetic Programming......................................72
4.4.1.Classifying by means of decision trees..................................................72
4.4.2.Decision trees........................................................................................72
4.4.3.Genetic algorithms and genetic programming in medical tasks.............73
4.4.4.Parallel and Distributed Genetic Programming (PADGP)......................74
4.4.5.A case study: Burns unit, Virgen del Rocío Hospital. Seville, Spain....75
4.4.6.The parameter set.................................................................................75
4.5. Experimental Results....................................................................................77
4.6. Conclusions...................................................................................................78
4.7. Experimental Studies....................................................................................80
4.7.1.PADGP vs. GP......................................................................................80
4.7.2.Studying regions of effort.......................................................................82
PART II
5. Synthesis of Logic Functions............................................................................85
5.1. Introduction...................................................................................................85
5.2. VLSI Design Cycle........................................................................................86
5.2.1.Physical Design Cycle...........................................................................87
5.2.2.Design styles.........................................................................................87
V
Contents
5.3. Design based on FPGAs...............................................................................88
5.3.1.Row-based FPGAs................................................................................88
5.3.2.Island-based FPGAs.............................................................................89
5.4. Design cycle for FPGA technology................................................................91
5.5. Encoding circuits. Background.....................................................................92
5.5.1.Encoding circuits as graphs...................................................................92
6. Methodology........................................................................................................96
6.1. Representing graphs by means of trees........................................................96
6.2. Representing wires as branch on trees.........................................................97
6.3. Codifying physical connections.....................................................................98
6.4. Genetic Programming applied to circuit description....................................104
6.4.1.First representation..............................................................................104
6.4.2.Second representation........................................................................106
6.4.2.1.Evaluating trees.........................................................................108
6.5. Applying the methodology...........................................................................110
6.5.1.Some simplifications............................................................................110
6.5.2.GP sets................................................................................................110
6.5.3.The fitness function.............................................................................111
6.5.4.Providing circuits to GP algorithm........................................................112
6.6. Testing the methodology.............................................................................115
6.6.1.First Experiment: A simple example...................................................115
6.6.1.1.Describing the problem..............................................................115
6.6.1.2.Providing the circuit....................................................................116
6.6.1.3.Results.......................................................................................117
6.6.1.4.Conclusions................................................................................118
6.6.2.Second Experiment: A more difficult problem.....................................119
6.6.2.1.Describing the problem..............................................................119
6.6.2.2.Providing the circuit....................................................................120
6.6.2.3.Results.......................................................................................120
6.6.2.4.Conclusions................................................................................121
VI
Contents
6.6.3.Third example: Increasing the difficulty..............................................121
6.6.3.1.Describing the problem..............................................................121
6.6.3.2.Providing the circuit....................................................................121
6.6.3.3.Results.......................................................................................122
6.6.3.4.Conclusions................................................................................124
6.6.4.The latest experiment: a jump in problem’s difficulty..........................124
6.6.4.1.Describing the problem.............................................................124
6.6.4.2.Providing the circuit....................................................................125
6.6.4.3.Results.......................................................................................126
6.6.4.3.1.Solution #1........................................................................126
6.6.4.3.2.Solution #2........................................................................127
6.6.4.3.3.Solution #3........................................................................129
6.6.4.4.Conclusions................................................................................130
6.7. Experimental studies on the FPGA problem...............................................130
6.7.1.Studying the region of effort.................................................................131
6.7.1.1.Region of effort with IMGP.........................................................131
6.7.1.2.Region of effort with PADGP......................................................134
6.7.2.IMGP vs. GP........................................................................................135
6.7.3.PADGP vs. GP....................................................................................137
7. Conclusions and final remarks........................................................................139
8. Future work.......................................................................................................141
References...............................................................................................................142
VII
Preface: Nature’s lessons
Throughout history human beings have made incredible discoveries, and
these have been used to solve problems that had previously been considered
impossible. In his book “The Blind Watchmaker”, Richard Dawkins
analyses the discovery and application of sonar and radar. Early pioneers in
this field could not imagine that the technique had been used by nature for
millions of years. In fact, one of our distant relatives in the mammal family
– the bat – employs sonar to navigate during flight. Such an example shows
how nature can confirm the results of a theoretical study, while also applying
this theory to practical problems.
Nevertheless, the process tends more to work vice versa: the success of an
invention often depends on a prior exhaustive study of natural processes.
The moral is that nature – the origin of everything – can show us new
techniques. A series of natural experiments enables us to envisage new
theories.
Nature is an infinitely patient teacher. We simply have to be humble enough
to learn its lessons. More than a century ago, Darwin and Wallace were able
to keep their minds open to its wonders. They consequently managed to
develop one of the most marvellous theories in scientific history. Life on
earth was a puzzle, and their work almost definitively solved it. These two
men sailed the world’s oceans in the course of their studies. They took us on
a voyage into the unknown, into something beyond all previous human
beliefs.
Many researchers have since found inspiration in Darwin’s “Theory of
Evolution”. His ideas have been a catalyst for the writing of many brilliant
pages in computer science history.
The relevance of evolutionary theory to artificial intelligence came to light
fifty years ago. Since then, we have lived through the birth of several
artificial intelligence systems. They are capable of playing games,
reproducing, driving vehicles and so on.
The Earth took 4500 million years to generate carbon-life intelligent
organisms. How much longer will it take the Earth, in partnership with
ourselves, to develop new forms of life that are as intelligent as those
derived from carbon? Is there anything beyond carbon-life intelligence? As
in the Triassic period, will the extinction of numerous species be necessary
for other kinds of organisms to reach a new level of intelligence? These
questions may not be answered until such an event takes place.
Nevertheless, we must try to get involved in the search for new forms of
intelligence. Our work should contribute to improvements in this field.
Preface
PART I
2
1 Introduction
1.1 Machine Learning
Making computers learn. This is the key issue behind the term machine
learning. Samuel coined the phrase machine learning in 1959. He was the
first person to endow a computer program with learning abilities. The term
was then used to refer to computers programming themselves [Samuel
1963]. Nevertheless, due to the difficulty of that objective, a contemporary
definition is: “[machine learning] is the study of computer algorithms that
improve automatically through experience” [Mitchell, 1996].
Three decades ago, researchers noticed that Darwin’s evolution theory
[Darwin 1859] could be almost directly applied to solving problems by
means of computers. New algorithms relying on natural processes were
developed. An emergent interest in evolutionary computation spread and a
number of associated techniques entered the machine learning field. The
best known algorithms in this class include evolution strategies [Rechenberg
1973], [Shwefel 1975], evolutionary programming [Fogel 1962], genetic
algorithms [Holland 1975], and genetic programming [Koza 1992]. All
them are based on the principle of evolution (survival of the fittest), although
each one represents individuals in a different way and also focuses on a
different facet of natural evolution. Genetic algorithms and genetic
programming stress chromosomal operators. Evolution strategies emphasise
behavioural changes at the level of the individual. Evolutionary
programming points out behavioural change at the level of the species. All
of them use a population of individuals (potential solutions) which
undergoes a series of transformations. After a certain number of
generations, the program converges and the best individual hopefully
represents a satisfactory or optimal solution.
1.2 Evolution
1 Introduction
The most widely accepted collection of evolutionary theories is the neo-
Darwinian paradigm. These arguments assert that the history of life can be
fully accounted for by physical processes operating on and within
populations and species [Hoffman 1988].
When natural selection acts, four different conditions must be fulfilled:
1. Individuals in the population must be able to reproduce.
2. Survival of individuals depends upon characters affected by
variations.
3. Characters are passed on from parents to children through
heredity.
4. Individuals in the population compete for resources.
These factors result in natural selection. The characteristics of a population
evolve over time. The processes operating in the background of this
evolutionary process are: reproduction, mutation, competition and selection.
Characters are transmitted from parents to offspring by means of sexual or
asexual reproduction. Mutation allows new genetic material to be generated
while reproduction takes place. As natural resources are limited,
competition keeps badly adapted individuals from surviving, and species
thus improve.
Several experiments show that evolution occurs even in non-living systems
[Orgel 1979]. These experiments lead us to conclude that the power of
evolutionary search resides in populations. Many individuals work in
parallel, each exploring a possible solution, a portion of the search space.
Several lessons could be learnt from these experiments, as stated in
[Banzhaff et al 1998]
A simple system may evolve as long as the elements of
multiplication, variance, and heredity exist.
Evolutionary learning may occur in the absence of life or self-
replicating entities.
Evolutionary learning may be a very efficient way to explore
learning landscapes.
Evolution may stagnate unless the system retains the ability to
continue to evolve.
The selection mechanism for evolutionary learning may be
implicit in the experimental setup (...) or may be explicitly
defined by the experimenter (...)
1.2.1 Some Definitions
4
1 Introduction
All living beings possess DNA. DNA can be regarded as a set of instructions
capable of creating an organism. In computer programs instructions are
actually made up of bits, i.e. units of information. The same thing occurs in
DNA. Each instruction is comprised of a sequence of three base pairs. Three
consecutive bases are a codon. A codon is a template for the production of a
particular amino acid, or a sequence termination codon. Not all codons
produce amino acids, so there is some redundancy in DNA code. This
redundancy is helpful when constructing evolving systems with computer
programs.
A gene is a location on the DNA. Each gene is usually associated with one
or more features of an individual, like eyes’ colour. The different
possibilities each feature can present are called alleles.
Much of the DNA of many organisms seems to do nothing, to express no
feature in individuals. This is referred to as junk DNA and as introns. Why
do introns appear? It is not yet clear, although they could be related to the
process of protein translation. Later on, we will see that introns also appear
in Genetic Programming. Obviously, their role in Genetic Programming will
be different.
It is important to distinguish between the appearance of an organism and its
genetic constitution. Two different words were coined to label the two
concepts. The former was called phenotypes; the latter genotypes
[Johannsen, 1911]. In biological evolution, phenotypes and genotypes play a
different role in interaction with the environment, survival and reproduction.
The genotype is the DNA of the individual. Phenotypes are composed of the
set of features we can observe in a individual.
Ontogeny is the development of the organism from fertilization to maturity.
It is the bridge between genotypes and phenotypes. Even when no physical
distinction exists between genotypes and phenotypes, evolution can take
place. Orgel´s experiments state that evolution can also occur without
ontogeny. Actually, some of the evolutionary techniques do not distinguish
between genotype and phenotype by means of an ontological process.
In order for a population to evolve, some genetic variation is required in that
population. This is usually achieved by means of mutation, genetic transfer
and sexual reproduction. All of them try to generate new genetic material
from other existing sources. Evolutionary methods also use some of these
techniques.
Organisms that sexually reproduce with resulting viable offspring are usually
said to belong to the same specie. Sexual reproduction has an important
advantage when compared with asexual reproduction: favourable mutations
are much more rapidly combined into one new individual. Reproduction is a
key issue in evolutionary techniques.
5
1 Introduction
1.3 Evolutionary Algorithms: A summary.
Before focussing on the methodology involved in the thesis, this section will
look at different approaches that have been widely adopted by researchers.
All of these aproaches use evolution to solve optimisation problems. As we
saw above, the algorithms contrast in the way individuals are represented,
while at the same time different aspects of evolution are emphasised. At the
first sight these distinctions could seem slight. Nevertheless they influence
the way different evolutionary algorithms work and they suggest the
importance of studying how they perform. This is one of the main areas of
study in our research.
All the following methods try to solve problems for which no reasonably fast
algorithms have been developed. Many of these problems are optimisation
problems. For some thorny optimisation problems we can use probabilistic
algorithms. They can find an approximate though not optimal solution, with
a minimal error. An example of this kind of awkward optimisation problem
is wire routing and logic block placement when dealing with FPGAs.
Another example is the travelling salesman problem (TSP). They can be
approximately solved by means of simulated annealing (a probabilistic
algorithm). In this theses we will see how to apply genetic programming
(one kind of EA) to the solving of optimisation problems.
As stated in [Michalewicz 1996], any evolutionary program must possess the
following attributes:
•a genetic representation for potential solutions to the
problem.
•a way to create an initial population of potential solutions.
•an evaluation function that plays the role of the
environment, rating solutions in terms of their “fitness”.
•genetic operators that alter the composition of children.
•Values for various parameters that the genetic algorithm
uses (population, size, probabilities of applying genetic
operators, etc.).
[Foggel 2000] summarises the techniques we will briefly described in the
following sections.
1.3.1 Genetic Algorithms
Genetic Algorithms (GAs) can be considered to be the most important
predecessor of genetic programming. Holland developed GAs in 1975
[Holland 1975]. Although GAs are a kind of optimisation method, several
differences distinguish them from classic optimisation and search methods:
1. It does not directly work with the problem’s parameters. Instead, it
uses a codified version of parameters.
6
1 Introduction
2. It searches for a set of points using a population of candidates
solutions
3. Probabilistic transition rules are employed.
Genetic Algorithms are stochastic. Their search method resembles genetic
inheritance and the Darwinian fight for survival.
GAs are typically implemented as follows:
1. The problem to be addressed is defined in terms of an objective
function. This function will be applied to each individual in the
population in order to estimate how well this individual solves the
problem. The function is called the fitness function, and the computed
value, the fitness value.
2. A population of individuals is generated within some constraints.
Each individual is usually coded as a binary vector, termed a
chromosome. Each element – bit – is described as a gene, and refers
to some of the parameters that will be optimised in a given problem.
The length of the vector thus depends on the number of parameters,
but it is always the same for a problem (and this is one of the most
important differences between GAs and genetic programming. In the
latter length is variable). Sometimes the vector represents real values
of a given variable.
3. Each chromosome is decoded and evaluated by means of the fitness
function. The fitness value obtained from this evaluation is assigned
to the chromosome.
4. Each chromosome is given a probability of reproduction proportional
to its fitness value.
5. According to the assigned probability of reproduction, a new
population of chromosomes is generated by probabilistically selecting
strings from the current population.
6. Two classic genetic operators are employed when producing
offspring: mutation and crossover. Mutation alters one or more genes,
and acts with a probability equal to the mutation rate. Crossover
selects a pair of individuals, which will act as parents. It generates a
new individual by mixing some parts of each parent. The individuals
acting as parents have more chance of being selected if their fitness
values are high. In the following generation the new individuals
created will substitute some individuals from the population.
Figure 1.1 visually depicts the way individuals are crossed over when they
produce offspring.
Some important parameter values affecting the performance of genetic
algorithms must be carefully selected: the number of individuals in the
7
1 Introduction
population, crossover probability, mutation probability and so on. Most of
these parameters are shared with other kinds of evolutionary algorithms.
Although the theoretical foundations of GAs are quite clear (see the notion
of schemata [Goldberg 1989]), in genetic programming – the focus of our
study – matters are not so cut and dried. In our work we have thus
concentrated on introducing and new parameters, and experimenting with
them in genetic programming frameworks.
[Michalewicz 1996] should be consulted for a study in greater depth on
genetic algorithms.
1.3.2 Evolutionary Strategies and Evolutionary Programming
Although some authors distinguish between evolutionary strategies and
evolutionary programming, we could see the two appraches as belonging to
the same class, due to their similarities
This paradigm in evolutionary computation was developed by [Rechenberg
1973] [Schwefel 1975,] and [Fogel 1962] and is commonly described by the
terms evolution strategies (ES) and evolutionary programming, or more
broadly evolutionary algorithms. Because of the limitations of their basic
experimental setup, only one individual could be considered at a time, so the
population consisted of just one individual. A selection process based on a
fitness value and variation due to random mutation was also applied.
8
0 1 1 1 1 0 1 0
Binary fixed sized genome
0 1 1 0
1 0
1 1 1 0
After the Operation
Before the Operation
Crossover point
Figure 1.1 : GA crossover.
1 Introduction
Soon, as computers became valuable tools, the benefit of using several
individuals in the population was recognised. Individuals were represented
as vectors with real values. In ES, selection is a deterministic operator, a
number of individuals are chosen to constitute the population in the next
generation.
The simplest method can be implemented in the following steps, as
described in [Fogel 2000]:
1. The problem is defined as finding the real-value n-
dimensional vector x that is associated with the extremum of
a functional F(x):
ℜ
n
→ℜ
. Without loss of generality, let
the procedure be implemented as a minimisation proccess.
2. An initial population of parent vectors, xi, i=1, ... P, is
selected at random from a feasible range in each dimension.
The distribution of initial trials is typically uniform.
3. An offspring vector, xi’, i=1, ..., P, is created for each
parent xi, by adding a Gaussian random variable with zero
mean and preselected standard deviation to each component
of xi.
4. Selection then determines which of these vectors to maintain
by comparing the error F(xi ) and F(xi’), i=1, ..., P. The P
vectors that possess the least error become the new parents
for the next generation.
5. The process of generating new trials and selecting those
with least error continues until a sufficient solution is
reached or the available computation is exhausted.
Each component of an individual is understood as a trait, not as a gene.
Nowadays, another interesting aspect of ES is the possibility of assigning
different strategy parameters, like mutation rates, to each individual. In ES
we are therofore working in both the domain of object variables and the
domain of strategy parameters.
1.3.3 Genetic Programming: an overview.
Genetic Programming (GP) focuses on inducing a population of computer
programs. This population improves automatically as it is trained. The
training process takes place by means of a set of examples concerning the
problem at hand. So, GP is part of the body of research called machine
learning (ML).
GP is often used to generate tree structures. This is due to the influence of
Koza’s treatise entitled “Genetic Programming. On the Programming of
Computers by Means of Natural Selection” [Koza 1992]. Koza was the first
9
1 Introduction
to recognise that GP was something different from other kinds of
evolutionary techniques. Due to the computer language he used for
constructing individuals, LISP, the tree structure has since been present in
almost all further developments in GP. In next chapter we’ll see the
importance of the tree structure for the problem we have addressed during
our research.
10
1 Introduction
The term genetic programming is usually associated with the evolution of
tree structures. Nevertheless, some research has been done with GP using
linear structures [Nordin 1994]. A list of elements belonging to GP is given
in [Banzhaf et al 1998]:
The term “genetic programming” shall include systems that
constitute or contain explicit references to programs
(executable code) or to programming language expressions...
... all means of representing programs will be included.
...Not all algorithms running on computers are primarily
programs... Nevertheless, we shall not exclude these algorithms
from being legitimate members of the GP family.
The learning algorithm used by GP is based on an analogy with natural
evolution. GP evolves programs for the purpose of inductive learning. The
main goal of GP is to tell the computer what we want it to perform. This
computer would be able to learn how to carry out the given task. Although
this goal is still far off, GP performs better than other machine learning
methods in several kind of problems [Koza et al 1999].
1.4 Genetic Programming: Basic Concepts.
When dealing with genetic programming, individuals are usually program
trees. These trees make up the population. Each individual - program tree -
is composed of terminals and functions. Terminals are always leaves on the
trees, while functions are internal nodes.
The terminal set includes the input parameters, constants supplied to
individuals, and zero-argument functions with side-effects. The process of
deciding inputs (important features of the problem at hand) is a decisive task.
In fact, they form the training set, and a good training set will speed up the
learning process. Real constants are often incorporated into the terminal set,
and these are called random ephemeral constants - represented by ℛ. Their
value remains fixed during the execution of a program. Nevertheless,
mutation can affect constant values, when applying reproduction to
individuals containing constants.
The function set includes the functions, statements and operators that are
available for each problem. The function set should be chosen so as to suit
the problem’s domain, while the different components of the set are selected
in order to optimise the problem’s solution. Any kind of function available
in programming languages could be included into the function set. For
instance, boolean and arithmetic functions; conditional statements; functions
for managing memory; loop statements and subroutines; etc.
Of course, any solution to the problem is composed of elements from the
function and the terminal set. These sets should thus include the necessary
elements to represent the desired solution. Usually, a small set of primitives
11
1 Introduction
can solve an extraordinary number of problems. However, selecting the
components of these sets is a crucial step in solving a problem.
Functions and terminals are assembled by making up trees (programs).
Programs are structures of terminals and functions, together with some rules
or conventions for when and how each function or terminal is to be
executed. The way of executing programs affects phenotypes, while genetic
operations affect genotypes.
Although other kinds of structures are permitted [Banzhaff et al 1998], in
this thesis we will deal with trees, like that depicted in figure 1.2.
Figure 1.2 : A GP tree.
1.4.1 Genetic Operators
Before beginning the evolution process, a population of individuals must be
created. GP must first initialise the population. A number of programs have
to be created, by assembling functions and terminals into tree structures.
The size of programs is an important issue as long as individuals are to run
on a computer that has a limited amount of memory. A parameter – depth –
is used for controlling the size of programs. The depth of a node is
consequently the minimal number of nodes that must be traversed to get
from the root node of the tree to the selected node. No node depth must go
beyond the maximum depth established for a given problem.
Two different ways of initialising trees have been commonly used. The first,
known as grow, randomly selects components from functions and terminals
throughout the entire tree. A branch ends when a terminal is placed on a
node. But this terminal cannot go beyond the maximum depth. The second
method, called full, always chooses functions until a node is at the maximum
depth. Then it chooses only terminals. These two methods could produce
reasonably uniform populations. Another method has been proposed for
solving this problem. The ramped-half-and-half technique operates as
12
b
a
b
c
+
*
-
1 Introduction
follows: let us suppose the maximum depth parameter is 4. The population
is then divided into 4-1=3 groups, each with the same number of individuals.
One group is initialised with individuals having depth 2, the second with
depth 3 and third with depth 4. For each group, half the individuals are
initialised with the full technique and half with the grow technique.
The evolution process transforms the initial population by means of genetic
operators. In fact, they are the search operators. We shall now describe the
most important ones.
1.4.1.1 Crossover
The aim of this operator is to generate a new individual by crossing over two
individuals - the parents. As in nature, genetic material is being combined,
thus producing a new genotype. In GP the way of crossing over two trees is
the following.
1. Two individuals from the population are selected as parents.
The probability that an individual will be selected is
proportional to its fitness.
2. A subtree in each parent is randomly selected. Internal nodes
are more likely to be selected as subtrees rather than leaves.
3. Selected subtrees are swapped. Two new individuals emerge
from the process.
Figure 1.3 shows the operation.
1.4.1.2 Mutation
Mutation affects just one individual. The mutation probability decides
whether the new individual (which has emerged from the crossover) should
be mutated or not. Mutation acts on GP by randomly choosing a subtree of
the individual, and then replacing it by a new subtree which is also randomly
generated. The modified individual is then placed back into the population
(see figure 1.4).
1.4.1.3 Reproduction
It consists of duplicating an individual from the population. It is
implemented by copying an individual.
13
1 Introduction
Figure 1.3 : Crossover.
14
b
a
b
c
+
*
-
a
b
c
+
-
b
/
Crossover point
Before the operation
After the operation
b
a
b
c
+
*
b
c
+
-
b
/
-
a
Subtrees to be
exchanged
1 Introduction
1.4.2 Fitness
GP can be seen as a kind of beam search. GP is the beam; each individual is
a point that guides the search.
When applying crossover to individuals from the population, GP must
decide which individuals should become parents. This is done by means of
fitness-based selection. The function employed for evaluating individuals is
called the fitness function. Fitness functions are very different for any given
problem, because in each case they are specially adapted to carrying out the
search as efficiently as possible.
15
b
a
b
c
+
*
-
Mutation
point
Before the operation After the operation
b
a
b
c
+
*
c
+
-
b
/
-
a
Subtree
to be
deleted
New Subtree
Randomly
generated
Figure 1.4 : Mutation
1 Introduction
1.4.2.1 The Fitness Function
As defined in [Banzhaf et al 1998]:
Fitness is the measure used by GP during simulated evolution
of how well a program has learned to predict the output(s) from
the input(s) – that is, the features of the learning domain.
The learning algorithm must have a mechanism to evaluate how good an
individual is. The fitness function applies an individual – a program – to the
training set. The values returned by the individual are then compared with
actual results. A fitness value is computed by comparing values to results.
Several kinds of fitness values are usually employed: standardised fitness
and normalised fitness being the most common. In the latter, the fitness
values vary from zero to one.
1.4.3 Selection
After the fitness function has been applied to all of the individuals, selection
takes place. The selection operator decides which individuals will undergo
crossover. Several types of selection can be applied. In nature selection
allows the best individuals to preserve their genetic material by surviving
and creating descendants. In GP, and in any other evolutionary technique,
the fitness value decides the worth of an individual, thus allowing it to
survive or not and cross over or not.
Several types of selection have been used. In fitness-proportional selection
the probability of an individual being selected is proportional to its fitness
value. Well-adapted individuals will in this way have a greater chance of
reproducing than bad individuals. This type of selection was first used by
Holland [Holland, 1975].
In truncation or (μ‚λ ) selection [Schwefel, 1995], a number μ of parents are
allowed to breed λ offspring, out of which the μ best are used as parents for
the next generation.
On the other hand ranking selection [Grefensette and Baker, 1989] [Whitley,
1989] uses the fitness order that has been calculated when ordering
individuals according to their fitness value. The selection probability for an
individual depends on its rank in the population.
Finally, in tournament selection some individuals from the population
compete. The number of individuals selected for competition is called the
tournament size. This parameter allows researchers to adjust selection
pressure. A small size brings about low pressure and vice-versa. This
method does not require a centralised fitness comparison between all
individuals, thus providing an easy way to parallelise the algorithm.
16
1 Introduction
1.4.4 GP Algorithm
Several preliminary steps must be taken before running the basic algorithm.
They are necessary for preparing GP to work with the problem we want to
solve. These are the steps:
1. Define the terminal and function sets.
2. Define the fitness function.
3. Define parameters such as population size, maximum individual
depth, crossover, selection and mutation probabilities and termination
criterion.
Depending on the way individuals are generated, two different GP
algorithms can be used: generational or steady-state algorithms.
1.4.4.1 Generational GP Algorithm
In generational GP, the concept of generation is present. A population of
individuals represents each generation. In the creation of each generation a
new population is generated from the older population. This new generation
replaces the previous one. The execution cycle consists of the following
steps:
1. Initialise the population.
2. Evaluate all of the individuals in the population and
assign a fitness value to each one.
3. Until the new population is fully populated, repeat the
following steps:
4. Select individuals in the population using the
selection algorithm.
5. Apply genetic operations to the selected individuals.
6. Insert the result of the genetic operations into the
new population.
7. If the termination criterion is reached, then present
the best individual as the output. Otherwise, replace
the existing population with the new population and
repeat steps 2-7.
1.4.4.2 Steady-State GP Algorithm
No fixed generation intervals exist in this approach. The reproduction of
individuals is a constant flow, and the offspring replace existing individuals
in the same population. Results reported when using this method present
17
1 Introduction
good general convergence. The basic GP algorithm using this method
consists of the following steps:
1. Initialise the population.
2. Select a subset of individuals from the population.
3. Evaluate the fitness value of each individual in the
subset.
4. Select the winners.
5. Perform genetic operations on the winners.
6. Replace the losers in the tournament with the results
obtained in step 5.
7. Repeat steps 2-7 until the termination criterion is met.
8. Present the best individual in the population as the
output.
1.5 Parallel and Distributed Models
Parallel and distributed computing is a key technological resource in present-
day networked and high-performance systems. The goal of increased
performance can be reached in principle by adding processors, memory and
interconnection networks and putting them to work together on a given
problem. By sharing the workload, it is hoped that an N-processor system
will reduce computation time. However, it is well-known that things are not
so simple, since in most cases several overhead factors contribute to
significantly lowering the expected theoretical performance improvement.
In any case, many important problems are sufficiently regular in their spatial
and temporal dimensions as to be suitable for parallel or distributed
computing. Evolutionary algorithms can also be parallelised and they
applied to the solutio of such problems.
As we have seen above, working with GP means evaluating populations. If
the difficulty of the problem is very high, we will need large high number of
individuals and generations. The time required for evaluating those huge
populations will thus be extreamly high when using only one processor. We
can save computation time by using more processors and parallelising the
application. New models of GP are thus required to allow populations to be
evaluated and operated in as many processors as necessary. For
applications that can be parallelised efficiently (such as GP), parallelisation
provides an efficient means of increasing computing power.
Regardless of the type of parallel computer, we must study ways of
parallelising genetic programming. In GP individuals are build up and
evaluated differently from other kinds of evolutionary techniques.
Consequently specific algorithms must be developed.
18
1 Introduction
As described in [Tomassini 1999] there are two main reasons for
parallelising an evolutionary algorithm: one is to save time by distributing
the computational effort and the second is to benefit from a parallel setting in
algorithmical terms, along with the natural parallel evolution of spatially
distributed populations.
A first type of parallel evolutionary algorithm makes use of the available
processors or machines to run independent problems. This is trivial, as there
is no communication between the different processes and for this reason it is
sometimes called an embarassingly parallel algorithm. This extremely
simple method of doing simultaneous work can be very useful. For
example, this setting can be used to run several versions of the same problem
with different initial conditions, consequently allowing statistics to be
gathered on the problem. As evolutionary algorithms are stochastic in
nature, being able to collect this kind of statistics is very important. This
method is generally better than a very long single run, since improvements
are less likely at later stages of the simulated evolution. The model can also
be used to solve N different versions of the same problem or to run N copies
of the same problem but with different parameters, such as crossover or
mutation rates. Neither of the above adds anything new to the nature of the
evolutionary algorithms, but the time savings can be great.
1.5.1 Global parallel evolutionary algorithms
There are several levels at which an evolutionary algorithm can be
parallelised: population, individuals or the fitness evaluation level. The
differences in the implementing of the parallel algorithm when using GA
with respect to GP stem from how individuals are represented.
1.5.1.1 Parallelising at the level of fitness
Parallelisation can be easily done at the level of fitness, because the
algorithm remains untouched. In many problems, calculating individuals’
fitness is by far the most time consuming step of the algorithm. The time
spent in communications can thus be considered to be negligible. A simple
way of parallelising is to evaluate individuals on different processors: a
master process manages the population, sends individuals to be evaluated to
different processors, collects results and applies genetic operators (figure
1.5).
Individuals in GP are of different sizes and complexities. As a consequence
some experiments have dealt with the problem of load imbalance, which
decreases the usage of processors [Oussaidène et al 97]. Load balancing can
automatically be achieved if we use steady state reproduction.
19
1 Introduction
Very often, several fitness cases must be applied to each individual. Another
simple way of parallelising at the level of individuals consists of evaluating
each fitness case on a different processor for the same program.
1.5.1.2 Parallelising at the level of populations
Usually, natural populations tend to possess a spatial structure in which
more so-called demes make their appearance. Demes are semi-independent
groups of individuals or subpopulations that have only a loose coupling to
other neighbouring demes. This coupling takes the form of the migration of
individuals from one deme to another. Several models based on this idea
have been proposed, and the two most important ones are island and grid
models.
The grid model is also called the fine-grained model. Individuals are placed
in a one or two-dimensional grid, one individual per grid location. The
model is also sometimes called cellular because of its resemblance to
cellular automata with stochastic transition rules [Tomassini 1993], [Whitley
1993]. Implementation of GP is difficult when using this model, since
individuals may vary widely in size and complexity.
20
Master
Process
Processor
#1
Processor
#2
Processor
#3
Processor
#n
…
Individual to be evaluated
Individual already evaluated
Figure 1.5 : Parallelising at the fitness level,
1 Introduction
On the other hand, the island model [Whitley et al 1997], [Tomassini 1999]
features geographically separated subpopulations of relatively large sizes.
Individuals are allowed to migrate among demes. Two ideas are behind this
model: exploring different search areas via different subpopulations, and
maintaining diversity within populations thanks to the exchange of
individuals with other populations.
Several patterns of exchange have traditionally been used. The most
common ones are rings, 2-d and 3-d meshes and hypercubes. Figure1.6
depicts these distributed models.
a) b)
Furthermore, several replacement policies can be adopted. One of the most
common involves the migration of k individuals. These individuals
substitute the worst k individuals from the population where they are
arriving. Several parameters must be taken into account when working with
this model: the subpopulation size, the frequency of exchange, the number
of migrating individuals and the migration topology.
Some papers deal with the usefulness of this model [Fuchs 1999], [Punch
1998]. In our research we are focussing on three important parameters: the
number of subpopulations, the number of individuals per subpopulation and
the topology we employed. In later chapters we will describe experiments
and results. We must point out that concerning the island model, each
population may have different parameters or even the encoding of
individuals may be different [Lin and Punch 1994]. These models are
21
Subpopulation
Direction of
exchange
Figure 1.6 : a) Ring b) Mesh
1 Introduction
labelled heterogeneous, and more research must be carried out to prove their
usefulness. In this research we have limited ourselves to homogeneous
models.
In general, island parallel genetic programming and genetic algorithms may
help in alleviating the premature convergence problem and are effective for
multimodal optimization. Furthermore, some arguments have been published
supporting the usefulness of the model when using genetic algorithms with a
specific kind of problems [Whitley et al]. In our research we also
investigated whether the island model is useful even when no
communication is allowed among subpopulations. Section 3.6 deals with
this question.
Several ways of implementing the island model can be found in [Koza et al
1999]. Some of them are based on multiprocessors and other
implementations in computer networks.
1.5.2 Our Multipopulation Genetic Programming Model.
In order to experiment with parallel models of Genetic Programming we
decided to implement our version of the Island Model, bearing in mind that
the tool should be able to work with as many populations as we like.
Furthermore, we wanted the tool to be flexible enough to work with different
communications topologies. We will thus propose in this section a new
topology for parallel GP. We will then analyse and compare it with another
commonly applied topology.
Figure 1.7 : Client/Server Model. Clients are Subpopulations.
22
Client # n
Subpopulation #n
Client #2
Subpopulation #2
Client #4
Subpopulation #4
…
Client #3
Subpopulation #3
Client #1
Subpopulation #1
Master
1 Introduction
Our model is based on the client/server paradigm (figure 1.7) for
establishing the communication topology. Every population acts as a client
process, sending its best individual to the server every time an internal
measuring-point is reached. The server is in charge of deciding where each
received individual must travel, so establishing the communication topology
[Fernández et al 1999].
Some of our research showed that random topology, in which the destination
of an individual is always stochastically generated (see figure 1.8), gives
good results with some benchmark problems [Fernández et al 2000i]. We
consequently decided to use such a topology. Results from some later
research have confirmed the usefulness of random topology (chapter 4).
Figure 1.8 : Topology randomly changes the communication topology
every time. Arrows indicate senders and receivers.
1.5.3 Performance analysis
[Fernandez et al 2000a] we find that the island model achieves better
performance: it reduces computing time when using parallel G.P.
The sequential GP runtime Tseq of a population of P individuals is
determined by the genetic operators and by fitness evaluation. Selection,
crossover and mutation take time O(P) since all these operators act on each
individual and perform a transformation independent of the program size.
The fitness calculation is the most important part and its complexity Tfitness is
O(PCm): each individual is evaluated m times and each evaluation takes on
average of C arithmetic or logical operations. Here C is the average program
complexity (i.e. the number of nodes in the tree structure) and m is the
number of fitness cases. The total sequential time is therefore
Tseq=gTpopulation+gTfitness, where g is the number of generations.
Let us consider the island model where N populations, of P/N individuals,
are equally distributed on a system of N machines. Now the genetic
operators take time O(P/N) while the fitness evaluation is O((P/N)Cm). The
23
t
…
1 Introduction
only overhead is caused by the communication of migrating individuals.
This takes time O(k(P/N)C). Here k
≈
0.05 is an empirical constant which
represents the fraction of migrating individuals; therefore, communication
time is limited in the context of the algorithm as a whole.
Finally, if we consider an asynchronous island model, where migration takes
place with non-blocking primitives, the communication time almost
completely overlaps with computation and can be disregarded at first.
Therefore,
We achieve an acceleration whic is nearly linear (the complete expression
has a constant value that avoids obtaining time equal to 0 when N tends to
infinite, but the simplified expression is an approximation and moreover, in
experimental terms we can only have from one to several thousand
processors). Of course, the preceding argument only holds for dedicated
parallel machines and unloaded clusters, and does not take into account
process spawning, message latency time and distributed termination.
Nevertheless, GP is an excellent candidate for parallelization, as is shown by
the results presented here and in [Andre and Koza 1996].
24
seq
fitness
par
fitness
T
N
T
=1
and
seq
population
par
population
T
N
T
=1
(1.1)
2 A Parallel and Distributed
Genetic Programming Tool
In this chapter we will describe the software tool developed by ourselves and
employed in all the studies we are presenting. The tool implements a
parallel and distributed model of genetic programming. The kernel of the
tool is based on the public domain code for Genetic Programming
experimentation developed by Weinbrenner [Weinbrenner 1997], GPC++,
which is implemented in C++.
2.1 Software tool based on PVM: PADGP version #1
The first version of the software too, was developed using Parallel Virtual
Machine (PVM) communication primitives [Sunderman 1990] to connect
processes, which in this case are subpopulations. Although the first name
for the methodology was Parallel Genetic Programming [Fernández et al
1999] we think a more precise name is Parallel and Distributed Genetic
Programming (PADGP), because of PVM’s features.
This tool allows us not only to decide classic parameters like the number of
individuals per population or mutations and crossover probabilities, but also
to choose the number of populations involved in the experiment, the
migration rate, the number of migrating individuals and the communication
topology.
The tool works by means of the client/server model, in which each client is a
subpopulation and the server is a process that takes charge both of the input/
output buffers and also of establishing the communication topology (figure
1.7). In this way, the server can either work with a predefined topology or
dynamically change it during the run.
The tool can run on a broad range of architectures and operating systems. In
our experiments we have worked with both PC-Linux and Sun-Solaris.
Computation in PADGP can basically be thought of as a collection of
processes, each process representing a population. The
processes/populations can proceed in parallel, and exchange information
using PVM primitives. The messages exchanged by these processes are GP
individuals while communication take place via another process called the
master this runs in parallel with the others, and implements a given
2 A Parallel and Distributed Genetic Programming Tool
communication topology. The master also sends termination signals to the
other processes at the end of the evolution. In this configuration, each
process/population executes the following steps:
While termination condition not reached do in parallel for
each population
-Create a random population of programs;
-Assign a fitness value to each individual;
-Select the best individual and send it to the master;
-Receive a set of n individuals from the master and replace
the n worst individuals in the population;
-Select a set of individuals for reproduction;
-Recombine the new population with crossover;
-Mutate individuals.
And the master process executes the following steps:
For each population do
-Receive individual;
-Send them to another population according to the chosen
topology;
In order to implement random topology, each time the master receives a
block of individuals from a population, it calculates a random number
between 1 and the total number of processes, and sends the block to the
population whose process ID corresponds to that number (the process ID of
the master is 0).
2.2 Software tool based on MPI: PADGP version #2
The second version of the tool has been developed in collaboration with the
Genetic Programming group from Institute of Computer Science in
University of Lausanne. The tool is now based on MPI [MPI 1995] norm,
and has been implemented using MPICH package.
The implementation of the tool described here can be divided into two
components: a parallel genetic programming kernel implemented in C++
and endowed with MPI message passing, and a graphical user interface
written in Java. The parallel system was designed starting from the first
version of the tool [Fernández et al 1999]. Nevertheless the new tool allows
populations to send several individuals simultaneously. [Fernández et al
2000b], [Vanneschi et al 2000c]. The algorithm now takes the following
shape:
While termination condition not reached do in parallel for
each population
-Create a random population of programs;
-Assign a fitness value to each individual;
26
2 A Parallel and Distributed Genetic Programming Tool
-Select the best n individuals (with n≥0) and send them to
the master;
-Receive a set of n individuals from the master and replace
the n worst individuals in the population;
-Select a set of individuals for reproduction;
-Recombine the new population with crossover;
-Mutate individuals.
And the master process executes the following steps:
For each population do
-Receive n individuals;
-Send them to another population according to the chosen
topology;
Before sending the individuals to the master, each population packs these
trees into a message buffer. The master receives the buffer and directly
sends it to another population. The data can consequently be exchanged
between processes with only one send and receive operation, and the
packing and unpacking activities are performed by population managing
processes. The user can parameterize the execution by setting the value of n,
the number N of individuals in each population and the communication
topology among others. The communication between the
processes/populations and the process/master is synchronous in the sense
that all the processes/populations wait until they have received all the n new
individuals before going on to the next iteration.
2.2.1 Graphical User Interface and Monitoring Tool
Most evolutionary computation environments do not feature a Graphical
User Interface (GUI). This is an inconvenient since parameter setting and
other choices have to be made in an old-fasioned file-based way, which is
obscure and difficult for beginners to work with. Even an experienced
researcher may benefit from a more user-friendly environment, especially if
he wishes to monitor closely the complex evolutionary process because this
process might shed light on the nature of the evolution itself.
The GUI is written in standard Java and it is designed to be easy to use. It
communicates with the computation kernel through bi-directional channels,
and also starts the distributed computation. Information is displayed in a
window that features the actions which users can follow (an example of
which is given in figure 2.1). There follows is a brief description of the
actions and information that are available to the user. The parameters for the
run can be entered using the text fields which have been designed for that
purpose and the system warns the user if some parameter is not provided.
Pre-defined standard default parameter settings are also proposed. Some less
important parameters, which also have default values, can be set from a
second window that appears by clicking the “options” button. Run-time
27
2 A Parallel and Distributed Genetic Programming Tool
quantities such as the best and average fitnesses, the average program
complexity and size of the tree can also be calculated and displayed at any
time during the run. The example in figure 2.1 shows a graph of the average
and best fitnesses for the population as a whole. The interface can also
display in an unprocessed or simplified version the tree which corresponds to
the best current solution. The topology can be chosen from a list in the panel
“connection topology” and an icon on the window shows the type selected
(“circle” in the figure). Facilities for end-of-run calculation of several useful
statistics are also provided. We can also examine statistics for each single
node by clicking on the corresponding icon or by using a node number in a
list. Colour codes are also useful for representing different states of the
evolution process, or to visualize nodes that are receiving or sending
messages.
28
Figure 2.1 : The monitoring graphical user interface.
3 Parallel and Distributed Genetic
Programming: Experimental Study
In this chapter we will study GP under the vision of parallel and distributed
systems. GP has been successfully applied to a broad range of optimisation
problems. Even some theoretical studies have dealt with the basis of GP
[Poli 2000], but until now little experimental or theoretical research has been
done on the performance of Parallel and Distributed Genetic Programming.
In this chapter we will apply Parallel and Distributed Genetic Programming
(PADGP) to some benchmark problems. We came to several different
conclusions. The same experiments are then applied to real life problems in
order to test whether the conclusions can also be applied to a wider set of
problems.
3.1 Introduction
Many researchers have supported the use of several populations when
working with Evolutionary Algorithm (EA) techniques. Different
experimental and theoretical studies have reported the efficiency of parallel
genetic algorithms and have studied the relationship between the classic
model and the island model [Cantú-Paz, Goldberg 1997] [Withley et al
1997].
But in the Genetic Programming (GP) domain, matters are less clear. Some
researchers talk about the usefulness of working with multiple large
populations [Andre and Koza 1996], while others question both the size of
populations [Fuchs 1999] and the efficiency of multiple populations in GP
[Punch 1998], at least for the few problems that have been intensively
studied.
In the following sections we will describe the differences between the nature
of the new algorithm: PADGP and GP.
3.2 New parameters
All the evolutionary methods require a set of parameters in order to be able
to solve a given problem. Some of the parameters are shared among several
methodologies and others are specific to one of them. For instance, the
parameter mutation rate is employed by both GP and also GA. The same
3 Parallel and Distributed Genetic Programming : Experimental Study
thing occurs with the parameter crossover rate. On the other hand, among
the parameters which are specific to GP we can find maximum depth and
maximum depth for crossover. The reason why these parameters exist in GP
and are not present in other methodologies is due to the way individuals are
represented in GP. While in GA and other methodologies the length of
individuals is defined and fixed at the beginning of the experiments, GP
varies, the length of individuals varies as the generations run. GP represents
its individuals as trees. Furthermore, crossover operation can produce
longer individuals than those previously created (see figure 1.4). The above
characteristics of GP mean that it is always necessary to establish a limit to
the length of individuals, specifically in order to avoid the exponential
growth of trees.
We may state that despite similarities among evolutionary methods, different
parameters are necessary in different methodologies. The same thing occurs
when changes are made to a given algorithm.
Tuning parallel GP parameters, (e.g. the number of populations, the size of
each population, the topological connections between populations) is
important in gaining maximum performance.
If we want to use the parallel and distributed version of GP, we have to
decide the following points:
•How we will distribute individuals.
•How many populations we will use.
•The communication topology.
•The exchange rate.
•The number of migrating individuals.
The first two issues are related and important. Is it better to employ just one
population with all the individuals? Or on the contrary it is better to
distribute individuals into several smaller populations? If so, should
individuals migrate among populations? Which is the best migration rate
and how many individuals should migrate each time? Once we have decided
to use several populations, what will be the communication topology? This
is an interesting set of questions to be solved. In this chapter we will use
PADGP to some benchmark problems in order to find some answers to these
questions. We will see that, although we found reasonable answers in some
cases, interesting new issues also appeared.
In the following section our goal will be to study the parameters that affect
parallel performance and study their interactions in common problems. By
doing so we hope to develop a more robust model of how parameters might
be set. In this way we may maximise the performance of PADGP with many
different types of problems.
30
3 Parallel and Distributed Genetic Programming : Experimental Study
3.3 Designing Experiments: Measuring Effort
When designing the experiments, we were not interested in obtaining perfect
solutions for each of the benchmark problems. Moreover, some of the
problems can not even be solved in a reasonable amount of time. We were
concerned with the way evolution occurred and how the convergence
process took place. But, due to the stochastic nature of evolutionary process
we should study a set of executions instead of just one if the idea is to
analyse the results of evolution. All the experiments were thus executed 60
times. In the following graphs, all the curves have been obtained by
averaging 60 different runs of the same experiments.
On the other hand, we were interested in comparing PADGP to GP. Due to
the different nature of the new algorithm, we thought it was nonsense to
compare fitness values during generations, because the parallel version of
GP may not only affect fitness but also the length of solutions. We have to
take into account that length directly influences the evaluation time for each
individual. Furthermore, we cannot compare execution times, because
PADGP is suitable for being computed using several machines
simultaneously, and this certainly saves time. Instead, we wanted to
measure the convergence process itself, without bearing execution times in
mind.
We consequently decided to analyse data by means of the computational
effort, calculated as the number of nodes that are evaluated in a GP tree.
Let’s suppose that we have executed the same experiment n times; we can
compute the average size of individuals per generation in each of the
executions (the number of nodes per individual solution tree), and then
calculate the average of those n values. What we really have in each
generation is the average number of nodes per individual over n execution in
a particular problem. Once this average has been computed, the required
effort in a particular generation will be: i*n*avg_length, where i is the
number of individuals in the population, and avg_length is the average
number of nodes previously calculated.
The computed effort is not necessarily useful for comparing results between
very distinct problems, because different functions, nodes, etc, may require
different time values. Nevertheless, it is a helpful measure when comparing
different results obtained with the same problem.
We are working with several populations instead of just one. However, it is
also easy to compute the effort. In each execution we compute the average
number of nodes, taking into account the average number of nodes per
individual per generation in each of the populations. We can then proceed as
we would in a classic model, by averaging all the values of the n different
executions per generation. We finally multiply the number of population by
the number of individuals and by the average number of nodes. If we add up
a number of generations’ efforts, we will have the computational effort
required to produce a result in a particular generation.
31
3 Parallel and Distributed Genetic Programming : Experimental Study
Once the effort was defined, we had to decide which parameters we should
study. Due to the number of new parameters to be studied in PADGP, we
decided to fix some of them and then gradually change others while running
the experiments.
3.4 Benchmark problems
The following step was to decide the set of problems to be studied. We
therefore decided to address a set of problems that have classically been used
for testing different evolutionary methodologies. We will now describe each
of them.
3.4.1 The Even Parity 5 problem.
This problem was studied by Koza by means of GP [Koza 92].
The boolean even-k-parity function of k boolean arguments returns T (true) if
an even number of its Boolean arguments is T, and otherwise returns NIL. If
k=5, 32 different possible combinations are available, so 32 fitness cases
must be checked to evaluate the accuracy of a particular program in the
population. The fitness can be computed as the number of hits over the 32
cases.
We have calculated the fitness value in the following way: a random
program will generate on the average 16 out of 32 correct values. If we
subtract 16 from both values we can obtain the improvement over the
random solution. A value 0 will thus means that the program is bad, while a
value 16 means all the computed values are correct. Graphs will
consequently show values in this range.
Every problem to be solved by means of GP needs a set of functions and
terminals. In the case of the Evenp-5 problem, the set of functions we have
employed is the following: F={NAND,NOR}.
The terminal set in this problem is made up of 5 different variables
T={a,b,c,d,e}. All will take the value 0 or 1 (T of NIL) of each of the 5
input bits.
3.4.2 The Symbolic Regression Problem.
Regression techniques try to find an appropriate expression for a function g
so that, given a variable y depending on the value of another x, g(x) is a good
approximation to y. This means we know a set of pairs (xi,yi) and we look
32
3 Parallel and Distributed Genetic Programming : Experimental Study
for a function g such that the mean-square error
( )
[ ]
∑
=
−
N
i
ii
Nxgy
1
2
1
is
minimum.
The goal here is to find an individual, i.e. a program, which matches a given
equation. For each of the values in the input set, the program must be able to
compute the output obtained by means of the equation. We have employed
the classic polynomial equation:
xxxxxf +++= 234
)(
(3.1)
And the input set is composed of the values 1 to 1000.
For this problem, the set of functions is the following: F={*,//,+,-} where //
is like / but returns 0 instead of ERROR when the divisor is equal to 0, thus
allowing syntactic closure.
We have defined here the fitness value proportional to the sum of the square
errors in each of the test points. A 0 value means perfect solution.
3.4.3 The Ant on the Santa Fe Trail
In this problem, an ant is placed on a toroidal grid. Some of the cells have
food. The food is thus placed on a particular path (see figure 3.1). The
problem consists of driving the ant along the path. It must follow the path,
eating all the food in as few steps as possible
We use the same set of functions as used in [Koza 92]: F={if_food_ahead,
left, right, forward).
33
Figure 3.1 : The Santa Fe trail.
3 Parallel and Distributed Genetic Programming : Experimental Study
We have defined the best fitness value in this problem as 70, which means
that all the food has been eaten by the ant (this value does not represent the
total number of pieces of foods eaten by the ant). Other smaller positive
values indicates the number of pieces eaten on the path. The value 0 means
that no piece of food has been eaten.
3.5 Experimental Results
In the following sections we will study each of the previously selected
parameters (see section 3.3) by means of a set of experiments on problems
described.
Classic parameter values used in these experiments are the followings:
Generational GP, Crossover rate 98%, Mutation rate 2%.
3.6 Isolated Multipopulation Genetic Programming
One of the main parameters affecting PADGP is the frequency of the
exchange of individuals. Bearing in mind the way PADGP works, several
populations exchange individuals at a certain rate; if an experiment is
performed for, let’s say, g generations, we can exchange individuals each
generation, each two generations and so on. One of the limit cases takes
place when no individuals migrate at all. In our first experiment we were
interested in answering the following question: What happens when no
migration occurs? If no migration phase occurs, the communication
topology doesn’t matter, because there is no actual communication, and the
different subpopulations taking part in the experiment are isolated. Each of
the subpopulations can run on a different processor or can even run at
different times on the same processor, because the runs are independent. Of
course the solution to the problem will be the best individual, obtained by
comparing the best individual from each run –subpopulation (see figure 3.2)
We call this version of GP, with its several isolated subpopulations Isolated
Multipopulations Genetic Programming (IMGP) [Fernández et al 2000d].
We will now present here some results obtained when comparing classic GP
to IMGP. Our first discovery was that sometimes, given a certain number of
individuals, it is useful to distribute them among several populations even
when no communication is allowed. This discovery consequently led to
research concentrating on three main questions: firstly, how to distribute
individuals according to the problem in hand; secondly, how many
populations must be employed in proportion to the effort and fitness
involved when solving a problem; and finally, how to use IMGP in the
classification of problems.
It is useful to recall now what we stated earlier: using a large population is
not always the best way of solving a problem when working with GP [Fuchs
1999]. Although simple, the idea of dividing a large population into several
34
3 Parallel and Distributed Genetic Programming : Experimental Study
smaller ones and distributing them over a set of processors is interesting
because it allows us to perform several tasks in parallel.
35
3 Parallel and Distributed Genetic Programming : Experimental Study
The same algorithm can in this way be run simultaneously on several
processors. This method enables us to study an evolutionary algorithm
without changing it while carrying out experiments. And this is actually an
important point: the basic GP algorithm, without modifications, can be used
for carrying out these experiments.
The algorithm is still the same as in the classic version, and we are just
saving computing time. In fact, executing n times one population of i
individuals on one processor will require n*t units of time (t being the time
required for executing one population of i individuals on that processor). If
we use m processors instead, the execution will last (n/m)*t units, which is
obviously less than the former. In both cases statistics could be obtained in
the same way: by calculating the mean of the best individual from each
execution.
Nevertheless, we could think in terms of populations and subpopulations
rather than repeating the same experiment n times. We can execute a
population of i individuals on a processor, which will need t units of time, or
we can divide the population into m subpopulations. Then we can execute
them on m different processors, employing t/m units of time. The time saved
is the same as in the previous case. However, some things change and none
of the results of any m subpopulation are used to obtain a mean. We now
consider that all the m subpopulations make up a bigger population. The
best fitness must thus be found by comparing the best fitness of each
subpopulation and selecting the best from among those values (see again
36
Processor
P
o
p
P
o
p
P
o
p
…
Processor
P
o
p
P
o
p
P
o
p
…
Processor
Processor
…
Figure 3.2 : Left) One processor runs all the populations. Right)
Each population run on a different processor. Other intermediate cases
can be used.
The best individual
of each population at
the end of the
execution.
The solution : The
« best » of all the
best individuals
3 Parallel and Distributed Genetic Programming : Experimental Study
figure 3.2). No mean is calculated. This idea is different from that of
finding the best fitness value using just one population on a processor,
because in the latter we have no “best-individuals” to compare. On the other
hand, we are aware that the number of individuals in each subpopulation is
lower than when using only one population. Consequently, the search space
region explored by a subpopulation is smaller than that explored by the
whole population.
So, what is the best choice taking into account the drawbacks and advantages
of each method? We have to bear in mind not just the computing time but
also the intrinsic differences between both methods. The shape of the search
space probably has a lot to do with the way of solving a particular problem.
There may be search spaces which are uniform enough to be equally
explored by different numbers k of individuals k>l, in which l is a constant
for each problem. Values k<l will not be enough to explore efficiently the
search space. Of course, very large populations will need more time to be
computed. Smaller populations, with approximately l individuals, are more
useful because the time needed to process them is shorter and results are the
same as in larger populations [Fuchs 1999].
Moreover, we can execute a number of these smaller populations over
several processors, and then, choose the best individual from among several
best individuals. All this is done in the time required for a small population
to be computed. Choosing from among several “best-individuals” obviously
enhances results. A particular problem can thus be solved by l or by l+x
(x>0) individuals. Consequently, m isolated populations each with about l
individuals will solve the problem better than one population with m*l
individuals (thus are not any advantages in using more than l individuals).
So, we could execute a set of experiments, each one with a different number
of individuals. Let’s suppose we execute n experiments each with Xi
individuals each:
X1,X2 ...Xn, where Xi> Xi+1. (3.2)
When each experiment ends we have the best fitness value: Fi. We therefore
have a series of values:
F1,F2 ... Fm, fm+1,...fn, where Fi=Fj>fk. (3.3)
This means that using a larger number of individuals than Xm will give us the
same result as with just Xm. But, on the other hand, using Xk<Xm will
produce worst results.
If we employ several populations instead of just one, with k individuals each,
we have an improvement due to the possibility of choosing the best
individual from among those obtained by each subpopulation. At the same
time we find a worsening if k is too small. Nevertheless, there may exist
other search spaces which are not uniform enough to be explored with a
37
3 Parallel and Distributed Genetic Programming : Experimental Study
small number of individuals. Using n individuals is always better than using
n/m individuals and taking the best of m cases, regardless of the value of m.
The statistics obtained will back up this statement.
3.6.1 Results
All the following graphs show the statistics obtained for the problems
studied. As stated beforehand, we always execute each configuration 60
times. So, each graph is thus the result of approximately 240 executions of
IMGP. The total number of executions employed to confirm the ideas
proposed in this section was about 3500. They were executed on a
multiprocessor system with 6 processors. We used the first version of the
tool described in the previous chapter to carry out experimental studies with
Multipopulation GP [Fernández et al 1999] using PVM as the
communication library (see section 2.1).
In the following graphs we compare several configurations. All the figures
compare the classic GP version of the problem, with let’s say n individuals,
with several different configurations of IMGP each with m subpopulations
and n/m individuals per subpopulation. In a given graph, all the
configurations use the same total number of individuals.
In order to compare results, we employed the computational effort,
calculated as the number of nodes of a GP tree that are evaluated (see
section 3.3).
Even when the results provided by classic GP and the IMGP version are the
same, the time spent on computing the latter is shorter because we are using
several processors. However, we are not particularly interested here in the
time that is needed to solve a problem but rather in how well the problem is
solved, i.e., how the algorithm changes and how these differences may help
us to finding solutions.
3.6.1.1 The ant on the santa fe trail problem
The statistics obtained from the ant problem show us several types of
behaviour: in figure 3.3 Ant-5000, we see that when the number of
individuals is very high, 5000 individuals, all the configurations obtain
approximately the same results -this confirms the previously proposed idea
And it probably occurs because in all the configurations studied, the number
of individuals per subpopulation, K, is always bigger than the limit value for
this problem. All the configurations consequently give the same results.
If the total number of individuals is small, 100, (figure 3.3, Ant-100) the
classic version of GP obtains similar results to IMGP when using 2 or 5
populations. Nevertheless, employing 10 populations with 10 individuals
38
3 Parallel and Distributed Genetic Programming : Experimental Study
each is not useful, probably because of the small amount of individuals each
population has.
When working with 250 or 1000 individuals (see figure 3.4), almost all
IMGP configurations proved to outperform classic GP. The limit value for
this problem seems to be between 250 and 1000. Differences among
configurations depend not only on the number of individuals per population
but also on the number of populations and the shape of the search space.
We can thus observe that when the total number of individuals is not so high,
-smaller than 2000- several populations achieve better results if each
population has enough individuals. We are not taking into account the time
required to run each configuration, just the results obtained. Even when
IMGP proves to be worse than classic GP the time required to obtain a given
result is less if we use as many processors as populations.
39
Figure 3.3 :The Ant problem solved by means of classic GP and IMGP using
and 100 and 5000 individuals.
3 Parallel and Distributed Genetic Programming : Experimental Study
40
Figure 3.4 :The Ant problem solved by means of classic GP and IMGP
using 250 and 1000 individuals.
Classic 250
3 Parallel and Distributed Genetic Programming : Experimental Study
3.6.1.2 The even parity 5 problem
Figure 3.5 shows the results we obtained when solving the Even Parity 5
problem with 500 individuals. Several configurations have been employed,
with 1, 2, 5 and 10 subpopulations. When we use 1 population, what we are
actually using is classic GP. We can observe that classic GP outperforms
IMGP when using 500 individuals (figure 3.5). Furthermore, there are no
significant differences when using 2500 individuals (figure 3.6, Evenp
2500). Nevertheless, using 1000 individuals seems to benefit IMGP (see
figure 3.6). Specifically, using 2 or 5 subpopulations improves results, but
using 10 damages them. The differences here are not so important as in the
ant problem. The improvement achieved by using several populations is
lost due to the drawback of the lower number of individuals which make up
each subpopulation.
An important issue arises again: the number of populations is crucial. We
must balance the number of individuals per subpopulation in order to carry
out a good search and on the other hand explore different areas of the search
space. These results are in agreement with other studies which also employe
Multipopulation GP, but use communication among subpopulations
[Fernández et al 2000e].
41
Figure 3.5 : The even parity-5 problem solved by means of classic GP
and IMGP using 500 individuals.
3 Parallel and Distributed Genetic Programming : Experimental Study
42
Figure 3.6 : The even parity-5 problem solved by means of classic
GP and IMGP using 1000 and 2500 individuals.
3 Parallel and Distributed Genetic Programming : Experimental Study
3.6.1.3 The symbolic regression problem
The following figures, 3.7 and 3.8, compare results obtained by means of
classic GP and IMGP when solving the symbolic regression problem using
1000 points. Figure 3.7 confronts several configurations using
125individuals. Figure 3.8 shows results when using 250 and 500
individuals. In the figures we can observe that the differences between
classic GP and IM GP are very slight when using 125, and 500 individuals.
More significant differences appear when using 250 individuals. In this case
results improve if the original population is divided into several ones. This
could points out that exploring the search space with about 5 subpopulations,
each with 50 individuals is better than using 1 population. This is because
every subpopulation may explore a different area. We could thus select from
among the best individuals of each area.
When the number of individuals employed is very high, dividing the
population into several smaller ones will still produce large subpopulations,
and the number of individuals per subpopulation is probably above the limit
number for this problem. This could be the reason why results are similar
between GP and IMGP. But dividing the population into smaller ones is
helpful when the total amount of individuals is neither large nor small, as in
the ant problem.
On the other hand, if the number of individuals is very low, using several
subpopulations will have the drawback of employing an even smaller
number of individuals, and the results will not improve on those achieved by
the classic version.
43
Figure 3.7 : Symbolic regression problem solved by means of GP
and IMGP using 125 individuals.
3 Parallel and Distributed Genetic Programming : Experimental Study
44
Figure 3.8 : Symbolic regression problem solved by means of GP
and IMGP using 250 and 500 individuals.
3 Parallel and Distributed Genetic Programming : Experimental Study
3.6.2 Conclusions
Returning to the questions we asked at the beginning of this section, we can
come to several conclusions: first of all, testing a problem with IMGP as
compared to classic GP is useful in studying the nature of the problem search
space. IMGP is not always helpful in obtaining better results, but in some
problems it is a good choice.
Secondly, the results in each problem depend on the total number of
individuals we employ and at the same time, on the number of populations
we use to distribute them. Results improve as the number of populations
grows. But simultaneously, results get worse as the number of individuals
per population reduces. Given a total number of individuals, a balance must
be reached. Different problems show different results when using the same
configuration. This leads us to believe that the shape of the search space is a
decisive factor regarding the usefulness of IMGP or classic GP. The number
of individuals I and the number of populations P depend on the particular
problem to be solved and are interrelated. This observation confirms the
results of other studies that deal with communicating populations [Punch
1998], [Fernández et al 2000e].
As yet, we cannot predict the best number of individuals or populations to be
used for solving a problem when working with IMGP. Nevertheless, we can
use IMGP to study a problem in terms of its search space. We can study the
results of problems using several configurations, which could be helpful
when classifying those problems.
Finally, we can say that results from these benchmark problems allow us to
state that IMGP is useful: even when the results obtained with IMGP are
quite similar to those obtained with GP, the fact that IMGP works with
several populations and each of the populations can be run on a different
processor, allows us to save computing time. The same results will be thus
obtained in a shorter time.
3.7 PADGP: Studying the communication topology
Now that the simplest limit case has been experimentally studied, it is time
to move on to look at what happens when communication is allowed. We
can really begin to talk about distributed populations working and
collaborating in parallel, for at this stage migration of best individuals
among subpopulations takes place.
According to the model we are to use, several parameters must be studied.
First of all we will concentrate on the communication topology. We
proposed random model in previous sections, and within it the topology
changes as generations run. Now we will compare our random topology
with that used by Koza [Andre and Koza 1996] which was based on a
toroidal grid (figure 3.9).
45
3 Parallel and Distributed Genetic Programming : Experimental Study
3.7.1 Results
All the following graphs show statistics obtained for the problems studied.
As stated earlier, we always execute each configuration 60 times. We use 9
populations in all the executions. The number of individuals was changed
for each experiment, in order to be able to extract some conclusions by
observing a broad set of results. Taking into account that we are using 9
populations, the communication topologies employed are depicted in figure
3.10.
We are using a previously defined efforts to compare results (see section3.3).
Other important parameters in all the experiments shown in this sections
were the following:
•Frequency of exchange: Each generation.
•Number of migrating individuals: 1.
a) b)
46
M
Figure 3.10 : a) Random Topology : All the populations send their
individuals to the master process which randomly decides the destination.
b) Toroidal Grid. Populations send and receive individuals from their
neighbours. Individuals are sent to all the neighbours each migrating
phase.
Figure 3.9 : Toroidal grid.
3 Parallel and Distributed Genetic Programming : Experimental Study
47
3 Parallel and Distributed Genetic Programming : Experimental Study
3.7.1.1 The ant on the santa fe trail
Figure 3.11 compares convergence results when using 500 individuals per
population. In each generation we evaluated 9*500= 4500 individuals. We
can see that results are similar although slightly better for the grid topology
and we must also not forget that all the results are the average over 60
executions, these give statistical significance. With grid topology, each
individual is sent four times, because each population has 4 neighbours.
Figures 3.12 and 3.13 show several statistics each one computed with a
different number of individuals per subpopulation. We can again see that
although differences are not very important, random topology almost always
obtains better results than grid topology. The most important differences
have appeared when using 60 individuals per population.
48
Figure 3.11 : Comparing topologies using 9 populations with 500 individuals
each. In each generation 1 individual is sent to another population.
Grid
3 Parallel and Distributed Genetic Programming : Experimental Study
49
Figure 3.12 :Comparing topologies using 9 populations with 250 and 125
individuals each. In each generation 1 individual is sent to another population.
Grid
Grid
3 Parallel and Distributed Genetic Programming : Experimental Study
3.7.1.2 The even parity-5 problem
The same kind of experiments were also carried out for the even parity-5
problem. We again employed 9 populations and varied the number of
individuals. Figures 3.14, 3.15 and 3.16 show different results obtained
when using 25, 50, 100, 250 and 500 individuals. We can notice in all the
graphs that random topology achieves similar results than grid topology.
The differences are not important enough to say that random topology allows
solutions to converge more quickly. Of course we have not found the
perfect solution. But the fact is that we did not carry on long enough to do
so. We simply wanted to compare different evolutions, not to find solutions.
In this problem we cannot conclude which is the best topology.
50
Figure 3.13 :Comparing topologies using 9 populations with 60 individuals each.
In each generation 1 individual is sent to another population.
Grid
3 Parallel and Distributed Genetic Programming : Experimental Study
51
Figure 3.14: Comparing topologies using using 250 and 500 individuals.
Each population sends one individual per generation.