Conference PaperPDF Available

Self Modifying Cartesian Genetic Programming: Fibonacci, Squares, Regression and Summing

Authors:
  • Machine Intelligence Ltd.

Abstract and Figures

Self Modifying CGP (SMCGP) is a developmental form of Cartesian Genetic Programming(CGP). It is able to modify its own phe- notype during execution of the evolved program. This is done by the inclusion of modification operators in the function set. Here we present the use of the technique on several different sequence generation and regression problems.
Content may be subject to copyright.
Self Modifying Cartesian Genetic Programming:
Fibonacci, Squares, Regression and Summing
Simon Harding1, Julian F. Miller2, and Wolfgang Banzhaf1
1Department of Computer Science, Memorial University, Canada
{simonh,banzhaf}@cs.mun.ca
http://www.cs.mun.ca
2Department Of Electronics, University of York, UK
jfm7@ohm.york.ac.uk
http://www.elec.york.ac.uk
Abstract. Self Modifying CGP (SMCGP) is a developmental form of
Cartesian Genetic Programming(CGP). It is able to modify its own phe-
notype during execution of the evolved program. This is done by the
inclusion of modification operators in the function set. Here we present
the use of the technique on several different sequence generation and re-
gression problems.
Keywords: Genetic programming, developmental systems.
1 Introduction
In biology, the process whereby genotypes gives rise to a phenotype can be
regarded as a form of self-modification. At each decoding stage, the expressed
genes, the local environment and the cellular machinery influence the subsequent
genetic expression [1,2]. The concept of self-modification can be a unifying way of
looking at development that allows the inclusion of genetic regulatory processes
inside single cells, graph re-writing and multi-cellular systems.
In evolutionary computation self-modification has not been widely considered,
but Spector and Stoffel examined its potential in their ontogenetic programming
paper [3]. It also has been studied in the the graph re-writing system of Gruau [4]
and was implicitly considered in Miller [5]. However, recently, much work in com-
putational development has concentrated at a multi-cellular level and the aim has
been to show that evolution could produce developmental cellular programs that
could construct various cellular pattern [6]. A common motivation for evolvingde-
velopmental programs is that they may allow the evolution of arbitrarily large sys-
tems which are infeasible to evolve using a direct genetic representation. However
many of the proposed developmental approaches are not explicitly computational
in that often one must apply some other mapping process from the developed cel-
lular structure into a computation. Further discussion of our motivation, and how
it relates to previous work, can be found in [7].
In our previous work, we showed that by utilizing self-modification opera-
tions within an existing computational method (a form of genetic programming,
L. Vanneschi et al. (Eds.): EuroGP 2009, LNCS 5481, pp. 133–144, 2009.
c
Springer-Verlag Berlin Heidelberg 2009
134 S. Harding, J.F. Miller, and W. Banzhaf
called Cartesian Genetic Programming, CGP) we could obtain a system that
(a) could develop over time in interaction with environmental inputs and (b)
would at every stage provide a computational function [7]. It could stop its own
development, if required, without external input. Another interesting feature of
the approach is that, in principle, programs could be evolved which allow the
replication of the original code.
Here we demonstrate SMCGP on a number of different tasks. The problems
illustrate different aspects and capabilities of SMCGP, and are intended to be
representative of the types of problems we will investigate in future.
2SelfModifyingCGP
2.1 Cartesian Genetic Programming (CGP)
Cartesian Genetic Programming represents programs as directed graphs [8].
Originally CGP used a program topology defined by a rectangular grid of nodes
with a user defined number of rows and columns. However, later work on CGP
always chose the number of rows to be one, thus giving a one-dimensional topol-
ogy, as used in this paper. In CGP, the genotype is a fixed-length representation
and consists of a list of integers which encode the function and connections of
each node in the directed graph.
CGP uses a genotype-phenotype mapping that does not require all of the
nodes to be connected to each other, resulting in a bounded variable length
phenotype. This allows areas of the genotype to be inactive and have no influence
on the phenotype, leading to a neutral effect on genotype fitness called neutrality.
This type of neutrality has been investigated in detail [8,9,10] and found to be
beneficial to the evolutionary process on the problems studied.
2.2 SMCGP
In this paper, we use a slightly different genotype representation to previously
published work using CGP.
Fig. 1. The genotype maps directly to the initial graph of the phenotype. The genes
control the number, type and connectivity of each of the nodes. The phenotype graph
is then iterated to perform computation and produce subsequent graphs.
Self Modifying Cartesian Genetic Programming 135
Fig. 2. Example program execution. Showing the DUP(licate) operator being acti-
vated, and inserting a copy of a section of the graph (itself and a neighboring functions
on either side) elsewhere in the graph in next iteration. Each node is labeled with a
function, the relative address of the nodes to connect to and the parameters for the
function (see Section 2.4).
As in CGP an integer gene (in our case the first) encodes the function the node
represents. This is followed by a number of integer connection genes that indicate
the location in the graph where the node takes its inputs from. However in SMCGP
there are also three real-valued genes that encode parameters required for the node
function. Also there is a binary gene that indicates if the node should be used as
an program output. In this paper all nodes take two inputs, hence each node is
specified by 7 genes. An example genotype is shown in Figure 1.
Like CGP, nodes take their inputs in a feed-forward manner from either the
output of a previous node or from a program input (terminal). The actual num-
ber of inputs to a node is dictated by the arity of its function. However, unlike
previous implementations of CGP, nodes are addressed relatively and specify
how many nodes back in the graph they are connected to. Hence, if the connec-
tion gene is 1 it means that the node will connect to the previous node in the
list, if the gene has value 2 then the node connects 2 nodes back and so on. All
such genes are constrained to be greater than 0, to avoid nodes referring directly
or indirectly to themselves.
If a gene specifies a connection pointing outside of the graph, i.e. with a
larger relative address than there are nodes to connect to, then this is treated as
connecting to zero value. Unlike classic CGP, inputs arise in the graph through
special functions. This is described in section 2.3. The relative addressing of
connection genes allows sub-graphs to be placed or duplicated elsewhere in the
graph whilst retaining their semantic validity.
This encoding is demonstrated visually in Figure 2.
The three node function parameter genes are primarily used in performing
modification to the phenotype. In the genotype they are represented as real
numbers but be cast (truncated) to integers if certain functions require it.
Section 4 details the available functions and any associated parameters.
2.3 Inputs and Outputs
The way we handled inputs in our original paper on SMCGP was flawed. We
found, it did not scale well as sub-graphs became disconnected from inputs, as
136 S. Harding, J.F. Miller, and W. Banzhaf
self-modifying functions moved them away from the beginning of the graph and
they lost their semantic validity. The new input strategy required two simple
changes from conventional CGP and our previous work in SMCGP.
The first, was to make all negative addressing return false (or 0 for non-binary
versions of SMCGP). In previous work [7], we used negative addresses to connect
nodes to input values.
The second was to change how the INPUT function works. When a node is of
type INP (shorthand for INPUT), each successive call gets the next input from
the available set of inputs. If the INP node is called more times than there are
inputs, the counting starts from the beginning again, and the first node is used.
Outputs are handled slightly differently to inputs. We added another gene to
the SMCGP node that decides whether the phenotype could use that node as
an output. In previous work, we used the last n-nodes in the graph to represent
the n-outputs. However, as with the inputs, we felt this approach did not scale
as the graph changes size. When an individual is evaluated, the first stage is to
identify the nodes in the graph that have their output gene set to 1. Once these
are found, the graph can be evaluated from each of these nodes in a recursive
manner.
If no nodes are flagged as outputs, the last n nodes in the graph are used as
the n-outputs. Essentially, this reverts the system back to the previous approach.
If there are more nodes flagged as outputs than are required, then the leftmost
nodes that have flagged outputs are used until the required number of outputs
is reached. If there are fewer nodes in the graph than required outputs, the
individual is deemed to be corrupt and it is not evaluated (it is given a bad
fitness score to ensure that it is not selected for).
2.4 Evaluation of the SMCGP Graph
From a high level perspective, when a genotype is evaluated the process is as
follows. The initial phenotype graph is a copy of the genotype graph. This graph
is then executed, and if there are any modifications to be made, they alter the
phenotype graph.
The genotype is invariant during the entire evaluation of the individual. All
modifications are made to the phenotype which starts out as a copy of the
genotype. In subsequent iterations, the phenotype will usually gradually diverge
from the genotype.
The encoded graph is executed in the same manner as standard CGP, but
with changes to allow for self-modification. The graph is executed by recursion,
starting from the output nodes down through the functions, to the input nodes.
In this way, nodes that are unconnected (’junk’) are not processed and do not
affect the behavior of the graph at that stage.
For non-self modification function nodes the output value, as in GP in general,
is the result of the mathematical operation on input values.
On executing a self-modification node, a comparison is made of the two input
values. If the second value is less than the first, the second value is passed
through. Otherwise, the node is activated and the first value passed through
Self Modifying Cartesian Genetic Programming 137
and the self-modification function in that node is added to a “To Do” list of
pending modifications. This makes the execution of the self modifying function
dependent on the data passing through the program.
After each iteration, the “To Do” list is parsed, and all manipulations are
performed (provided they do not exceed the number of operations specified in the
user defined “To Do” list length). The parsing is done in order of the instructions
being appended to the list, i.e. first in is first to be executed.
The length of the list can be limited as manipulations are relatively compu-
tationally expensive to perform. Here, we limit the length to just 2 instructions.
All graph manipulation functions require a number of parameters, as described
in section 4.
3 Evolutionary Algorithm and Parameters
We use an (1+4) evolutionary strategy for the experiments in this paper. We
bootstrap the process by testing a population of 50 random individuals. We
then select the best individual and generate four offspring. We test these new
individuals, and use the best of these to generate the next population. If there is
more than one best in the population and one of these is the parent, we always
choose the offspring to encourage neutral drift (see section 2.1).
We have used a relatively high (for CGP) mutation rate of 0.1. This means
that each gene has a probability of 0.1 of being mutated. Mutations for the
function type and relative addresses themselves are unbiased; a gene can be
mutated to any other valid value.
For the real-valued genes, the mutation operator can choose to randomize the
value (with probability 0.1) or add noise (normally distributed, sigma 20).
Evolution is limited to 10,000,000 evaluations. Trials that fail to find a solution
in this time are considered to have failed.
The evolutionary parameter values have not been optimized, and we would
expect performance increases if more suitable values were used.
4 Function Set
The function set is defined in two parts. The first is a set of modification opera-
tors, as shown in table 1. These are common to all data types used in SMCGP.
The functions chosen are intended to give coverage to as many modification
operations as possible. The remainder of the set are the computational opera-
tions. The data type these functions manipulate is determined by the problem
definition. Table 2 contains definitions for all the numerical (i.e. non-modifying)
operators that are available. Depending on the experiment, different sub-sets of
this set are used.
The way self modifying functions act is defined by 4 variables. The three genes
that are double precision numbers, here we call them “parameters” and denote
them P0,P1,P2. The other variable is the integer position of the node in the
phenotype graph that contained the self modifying function (i.e. the leftmost
138 S. Harding, J.F. Miller, and W. Banzhaf
Table 1 . Self modification functions
Function name Description
Duplicate and scale ad-
dresses (DU4)
Starting from position (P0+x)copy(P1)nodesand
insert after the node at position (P0+x+P1). During
the copy, cij of copied nodes are multiplied by P2.
Shift Connections
(SHIFTCONNECTION)
Starting at node index (P0+x), add P2to the values of
the cij of next P1nodes.
Shift By Mult Connec-
tions (MULTCONNEC-
TION)
Starting at node index (P0+ x), multiply the cij of the
next P1nodes by P2nodes.
Move (MOV) Move the nodes between (P0+x)and(P0+x+P1)and
insert after (P0+x+P2).
Duplication (DUP) Copy the nodes between (P0+x)and(P0+x+P1)and
insert after (P0+x+P2).
Duplicate, Preserving
Connections (DU3)
Copy the nodes between (P0+x)and(P0+x+P1)and
insert after (P0+x+P2). When copying, this function
modifies the cij of the copied nodes so that they continue
to point to the original nodes.
Delete (DEL) Delete the nodes between (P0+x)and(P0+x+P1).
Add (ADD ) Add P1new random nodes after (P0+x).
Change Function (CHF) Change the function of node P0to the function associ-
ated with P1.
Change Connection
(CHC)
Change the (P1mod3)th connection of node P0to P2.
Change Parameter (CHP) Change the (P1mod3)th parameter of node P0to P2.
Overwrite (OVR) Copy the nodes between (P0+x)and(P0+x+P1)
to (P0+x+P2), replacing existing nodes in the target
position.
Copy To Stop (COPY-
TOSTOP)
Copy from xto the next “COPYTOSTOP” function
node, “STOP” node or the end of the graph. Nodes are
inserted at the position the operator stops at.
node is position 0), we denote this x. In the definitions of the SM functions we
often need to refer to the values taken by node connection genes (which are all
relative addresses). We denote the jth connection gene on node at position i,
by cij .
There are several rules that decide how addresses and parameters are
treated:
When Piare added to the x, the result is treated as an integer.
Address indexes are corrected if they are not within bounds. Addresses below
0 are treated as 0. Addresses that reach beyond the end of the graph are
truncated to the graph length.
Start and end indexes are sorted into ascending order (if appropriate).
Operations that are redundant (e.g. copying 0 nodes) are ignored, however
they are taken into account in the ToDo list length.
Self Modifying Cartesian Genetic Programming 139
Table 2 . Numeric and other functions
Function name Description
No operation (NOP) Passes through the first input.
Add, Subtract, Multiply,
Divide (DADD, DSUB,
DMULT, DDIV)
Performs the relevant mathematical operation on the two
inputs.
Const (CONST) Returns a numeric constant as defined in parameter P0.
x,1
xCos, Sin, TanH,
Absolute (SQRT, DRCP,
COS, SIN, TANH, DABS)
Performs the relevant operation on the first input (ig-
noring the second input).
Average (AVG) Returns the average of the two inputs.
Node index (INDX) Returns the index of the current node. 0 being the first
node.
Input count (INCOUNT) Returns the number of program inputs.
Min, Max (MIN, MAX) Returns the minimum/maximum of the two inputs.
5 Experiments
5.1 Mathematical Functions and Sequences
We can use SMCGP to produce numeric sequences, where the program provides
the first number on the first iteration, the 2nd on the next and continues to
generate the next number in a sequence as we iterate the program. For these
sequences, the input value to the program is fixed to 1. This forces the program
to modify itself to produce a new program that produces the next digit. We
demonstrate this ability on two sequences of integers; Fibonacci and a list of
square numbers.
Squares. In this task, a program is evolved that generates a sequence of squares
0,1,2,4,9,16,25,... without using multiplication or division operations. As Spector
(who first devised this problem) points out, this task can only be successfully
performed if the program can modify itself - as it needs to add new function-
ality in the form of additions to produce the next integer in the sequence [3].
Hence, conventional genetic programming, including CGP, will be unable to find
a general solution to this problem.
The function set for this experiment includes all the self modification functions
and DADD, CONST, INP, MIN, MAX, INDX, SORT and INCOUNT.
Table 3 shows a summary of results for the squares problem (based on 110
runs). Programs were evolved to produce the first 10 terms in the sequence,
with the fitness score being the number of correctly produced numbers. After
successfully evolving a program that generates this sequence, the programs were
tested on their ability to generalize to produce the first 100 numbers in the
sequence. It was found that 84.3% of solutions generalised.
We found that as the squaring program iterates, the length of the phenotype
graph increased linearly. However, the number of active nodes inside the graph
fluctuated a lot on early iterations but stabilized after about 15 iterations.
140 S. Harding, J.F. Miller, and W. Banzhaf
Table 3 . Evaluations required to evolve a program that can generate the squares
sequence
Avg. Evaluations Std. dev. Min. evaluations Max. evaluation % generalize
141,846 513,008 392 3,358,477 84.3
Table 4 . Success and evaluations required to evolve programs that generate the Fi-
bonacci sequence. The starting condition and the length appear to have little influence
on the the success rate of time taken. Percentage of solutions that generalize to solve
up to 74 terms.
Start Max. Iterations % Success Avg Evals % Generalise
01 12 89.1 1,019,981 88.6
01 50 87.4 774,808 94.5
12 12 86.9 965,005 90.4
12 50 90.8 983,972 94.4
Fibonacci. Koza demonstrated that recursive tree structures could be evolved
that generate the Fibonacci sequence [11]. Huelsbergen evolved machine lan-
guage programs in approximately 1 million evaluations, and found that all his
evolved programs were able to generalise [12]. Nishiguchi [13] successfully evolved
recursive solutions with a success rate of 70%. The algorithm quickly evolves
solutions in approximately 200,000 evaluations. However, the authors do not ap-
pear to test for generalisation. More recently, Agapitos and Lucas used a form
of object oriented GP to solve this problem [14]. They tested their programs on
the first 12 numbers in the sequence, and tested for generality. Generalization
was achieved with 25% success and on average required 20 million evaluations.
In addition, they note that their approach is computationally expensive. In [15]
a graph based GP (similar to CGP) was demonstrated on this problem, however
the approach achieved a success rate of only 8% on the training portion (first
12 integers) of the sequence. Wilson and Heywood evolved recursive programs
using Linear GP that solved up to order-3 Fibonacci sequences [16]. On average
solutions were discovered in 2.12 ×105evaluations, with a generalization rate of
83%. We evolve for both the first 12 and 50 numbers in the sequence and test
for generality to 74 numbers (after which the value exceeds a long int).
Table 4 shows the success rate and number of evaluations required to evolve
programs that produce the Fibonacci sequence (based on 287 runs). As with
the squares problem, the fitness function is the number of correctly outputted
numbers. The starting condition (either 0,1 or 1,2) and the length of the target
sequence appear to have little influence on the success rate of time taken. The
results show that sequences that are evolved to produce the first 50 numbers
do better at generalizing to produce the first 74 numbers. However, the starting
condition again makes no difference.
Sum of numbers. Here we wished to evolve a program that could sum an
arbitrarily long list of numbers. At the n-th iteration, the evolved program should
Self Modifying Cartesian Genetic Programming 141
Table 5 . Evaluations required to evolve a program that can add a set of numbers
Average Minimum Maximum Std. Deviation
6,922 2,27 2,9603 4,998
Table 6 . Evaluations required to evolve a SMCGP program that can add a set of
numbers of a given size. 100% of SMCGP experiments were successful. The % success
rate for conventional CGP is also shown.
Size of set Average Minimum Maximum Std. dev %CGP
250 50 50 0100
3618 54 3,248 566 80
41,266 64 9,334 1,185 95.8
51,957 116 9,935 1,699 48
62,564 120 11,252 2,151 38.1
73,399 130 17,798 2,827 0
84,177 184 17,908 3,208 0
95,138 190 18,276 3,871 0
10 5,918 201 22,204 4,401 0
be able to take n inputs and compute the sum of all the inputs. We devised this
problem because we thought it would be difficult for genetic programming, but
relatively easy for a technique such as neural networks. The premise being, that
neural networks appear to perform well when combining input values and genetic
programming seems to prefer feature selection on the inputs.
Input vectors consist of random sequences of integers. The fitness is defined
as the absolute cumulative error between the output of the program and the
expected sum of the values. We evolved programs which were evaluated on input
sequences of 2 to 10 numbers. The function sets consists of the self modifying
functions and just the ADD operator.
Table 5 summarizes the results this problem. All experiments were found to
be successful, in that they evolved programs that could sum between 2 and 10
numbers (depending on the number of iterations the program is iterated). Table
6 shows the number of evaluations required to reach the nth sum (where n is
from 2 to 10).
After evolution, the best individual for each run was tested to see how well
it generalized. This test involved summing a sequence of 100 numbers. It was
found that most solutions generalized, however in 0.07% of cases, they did not.
We also tested the ability of conventional CGP to sum a set of numbers.
Here CGP could only be evolved for a given size of set of input numbers. The
results (based on 500 runs) are also shown in table 6. It is revealed that CGP
is able to solve this problem only for a smaller sets of numbers. This shows
a clear benefit of the self-modification approach in comparison with the direct
encoding.
142 S. Harding, J.F. Miller, and W. Banzhaf
5.2 Regression and Classification
Bioinformatics Classification. In this experiment, SMCGP is applied to the
protein classification problem described in [17]. The task is to predict the location
of a protein in a cell, from the amino acids in the particular protein. The entire
dataset was used for the training set. The set consisted of 2,427 entries, with
19 variables each and 1 output. The function set for SMCGP includes all the
mathematical operators in addition to the self modifying command set. The
CGP function set contained just the mathematical operators (see section 4). For
this type of problem, it is not clear that a self-modification approach would have
advantages compared with classic CGP. Also, we added the number of iterations
to the genotype so that the phenotype is iterated that many times before being
executed on the training set.
Table 7 shows the summary of results for this problem (based on 100 runs of
each representation). Both CGP and SMCGP perform similarly. The addition of
the self modification operations does not appear to hinder evolvability - despite
the increase in search space size.
Table 7. Results summary for the bio informatics classification problem
-CGP SMCGP
Average fitness (training) 66.81 66.83
Std. dev. fitness (training) 6.35 6.45
Average fitness (validation) 66.10 66.18
Std. dev. fitness (validation) 5.96 6.46
Avg. evaluations to best fitness (training) 7,679 7,071
Std. dev. evaluations to best fitness (training) 2,452 2,644
Avg. evaluations to best fitness (validation) 7,357 7,161
Std. dev. evaluations to best fitness (validation) 2,386 2,597
Powers Regression. A problem was devised that tests the ability of SMCGP
to learn a ‘modular’ regression problem. The task is to evolve a program that,
depending on the iteration, approximates the expression xnwhere n is the itera-
tion number. The fitness function applies x as integers from 0 to 20. The fitness
is defined as the number of wrong outputs (i.e. lower is better).
The function set contains all the modification operators (section 4) and NOP,
DADD, DSUB, NOP, DADD, DSUB, DMULT, DDIV, CONST, INDX and IN-
COUNT from the numeric operations (section 4).
Programs are evolved to n= 10 and then tested for generality up to n= 20.
As with other experiments, the program is evolved incrementally. Where it first
tries to solves n=1 and if successful, is evaluated and evolved for n=1 and n=2
until eventually it is evaluated on n=1 to 10.
Table 8 shows the results summary for the powers regression problem. All 337
runs were successful. In this instance, there is an interesting difference between
the two starting conditions. If the fitness function starts with n= 1 to 5, it is
found that that fewer evaluations are required to reach n= 10. However, this
Self Modifying Cartesian Genetic Programming 143
Table 8 . Summary of results for the powers regression problem
Number of Initial Test Sets Average Evaluations Std Dev. Percentage Generalize
1687,156 869,699 60.4
5527,334 600,800 55.6
leads to reduced generalization. Using a Kolmogorov-Smirnov test, it was found
that the difference in the evaluations required is statistically significant (p=0.0).
6 Conclusions
We have examined and discussed a developmental form of Cartesian Genetic
Programming called Self Modifying CGP and evaluated and compared it with
classic CGP on a number of diverse problems. We found that it is more efficient
than classic CGP at solving four of the test problems: Fibonacci sequence, se-
quence of squares, sum of inputs, and a power function. In addition it appears
that it was able to obtain general solutions for all these problems, although
full confirmation of this will require further analysis of evolved programs. On a
fourth problem, classification it performed no worse than CGP despite a larger
search space.
Other approaches to solving problems, such as Fibonacci, produce a com-
puter program. Instead, at each iteration we produce a structure that produces
the output. This could be considered as unrolling the loops (or recursion) in a
program. In a related paper [18], we use this structure building approach to con-
struct digital circuits. In future work, we will investigate SMCGP when used as
a continuous program. We believe combining both approaches will be beneficial.
References
1. Banzhaf, W., Beslon, G., Christensen, S., Foster, J.A., K´ep`es, F., Lefort, V., Miller,
J.F., Radman, M., Ramsden, J.J.: From artificial evolution to computational evo-
lution: A research agenda. Nature Reviews Genetics 7, 729–735 (2006)
2. Kampis, G.: Self-modifying Systems in Biology and Cognitive Science. Pergamon
Press, Oxford (1991)
3. Spector, L., Stoffel, K.: Ontogenetic programming. In: Koza, J.R., Goldberg, D.E.,
Fogel, D.B., Riolo, R.L. (eds.) Genetic Programming 1996: Proceedings of the First
Annual Conference, pp. 394–399. MIT Press, Stanford University (1996)
4. Gruau, F.: Neural network synthesis using cellular encoding and the genetic al-
gorithm. Ph.D. dissertation, Laboratoire de l’Informatique du Parall´elisme, Ecole
Normale Sup´erieure de Lyon, France (1994)
5. Miller, J.F., Thomson, P.: A developmental method for growing graphs and circuits.
In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606,
pp. 93–104. Springer, Heidelberg (2003)
6. Kumar, S., Bentley, P.: On Growth, Form and Computers. Academic Press, London
(2003)
144 S. Harding, J.F. Miller, and W. Banzhaf
7. Harding, S.L., Miller, J.F., Banzhaf, W.: Self-modifying cartesian genetic program-
ming. In: Thierens, D., Beyer, H.-G., et al. (eds.) GECCO 2007: Proceedings of
the 9th annual conference on Genetic and evolutionary computation, vol. 1, pp.
1021–1028. ACM Press, London (2007)
8. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf,
W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000.
LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000)
9. Vassilev, V.K., Miller, J.F.: The advantages of landscape neutrality in digital circuit
evolution. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES
2000. LNCS, vol. 1801, pp. 252–263. Springer, Heidelberg (2000)
10. Yu, T., Miller, J.: Neutrality and the evolvability of boolean function landscape.
In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon,
W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 204–217. Springer, Heidelberg
(2001)
11. Koza, J.: Genetic Programming: On the Programming of Computers by Natural
Selection. MIT Press, Cambridge (1992)
12. Huelsbergen, L.: Learning recursive sequences via evolution of machine-language
programs. In: Koza, J.R., Deb, K., et al. (eds.) Genetic Programming 1997: Pro-
ceedings of the Second Annual Conference, pp. 186–194. Morgan Kaufmann, Stan-
ford University (1997)
13. Nishiguchi, M., Fujimoto, Y.: Evolution of recursive programs with multi-niche
genetic programming (mnGP). In: Evolutionary Computation Proceedings, 1998.
IEEE World Congress on Computational Intelligence, pp. 247–252 (1998)
14. Agapitos, A., Lucas, S.M.: Learning recursive functions with object oriented genetic
programming. In: Collet, P., Tomassini, M., Ebner, M., Gustafson, S., Ek´art, A.
(eds.) EuroGP 2006. LNCS, vol. 3905, pp. 166–177. Springer, Heidelberg (2006)
15. Shirakawa, S., Ogino, S., Nagao, T.: Graph structured program evolution. In: Pro-
ceedings of the 9th annual conference on Genetic and evolutionary computation,
pp. 1686–1693. ACM, London (2007)
16. Wilson, G., Heywood, M.: Learning recursive programs with cooperative coevolu-
tion of genetic code mapping and genotype. In: GECCO 2007: Proceedings of the
9th annual conference on Genetic and evolutionary computation, pp. 1053–1061.
ACM Press, New York (2007)
17. Langdon, W.B., Banzhaf, W.: Repeated sequences in linear genetic programming
genomes. Complex Systems 15(4), 285–306 (2005)
18. Harding, S., Miller, J.F., Banzhaf, W.: Self modifying cartesian genetic program-
ming: Parity. In: CEC 2009 (2009) (submitted)
... Perhaps because, by early GP standards, the Fibonacci Problem needs a large population, see Figure 3, it has been little used in GP. However 17 years after Koza's book was published, At EuroGP 2009 Harding et al. gave a nice summary [13]. Their survey includes [14], [39], [1], [44], and [48], all of these used approaches to the Fibonacci Problem which differ from Koza's [17]. ...
... Edward N. Lorenz (1972) was uncertain if a single flap of a butterfly's wings in one hemisphere could cause a dramatic change in the weather the other side of the equator but argued that the atmosphere is chaotic and so difficult to predict in the short term [37]. (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) where it impacts the tree's output. GP trees plotted with output at center of circular lattice [9]. ...
... Left +1, right RANDINT. (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) where it impacts the tree's output. GP trees plotted with output at center of circular lattice [9]. ...
Preprint
Full-text available
We inject a random value into the evaluation of highly evolved deep integer GP trees 9743720 times and find 99.7percent Suggesting crossover and mutation's impact are dissipated and seldom propagate outside the program. Indeed only errors near the root node have impact and disruption falls exponentially with depth at between exp(-depth/3) and exp(-depth/5) for recursive Fibonacci GP trees, allowing five to seven levels of nesting between the runtime perturbation and an optimal test oracle for it to detect most errors. Information theory explains this locally flat fitness landscape is due to FDP. Overflow is not important and instead, integer GP, like deep symbolic regression floating point GP and software in general, is not fragile, is robust, is not chaotic and suffers little from Lorenz' butterfly. Keywords: genetic algorithms, genetic programming, SBSE, information loss, information funnels, entropy, evolvability, mutational robustness, optimal test oracle placement, neutral networks, software robustness, correctness attraction, diversity, software testing, theory of bloat, introns
... Here, the fitness of phenotype is calculated in each iteration. It has been used for the evolution of even parity circuits (Harding et al. 2009a) with 100% success. Harding et al. (2009b) also used SMCGP to study Fibonacci, Squares, Regression, and so on with the success rate of 100%. ...
... - Miller (2004, 2005a,b) show that results using ECGP to be better and efficient on solving problems of digital circuits and as problems become harder ECGP has better results. Self-Modifying CGP(SMCGP) Self-Modifying functions are used in addition to CGP representation Mutation and Large collection of SM functions Harding et al. (2007) -Study of Fibonacci, Squares, and Regression (Harding et al. 2009a). -Evolution of large parity circuits (Harding et al. 2009b). ...
... Another extension to CGP is Self-Modifying CGP (SMCGP), which defines self-modifying functions that can be executed over time and has its inspiration from biology. SMCGP has been used to study Fibonacci squares and regression (Harding et al. 2009a), evolution of large parity circuits (Harding et al. 2009b), and calculation of "π " and "e" to any arbitrary precision (Harding et al. 2010b). An extension within SMCGP, called SMCGP in two dimensions (SMCGP-2) , was proposed that uses height and width to define a 2D grid of nodes. ...
Article
Cartesian Genetic Programming (CGP) is a variant of Genetic Programming with several advantages. During the last one and a half decades, CGP has been further extended to several other forms with lots of promising advantages and applications. This article formally discusses the classical form of CGP and its six different variants proposed so far, which include Embedded CGP, Self-Modifying CGP, Recurrent CGP, Mixed-Type CGP, Balanced CGP, and Differential CGP. Also, this article makes a comparison among these variants in terms of population representations, various constraints in representation, operators and functions applied, and algorithms used. Further, future work directions and open problems in the area have been discussed.
... Protected division is used, returning 1.0 whenever the denominator is 0. The program graphs are iterated to produce the sequence elements. The fitness function is the amount of correct numbers for the first ten elements of the Fibonacci sequence, in contrast to other approaches where the first 12 or the first 50 elements are used for training hard (Simon Harding et al., 2009b). The evolutionary process ends when the first ten elements are correctly generated or when the maximum amount of evaluations is reached. ...
... Finally, the circuits are tested for generalization over the first 74 elements of the Fibonacci sequence, in order to compare with the results reported in (Simon Harding et al., 2009b). The results for SM-CGP are transcribed in Table 2, although the approaches are not directly comparable. ...
... This limited function set only allows the regression of linear functions, and so this problem would be impossible to solve with traditional GP methods 3 . Based on the use of self-modification functions in the phenotype, different developmental approaches have been proposed that solve this problem with success, amongst them (Simon Harding et al., 2009b). ...
Article
Full-text available
Evolutionary Algorithms (EA) approach differently from nature the genotype-phenotype relationship, a view that is a recurrent issue among researchers. Recently, some researchers have started exploring computationally the new comprehension of the multitude of regulatory mechanisms that are fundamental in both processes of inheritance and of development in natural systems, by trying to include those mechanisms in the EAs. One of the first successful proposals was the Artificial Regulatory Network (ARN) model. Soon after some variants of the ARN, including different improvements over the base model, were tested. In this paper, we revisit the Regulatory Network Computational Device (ReNCoDe), now empowered with feedback connections, providing a formal demonstration of the typical solutions evolved with this representation. We also present some preliminary results of using a variant of the model to deal with problems with multiple outputs.
... Most of the approaches take a very high number of evaluations to find a solution and are not completely effective. Moreover, the generalization is usually poor, although few approaches managed to obtain good generalization results (for a good summary of these results please refer to [16]). The approach followed here is to evolve circuits using the arithmetic operators as the function set {?,-,*,/} and {0,1} as the terminal set. ...
... The circuits are then iterated to produce the sequence elements. The fitness function is the amount of correct numbers for the first ten elements of the Fibonacci sequence, in contrast to other approaches where the first 12 and the first 50 elements are used for training [16]. The evolutionary process ends when the first ten elements are correctly generated or when the maximum amount of evaluations is reached. ...
... The evolutionary process ends when the first ten elements are correctly generated or when the maximum amount of evaluations is reached. Finally, the circuits are tested for generalization over the first 74 elements of the Fibonacci sequence, in order to compare with the results reported in [16]. ...
Article
Full-text available
Evolutionary Algorithms (EA) approach the genotype–phenotype relationship differently than does nature, and this discrepancy is a recurrent issue among researchers. Moreover, in spite of some performance improvements, it is a fact that biological knowledge has advanced faster than our ability to incorporate novel biological ideas into EAs. Recently, some researchers have started exploring computationally new comprehension of the multitude of the regulatory mechanisms that are fundamental in both processes of inheritance and of development in natural systems, by trying to include those mechanisms in the EAs. One of the first successful proposals was the Artificial Gene Regulatory Network (ARN) model, by Wolfgang Banzhaf. Soon after some variants of the ARN were tested. In this paper, we describe one of those, the Regulatory Network Computational Device, demonstrating experimentally its capabilities. The efficacy and efficiency of this alternative is tested experimentally using typical benchmark problems for Genetic Programming (GP) systems. We devise a modified factorial problem to investigate the use of feedback connections and the scalability of the approach. In order to gain a better understanding about the reasons for the improved quality of the results, we undertake a preliminary study about the role of neutral mutations during the evolutionary process.
... It is usually used to evolve acyclic computational structures of nodes (graph) indexed by their Cartesian coordinates but can also be extended to evolve recurrent (cyclic) structures [30]. The method has been successfully applied to evolve digital circuits [19,20], robots' controllers [6], Atari games players [35], neural networks [10], image classifier [7], molecular docking [2] and regression programs [8]. ...
Article
Full-text available
We demonstrate how the efficiency of Cartesian genetic programming methods can be enhanced through the preferential selection of phenotypically larger solutions among equally good solutions. The advantage is demonstrated in two qualitatively different problems: the eight-bit parity problems and the “Paige” regression problem. In both cases, the preferential selection of larger solutions provides an advantage in term of the performance and of speed, i.e. number of evaluations required to evolve optimal or high-quality solutions. Performance can be further enhanced by self-adapting the mutation rate through the one-fifth success rule. Finally, we demonstrate that, for problems like the Paige regression in which neutrality plays a smaller role, performance can be further improved by preferentially selecting larger solutions also among candidates with similar fitness.
... From the evolutionary perspective, the natural extension of this work is the application of related evolutionary algorithms that can be also used to represent a construction. Here, as the most promising candidate we see the Self Modifying Cartesian Genetic Programming [12]. ...
Conference Paper
The evolution of Boolean functions that can be used in cryptography is a topic well studied in the last decades. Previous research, however, has focused on evolving Boolean functions directly, and not on general methods that are capable of generating the desired functions. The former approach has the advantage of being able to produce a large number of functions in a relatively short time, but it directly depends on the size of the search space. In this paper, we present a method to evolve algebraic constructions for generation of bent Boolean functions. To strengthen our approach, we define three types of constructions and give experimental results for them. Our results show that this approach is able to produce a large number of constructions, which could in turn enable the construction of many more Boolean functions with a larger number of variables.
Presentation
Full-text available
Genetic Programming is often associated with a tree representation for encoding expressions and algorithms. However, graphs are also very useful and flexible program representations which can be applied to many domains (e.g. electronic circuits, neural networks, algorithms). Over the years a variety of representations of graphs have been explored such as: Parallel Distributed Genetic Programming (PDGP) , Linear-Graph Genetic Programming, Implicit Context Genetic Programming, Graph Structured Program Evolution (GRAPE) and Cartesian Genetic Programming (CGP). Cartesian Genetic Programming (CGP) is probably the best known form of graph-based Genetic Programming. It was developed by Julian Miller in 1999-2000. In its classic form, it uses a very simple integer address-based genetic representation of a program in the form of a directed graph. CGP has been adopted by a large number of researchers in many domains. In a number of studies, CGP has been shown to be comparatively efficient to other GP techniques. It is also very simple to program. Since its original formulation, the classical form of CGP has also undergone a number of developments which have made it more useful, efficient and flexible in various ways. These include the addition of automatically defined functions (modular CGP), self-modification operators (self-modifying CGP), the encoding of artificial neural networks (GCPANNs) and evolving iterative programs (iterative CGP).
Article
In nature, brains are built through a process of biological development in which many aspects of the network of neurons and connections change and are shaped by external information received through sensory organs. From numerous studies in neuroscience, it has been demonstrated that developmental aspects of the brain are intimately involved in learning. Despite this, most artificial neural network (ANN) models do not include developmental mechanisms and regard learning as the adjustment of connection weights. Incorporating development into ANNs raises fundamental questions. What level of biological plausibility should be employed? In this chapter, we discuss two artificial developmental neural network models with differing degrees of biological plausibility. One takes the view that the neuron is fundamental (neuro-centric) so that all evolved programs are based at the level of the neuron, the other carries out development at an entire network level and evolves rules that change the network (holocentric). In the process, we hope to reveal some important issues and questions that are relevant to researchers wishing to create other such models.
Article
Optimized shape design is used for such applications as wing design in aircraft, hull design in ships, and more generally rotor optimization in turbomachinery such as that of aircraft, ships, and wind turbines.We present work on optimized shape design using a technique from the area of Genetic Programming, self-modifying Cartesian Genetic Programming (SMCGP), to evolve shapes with specific criteria, such as minimized drag or maximized lift. This technique is well suited for a distributed parallel system to increase efficiency. Fitness evaluation of the genetic programming technique is accomplished through a custom implementation of a fluid dynamics solver running on graphics processing units (GPUs). Solving fluid dynamics systems is a computationally expensive task and requires optimization in order for the evolution to complete in a practical period of time. In this chapter, we shall describe both the SMCGP technique and the GPU fluid dynamics solver that together provide a robust and efficient shape design system.
Chapter
Full-text available
Self-modifying Cartesian genetic programming (SMCGP) is a general purpose, graph-based, form of genetic programming founded on Cartesian genetic programming. In addition to the usual computational functions, it includes functions that can modify the program encoded in the genotype. SMCGP has high scalability in that evolved programs encoded in the genotype can be iterated to produce an infinite sequence of programs (phenotypes). It also allows programs to acquire more inputs and produce more outputs during iterations. Another attractive feature of SMCGP is that it facilitates the evolution of provably general solutions to various computational problems.
Conference Paper
Full-text available
A review is given o f approaches to g rowing n eural networks and electronic circuits. A new method for growing graphs and circuits using a de- velopmental process is discussed. The method is inspired by the view that the cell is the basic unit of biology. Programs that construct circuits are evolved to bu ild a sequence of digital circuits at user specified iterations. The pro- grams can be run for an arbitrary number of iterations so circuits of huge size could be created that could not be evolved. It is shown that the circuit build- ing programs are capable of correctly predicting the next circuit in a sequence of larger even parity functions. The new method however finds building spe- cific circuits more difficult than a non-developmental method.
Book
On Growth and Form, the classic by the great D'Arcy Wentworth Thompson provides the general inspiration for this book. D'Arcy was not one to run with the herd; he was an original thinker and brilHant classicist, mathematician and physicist, who provided a source of fresh air to developmental biology by examining growth and form in the light of physics and mathematics, courageously ignoring chemistry and genetics. Despite this omission of what are now regarded as the main sciences in understanding growth and form, D'Arcy's message is not in the least bit impaired. Instead, in today's biochemistry dominant world D'Arcy's work highlights, as it did in his own time, the role physics plays in growth and form. This book takes its name from D'Arcy's magnum opus and hopes to do it justice.
Article
Biological chromosomes are replete with repetitive sequences, micro satellites, SSR tracts, ALU, etc. in their DNA base sequences. We started looking for similar phenomena in evolutionary computation. First studies find copious repeated sequences, which can be hierarchically decomposed into shorter sequences, in programs evolved using both homologous and two point crossover but not with headless chicken crossover or other mutations. In bloated programs the small number of effective or expressed instructions appear in both repeated and nonrepeated code. Hinting that building-blocks or code reuse may evolve in unplanned ways. Mackey-Glass chaotic time series prediction and eukaryotic protein localisation (both previously used as artificial intelligence machine learning benchmarks) demonstrate evolution of Shannon information (entropy) and lead to models capable of lossy Kolmogorov compression. Our findings with diverse benchmarks and GP systems suggest this emergent phenomenon may be widespread in genetic systems.
Conference Paper
We are interested in engineering smart machines that enable backtracking of emergent behaviors. Our SSNNS simulator consists of hand-picked tools to explore spiking neural networks in more depth with flexibility. SSNNS is based on the Spike Response ...
Article
We u s e d i rected search techniques in the space of computer programs t o l e arn recur-sive sequences of positive i n tegers. Specif-ically, the integer sequences of squares, x 2 ; cubes, x 3 ; factorial, x!; and Fibonacci numbers are studied. Given a small nite preex of a sequence, we show that three directed searches|machine-language ge-netic programming with crossover, ex-haustive iterative hill climbing, and a hy-brid crossover and hill climbing|can au-tomatically discover programs that exactly reproduce the nite target preex and, moreover, that correctly produce the re-mainingsequence up to the underlying ma-chine's precision. Our machine-language representation is generic|it contains instructions for arith-metic, register manipulation and compar-ison, and control ow. We also introduce an output instruction that allows v ariable-length sequences as result values. Impor-tantly, this representation does not con-tain recursive operators; recursion, when needed, is automatically synthesized from primitive instructions. For a xed set of search parameters e.g., instruction set, program size, tness cri-teria, we compare the eciencies of the three directed search techniques on the four sequence problems. For this param-eter set, an evolutionary-based search al-ways outperforms e xhaustive hill climbing as well as undirected random search. Since only the preex of the target sequence is variable in our experiments, we p o s it that this approach t o s e quence induction i s p o -tentially quite general.
Article
Computational scientists have developed algorithms inspired by natural evolution for at least fifty years. These algorithms solve optimization and design problems by building solutions that are ”more fit” relative to desired properties. However, the basic assumptions of this approach are outdated. We propose a research program to develop a new field: Computational Evolution. This approach will produce algorithms based on our current understanding of molecular and evolutionary biology and may solve previously intractable or unimaginable computational and biological problems.