MT-CGP: Mixed Type Cartesian Genetic Programming
firstname.lastname@example.org Vincent Graziano
email@example.com Jürgen Leitner
6928 Manno, Switzerland
The majority of genetic programming implementations build
expressions that only use a single data type. This is in con-
trast to human engineered programs that typically make use
of multiple data types, as this provides the ability to ex-
press solutions in a more natural fashion. In this paper, we
present a version of Cartesian Genetic Programming that
handles multiple data types. We demonstrate that this al-
lows evolution to quickly ﬁnd competitive, compact, and
human readable solutions on multiple classiﬁcation tasks.
Categories and Subject Descriptors
I.2.2 [ARTIFICIAL INTELLIGENCE]: Automatic Pro-
gramming; D.1.2 [Software]: Automatic Programming
Cartesian Genetic Programming, Classiﬁers
The use of Genetic Programming (GP) for classiﬁcation
tasks is a reasonably well studied problem . Typically,
the objects to be classiﬁed are represented by vectors in
some n-dimensional space. In this scenario, a program is
found that takes up to n-inputs, each a real number, and
outputs a single value which is used to represent the class
of the input object. That is, each component of the vector
is presented independently to the program as a real and
the functions of the program operate on pairs of reals and
output reals. Although this approach has some beneﬁts, it
ultimately imposes some severe limitations.
One beneﬁt is that GP can discover which components of
the input are important for the classiﬁcation problem and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for proﬁt or commercial advantage and that copies
bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior speciﬁc
permission and/or a fee.
GECCO’12, July 7-11, 2012, Philadelphia, Pennsylvania, USA.
Copyright 2012 ACM 978-1-4503-1177-9/12/07 ...$10.00.
use precisely those. This is convenient if there is redundancy
in the input values, or if the entire vector is not needed to
solve the classiﬁcation problem.
However, if all the components are required to classify the
objects, then evolution has to ﬁnd a way to incorporate each
component individually. Successful classiﬁcation of objects
in high-dimensional space could unnecessarily require very
large and very complicated programs. Consider the follow-
ing example: Suppose we have a classiﬁcation task that is
solved by thresholding the norm-squared of the vector repre-
sentation of the object. Evolution is then faced with the task
of ﬁnding a program that individually squares each of the
inputs and then sequentially sums the results. Such a pro-
gram is not easily found. The approach does not scale well;
typical evolutionary approaches are likely to fail in these set-
tings. Further, even in the case that a solution is found it
is unlikely to be compact or human readable—losing one of
the attractive features of GP.
When hand building classiﬁers, the natural approach is to
gather statistics on the entire object and then later combine
these values with the values of speciﬁc components. For
example, we might deﬁne a pixel of an image as important
if it is outside one standard deviation from the mean pixel
value. An implementation of this program would treat the
entire image as a vector to produce a real value, and then
compare this value to each pixel value for classiﬁcation. This
example suggests the method we introduce in this paper—
allow GP to use mixed data types, to work with data in the
same manner that a human programmer does, performing
operations on vectors and reals in tandem.
The PushGP  implementation handled multiple data
types by using multiple stacks. Floating point numbers and
vectors were allocated their own stacks, and depending on
the calling function, values are popped from the appropriate
stack. This was further expanded in Push3 , where sup-
port for additional data types (and appropriate stacks) was
added, including: integers, ﬂoating point numbers, Boolean
values, symbolic names and code (PushGP supports the
ability to generate new programs). The authors validated
this approach on various problems, such as sorting, parity
circuits and generating the Fibonacci sequence [25, 27].
In Strongly Typed Genetic Programming (ST-GP) ,
multiple data types (ﬂoats, integers, vectors) are used within
a strongly typed representation. Functions have expected
input types, and these deﬁne the output type of that func-
tion. In order for valid programs to be generated, trees are
constructed so that the input values are of expected types.
CONST INPUT DIVIDE ADD COS
Figure 1: An example of an MT-CGP genotype. Not all of the nodes in this genotype are connected. This is an example
of neutrality in CGP. A mutation at the DIVIDE node could result in a connection to the COS node, making it part of the
program. The inputs to the COS node are not shown since the node is not active.
Functions can be ‘generic’ (or overloaded) so that varying
data types can be handled. To demonstrate its eﬃcacy, ST-
GP was tested on a regression problem and on evolving a
Kalman Filter. The author compared the strongly typed
representation with a dynamically typed version, and found
that the strongly typed solved problems more eﬃciently .
PolyGP is a ‘polymorphic’ GP built using a λ-calculus
approach [29, 5]. The approach specializes in list processing
tasks, including the ability to execute a function on each
element on a list (i.e. a map operation). Although the sys-
tem should be general purpose, it only appears to have been
used to evolve simple list processing operations, such as re-
turning the N-th element from a list. A similar system, also
built on λ-calculus improves on PolyGP by allowing the GP
to explicitly evolve the types in tandem with the program
and allows for arbitrary types to be used . This approach
was successfully used to evolve recursive list operations and
basic Boolean operations.
In this paper we present Mixed Type Cartesian Genetic
Programming (MT-CGP). As the name implies, this is an
extension of the familiar CGP representation. Functions dy-
namically select the data type of their inputs, which in turn
determines the output type. If they are unable to ﬁnd suit-
able input types, a default value is returned. This ensures
that all randomly generated programs are valid. We show
that this method produces compact, human readable (C-
style) programs that are able to eﬃciently solve a number
of classiﬁcation problems.
Cartesian Genetic Programming (CGP) is a form of Ge-
netic Programming in which programs are encoded in par-
tially connected feed forward graphs [18, 19]. The genotype,
given by a list of nodes, encodes the graph. For each node in
the genome there is a vertex, represented by a function, and
a description of the nodes from where the incoming edges are
attached. This representation has a number of interesting
For instance, not all of the nodes of a solution representa-
tion (the genotype) need to be connected to the output node
of the program. As a result there are nodes in the represen-
tation that have no eﬀect on the output, a feature known in
GP as ‘neutrality’. This has been shown to be very useful
 in the evolutionary process. Also, because the genotype
encodes a graph, there can be reuse of nodes, which makes
the representation distinct from a classically tree-based GP
representation. See Figure 1.
More recent versions of CGP have borrowed some key fea-
tures of the Self Modifying CGP (SMCGP) representation
[14, 15]. In keeping with SMCGP, each node contains four
genes. One speciﬁes the function that the node performs.
Two genes are used to specify the nodes from which the
function obtains its inputs. These connection addresses are
deﬁned relative to the current node and specify how many
nodes backwards in the genotype/graph to connect. The ﬁ-
nal gene in each node is a ﬂoating point number that can
be interpreted as either a parameter to a function, or used
to generate values within the program. The input/output
strategy used here is the same as SMCGP, where special
functions are used to indicate which node to use as an out-
put or how to obtain an input.
If a relative address extends beyond the extent of the
genome it is connected to one of the inputs. If there are
no output nodes in the genotype, then the output is taken
from the last node in the genome. An illustrative example
is shown in Figure 2.
3. CGP WITH MIXED DATA TYPES
To generate programs with multiple data types, MT-CGP
uses an internal data type that can be cast to either a real
number (double precision) or a vector of real numbers. The
type of casting performed is dependent on a ﬂag stored in
the data type. That is, the casting used is explicit.
In MT-CGP the functions inspect the types of the input
values being passed to it, and determine the most suitable
operation to perform. This in turn determines the output
type of the function. A default value is returned in cases
where there is no suitable operation for the types being
passed to the function.
How this works in practice is best illustrated with an ex-
Consider the ‘addition’ function. This function takes two
inputs and returns one output. There are four input cases
that need to be considered: two reals, two vectors of the
same dimension, a real and a vector, and two vectors of
diﬀerent dimension. The ﬁrst two cases are trivial: the
sum of two reals and vector addition. If the inputs are
a real and a vector the function returns a vector has the
real value added to every component of the vector. That is,
x+[a, b, c] = [a+x, b +x, c +x]. In the case that the vectors
lie in two diﬀerent spaces the vector from the higher dimen-
sional space is projected down into the space of the other
vector. Vector addition is then performed and the output of
the function is a vector. For example, [a, b, c] +[d, e, f, g, h] =
[a+d, b +e, c +f].
In general, each function will have a number of special
cases that need to be carefully handled. Typically this is
handled by limiting the calculations to the number of ele-
ments in the shortest vector, as was done in the above ex-
54 1 55 55
Inputs Evolved program
Output value of each node
INPUT INPUT ADD ADD SUM
Figure 2: A worked example of an MT-CGP genotype. Each node has a function, and is connected to at most two other
nodes. All the nodes in this example graph are connected, however not all input values are used. Data moves from left to
right; the node labeled ‘Output’ speciﬁes which node produces the program’s output. The text beneath each node shows its
output. The ﬁrst two nodes are of type ‘Input’ which indicates they each read the next available input value. The ‘Add’ node
then adds element-wise to the components of the vector, producing another vector. ‘Sum’ returns the total of all the values
in the vector. ‘Head’ returns the ﬁrst value in the vector. The second ‘Add’ node receives two real numbers as inputs, and
therefore returns a real number.
Since MT-CGP can handle multiple and mixed data types
it enjoys the property that it can draw from a wide-range
of functions . Table 1 reproduces the complete function set
used in the experiments in this paper. Clearly, this is only
a rudimentary function set for the MT-CGP and it can be
expanded to include domain speciﬁc functionality.
As with all forms of GP, MT-CGP can simply ignore func-
tions that are not useful. By analyzing the evolved programs
it is possible to see how the use of mixed data types is ben-
eﬁcial to CGP. We carry out such an analysis in Section 6.
4. EVOLUTIONARY ALGORITHM
CGP is often used with a simple evolutionary strategy
(ES), typically 1 + n, with n= 4. Here, parallel populations
of evolution strategies are used. Individuals migrate between
the populations through a single computer server node, and
Replacement in the evolutionary strategy has two impor-
As usual, the best performing individual is selected as
the parent. However, if the best performance is achieved
by more than a single individual the ES chooses the one
with the shortest program. In cases that this choice is not
unique, one of newest individuals is selected. Choosing the
shortest program (i.e. the individual with the fewest con-
nected nodes) pushes evolution to ﬁnd more compact pro-
grams. The preference of newer individuals has been shown
to help CGP ﬁnd solutions .
In keeping with the 1 + nevolutionary strategy, MT-CGP
does not use crossover. This ‘mutation-only’ approach is
typical in CGP and SMCGP. The introduction of mixed data
types does not preclude its usage however. As with SMCGP,
it appears a high mutation rate (10%) performs best. The
mutation rate is deﬁned as the probability that a particular
gene will be modiﬁed when generating an oﬀspring.
MT-CGP requires relatively few parameters, even in a
distributed system. Table 2 lists the most important pa-
rameters. It is important to note that these values have not
been optimized, and therefore it may be possible to improve
performance by selecting diﬀerent values.
Table 2: Key Parameters in MT-CGP:
Number of populations 24
Maximum evaluations 10,000,000
100 evaluations (per client
Genotype length 50 nodes
Mutation rate 10%
Runs per experiment 50
To demonstrate the validity of the mixed data type ap-
proach we have tested MT-CGP on four well-known bi-
nary classiﬁcation tasks: Wisconsin breast cancer dataset,
phoneme cr dataset, diabetes1 dataset, and heart1 dataset.
The parameters for the MT-CGP are unchanged across all
experiments and are as given in Table 2. Likewise the func-
tion set is the same for all experiments and is reported in
full in Table 1.
For each of the classiﬁcation tasks, the inputs presented
to the program were both the vector representing the object
as well as the individual component values of the vector.
For example, the Wisconsin breast cancer data uses data-
points in 9-dimensional space. The evolved programs were
given these values as a vector and also as 9 individual real
numbers, making a total of 10 inputs available to the pro-
gram. Evolved programs are then able to select the most
appropriate inputs to use in the classiﬁer.
We use the Matthews Correlation Coeﬃcient (MCC) 
to measure the ﬁtness of our classiﬁers. First the ‘confusion
matrix’ is found: true positives (TP), false positives (FP),
true negatives (TN), and false negatives (FN). The MMC is
calculated as follows:
MCC =T P ×T N −F P ×F N
p(T P +F P )(T P +F N)(T N +F P )(T N +F N )
An MCC of 0 indicates that the classiﬁer is working no better
than chance. A score of 1 is achieved by a perfect classiﬁer,
Table 1: Functions available in MT-CGP. Due to space issues, the complete behavior of each function cannot be presented
here. Arity is the number of inputs used by the function. A ‘yes’ value in the parameter column indicates that the function
makes use of the real number value stored in the calling CGP node.
Function Name Arity Uses parameter Description
Head 1 Returns the ﬁrst element of a vector
Last 1 Returns the last element of a vector
Length 1 Returns the length of the vector
Tail 1 Returns all but the ﬁrst element of a vector
Diﬀerences 1 Returns a vector containing the diﬀerences be-
tween each pair in the input vector
AvgDiﬀerences 1 Returns the average of the diﬀerences between
pairs in the vector
Rotate 1 Yes Rotates the indexes of the elements by the param-
Reverse 1 Reverses the elements in the list
PushBack 2 Inserts the values of the second input to the end
of the ﬁrst input
PushFront 2 Inserts the values of the second input to the front
of the ﬁrst input
Set 2 If one input is a vector, and the other is a real,
return a new vector where all values are the real
Sum 1 Sums all the elements in the input vector
Add, Subtract, Multiply,
Divide, Abs, Sqrt, Pow,
PowInt, Sin, Tan, Cos,
Tanh, Exp, <,>
2 Returns mathematical result. Element-wise if at
least one input is a vector.
Kurtosis, Mean, Median
1 Returns the given statistic for an input vector.
Range 1 Returns the max value - min value of a vector
Max1 1 Returns the maximum value of a vector
Max2 2 Element wise, returns max of two vectors or a
vector and real
Min1 1 Returns the minimum value of a vector
Min2 2 Element wise, returns min of two vectors or a vec-
tor and real
VecFromDouble 1 Tries to cast the input to a double, and then casts
that to a single element vector
NOP 1 Passes through the ﬁrst input
GetInput 0 Returns the current input, moves pointer forward
GetPreviousInput 0 Moves the inputs pointer backwards, returns that
SkipInput 0 Yes Moves the input pointer parameter positions, re-
turns that input
Const 0 Yes Returns the parameter value
ConstVectorD 0 Yes Returns a vector of the default length, all values
set to the parameter value
EmptyVectorD 0 Returns an empty vector of a default length
Output 1 Indicates that this node should be used as an out-
a score of −1 indicates that the classiﬁer is perfect, but
has inverted the output classes. Finally, the ﬁtness of an
individual is given by:
fitness = 1 − |M CC |,
with values closer to 0 being more ﬁt.
The MCC is insensitive to diﬀerences in class size, mak-
ing it a nice choice for many binary classiﬁcation tasks. The
confusion matrix also allows for easy calculation of the clas-
siﬁcation rate as a percentage of correctly classiﬁed ﬁtness
MT-CGP was run on four diﬀerent datasets. A summary
of the results can be seen in Table 3. Table 3 also shows how
many evaluations were required to ﬁnd the best generalizing
individual. Detailed analysis of the results, and comparisons
to other work, can be found in Table 4.
To ensure accurate statistics are found, experiments were
run 50 times. Again, the parameters and function set were
kept consistent through each experiment.
Table 3: Results from the validation set of each of the data
Cancer Phoneme Diabetes1 Heart1
Max MCC 0.98 0.575 0.55 0.72
Min MCC 0.943 0.434 0.44 0.51
Avg. MCC 0.959 0.515 0.50 0.61
S.D. 0.009 0.0384 0.0233 0.0394
Max Acc. 99.3 80.4 79.2 85.3
Min Acc. 97.9 75.9 75.0 76.0
Avg. Acc. 98.5 78.2 77.3 80.2
S.D. 0.3 1.2 0.77 1.8
Min Evals 7k 17k 35k 25k
Max Evals 1,997k 1,973k 9,540k 9,647k
Avg. Evals 1,204k 879k 3,204k 1,539k
S.D. 555k 601k 2,94k 2,011k
6.1 Wisconsin Breast Cancer Dataset
The Wisconsin Breast Cancer Dataset is a well tested
benchmark in classiﬁcation and is part of the UCI Machine
Learning Repository . The dataset is split into 400 train-
ing examples and 283 validation examples (examples with
missing values are discarded). It was found that MT-CGP
works well for this problem, and compares well to other re-
cent work published on the same dataset. A summary of
the statistical results can be seen in Table 3. In Table 4 the
results are compared to a range of other recent approaches.
(Note: there are diﬀerent methodologies for dividing the
data into training and validation sets. The diﬀerent ap-
proaches report mean results and therefore may not be di-
rectly comparable. Further, classiﬁers can be sensitive to
how the dataset is split when there is a class imbalance or
the problem is relatively easy to solve).
Table 4: Comparison of various classiﬁers on the
Breast Cancer % Accuracy
Radial Basis Function Networks  49.8
Probabilistic Neural Networks  49.8
ANN (Back Propagation)  51.9
Recurrent Neural Network  52.7
Competitive Neural network  74.5
Learning vector Quantization  95.8
Support Vector Machine  196.9
Memetic Pareto ANN  98.1
ANN (Back Propagation)  98.1
Genetic Programming  298.2
Support Vector Machine  298.4
MT-CGP Maximum Accuracy 99.3
MT-CGP Minimum Accuracy 97.9
MT-CGP Avg. Accuracy. 98.5
Phoneme CR % Accuracy
Linear Bayes  73.0
Quadratic Bayes  75.4
Fuzzy Classiﬁer  76.9
Quadratic Bayes  78.7
Neural network  79.2
Piecewise linear  82.2
C4.5  83.9
MLP Neural Network  86.3
k-Nearest Neighbor  87.8
MT-CGP Maximum Accuracy 80.4
MT-CGP Minimum Accuracy 75.9
MT-CGP Avg. Accuracy. 78.2
Diabetes1 % Accuracy
Self-generating Neural Tree (SGNT)  68.6
Learning Vector Quantization (LVQ)  69.3
1-Nearest Neighbor  69.8
Self-generating Prototypes ( SGP2)  71.9
Linear Genetic Programming  72.2
k-Nearest Neighbor  72.4
Self-generating Prototypes (SGP1)  72.9
Gaussian mixture models  72.9, 68.2
Neural Network  75.9
Inﬁx Form Genetic Programming  377.6
Support Vector Machine  277.6
Principal Curve Classiﬁer  78.2
MT-CGP Maximum Accuracy 79.2
MT-CGP Minimum Accuracy 75.0
MT-CGP Avg. Accuracy. 77.3
Heart1 % Accuracy
Neural Network  80.3
Linear GP  81.3
Support Vector Machine  283.2
Inﬁx Form GP  84.0
MT-CGP Maximum Accuracy 85.3
MT-CGP Minimum Accuracy 76.0
MT-CGP Avg. Accuracy. 80.2
1Results are based on leave-one-out validation, and so may
not directly comparable.
2Results are based on 10 fold validation, and so may not
3This paper does not use the pre-deﬁned training and vali-
dation split, so the comparison may not be accurate
Below is the program, found by evolution, that gives the
best result (99.3% accuracy) on the validation data:
node0 = Inputs;
node1 = Multiply(node0,node0);
node2 = Mean(node1);
node4 = Sum(node2);
node39 = Max1(node4);
node45 = ConstVectorD(5.98602127283812);
node49 = PowInt(node45, node39);
if ((int)Head(node49) <=0)
The ‘if’ statement at the end of the program is from the
ﬁtness function, and each of the other lines corresponds to a
node in the genotype. The genotype contains 50 nodes, but
only 7 of them were connected to form a program. ‘nodeN’
can be treated as a variable which can be either a vector
or a real number. The ﬁrst line sets the variable node0 to
the input vector. The values are then squared, and stored
in node1. node2 is the average value of the elements in the
vector node1. node4 is largely redundant, as node2 is a
single value. node45 is a vector which is the same length as
the input vector, but all elements are set to a constant value.
node49’s value is a vector, made by raising each element in
node45 to the truncated (integer) value of node39. The ‘if’
statement is used to turn the actual output into either 0 or
1, which is then compared against the expected class. In this
example, the (int) type cast in the ‘if ’ statement turns out
to be a crucial step. All values of node49 are positive and
much larger than 0. However, when casting to an integer,
the very large values cannot ﬁt and cause an overﬂow. This
causes the integer version of some values to be less than 0.
The phoneme cr dataset is another benchmark data set for
classiﬁers 1. It consists of 5 real number input variables
and one binary class. There are 5,404 ﬁtness cases, and this
is split in two equal parts (at random) to produce a training
and validation set. From previous results (and by design), it
can be seen that this is a more challenging data set, where
the highest classiﬁcation accuracy is 87.8%.
From Table 4 it can be seen that whilst MT-CGP is com-
petitive, it does not produce the best results. This may
be because the parameters chosen do not allow MT-CGP to
ﬁnd a good program. The best solution is 20 operations long,
and therefore is quite large with respect to the size of the
genotype. Previous experience with CGP indicates better
performance may be achieved by using a larger genotype.
6.3 Proben Diabetes1
This dataset is ‘diabetes1’ from the Proben machine learn-
ing repository. It is a binary classiﬁcation problem consist-
ing of 576 training cases and 192 validation test cases. Each
case contains 8 real numbered inputs. Table 4 shows that
MT-CGP does well compared to previous work.
6.4 Proben Heart1
‘Heart1’ is another proben1 dataset, based on data for pre-
dicting heart disease 2. It consists of 690 training examples
and 230 validation cases. There are 35 real valued inputs,
and is a binary classiﬁcation problem.
Table 4 shows that MT-CGP performs favorably to pre-
7. PROGRAM LENGTH
The ES used prefers the shortest program among those of
equal ﬁtness. The aim was to get MT-CGP to ﬁnd solutions
that are parsimonious. The are two immediate advantages
for such solutions: the chances that humans can understand
and learn from the resultant programs are increased ,
and shorter, simpler solutions tend to generalize to larger
problems more easily .
Consider the following program:
node0 = Mean(input);
node1 = Exp(node0);
node49 = Pow(node0, node1);
if ((int)Head(node49) <=0)
On the breast cancer dataset it achieves a classiﬁcation ac-
curacy of 97.9%, competitive with the state-of-the-art classi-
ﬁers on this dataset. Although this wasn’t the best perform-
ing MT-CGP program on the dataset, the evolved program
is extremely simple and is easy for humans to follow (Note:
it also abuses the double-int conversion feature, discussed
For the problems investigated here, there was a general
trend that the best performing individuals had longer pro-
grams. An example of this can be seen in Figure 3 where
the average classiﬁcation error for various program lengths
is plotted. Although longer programs generally do better,
a trade oﬀ may be made in terms of generalization, human
interpretability, and processing speed. It may be acceptable,
and indeed preferable, to use shorter programs even if they
do not perform as well.
The results show that MT-CGP is a promising technique
for use in classiﬁcation problems. In the example datasets it
performs competitively with, and in some cases outperforms,
other established machine learning techniques. In principle,
MT-CGP has some advantages over previous methodologies.
It is able to produce human readable output. In the case of
the breast cancer data, the program showed that a good
classiﬁcation rate can be achieved through a simple calcula-
In all the programs inspected, evolution was seen to take
advantage of the vector as an input. The ability to collect
statistics on the vector input and then perform further pro-
cessing on components of the vector appears to be very use-
ful. This suggests that GP methods which mix data types
2Data collected by: Hungarian Institute of Cardiol-
ogy. Budapest: Andras Janosi, M.D., University Hospi-
tal, Zurich, Switzerland: William Steinbrunn, M.D., Univer-
sity Hospital, Basel, Switzerland: Matthias Pﬁsterer, M.D.
and V.A. Medical Center, Long Beach and Cleveland Clinic
Foundation: Robert Detrano, M.D., Ph.D.
5 10 15 20 25 30
Figure 3: For the best individuals found by evolution, the
classiﬁcation accuracy is related to the program length. The
plot shows average classiﬁcation error versus program length
on the phoneme dataset. Although longer programs gener-
ally do better, a trade oﬀ may be made in terms of general-
ization, human interpretability, and processing speed.
will—in most cases—have an edge over GP techniques that
cannot. The latter would have to build signiﬁcantly more
complex programs to achieve the same functionality.
MT-CGP should also be successful on problems other than
classiﬁcation. Indeed, it should work on any problem where
CGP has been used. Given its ability to statistically ana-
lyze streams of values and simultaneously handle vectors of
diﬀerent dimension, problems such as time series prediction
seem to be an ideal candidate for future exploration.
There are two aspects of the algorithm that we plan to
examine in detail in the future. By incorporating domain
speciﬁc functions into the set, we expect to sharpen pre-
vious results and to be able to tackle more diﬃcult tasks.
For example, there are many metrics used in ﬁnance, e.g.,
volatility, whose incorporation could prove useful in highly
diﬃcult ﬁnancial series classiﬁcation problems.
More generally, we will investigate how best to constrain
the ES so that program length is optimized. The solution
used herein is likely to be too aggressive. Initially, it seems
better to explore programs of varying lengths, and then only
later exploit the knowledge contained in the gene pool to
ﬁnd short solutions. We anticipate that borrowing ideas
from  will help us to better balance performance with
 H. A. Abbass. An evolutionary artiﬁcial neural
networks approach for breast cancer diagnosis.
Artiﬁcial Intelligence in Medicine, 25:265–281, 2002.
 F. Binard and A. Felty. Genetic programming with
polymorphic types and higher-order functions. In
Proceedings of the 10th annual conference on Genetic
and evolutionary computation, GECCO ’08, pages
1187–1194, New York, NY, USA, 2008. ACM.
 M. Brameier and W. Banzhaf. A comparison of linear
genetic programming and neural networks in medical
data mining. Evolutionary Computation, IEEE
Transactions on, 5(1):17 –26, feb 2001.
 K. Chang and J. Ghosh. Principal curve classiﬁer-a
nonlinear approach to pattern classiﬁcation. In Neural
Networks Proceedings, 1998. IEEE World Congress on
Computational Intelligence. The 1998 IEEE
International Joint Conference on, volume 1, pages
695 –700 vol.1, may 1998.
 C. Clack and T. Yu. Performance enhanced genetic
programming. In In Proceedings of the Sixth
Conference on Evolutionary Programming, pages
87–100. Springer, 1997.
 A. Eftekhari, H. A. Moghaddam, M. Forouzanfar, and
J. Alirezaie. Incremental local linear fuzzy classiﬁer in
ﬁsher space. EURASIP J. Adv. Signal Process,
2009:15:1–15:9, January 2009.
 P. Espejo, S. Ventura, and F. Herrera. A survey on
the application of genetic programming to
classiﬁcation. Systems, Man, and Cybernetics, Part C:
Applications and Reviews, IEEE Transactions on,
40(2):121 –144, march 2010.
 H. A. Fayed, S. Hashem, and A. F. Atiya.
Self-generating prototypes for pattern classiﬁcation.
Pattern Recognition, pages 1498–1509, 2007.
 A. Frank and A. Asuncion. UCI machine learning
 G. Giacinto and F. Roli. Dynamic classiﬁer selection
based on multiple classiﬁer behaviour. Pattern
Recognition, 34:1879–1881, 2001.
 A. Guerin-Dugue and et al. Deliverable r3-b4-p task
b4: Benchmarks. Technical report, Technical Report,
 P.-F. Guo, P. Bhattacharya, and N. Kharma.
Automated synthesis of feature functions for pattern
detection. In Electrical and Computer Engineering
(CCECE), 2010 23rd Canadian Conference on, pages
1 –4, may 2010.
 P. Hao, L. Tsai, and M. Lin. A new support vector
classiﬁcation algorithm with parametric-margin
model. In Neural Networks, 2008. IJCNN 2008.
(IEEE World Congress on Computational
Intelligence). IEEE International Joint Conference on,
pages 420–425. IEEE, 2008.
 S. Harding, J. F. Miller, and W. Banzhaf.
Developments in cartesian genetic programming:
self-modifying CGP. Genetic Programming and
Evolvable Machines, 11(3-4):397–439, 2010.
 S. Harding, J. F. Miller, and W. Banzhaf. A survey of
self modifying cgp. Genetic Programming Theory and
Practice, 2010, 2010.
 R. Janghel, A. Shukla, R. Tiwari, and R. Kala.
Intelligent decision support system for breast cancer.
In Y. Tan, Y. Shi, and K. Tan, editors, Advances in
Swarm Intelligence, volume 6146 of Lecture Notes in
Computer Science, pages 351–358. Springer Berlin /
 H. X. Liu, R. S. Zhang, F. Luan, X. J. Yao, M. C. Liu,
Z. D. Hu, and B. T. Fan. Diagnosing breast cancer
based on support vector machines. Journal of
Chemical Information and Computer Sciences,
 J. F. Miller. An empirical study of the eﬃciency of
learning boolean functions using a cartesian genetic
programming approach. In Proceedings of the 1999
Genetic and Evolutionary Computation Conference
(GECCO), pages 1135–1142, Orlando, Florida, 1999.
 J. F. Miller, editor. Cartesian Genetic Programming.
Natural Computing Series. Springer, 2011.
 J. F. Miller, D. Job, and V. K. Vassilev. Principles in
the evolutionary design of digital circuits - part I.
Genetic Programming and Evolvable Machines,
 J. F. Miller and S. L. Smith. Redundancy and
computational eﬃciency in cartesian genetic
programming. In IEEE Transactions on Evoluationary
Computation, volume 10, pages 167–174, 2006.
 D. J. Montana. Strongly typed genetic programming.
Evolutionary Computation, 3(2):199–230, 1995.
 M. Oltean and C. Grosan. Solving classiﬁcation
problems using inﬁx form genetic programming. In
Programming, The 5 th International Symposium on
Intelligent Data Analysis, M. Berthold (et al),
(Editors), LNCS 2810, pages 242–252. Springer, 2003.
 L. Prechelt. PROBEN1 — A set of benchmarks and
benchmarking rules for neural network training
algorithms. Technical Report 21/94, Fakult¨
at Karlsruhe, D-76128 Karlsruhe,
Germany, Sep 1994.
 A. Robinson and L. Spector. Using genetic
programming with multiple data types and automatic
modularization to evolve decentralized and
coordinated navigation in multi-agent systems.
In E. Cant´u-Paz, editor, Late Breaking Papers at the
Genetic and Evolutionary Computation Conference
(GECCO-2002), pages 391–396, New York, NY, July
 M. Schmidt and H. Lipson. Distilling free-form natural
laws from experimental data. Science, 324(5923):81,
 L. Spector, J. Klein, and M. Keijzer. The push3
execution stack and the evolution of control. In H.-G.
Beyer, U.-M. O’Reilly, D. V. Arnold, W. Banzhaf,
C. Blum, E. W. Bonabeau, E. Cantu-Paz, et. al.,
editors, GECCO 2005: Proceedings of the 2005
conference on Genetic and evolutionary computation,
volume 2, pages 1689–1696, Washington DC, USA,
25-29 June 2005. ACM Press.
 Wikipedia. Matthews correlation coeﬃcient —
wikipedia, the free encyclopedia, 2011. [Online;
 T. Yu and C. Clack. Polygp: a polymorphic genetic
programming system in haskell. In Proc. of the 3rd
Annual Conf. Genetic Programming, pages 416–421.
Morgan Kaufmann, 1998.
 B.-T. Zhang and H. Muhlenbein. Balancing accuracy
and parsimony in genetic programming. Evolutionary
Computation, 3:17–38, 1995.