Combining Neuro-Evolution of Augmenting Topologies with
Convolutional Neural Networks
Jan Nils Ferner, Mathias Fischler, Sara Zarubica, Jeremy Stucki
November 23, 2018
Current deep convolutional networks are ﬁxed in their topology.
We explore the possibilites of making the convolutional topology a parameter itself by combining NeuroEvolution
of Augmenting Topologies (NEAT) with Convolutional Neural Networks (CNNs) and propose such a system using
blocks of Residual Networks (ResNets).
We then explain how our suggested system can only be built once additional optimizations have been made, as
genetic algorithms are way more demanding than training per backpropagation.
On the way there we explain most of those buzzwords and oﬀer a gentle and brief introduction to the most important
modern areas of machine learning.
1 Introduction to neural networks 7
1.1 Whatisaneuralnetwork?.................................... 7
1.2 How does a neural network learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Traditional ........................................ 9
1.2.2 Geneticalgorithm .................................... 11
2 What is NEAT 12
2.1 Topology ............................................. 12
2.2 Speciation............................................. 13
3 Convolutional Neural Networks 14
3.1 Problems with image recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Subsampling............................................ 14
3.2.1 Kernels .......................................... 14
3.2.2 Poolers .......................................... 15
3.2.3 Activationfunction.................................... 15
4 Hippocrates, a NEAT implementation 16
4.1 Motivation............................................. 16
4.2 Technology ............................................ 16
4.3 Discrepancies ........................................... 18
4.3.1 Paper ........................................... 18
4.3.2 Original implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Visualizing Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.1 Traditional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4.2 Navigating through generations and species . . . . . . . . . . . . . . . . . . . . . . 21
4.4.3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.4 Technolgies used for our visualizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.5 Interoperability...................................... 25
5 Build tools 27
5.1 Versioncontrol .......................................... 27
5.1.1 Git............................................. 27
5.1.2 GitHub .......................................... 28
5.2 Integrationtests ......................................... 29
5.2.1 Travis ........................................... 29
5.2.2 AppVeyor ......................................... 30
5.3 CMake............................................... 30
5.4 Challanges............................................. 31
5.5 CLion ............................................... 31
6 Combining Neuro-Evolution of Augmenting Topologies with Convolutional Neural
6.1 Challenges&Solutions...................................... 32
6.2 Deﬁnition ............................................. 37
6.3 Implementation.......................................... 39
7 Further enhancements 40
7.1 Optimisation ........................................... 40
7.2 Safetyconcerns .......................................... 41
7.3 HyperNEAT............................................ 42
8 Our Work 43
8.1 Collaborators ........................................... 43
8.1.1 ProjectGroup ...................................... 43
8.1.2 Acknowledgements .................................... 43
8.1.3 MedicalSupport ..................................... 43
8.2 Ourgoals ............................................. 44
8.3 Initialposition .......................................... 45
8.4 Openingquestions ........................................ 45
8.5 Working programms and tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.5.1 VisualStudio....................................... 45
8.5.2 LaTex ........................................... 46
8.5.3 Github........................................... 46
8.6 Procedure ............................................. 46
8.6.1 Thebeginning ...................................... 46
8.6.2 Theplanning ....................................... 47
8.6.3 Therealisation ...................................... 47
8.6.4 Theresult......................................... 48
8.6.5 Ourconclusion ...................................... 48
8.7 Progress.............................................. 49
8.8 Contactwithdoctors....................................... 50
1 Introduction to neural networks
1.1 What is a neural network?
The most famous neural network is you. Or, in other words, the human brain.
It is, simply put, a clever arrangement of smallest units capable of processing easy logic.
These smallest units are called neurons, and our brain consists of approximately 100 billion of them.
They are interweaved through a complex series of incomming and outgoing extensions called dendrites
and axons, respectively, of which some transport electricity faster than others. Most of the components of
a brain are unfortunately still not understood well enough to be used productively in computer science.
An artiﬁcial neural network (ANN) tries to emulate the immense success of its biological counterpart
by abstracting the complex chemical reactions responsible for our thoughts to much more graspable math.
The feedforward version of such an ANN consists of two simple components: neurons and connections.
Each neuron has inputs, which are the incoming connections. It applies a simple mathematical operation
to this set of inputs and returns the result.
Connections connect neurons to each other. Each connection has a weight, which determines how weak
or strong the connection is.
The neurons are typically organized into layers. The ﬁrst is referred to as the input layer and the last one
as the output layer. The remaining layers are called hidden layers. (Anderson, James. 1995)
Here is a basic example of a neural network:
Each connection is represented as an arrow and has an associated weight. Every neuron is connected to
all neurons in the previous and in the next layer.
The conﬁguration of how all neurons and layers are interconnected, as well as the number of layers, is
called the topology of the network.(Anderson, James. 1995)
For simple networks, you can also write down the inputs and the corresponding outputs.
This network was trained to solve the XOR problem, which can be simpliﬁed as "are my inputs diﬀerent?".
We deﬁned the output to represent yes if its >= 0.5and no otherwise.
It’s also possible for a neural network to have multiple outputs.
We will use the assumption that our network has one output per possible answer for the rest of the
Example: We have a picture of a ﬂower. It can either be a poppy, a lilly or a dandelion. Our neural
network looking at the ﬂower would have three outputs.
In this case we still wish to have one deﬁnitive output. For this we’ll use the softmax function, which
squashes all our outputs in matter that lets them add up to exactly one. One can think of it as a
normalization that represents conﬁdence.(Anderson, James. 1995)
It is deﬁned as follows:
for j = 1, . . . , K.
Example: If we our outputs are [1,2,3,4,1,2,3], the softmax of that is [0.024,0.064,0.175,0.475,0.024,0.064,0.175]
We then simply take the highest one as our main output. This is called the winner takes all principle and
is modeled after how the brain works (Anderson, James. 1995)
1.2 How does a neural network learn
A traditional approach of optimizing the connection weights to improve the network’s accuracy is named
"The Backpropagation algorithm is a supervised learning method for multilayer feed-forward
networks from the ﬁeld of Artiﬁcial Neural Networks.
Feed-forward neural networks are inspired by the information processing of one or more neural
cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electri-
cal signal down to the cell body. The axon carries the signal out to synapses, which are the
connections of a cell’s axon to other cell’s dendrites." (Brownlee, Jason. 2016)
The backpropagation algorithm is a algorithm for supervised learning. In supervised learning, it is being
measured how good a network performs, by testing a network with a given dataset, over and over again.
In such a dataset, input values and the expected outputs for these values are deﬁned.
The discrepancies from the speciﬁed outputs in the dataset and the actual outputs are called the errors
of the network.(Saimadhu Polamuri. 2014)
Using basic calculus, the so called error of a network can be calculated. This is also known as solving the
error minimization problem. (Shashi Sathyanarayana. 2014)
"In the most popular version of backpropagation, called stochastic backpropagation, the weights
are initially set to small random values."(Shashi Sathyanarayana. 2014)
Stochastic methods are being used, because "properly scaled random initialization can deal with the van-
ishing gradient problem" (Philipp Krähenbühl, Carl Doersch, Jeﬀ Donahue, Trevor Darrell. 2016)
With enough complexity, neural networks can represent any existing function. (Nielsen, Michael. 2016)
There are methods for picking initial weights, so that problems with local maximums of derivatives are
not limiting the backpropagation algorithm.(Derrick Nguyen and Bernard Widrow. 1990)
However, as Dr. Geoﬀrey E. Hinton states, backpropagation is often limited by the sheer sizes of networks
that are required today:
"Backpropagation was the ﬁrst computation- ally eﬃcient model of how neural networks could
learn multiple layers of representation, but it required labeled training data and it did not work
well in deep networks." (Geoﬀrey E. Hinton. 2007)
1.2.2 Genetic algorithm
The training starts with a number of genomes, typically referred to as the population. For each of these
genomes a network is built and it is tested against the expected outputs. From these results we can assign
a ﬁtness to the genome. A higher ﬁtness indicates that the genome was able to solve a problem better
than another. (Anderson, James. 1995)
The initial set of genes is the ﬁrst generation. The weights of all genes are set to a random value.
To get to the next generation, all genomes have to be tested. Before that, each genome has a chance that
a random gene mutates, E.g. the gene is assigned a new random weight.
After that, we select the genomes for the next generation. To select a genome, a so called roulette wheel
selection is performed. This means that every genome has a chance to get to the next generation, based
on its ﬁtness. (Bäck, Thomas. 1996)
We always select two genomes at a time, so that we can perform a crossover. This means that we swap a
part of the genes in the ﬁrst genome with the second. (Buckland, Mat)
This process is repeated until a genome reaches the target ﬁtness, which is set by the trainer.
2 What is NEAT
NEAT stands for Evolving Neural Networks through Augmenting Topologies and is a technology ﬁrst
proposed by O. Stanley. (Stanley, Kenneth. 2002)
It presents an elegant way to combine genetic algorithms with evolving topologies.
In traditional neural networks, the topology is ﬁxed. The number of hidden layers and the number
of neurons in each hidden layer are given. This makes it very easy to see the diﬀerence between two
networks, since the only diﬀerences are the weights.
The downside is, that the performance of these networks heavily depends on the chosen topology, which
leads to the conclusion that many networks would perform better if one had chosen a diﬀerent topology.
NEAT proposes a technique to evolve the topology over time which allows the network to be better
structured for a speciﬁc task then a conﬁguration with hyper parameters.
The main problem of such a network, called Topology and Weight Evolving Artiﬁcial Neural Network, or
TWEANN for short, is the competing conventions problem (Stanley, Kenneth. 2002). It means that two
networks may generate the same solution to a problem at diﬀerent points in time, thus appearing to be
two distinct topologies.
This makes the algorithm mark them as not compatible for a genetic crossover during the mating phase.
NEAT solves this problem by assigning each connection a historical marking, which can be imagined as a
The ﬁrst gene ever created is corresponds to a historical marking of one, the next one to two, and so on.
Every new gene is then ﬁrst compared with all existing genes. If an identical match is found, the new gene
gets the same historical marking as its twin. If not, the next biggest total number is assigned to it.
This way, during crossover, the algorithm doesn’t have to check any complicated structural compatibility,
but instead simply compares the historical markings of the two networks. If they are largely the same,
the networks are suitable for a genetical exchange.
Another diﬃculty in evolving topologies lies in the way the topology is encoded in the genome. When a
new connection is introduced in a network, it’s often ﬁrst a bit worse than before because it needs some
time to adjust and show it’s real potential. Traditional TWEANNs like to throw these kinds of topologies
out of the gene pool preemptively, as they appear to make the network worse.
NEAT solves this by again by using historical markings. The more markings a network shares with
another, the more related it is to that other network. Based on this principle, NEAT groups similar
networks into species, which share their ﬁtness with each other. This means that weak individuals that
are only marginally diﬀerent from a proven concept are guaranteed to be temporarily protected in their
niche. (Stanley, Kenneth. 2002)
3 Convolutional Neural Networks
3.1 Problems with image recognition
Most neural networks are unable to handle the amount of data contained in an image. For example an
image with a resolution of 3264x2448 (8 Megapixels) would result in almost 24 million inputs, as each
pixel is split into its red, green and blue parts.
Another challenge is the detection of so called "features" across an image. Traditional neural networks
only detect a feature at a speciﬁc location in the image. This is a big issue in image recognition, as you
almost always want the entire image to be handled equally. A self-driving car should recognize a stop sign,
regardless of its position in the image.
Subsampling is, broadly speaking, the act of taking values from a source, observing them and combining
these into a smaller dataset that is still representative.
It’s a bit like compressing, really.
Traditional use cases of subsampling include the JPEG format. It makes use of the fact that the human eye
cannot diﬀerentiate colors as good as luminance, and simpliﬁes parts of the image that are not diﬀerentiable
for the average human anyways. (Christian J. van den Branden Lambrecht. 2001)
The subsampling CNNs perform is not related speciﬁcally to the human eye but animal visual systems in
general. (Masakazu Matsugu, Katsuhiko Mori, Yusuke Mitari, Yuji Kaneda. 2003)
The main goal of a CNN is to "see" structures in images. These can be geometric (line, square, circle, etc.),
typical human recognitions (face, smile, house, cat), and also totally inhuman and unintuitive structures
(wiggly lines pointing to the left, three stripes ending up in a sharp point)
Kernels are little matrices (rectangular tables of numbers) that go through an image and ﬁlter a certain
structure out of it as they multiply their weights with the individual pixels. Because of this behavior, they
are sometimes referred to as ﬁlters.
An aggregation of ﬁlters with an equal size is called a convolution.
When working with convolutions, we refer the inputs and outputs as tensors.
A tensor is, in layman’s terms, a matrix with more than two dimensions. A tensor with three dimensions,
which is called a tensor of rank three in maths, can be imagined as a cube.
A convolution takes a tensor of variable dimensionality as an input and returns a tensor of rank n, where
nequals the number of ﬁlters in the convolution. The exact size of the input tensor is irrelevant, as the
convolution reapplies its ﬁlters over the whole input.
Despite the kernels doing a great job at making the image smaller, the resulting data is still quite too big
to work with. For that reason one can use poolers, which are nothing but dull compression algorithms.
The most used pooler is the max pooler. (Graham, Benjamin. 2014)
This simple unit traditionally takes four adjacent pixels, then determines the darkest one, and simply
concatenates the four original pixels into this smaller single one. Repeat this process over the whole
image, and you just scaled it to one fourth of it’s original size.
3.2.3 Activation function
Every procedure and concept that we described so far ia a linear function. To make a combination of
layers meaningful, we need to introduce nonlinearities after each convolution, as a stack of layers would
otherwise behave like just one big linear layer.
This is done by an activation function layer.
The most commonly used one in the ﬁeld of image recognition is the rectifying linear unit, in short ReLU
(Alex Krizhevsky, Ilya Sutskever and Geoﬀrey E. Hinton. 2012). It’s deﬁnition is extremely easy:
f(x) = max(x, 0)
In other words, it just replaces every negative value in a feature map for a zero.
4 Hippocrates, a NEAT implementation
The currently available implementations of NEAT are suboptimal.
Most machine learning frameworks and libraries are focused on training by backpropagation and only oﬀer
limited support for genetic algorithms.
Dr. Stanley’s original implementation in C++ (Kenneth Stanley. 2010) was written before the major
revisions in the C++ language, which made the language very diﬀerent to use. (Bjarne Stroustrup. 2013)
The Code is no longer eﬀectively usable, as it is ridden with experimental features, afterthoughts, dead
code and patterns of thought that are no longer in use.
The most usable implementations are all written in python, which makes them very easy to use but also
very slow when compared to optimized C++.
This is why in 2014 Mr. Ferner decided to work on an "actually usable" implementation of NEAT, which
he called Hippocrates.
At the begginging, a big question was, in which language we should write our library in.
The main contestants where C++ and C#. We had to juggle diﬀerent pros and cons.
One aspect is, how easy the actual writing would be. There is a concept in programming languages which
is called memory safety. It describes, how and when objects end their accessibility.
Just like real life, a program is made out of various objects, each of which having a distinct state and
One such possible object could be a dog. It’s state, which is divided into a set of variables, could for
example consist of his age, his haircolor, his character and so forth.
His possible actions, which are called functions in the programming world, could include bark, walk or
lick. Some of his functions might even alter his state, like a function for celebrating birthday might change
the age variable by +1. These objects that compose a programm however have to, just like in real life
again, die oﬀ at some point.
Our program might spawn hundreds or thousands of objects. If we do not do something about it, these
objects would clog up our entire memory and slow every process down. The question becomes "when does
their lifetime end"?
So called safe languages like C# answer by saying "whenever absolutely no one needs them anymore any-
where". This very hedonistic principle is enforced by a garbage collector. This is a program that carefully
inspects a running process and it’s objects and ﬁnds out, if an object is really not used anymore. Modern
day garbage collectors have become very eﬃcient at what they’re doing, but still require performance. An-
other disadvantage is also that garbage collectors are non deterministic, which means that a programmer
can never now for sure at what exact point the objects get destroyed. If the garbage collector that it’s
time to free up some space, it’s gonna do it no matter what. If this happens during a performance critical
part of the application, it’s going to be slowed down by a lot.
The counterpart are unsafe languages. C++ is called unsafe because before 2011 it didn’t have a standard
way to manage lifetimes of objects except for forcing the programmer to watch over the memory manually,
often leading to corrupt data and undeﬁned behaviour during the runtime. (Bjarne Stroustrup. 2013)
In modern C++ however, lifetimes of complex objects can be managed by so called smart pointers, which
are implemented as reference counters.
This means that everytime a function tries to use an object, it’s reference counter goes up by one. If the
function is done with it and doesn’t need the memory anymore, this counter goes down by one. As soon
as the reference counter hits zero, the object is destroyed.
This gives the programmer determinism, which means that he now knows exactly when the memory is
going to get freed (provided he designed his application carefully). This however comes at the cost of
requiring more design skill than a using a garbage collector.
In certain edge cases it is possible that reference counters use up more performance than a garbage collector,
as the latter is free to do more optimizations on the ﬁnal code provided he can prove that the end eﬀect
is the same.
Additional considerations are that the most used machine learning libraries are written in C++, however
C# has way better system of actually distributing the libraries.
This gives us a though decision: Do we want the comfort and stability of C# for increased productivity
or the absolute control and performance power of C++?
In the end, Hippocrates was written in C++, as we deemed the performance of the library to be of crucial
importance to the usability in the future.
Mr. Ferner and Ms. Zarubica already wrote C++ since years at his company, the Messerli Informatik
AG, and Mr. Ferner had a lot of experience teaching apprentices the ins and outs of the language, which
is why he was happy to assist Mr. Fischler and Mr. Stucki in learning the common syntax and semantics
of modern C++.
To determine if two organisms are compatible for reproductions with each other, one measures the diﬀer-
ence of their genomes by a distance function. The original paper describes it as follows:
Therefore, we can measure the compatibility distance δof diﬀerent structures in NEAT as a
simple linear combination of the number of excess Eand disjoint Dgenes, as well as the
average weight diﬀerences of matching genes W, including disabled genes:
The coeﬃcients c1,c2, and c3allow us to adjust the importance of the three factors, and the
factor N, the number of genes in the larger genome, normalizes for genome size
Typical settings for the coeﬃcients are c1= 1.0, c2= 1.0, c3= 0.4.
(Stanley, Kenneth. 2002)
However, if we look at Stanley’s code (Kenneth Stanley. 2010), the actual formula we ﬁnd is
Where Wis the sum of absolute weight diﬀerences.
The same function is used by all the NEAT implementations that we looked at. This deviation is most
likely intentional, although not explicitly documented by Stanley himself. In the original function, the
importance of excess and disjoint is limited to the sum of 1(because there can be at most Nnot matching
genes). This means that for two completely diﬀerent networks, our function results in
δ= 1 ·W
where Wis unlimited. This means, that the weight diﬀerences would be a lot more important than
the topological ones, which stands in contrast to the usage of the function as an indicator of topological
compatibility. (Colin D. Green. 2009)
Because of this, we use the second version of the function without normalization.
4.3.2 Original implementation
We didn’t implement the ability for neurons to form recurrent connections, i.e. connect to previous layers.
This feature is traditionally used to simulate short-term memory in e.g. speech recognition, where one
word alters the meaning of another. (Haşim Sak, Andrew Senior, Françoise Beaufays. 2014)
As our images are not sequentially interconnected (as e.g. in a movie), we do not need this.
4.4 Visualizing Neural Networks
4.4.1 Traditional Neural Networks
In the following section, visualizing is meant to be about visualizing the structure, and not about visualizing
what neural networks see.(Jason Yosinski, Jeﬀ Clune, Anh Nguyen, Thomas Fuchs and Hod Lipson. 2015)
Traditional neural networks are relatively easy to visualize.
An example of this is such a network with two inputs, three hidden neurons and one output neuron.
For us, the minimum of visible structure for a neural network to be readable is knowledge of the input
layer location, the neurons (displayed as circles) and the connections, displayed as lines.
Additional information that we found useful in understanding the network would be showing the exact
weight of the connections.
We created a algorithm to calculate the scaling of the diﬀerently sized networks automatically for a ﬁxed
For this algorithm, we need information about the amount of layers, the max amount of neurons in any
For the deﬁnition of the algorithm, we simply assume the the network will be shown from left to right,
with all input-neurons to the left.
The x-axis below is deﬁned horizontal, the y axis vertical.
width is the available width (also xSize), height is the available height (also ySize).
layerCount is deﬁned as the amount of layers, maxNeuronCount as the max amount of neurons in any
AStep (x or y) deﬁnes the distance between the centers of neurons, in x or in y direction, respectively.
xStep = (xS ize −((layerC ount + 2) ∗(minM argin +neuronRadius ∗2)))
yStep = (ySize −((maxN euronCount + 2) ∗(minM argin +neuronRadius ∗2)))
whereat minMargin is the margin that should be kept between neurons to make sure they aren’t over-
lapping, and neuronRadius is the radius of the neurons to be drawn.
layerCount + 2 is there, because there are also borders to be kept at the corner of the drawing area to be
drawn on - exactly 2 per dimension.
Of course, this means that if (minM argin +neuronRadius)∗(layerC ount + 2) > width the neurons will
overlap anyways and the structure will be hard to read.
This will result in such a structure (taken straight from our software NEAT_Visualizer):
4.4.2 Navigating through generations and species
For us, not only the end result was interesting when analysing results from a run, but also the evolvement
itself. However, inspecting the evolvement is very interesting, but also complex.
For every generation, there are multiple species, who in turn contain multiple organisms themselves.(Stanley,
In our application for inspecting these structures, NEAT_Visualizer, we can read a JSON dump with
logging data from Hippocrates and they will be loaded into the application. The user sees the interfaces
Both - the left and the right - views are showing a selected generation, a selected species and no organism
selected yet. The control reads from left to right - generation, species, organism.
The numbers are always - left ﬁrst - Index, then Fitness. As an example, the left picture above has
generation 15 with a maximum of 3 ﬁtness selected, and its species number 12 with a maximum ﬁtness of
Here is a full view of the visualizer’s graphical user interface (and the network drawing algorithm conﬁgured
to have all the inputs at the bottom):
4.4.3 Convolutional Neural Networks
"To visualize the function of a speciﬁc unit in a neural network, we synthesize inputs that cause
that unit to have high activation." (Jason Yosinski, Jeﬀ Clune, Anh Nguyen, Thomas Fuchs
and Hod Lipson. 2015)
This way, a artiﬁcial picture can be created that represents what the networks "sees".(Jason Yosinski, Jeﬀ
Clune, Anh Nguyen, Thomas Fuchs and Hod Lipson. 2015), (Karen Simonyan, Andrea Vedaldi, Andrew
This is an example of such artiﬁcially created images:
4.4.4 Technolgies used for our visualizer
To create a visualizer, we had to chose technologies - for code and graphics.
Due to convenience we decided quite immediately, that C# would be our language of choice. It oﬀers a
high productivity with a very concise and remarkable syntax, and is very well known for some of our team
users. C# runs on the most used operating systems easily
Also, C# oﬀers a very healthy ecosystem that allows developers and engineers to chose freely between
competing products, all more often than not for free.
The decision about what graphics/GUI system to be used was harder.
The prime choice would have been WPF, however, it is limited through it depending on Windows drivers
for DirectX. This rules WPF out, because we are convinced of the idea, that if possible, our tools should
be available for everyone, not only just Windows users.
Other possibilities would include Gtk-Sharp, WinForms and Avalonia.
The latter one is just in Alpha and was discovered by Mr. Fischler while researching possibilities.
However, it seemed to have similar features and approaches as WPF.
Gtk-Sharp has many appealing features, but no good scalable drawing area. It runs well on Windows,
Linux and macOS.
WinForms is very stable due to its age, but will only run on Linux with help of a simulator called Wine.
Wine can be found under https://www.winehq.org/.
With that, it seemed the most exciting and still best option to chose Avalonia for development.
Avalonia has a interesting modular system of rendering subsystem, currently supporting Gtk and Cairo
(Windows, Linux, macOS) and Win32 with Direct2D (only windows). Skia is currently planned to be
implemented to be and replacing Gtk.
While creating a tool to visualize the structures of NEATly generated network, we faced multiple problems.
We decided to use Avalonia (https://github.com/AvaloniaUI/Avalonia) as a framework for the visualizer.
Because Avalonia is based on C# (that is a non-native, safe, just in time compiled language), it can not
natively exchange data with Hippocrates, that is written in C++ and compiled for a certain platform.
There is a method called interop marshalling, that would provide a solution to such a problem.(Microsoft.
This method however has been designed for Windows, and will work diﬀerently on Linux.
Also, it would make the implementation of the visualizer dependent of the memory layout, which is a huge
constraint to be taken into consideration. That’s why we decided against using interop marshalling.
Another possibility is using the ﬁle system to exchange data. All data that belongs together will be
contained in a folder with a ﬁle per logical unit that it represents.
This is the approach we took to avoid having memory incompatibilities For saving data in a ﬁle however,
you need a common representation for the data you want to exchange between programs.
The keyword to that problem is serialization. Serializing data is the procedure of converting data from
the native memory to a more general (human readable or non-readable) format.
We decided to use a humanly readable and well known serialization format for Hippocrates, because it
allows us more ﬂexibility and automation in terms of serializing and deserializing (reading the data into
native memory again, but maybe diﬀerently structured).
The two most often used and most famous humanly readable data formats are XML and JSON.(Tom
We decided to go with JSON, because it is more lightweight and by now more often used than XML.
Memory overﬂow is a problem when reading lots of data from a ﬁle system - it can be fought by not
reading all the data, but some data after another and only when needed, and discard as much as possible
when not required anymore.
This is often also called lazy programming or lazy initialization, and it ended up being what we implemented
to ensure that the visualizer wouldn’t collapse under big Hippocrates dumps.
5 Build tools
5.1 Version control
A version control system is a computer program that tracks every ﬁle change in a directory. It allows to
revert to another version of a ﬁle if one want to undo something. It is also great for collaboration, as it
records who made which change.
Git is a version control system that was ﬁrst proposed by Linus Torvalds in 2005. (Torvalds, Linus)
It is a free, open-source version control system, which we use for our entire source code and documentation.
We ﬁrst separated our code into multiple repositories (Torvalds, Linus et al) as we thought it would
make sense to keep NEAT and CNN separated. Later we decided that it would make more sense to keep
everything in a single repository, as we had to use both parts simultaneously.
A repository is like a project folder, but it is synced across multiple computers.
Git has many powerful tools that defeat their antecessors from other version control by a big margin in
terms of usability, performance and stability.
One of these tools is the merge tool. It allows to either automatically - if no conﬂicts happen - or manually
- merge together ﬁles from diﬀerent branches or repositories. This is very useful when working together
with multiple teammembers, because you don’t have to watch out too much about working in the same
ﬁles - as long as there’s no redundant work done - because the merge tool is able to often ﬁx alot of
collisions automatically, or if not, it marks the colliding parts so users have less hard times ﬁxing the
When creating and pushing commits onto git repository (a commit is a subset of changes) everyone gets
a copy of this commit, as soon as queried for it via "pulling" (getting the latest changes from a remote
Because of that commit messages are important. They ought to explain what the commit changed on the
To make sure everyone can understand what has been done, we adopted some rule set for naming com-
mits(Chris Beams. 2014):
•Separate subject from body with a blank line
•Limit the subject line to 50 characters
•Capitalize the subject line
•Do not end the subject line with a period
•Use the imperative mood in the subject line
•Wrap the body at 72 characters
•Use the body to explain what and why vs. how
The limit the subject line to 50 characters rule is very useful:
It guarantees that on github, the commit message will be readable without requiring a user to expand a
area of the page.
The use the imperative mood in the subject line rule is useful because it makes commits more readable. As
we have used this rule, it has become more and more clear to us that not using imperative means having
As an example, instead of "Add Implementation" the commit message could be "Added Implementation".
That is two characters more without any gain of insight or readability. Thats why we found this rule
GitHub is a web-based Git repository hosting service. (Preston-Werner, Tom)
This means for us, that we have a central place where our data is located. This also enables us to
simultaneously work on the code, which increases the speed of development.
GitHub provides all services for free when developing an open-source application. (Preston-Werner, Tom)
We use some of the GitHub features to improve the quality of our documentation. We have set it up, so
that any change to the documentation has to happen on a new branch and before it gets copied over to
the main one, another member of the team has to approve the changes. (GitHub, Inc.)
5.2 Integration tests
We use automated testing to check each code change for issues. This means that the code everyone works
on is located in a separate git branch (Torvalds, Linus) and has to pass all integration tests before it gets
merged into the main branch. A branch can be seen as a copy of the project that one works on in parallel
to the original. When the work is done, the changes are copied over to the original version.
But before the changes can be added to the main branch, they have to pass our tests, which are basically
dummy-programs that get executed on Travis and AppVeyor and require diﬀerent parts of the software
to work. This assures that we always have a stable version and improves code quality.
These tests are performed on remote servers and the status is visible on GitHub. (GitHub, Inc.)
Travis and AppVeyor both use the same technology. They create a virtual machine to simulate a computer
and run our software on that machine.
Travis is a german service provider for automating integration tests that can be found under https://travis-
Travis oﬀers its services for free to open source projects. (Travis CI, GmbH)
We use it to compile and test our code on Linux. Travis also supports macOS, but since they both use
the same compiler we chose to just use Linux.
Travis also generates the PDF’s for our documentation and warns us if a citation is missing a bibliography
This automatic generation allows us to control the provided pdf remotely, without the need for building
The services we used from travis have one big downside - they have no caching or preinstalled conﬁgura-
tions. This means when using LaTeX or modern compilers that are still under development and not fully
released, they have to be installed ﬁrst, and this will take its time.
Having the security of knowing when the PDF of the documentation still builds is something we value a
lot and have learned to value even more when multiple people work at the same time.
Travis is a canadian service provider for automating integration tests on windows that can be found under
Appveyor also provides its services for free to open source projects. (Appveyor Systems Inc.)
We use it to compile and test our code on Windows with Visual Studio.
We struggled a lot with appveyor, because our Visual Studio conﬁgs were based on Visual Studio 2017
RC and they required this version to run.
However, when ﬁrst used by us, Visual Studio 2017 RC was just released in closed beta as a pre-installation
for continous integration. We had to get access to it by requesting for access through the public repository
Once correctly conﬁgured however, we never had any problems with appveyor.
CMake is a tool to control processes of software compilation and testing. (Kiteware)
It allows us to write a fairly simple conﬁguration ﬁle which can then be used on multiple platforms. It is
a high level conﬁguration that has to be converted into a platform speciﬁc one. This conversation is being
done automatically by the build system (CMake) and thus does not cost us any time.
This allows us to support almost every operating system, as it was important for us the be platform
The problem with platform dependent solutions is that they are not as accessible to everyone, and we
really want to support all major operating systems to make our code and work as accessible as possible.
CMake supports a hirachial setup of its build-tool, that allows you to move parts of the build tool to
subfolder, and then chaining the build-scripts together with a root build script.(Kiteware)
CLion is an integrated development environment for C++ developers. (JetBrains s.r.o.)
We decided to use it over other available tools because it is the best tool available for macOS and Linux
and we felt like we wouldn’t get any productivity raise otherwise after trying several other IDEs.
One of the problem with CLion is that it is not up-to-date with all of the latest developments in the C++
programming language. This makes it almost impossible to use it for modern C++ development.
We then even started using plain text editors in edge cases on Linux to not be limited by the editor
and compile our code with the new C++ features with compiler that we accessed via the command-line
This was only a problem on Linux and maxOS, because for Windows, the very well known Visual Studio
IDE is available, which supports the features we wanted in a release candidate that is publicly available.
6 Combining Neuro-Evolution of Augmenting Topologies with Convo-
lutional Neural Networks
6.1 Challenges & Solutions
The goal of NEAT is to make topological units modular. These can then be combined in a not predeter-
mined way. So our two questions while combining become:
1. How can we make CNN’s modular?
2. How can these units be combined in a meaningful way?
Our ﬁrst approach was simply taking NEAT and exchanging some of the neurons for Filters.
An example network can be seen here:
This approach is probably as modular as it gets, however it brings various problems when combining.
1. We ignore one of the main advantages on CNN: Being able to drastically lower the number of inputs
2. We don’t use Pooling or ReLU layers
3. The signiﬁcance of a single classic neuron in such a system is questionable
4. The ﬁlters in the same layer have to have some way of communicating to form a convolution
5. Adding a new ﬁlter in a convolution conﬂicts with previous learned parameters
We can’t address all of these conﬂicts in a satisfactory way, so we decided to go on to a next approach.
We adressed issue 3 by separating the whole network into a convolutional and a fully connected part. This
allows us to take issue 1 by adding the concept of a minimal network, inspired by NEAT’s practice of
always starting with combining all inputs with all outputs.
In our case, the minimal network would incorporate some combination of convolution and pooling to re-
duce the input space. While the exact form of it is debatable, we think a good starting point is LeNet, as
it proved itself to be ﬂexible in its application. (Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick
The overhauled version would start out like this:
And could evolve into something like this:
This setup is problematic because ﬁlters are supposed to work together to form convolutions.
To process the same input (issue 4), the ﬁlters need to have the same size, which we cannot guarantee
once we randomly insert new ﬁlters or, as per issue 2, pooling layers.
We can only scale the weight matrixes in the ﬁlters to the same size by either ﬁlling the smaller ones with
a bunch of meaningless zeros or pooling the bigger one down, which, beeing a non-lossless compressing
algorithm, makes our matrix less accurate.
We come to the conclusion that we have to limit the modularity of the Filters, as doing otherwise brings
to many cons. Instead of letting the ﬁlters connect to whatever they want, we group them in convolutions.
These can alter the dimensionality of all ﬁlters in them at once, guaranteeing homogeneity and encapsu-
With the ﬁlters now being synchronized in their convolutions, we have no more problems introducing
poolers or ReLUs, as a convolution as a whole doesn’t care about the size of it’s input matrix.
Our updated pool of available units for stochastic insertion is now:
Convolutional Fully connected
Our starting topology now looks like this:
The possible developments consist of a chain of random units right after LeNet.
This raises a new question: How is the meaning of the fully connected part altered when we add a new unit
in the convolutional part?
After detailed evaluation, we came to the conclusion that all of the parameters in the fully connected
part would be ﬁne tuned to a speciﬁc expected input. This expectation however ceases to be met once
the dimensionality of the convolutions changes, as this shifts a lot of weight parameters towards a new
This means that we have two choices on how to process the fully connected part in case of a topological
change in the convolutional part:
1. Adjust weights for the new meaning
2. Trash the fully connected part and train it anew
Both of these possibilities are not satisfactory. 1 will take a long time, since the already trained fully
connected 4structure is basically meaningless now. 2 throws away big, otherwise perfectly usable, parts of
After some research into this problem we found a recent paper from Google, describing how to get rid of
the fully connected layer completely by using a global average pooler (Min Lin, Qiang Chen, Shuicheng
If we treat the feature map matrix Fat the lth dimension as a vector F0
l, the global average pooler is
deﬁned as follows:
l) = Pn
We then forward the results of every layer to the softmax layer.
Provided the last layer of the convolutional part outputs a tensor with exactly as many dimensions as the
number of possible network output, we can exchange the complete fully connected part for this global av-
erage pooler while achieving the same results with a drastically improved performance in both evaluation
and search space. (Min Lin, Qiang Chen, Shuicheng Yan. 2014)
The reason is, in a nutshell, that we stop imagining the output of a ﬁlter as detection of a feature.
We now treat it as a rate of conﬁdence: The bigger the numbers, the more conﬁdent we are that the
feature is present.
This means that the feature detection is no longer performed by the fully connected part, but instead by
every single ﬁlter in the network together (Min Lin, Qiang Chen, Shuicheng Yan. 2014). Our standard
network now looks like this:
And could develop into something like this:
We now seem to have resolved all issues, however when looking at the layers of the example, we see that
it has a depth of 8 logical layers (ReLU layers are not counted because they do not result in a feature
extraction, as they are merely activation functions). This huge amount is very atypical and has been
shown to result is various problems such as very high hardware requirements and lower accuracies(Karen
Simonyan and Andrew Zisserman. 2015).
The fundamental problem is that the eﬀect of a change in the parameters in a lower layer becomes abysmal
compared to a change in the higher ones (Karen Simonyan and Andrew Zisserman. 2015) (?). A network
of this size is not realistically trainable by us.
A very recent paper now belonging to the Facebook AI Research group deals with these issues.
They introduce the concept of Residual Networks, in short ResNets. (Kaiming He, Xiangyu Zhang, Shao-
qing Ren and Jian Sun. 2015)
Their goal was to create a convolutional network by combining an arbitrary amount of well deﬁned residual
units on which these problems are of no concern. Overly simpliﬁed, they address the problem of varying
inﬂuence by adding a new kind of connecting, called a shortcut.
What it does is simply add matrixes. If they have diﬀerent dimensionalities, the smaller matrix gets
projected on the bigger one by being processed by a one by one matrix with a respective number of ﬁlters.
A residual block looks like this:
On the left side, a convolutional action takes place (in this case two convolutions with one ReLU activation
inbetween). On the right side, the original input of the residual block is added to its output.
This overlay guarantees that the convolutions cannot alter the original state too much, as they now merely
highlight features as opposed to extracting them.
The issue of performance is addressed by applying a bottleneck.
This means downsampling the input dimensionality of the residual block by applying one by one con-
volutions before performing the convolution and then upscaling it again. This procedure is inspired by
Googles Network In Network Inception structure (Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian
Sun. 2015) (Min Lin, Qiang Chen, Shuicheng Yan. 2014).
The overhauled residual block now looks like this:
While more convolutions would in theory be possible, only one is used, as the bottleneck dimension poolers
introduce new parameters themselves.
This method has been demonstrated to achieve very similar levels of accuracy while reducing a bit chunk
of the computational cost (Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. 2015).
Residual Blocks are modular by nature, so they are a perfect ﬁt for our NEAT algorithm.
When analyzing them, we can easily extract following parameters from them:
1. Weights of ﬁrst dimension pooler
2. Weights of convolution pooler
3. Weights of second dimension pooler
4. Weights of shortcut projection (if needed)
5. Downscaled number of dimensions in each residual block
6. Upscaled number of dimensions in each residual block
7. Number of convolutions in each residual block
8. Total number of residual blocks
Through traditional means we can adjust the parameters 1 to 4.
Numbers 5 to 8 are predeﬁned in ResNet. Their exact values are deﬁned empirically and experimentally.
This is of course suboptimal, as we already asserted in chapter two.
We think NEAT can optimize these by encoding them as genes in the genome.
However, because of the nature of our smallest building blocks, it doesn’t make sense to store these genomes
in a per-connection basis.
All parameters can be described as state of a residual block. For the last one, we just abstract it as a link
to the next block. If the algorithm decides to add a new residual block, it can be inserted in a random
For the parameter tuning, we treat numbers 1 to 4 as a big vector of weights inside the genome of the
residual block and apply the same chances and rules of change to them as in standard NEAT , which are:
•Chance of selecting this genome to change weights: 80%
•Chance for each weight to be uniformly perturbed: 90%
•Chance for each weight to be set to a random new value: 10%
(Stanley, Kenneth. 2002)
Parameters 5 to 7 are more critical, as they greatly eﬀect the computational cost.
We limited changes to be only +1 in each, to go with NEAT’s thought of starting with a small topology
and only going up if necessary. The chances are thus taken straight from how NEAT treats extra neurons:
Each genome has a 3% chance of mutating one of the mentioned parameters at birth.
Lastly, parameter 8 is the one directly in control of the network’s depth. As ResNet proved, deep networks
have great advantages to shallow ones (Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. 2015).
So we made this value very prone to grow by one. The chance is analog to NEAT’s chance of adding a
new connection, which is 30% at birth.
Additionally, we changed the compatibility functions c3parameter to 0.06 to account for the higher number
of potential weight diﬀerences per genome.
While programming according to our algorithm, we continuously tested our code.
We unfortunately found out during one of those tests, that our implementation of matrix multiplication
while applying a ﬁlter is not nearly fast enough to process high quality scans of mammographies.
A simple test with 32 by 32 images conﬁrmed our fear: Deep networks with a dimensionality higher than
10 are not realistically computable in a given time. By contrast, the deepest ResNet uses more than 1000
dimensioniona in its lowest layers. Given that genetic algorithms go even further by not training one
network, but 100 at a time, and considering our limited time, we had to halt further research.
7 Further enhancements
The single biggest challenge we faced was performance.
We had estimated that for training a full set of 800 pictures at a mere 400 to 400 pixels, we would need
months for just training the network once. This held us back from eﬃciently mesuring our algorithms
"Currently, large-scale CNN experiments require specialized hardware, such as NVidia GPUs,
and specialized APIs, such as NVidia’s CuDNN library, to achieve adequate training perfor-
mance." (Firas Abuzaid. 2015)
Firaz Abuzaid also mentions that "at runtime, the convolution operations are computationally expensive
and take up about 67% of the time; other estimates put this ﬁgure around 95%".
We were (unfortunately) able to conﬁrm these numbers as realistic - one line of code (the multiplication
of the matrices values) took up to 93% of the execution time when testing our code, the loop for executing
these multiplications took another 6% of the execution time.
Here are some improvements that could be done to optimise the performance of convolutional neural
•Using GPUs to accelerate matrix-multiplications(Robert Hochberg. 2012)
Using the power of GPUs for complex and computation heavy calculations has been very important
the last years in the industry. GPU toolkits seem to consistently perform the same tasks ﬁve to ten
times faster than their CPU counterparts.(Firas Abuzaid. 2015)
•Using the CcT method to optimize CPU usage
The CcT method has proven to be up to 4 times faster than one of the often used CPU toolkits for
machine learning; Caﬀe. Utilizing this method would allow us to improve the performance of CNNs
by a big margin without having to use expensive GPUs.(Firas Abuzaid. 2015)
There are other approaches of optimizing CNNs to be more eﬃcient, such as Low Rank Expansions(Max
Jaderberg, Andrea Vedaldi, Andrew Zisserman. 2014), the approach of Optimizing a FPGA-based Accel-
erator Design for Deep Convolutional Neural Networks(Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan,
Bingjun Xiao, Jason Cong. 2015) and Convexiﬁed Convolutional Neural Networks(Yuchen Zhang, Percy
Liang, Martin J. Wainwright. 2016).
All these approaches share the same limitations for us - it is unclear whether they are even compatible with
our NEAT based evolutionary algorithm, and if they are, the changes to the inner workings of our algorithm
would be drastic, so that benchmarking would be hard. Due to the recency of these developments, it is
hard to fully estimate their impact onto our model and performance.
7.2 Safety concerns
We learned in the presentation of Dr. Krause at the SGAICO Annual Meeting and Workshop - Deep
Learning and Beyond in Luzern of the concept of safety constraints.
He oﬀered insight into his current studies about how to train system that have inﬂuence over real life and
can cause harm. Examples where:
•A quadcopter learning to ﬂy around a stationary object. It could potentially ﬂy in a manner resulting
in a crash, damaging itself or propriety, causing ﬁnancial damage.
•A system learning how to apply a new experimental treatment to patients. This can end in life
A big point to consider here is the bayesian concept of false positives vs false negatives. In other words:
"What is more critical, telling a patient he is sick when he is not (false positive) or telling him he is ﬁne
when he is actually pretty ill (false negative)?"
Of course, the answer to that depends on multiple factors such as treatment cost and lethality of the
condition. Dr. Krause proposes mechanisms that do not allow damaging decisions once you have settled
on a deﬁnition of what "damaging" means in context of the training. We think this is very relevant to the
ﬁeld of medical diagnostics and so a good improvement to consider in the future.
HypernNEAT is a further adjusted version, often also called an extension, of the original NEAT algo-
rithm.(Jessica Lowell, Kir Birger and Sergey Grabkovsky. 2011)
HyperNEATs major problem is that it has performance hits compared to the original version of NEAT.
With that, the already performance ﬂawed system of combining NEAT with CNNs would be too slow:
"Finally, one major problem with HyperNEAT is that it is very slow, even on a multi-core
processor."(Jessica Lowell, Kir Birger and Sergey Grabkovsky. 2011)
8 Our Work
We have dedicated our project work to the subject Image recognition by artiﬁcial intelligence.
Image recognition by artiﬁcial intelligence has and interests us very much, because you can create some-
thing that does not exists yet. Our project is mainly concerned with computer science (artiﬁcial intelli-
gence) and medicine. An important aspect for our work was to create something that can be needed in
the future and what can be of beneﬁt to other people. Because we are software engineers, it is also a good
opportunity to train us in our area.
8.1.1 Project Group
Mr. Ferner has thought of realizing this project quite a while ago. Mr. Stucki, Mr. Fischler and Ms.
Zarubica have found mutual interest in this topic and wanted to form a group meant to help Mr. Ferner
to implement the idea.
Our motivation is to create something new together that can help people with their lives.
Mr. Benno Piller was the administrative supervisor of the project and has helpfully advised us whenever
we had administrative questions or were in need of an external opinion.
Ms. Polina Terzieva, Bachelor of English philology, proofread multiple sections and provided sporadic
support in linguistic and stylistic questions.
8.1.3 Medical Support
We have contacted two medical specialists who are willing to look at our project and help us with it by
providing us with data.
PD Dr. med. univ. Christoph Tausch
General surgeon with focus on clariﬁcation and treatment of breasts
Dr. med. Seraﬁno Forte
Deputy ead doctor radiology
Both have submitted a request for a studyprotocol, which can be found in the attachments.
Dr. Tausch also wanted to know about the type of data needed (which age, gender, cancer type, etc.)
8.2 Our goals
While working on our project we had to take our goals into account. So we found a lot oft them. Here
are our main goals.
•Read mammography correctly
•Probability indication to 95%
•Present the knowledge to the layman
•Reach platform independency
The ability to read mammography is very important because besides the project consisting of image
recognition, it needs mammography too. Certainty is also really important for the project, and it puts a
"stamp" on it because one has to be very sure before putting a cancer diagnose.
A visual display is also a goal the project has, as it makes the use of the software easier and more appealing
to the eye. A presentation for the layman could be a good supplement, providing them with the information
they need, just to have a general clue.
As a conclusion, our project should be softwate-independent with the purpose that one could be able to
use it on any device.
8.3 Initial position
- The project took oﬀ with a semi-ﬁnished NEAT-library
- There is a library named "Hippocrates" consisting of approximately a half a year of work
8.4 Opening questions
•Is it possible to combine NEAT and CNN?
•Is it possible to carefully explain the complexity of subjects and if possible, simplify them?
•Is it possible to cooperate and work with the hospital?
•Is it possible to evaluate mammographys at home with the software we possess?
•Is our software capable to emulate a human evolution?
•Is our software usable as an assistance system in the future?
8.5 Working programms and tools
We have used the following platforms and programs for the work Visual Studio; LaTex; and Github.
8.5.1 Visual Studio
Visual Studio is a programming tool developed by Microsoft for Windows we have programmed all of our
software in. We have worked with it quite a lot as well. Since Visual Studio is very modern, it didn’t
cause us problems and without a doubt helped us with the project changes that we made.
Changes applied to the software structure were not too diﬃcult.
The tool we used for our documentation and design is LaTex, the advantages of which are that LaTex
takes complete care of the presentation of the documentation, including the citations. This guarantees
that we waste as little time as possible with visually designing the documentation.
Some rather special formulas can also be used wich the program formats it by itself so we don’t have to
worry about it.
Github is an online platform we used to store the code we have written in. In Github people can work on
projects together and view other people’s work. It can also work in a way that people give other people
tasks or demand a review of a task. Github is and has been an enormous help to our project, beause it has
given us an overview of the things we have programmed. On top of that, Github gives us reasssurement
because every work has been checked and reviewed by some other member of the site.
In Github you can also make branches. A branch is a copy of a project. On one hand, it oﬀers additional
security for the programmer because if any changes are made in the original project, it can be possible
that this would no longer works. On the other hand, if one is working on the branch, nothing can be
Changes and edits in the branch can of course be made in the original project, but only if it’s in order.
Since our project was public on Github, people who had no idea about it, but were interested in it, could
take a look at it.
Specialists in our area of expertise could look at our program and give us tips, for example on how to
improve it. Github has many advantages and oﬀers the user many possibilities.
8.6.1 The beginning
Our starting point was picking a topic for the project.
We haven’t been dealing with this for quite a long time because we already had a concrete idea of what
the project was going to be about.
Jan Nils Ferner (project member) had an inspiring idea with which he truly inspired us.
Roughly one year before the project he had the idea to make possible the recognizing of diﬀerent images
using artiﬁcial intelligence.
In the course of this year, he collected information and experiences, and came up with a concept to make
image recognition possible in the ﬁeld of medicine, more precisely breast cancer (mammography).
When we started out the project, we had a very interesting topic and also someone who had previously
dealt with it.
8.6.2 The planning
At the beginning, a rough plan of how the project is going to develop was made.
We have looked at the aspects which are very time-consuming and important. After this, we proceeded
by making a weekly planning.
We knew what we had to do and how to do it, of course thanks to the planning but since that wouldn’t
be enough, we had to group each week and discuss tasks carefully.
We also discussed how we would like to cooperate with the doctors.
8.6.3 The realisation
Since the planning has been completed, it was time for the realization.
As already mentioned in the planning, we worked in a weekly schedule.
This means that we divided all the big tasks into smaller ones, each one going on for about a week.
Having a really good overview of our project, we knew exactly where we stand.
Also we had a meeting every week and discussed our concerns about the project.
The biggest problem with the realization was that we lacked the computing power with which to run our
project and visualize it.
We had a problem to present our project correctly and to make it clear to the people what we have
achieved completing it.
Apart from this, the realization ran very well.
Our job was to mainly work on NEAT (NeuroEvolution of Augmenting Topologies) and CNN (Convolu-
tional Neural Network).
When both parts were completed, it was our goal to connect them together.
In addition, we cooperated with a few doctors that helped us by providing us with data we needed.
We searched for them on the internet and in books, at least the ones that were specialists in the ﬁeld of
oncology (tumor diseases).
We prepared a study protocol to present our project to the doctorsd.
After checking out our study protocol, Dr. Seraﬁno Forte from the Hospital of Baden invited us to intro-
duce him to our project.
He gave us some helpful tips and we discussed the next steps.
8.6.4 The result
The result led to a dichotomy.
The two largest parts, CNN (Convolutional Neural Network) and NEAT (NeuroEvolution of Augmenting
Topologies), have been successfully connected together and they worked successfully. However, our com-
puting power is much too low to test our software on.
One of our main concerns is to explain people who haven’t encountered our program and this type of work
that it would still run successfully, despite us not being able to show it to them or visualize it.
With the performance of a really good laptop, the learning of the network would still take about a half a
year, which is unfortunately time we don’t have.
8.6.5 Our conclusion
Our software worked successfully and we are very satisﬁed with it. Unfortunately, we have used an out-
dated technology of mammography, which is nowadays almost to no longer needed. So it turns out that
our software can serve in only a few facilities as an assistance system. However, this doesn’t necessarily
mean it’s negative, quite the contrary - an advantage. Inserting the new technology is deﬁnitely possible,
and therefore our software would also support all technologies.
In conclusion, our software can be used as an assistance system, but we want to optimize the data to
make it easier for the doctors to work with it and eventually be able to help people thanks to the provided
The project was a very good opportunity to let our ideas run free and to further train our knowledge.
We are very optimistic for our software to be used and in future optimized.
Progress made until the end of October
Implement the load of networks (ZAR)
Complete the visualizer (MFI)
Complete the documentation part «How does a neural network work?» (JST)
Progress made until the middle of November
Complete the study journal (ZAR)
Complete CNN (JNF, JST)
Complete NEAT (ZAR, MFI)
Progress made until the end of November
Complete the documentation part «Automatic testing of software» (JST)
Complete the documentation part ”What is NEAT?” (MFI)
Progress made from the middle of November until the middle of December
Connect CNN with NEAT (All)
Testing of diﬀerent variations (All)
Execute benchmarks (All)
Possible tests in and cooperation with other people (All)
Progress made from mid to end of December
Evaluation of tests, writing of conclusions (All)
16.11.2016: SGAICO Meeting- Deep Learning and Beyond
23.11.2016: Delivery of the ﬁne concept (Table of contents)
18.01.2017: Delivery of the project and presentation
25.01.2017: Project exhibition
8.8 Contact with doctors
Not much time has passed after the planning until we already began looking for doctors who could coop-
erate with us on the project.
For this purpose we sought doctors who were active exclusively in the ﬁeld of oncology.
The term oncology is reﬀered to the branch of medicine that deals with the diagnosis, treatment and
prevention of diﬀerent types of cancer (tumor disease).
With our project being almost entirely about breast mammography, we needed doctors which are spe-
cialized in the ﬁeld of oncology, more speciﬁc - breast oncology. According to this categorization we then
After some research on the internet and also in books, we ﬁnally cut down 26 doctors who could be con-
sidered qualiﬁed to for the project.
We have sent all of the doctors a brief description of our project. We explained to them the core and the
content of the project. Some of them answered the e-mails we sent.
Most of the doctors have too much research to do, thus they just don’t have as much time.
We arranged a few appointments, which were unfortunately not in the time span of our project.
Two doctors sent us a study protocol request, which means they required a protocol of our project that is
We wrote the following protocol and sendt it to the doctors.
This we sent the doctors.
Dr. Seraﬁno Forte of the hospital of Baden has invited us to the hospital after examining the study
We talked to him about all the details in medical and organizational terms. At ﬁrst he looked at our mam-
mography record. Immediately he realized that our mammographys are obsolete, because this technology
is no longer needed and probably outdated.
Our data was in the old format of the ﬁlm-ﬁlm system. This was good for our project on the one hand
but on the other hand it didn’t speak so good.
It is deﬁnitely an advantage, because we have the possibility to have diﬀerent mammographys evaluated.
However, it has a disadvantage too.
It is very diﬃcult to get to the new mammographys because we need a lot of mammographys to train our
software, we have to submit an ethics application.
This is an even more detailed application than the study protocol, which is checking if our project is as
trustworthy as possible.
The application is examined by the Swiss Ethics Committees for Human Research.
In this application the ﬁnancial resources are examined.
The ﬁnancial resources are checked because the application is not free of charge.
To submit an application, you have to pay a basic ﬂat rate of 800.00 CHF.
A check should be made for how well can the people be put together. This means that four individuals
working as a team could possibly carry out such a project because at the end of it it’s just a product,
resulting from months of work.
Apart from the organizational, we have also looked at the medical aspects as mentioned earlier in the
For Dr. Seraﬁno Forte, it was quite important that our software could not only tell you whether a person
is diagnosed with cancer, but also what type of cancer does he have.
Alex Krizhevsky, Ilya Sutskever and Geoﬀrey E. Hinton. 2012. ImageNet Classiﬁcation with Deep
Convolutional Neural Networks.
4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf [As of:
Anderson, James. 1995. An Introduction to Neural Networks. MIT Press
Bäck, Thomas. 1996. Evolutionary Algorithms in Theory and Practice. Oxford Univ. Press
Bjarne Stroustrup. 2013. The C++ Programming Language (4th Edition). Addison-Wesley
Brownlee, Jason. 2016. How to Implement the Backpropagation Algorithm From Scratch In Python.
Buckland, Mat. Genetic Algorithms in Plain English.
URL: http://www.ai-junkie.com/ga/intro/gat1.html [As of: 12.09.2016]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, Jason Cong. 2015. Optimizing
FPGA-based Accelerator Design for Deep Convolutional Neural Networks.
[As of: 16.01.2017]
Chris Beams. 2014. How to Write a Git Commit Message.
Christian J. van den Branden Lambrecht. 2001. Vision Models and Applications to Image and Video
Processing. Springer Science & Business Media
Colin D. Green. 2009. Speciation in Canonical NEAT.
URL: http://sharpneat.sourceforge.net/research/speciation-canonical-neat.html [As of:
Cowan, Mark K. 2014. neural.
[As of: 09.09.2016]
Derrick Nguyen and Bernard Widrow. 1990. Improving the Learning Speed of 2-Layer Neural Networks
by Choosing Initial Values of the Adaptive Weights.
URL: http://www-isl.stanford.edu/~widrow/papers/c1990improvingthe.pdf [As of:
Dr. Frank Antwerpes. 2015. Endokrinologie.
URL: http://flexikon.doccheck.com/de/Endokrinologie# [As of: 16.11.2015]
Firas Abuzaid. 2015. Optimizing CPU Performance for Convolutional Neural Networks.
URL: http://cs231n.stanford.edu/reports/fabuzaid_final_report.pdf [As of: 16.01.2017]
Geoﬀrey E. Hinton. 2007. Learning multiple layers of representation.
URL: http://www.cs.toronto.edu/~fritz/absps/tics.pdf [As of: 15.01.2017]
Graham, Benjamin. 2014. Computer Vision and Pattern Recognition (cs.CV).
URL: https://arxiv.org/abs/1412.6071 [As of: 30.11.2016]
Haşim Sak, Andrew Senior, Françoise Beaufays. 2014. Long Short-Term Memory Based Recurrent
Neural Network Architectures for Large Vocabulary Speech Recognition.
URL: https://arxiv.org/abs/1402.1128 [As of: 15.01.2017]
Hess, Bernhard. 2011. Publicus 2012. Schwabe AG
Jason Yosinski, Jeﬀ Clune, Anh Nguyen, Thomas Fuchs and Hod Lipson. 2015. Understanding Neural
Networks Through Deep Visualization.
URL: http://yosinski.com/deepvis [As of: 14.01.2017]
Jessica Lowell, Kir Birger and Sergey Grabkovsky. 2011. Comparison of NEAT and HyperNEAT on a
Strategic Decision-Making Problem.
URL: http://web.mit.edu/jessiehl/Public/aaai11/fullpaper.pdf [As of: 2017.01.15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. 2015. Deep Residual Learning for Image
URL: https://arxiv.org/pdf/1512.03385v1.pdf [As of: 15.01.2017]
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. 2016. Identity Mappings in Deep Residual
URL: https://arxiv.org/pdf/1603.05027v3.pdf [As of: 17.01.2017]
Karen Simonyan and Andrew Zisserman. 2015. VERY DEEP CONVOLUTIONAL NETWORKS FOR
LARGE-SCALE IMAGE RECOGNITION.
URL: https://arxiv.org/pdf/1409.1556v6.pdf [As of: 14.01.2017]
Karen Simonyan, Andrea Vedaldi, Andrew Zisserman. 2014. Deep Inside Convolutional Networks:
Visualising Image Classiﬁcation Models and Saliency Maps.
Karpathy, Andrej. 2016. CS231n Convolutional Neural Networks for Visual Recognition.
URL: http://cs231n.github.io/convolutional-networks/ [As of: 17.08.2016]
Kenneth Stanley. 2010. NEAT C++.
URL: http://nn.cs.utexas.edu/?neat-c [As of: 13.01.2017]
URL: https://cmake.org [As of: 17.01.2017]
Masakazu Matsugu, Katsuhiko Mori, Yusuke Mitari, Yuji Kaneda. 2003. Subject independent facial
expression recognition with robust face detection using a convolutional neural network.
face_expression_conv_nnet.pdf [As of: 30.11.2016]
Max Jaderberg, Andrea Vedaldi, Andrew Zisserman. 2014. Speeding up Convolutional Neural Networks
with Low Rank Expansions.
URL: https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/jaderberg14speeding.pdf [As of:
Microsoft. 2016. Interop Marshaling.
Min Lin, Qiang Chen, Shuicheng Yan. 2014. Network In Network.
URL: https://arxiv.org/pdf/1312.4400v3.pdf [As of: 16.01.2017]
Nielsen, Michael. 2016. Neural Networks and Deep Learning.
URL: http://neuralnetworksanddeeplearning.com/chap6.html [As of: 17.08.2016]
Philipp Krähenbühl, Carl Doersch, Jeﬀ Donahue, Trevor Darrell. 2016. DATA-DEPENDENT
INITIALIZATIONS OF CONVOLUTIONAL NEURAL NETWORKS.
[As of: 16.01.2017]
Preston-Werner, Tom. GitHub.
URL: https://github.com [As of: 11.01.2017]
Preston-Werner, Tom. GitHub Pricing.
URL: https://github.com/pricing [As of: 11.01.2017]
Raúl Rojas. 1996. Neural Networks.
URL: https://page.mi.fu-berlin.de/rojas/neural/neuron.pdf [As of: 16.01.2017]
Robert Hochberg. 2012. Matrix Multiplication with CUDA | A basic introduction to the CUDA
matrixMultiplication/moduleDocument.pdf [As of: 16.01.2017]
Saimadhu Polamuri. 2014. Supervised and Unsupervised learning.
URL: http://dataaspirant.com/2014/09/19/supervised-and-unsupervised-learning/ [As of:
Saimadhu Polamuri. 2014. Supervised and Unsupervised learning.
URL: http://dataaspirant.com/2014/09/19/supervised-and-unsupervised-learning/ [As of:
Shashi Sathyanarayana. 2014. A Gentle Introduction to Backpropagation.
[As of: 14.01.2017]
Stanley, Kenneth. 2002. Evolving Neural Networks through Augmenting Topologies.
URL: http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf [As of: 11.9.2016]
Tom Strassner. 2015. XML vs JSON.
Torvalds, Linus. git.
URL: https://git-scm.com [As of: 11.01.2017]
Torvalds, Linus. git branch.
URL: https://git-scm.com/docs/git-branch [As of: 11.01.2017]
Torvalds, Linus et al. Getting a Git Repository.
URL: https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository [As of:
Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haﬀner. 1998. GradientBased Learning Applied
to Document Recognition.
URL: http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf [As of: 16.01.2017]
Yuchen Zhang, Percy Liang, Martin J. Wainwright. 2016. Convexiﬁed Convolutional Neural Networks.
[As of: 16.01.2017]
nzhagen. 2016. bibulous.
Appveyor Systems Inc.. Appveyor Plans and Pricing.
URL: https://www.appveyor.com/pricing/ [As of: 15.01.2017]
GitHub, Inc.. Continuous integration.
URL: https://github.com/integrations/feature/continuous-integration [As of: 13.01.2017]
GitHub, Inc.. About pull request reviews.
URL: https://help.github.com/articles/about-pull-request-reviews/ [As of: 17.01.2017]
JetBrains s.r.o.. C++ Support.
URL: https://www.jetbrains.com/help/clion/2016.3/cpp_support.html [As of: 17.01.2017]
JetBrains s.r.o.. CLion.
URL: https://www.jetbrains.com/clion/ [As of: 17.01.2017]
The NetBSD Foundation. NetBSD FTP Server.
[As of: 17.08.2016]
Travis CI, GmbH. Travis CI Plans.
URL: https://travis-ci.com/plans [As of: 13.01.2017]
Travis CI, GmbH. Travis CI Plans.
URL: https://travis-ci.com/plans [As of: 13.01.2017]
University of South Florida. Digital Database for Screening Mammography.
URL: http://marathon.csee.usf.edu/Mammography/Database.html [As of: 17.08.2016]