ArticlePDF Available

Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks


Abstract and Figures

Biological neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence, but the complexity of the whole system of interactions is an obstacle to the understanding of the key factors at play. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks, artificial systems composed of sensors, outputs, and plastic components that change in response to sensory-output experiences in an environment. These systems may reveal key algorithmic ingredients of adaptation, autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now setting the conditions for radically new approaches and results. In particular, the limitations of hand-designed structures and algorithms currently used in most deep neural networks could be overcome by more flexible and innovative solutions. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main computational methods and results are reviewed. Finally, new opportunities and developments are presented.
Content may be subject to copyright.
Born to Learn: the Inspiration, Progress, and Future
of Evolved Plastic Artificial Neural Networks
Andrea Soltoggio, Kenneth O. Stanley, Sebastian Risi
Abstract—Biological neural networks are systems of extraor-
dinary computational capabilities shaped by evolution, develop-
ment, and lifelong learning. The interplay of these elements leads
to the emergence of biological intelligence. Inspired by such
intricate natural phenomena, Evolved Plastic Artificial Neural
Networks (EPANNs) employ simulated evolution in-silico to breed
plastic neural networks with the aim to autonomously design and
create learning systems. EPANN experiments evolve networks
that include both innate properties and the ability to change
and learn in response to experiences in different environments
and problem domains. EPANNs’ aims include autonomously
creating learning systems, bootstrapping learning from scratch,
recovering performance in unseen conditions, testing the com-
putational advantages of particular neural components, and de-
riving hypotheses on the emergence of biological learning. Thus,
EPANNs may include a large variety of different neuron types
and dynamics, network architectures, plasticity rules, and other
factors. While EPANNs have seen considerable progress over the
last two decades, current scientific and technological advances in
artificial neural networks are setting the conditions for radically
new approaches and results. Exploiting the increased availability
of computational resources and of simulation environments,
the often challenging task of hand-designing learning neural
networks could be replaced by more autonomous and creative
processes. This paper brings together a variety of inspiring
ideas that define the field of EPANNs. The main methods and
results are reviewed. Finally, new opportunities and possible
developments are presented.
Index Terms—Artificial Neural Networks, Lifelong Learning,
Plasticity, Evolutionary Computation.
Over the course of millions of years, evolution has led to the
emergence of innumerable biological systems, and intelligence
itself, crowned by the evolution of the human brain. Evolution,
development, and learning are the fundamental processes that
underpin biological intelligence. Thus, it is no surprise that
scientists have tried to engineer artificial systems to reproduce
such phenomena (Sanchez et al., 1996; Sipper et al., 1997;
Dawkins, 2003). The fields of artificial intelligence (AI) and
artificial life (AL) (Langton, 1997) are inspired by nature and
biology in their attempt to create intelligence and forms of
life from human-designed computation: the main idea is to
abstract the principles from the medium, i.e., biology, and
utilize such principles to devise algorithms and devices that
reproduce properties of their biological counterparts.
One possible way to design complex and intelligent sys-
tems, compatible with our natural and evolutionary history,
Department of Computer Science, Loughborough University, LE11 3TU,
Loughborough, UK,
Department of Computer Science, University of Central Florida, Orlando,
IT University of Copenhagen, Copenhagen, Denmark,
is to simulate natural evolution in-silico, as in the field of
evolutionary computation (Holland, 1975; Eiben and Smith,
2015). Sub-fields of evolutionary computation such as evo-
lutionary robotics (Harvey et al., 1997; Nolfi and Floreano,
2000), learning classifier systems (Lanzi et al., 2003; Butz,
2015), and neuroevolution (Yao, 1999) specifically research
algorithms that, by exploiting artificial evolution of physical,
computational, and neural models, seek to discover principles
behind intelligent and learning systems.
In the past, research in evolutionary computation, particu-
larly in the area of neuroevolution, was predominantly focused
on the evolution of static systems or networks with fixed
neural weights: evolution was seen as an alternative to learning
rules to search for optimal weights in an artificial neural
network (ANN). Also, in traditional and deep ANNs, learning
is often performed during an initial training phase, so that
weights are static when the network is deployed. Recently,
however, inspiration has originated more strongly from the
fact that intelligence in biological organisms considerably
relies on powerful and general learning algorithms, designed
by evolution, that are executed during both development and
continuously throughout life.
As a consequence, the field of neuroevolution is now pro-
gressively moving towards the design and evolution of lifelong
learning plastic neural systems, capable of discovering learn-
ing principles during evolution, and thereby able to acquire
knowledge and skills through the interaction with the envi-
ronment (Coleman and Blair, 2012). This paper reviews and
organizes the field that studies evolved plastic artificial neural
networks, and introduces the acronym EPANN. EPANNs are
evolved because parts of their design are determined by an
evolutionary algorithm; they are plastic because parts of their
structures or functions, e.g. the connectivity among neurons,
change at various time scales while experiencing sensory-
motor information streams. The final capabilities of such
networks are autonomously determined by the combination
of evolved genetic instructions and learning that takes place
as the network interacts with an environment.
EPANNs’ ambitious motivations and aims, centered on the
autonomous discovery and design of learning systems, also
entail a number of research problems. One problem is how
to set up evolutionary experiments that can discover learning,
and then to understand the subsequent interaction of dynamics
across the evolutionary and learning dimensions. A second
open question concerns the appropriate neural model abstrac-
tions that may capture essential computational principles to
enable learning and, more generally, intelligence. One further
problem is the size of very large search spaces, and the high
arXiv:1703.10371v3 [cs.NE] 8 Aug 2018
computational cost required to simulate even simple models
of lifelong learning and evolution. Finally, experiments to
autonomously discover intelligent learning systems have a
wide range of performance metrics, as their objectives are
sometimes loosely defined as the increase of behavioral com-
plexity, intelligence, adaptability, evolvability (Miconi, 2008),
and general learning capabilities (Tonelli and Mouret, 2011).
Thus, EPANNs explore a larger search space, and address
broader research questions, than machine learning algorithms
specifically designed to improve performance on well-defined
and narrow problems.
The power of EPANNs, however, derives from two au-
tonomous search processes: evolution and learning, which
arguably place them among the most advanced AI and machine
learning systems in terms of open-endedness, autonomy, po-
tential for discovery, creativity, and human-free design. These
systems rely the least on pre-programmed instructions because
they are designed to autonomously evolve while interacting
with a real or simulated world. Plastic networks, in particular
recurrent plastic networks, are known for their computational
power (Cabessa and Siegelmann, 2014): evolution can be a
valuable tool to explore the power of those computational
In recent years, progress in a number of relevant areas has
set the stage for renewed advancements of EPANNs: ANNs,
in particular deep networks, are becoming increasingly more
successful and popular; there has been a remarkable increase
in available computational power by means of parallel GPU
computing and dedicated hardware; a better understanding of
search, complexity, and evolutionary computation allows for
less naive approaches; and finally, neuroscience and genetics
provide us with an increasingly large set of inspirational
principles. This progress has changed the theoretical and
technological landscape in which EPANNs first emerged,
providing greater research opportunities than in the past.
Despite a considerable body of work, research in EPANNs
has never been unified through a single description of its
motivations and inspiration, achievements and ambitions. This
paper aims firstly to outline the inspirational principles that
motivate EPANNs (Section II). The main properties and aims
of EPANNs, and suitable evolutionary algorithms are pre-
sented in Section III. The body of research that advanced
EPANNs is brought together and described in Section IV. Fi-
nally, the paper outlines new research directions, opportunities,
and challenges for EPANNs (Section V).
EPANNs are inspired by a particularly large variety of ideas
from biology, computer science, and other areas (Floreano
and Mattiussi, 2008; Downing, 2015). It is also the nature of
inspiration to be subjective, and some of the topics described
in this section will resonate differently to different readers. We
will touch upon large themes and research areas with the intent
to provide the background and motivations to introduce the
properties, the progress, and the future directions of EPANNs
in the remainder of the paper.
The precise genetic make-up of an organism, acquired
through millions of years of evolution, is now known to
determine the ultimate capabilities of complex biological
neural systems (Deary et al., 2009; Hopkins et al., 2014):
different animal species manifest different levels of skills
and intelligence because of their different genetic blueprint
(Schreiweis et al., 2014). The intricate structure of the brain
emerges from one single zygote cell through a developmental
process (Kolb and Gibb, 2011), which is also strongly affected
by input-output learning experiences throughout early life
(Hensch et al., 1998; Kolb and Gibb, 2011). Yet high levels
of plasticity are maintained throughout the entire lifespan
(Merzenich et al., 1984; Kiyota, 2017). These dimensions,
evolution,development and learning, also known as the phylo-
genetic (evolution), ontogenetic (development) and epigenetic
(learning) (POE) dimensions (Sipper et al., 1997), are essential
for the emergence of biological plastic brains.
The POE dimensions lead to a number of research ques-
tions. Can artificial intelligence systems be entirely engineered
by humans, or do they need to undergo a less human-controlled
process such as evolution? Do intelligent systems need to
learn, or could they be born already knowing? Is there an
optimal balance between innate and acquired knowledge?
Opinions and approaches are diverse. Additionally, artificial
systems do not need to implement the same constraints and
limitations as biological systems (Bullinaria, 2003). Thus,
inspiration is not simple imitation.
EPANNs assume that both evolution and learning, if not
strictly necessary, are conducive to the emergence of a strongly
bio-inspired artificial intelligence. While artificial evolution is
justified by the remarkable achievements of natural evolution,
the role of learning has gathered significance in recent years.
We are now more aware of the high level of brain plasticity,
and its impact on the manifestation of behaviors and skills
(LeDoux, 2003; Doidge, 2007; Grossberg, 2012). Concur-
rently, recent developments in machine learning (Michalski
et al., 2013; Alpaydin, 2014) and neural learning (Deng
et al., 2013; LeCun et al., 2015; Silver et al., 2016), have
highlighted the importance of learning from large input-output
data and extensive training. Other areas of cognition such as
the capabilities to make predictions (Hawkins and Blakeslee,
2007), to establish associations (Rescorla, 2014) and to reg-
ulate behaviors (Carver and Scheier, 2012) are also based on
learning from experience. Interestingly, skills such as reading,
playing a musical instrument, or driving a car, are mastered
even if none of those behaviors existed during evolutionary
time, and yet they are mostly unique to humans. Thus, human
genetic instructions have evolved not to learn specific tasks,
but to synthesize recipes to learn a large variety of general
skills. We can conclude that the evolutionary search of learning
mechanisms in EPANNs tackles both the long-running nature
vs. nurture debate (Moore, 2003), and the fundamental AI
research that studies learning algorithms. This review focuses
on evolution and learning, and less on development, which
can be interpreted as a form of learning if affected by sensory-
motor signals. We refer to Stanley and Miikkulainen (2003)
for an overview of artificial developmental theories.
Whilst the range of inspiring ideas is large and heteroge-
neous, the analysis in this review proposes that such ideas can
be grouped under the following areas:
natural and artificial evolutionary processes,
plasticity in biological neural networks,
plasticity in artificial neural networks, and
natural and artificial learning environments.
Figure 1 graphically summarizes the topics described in sec-
tions II-A-II-D from which EPANNs take inspiration.
A. Natural and artificial evolutionary processes
A central idea in evolutionary computation (Goldberg and
Holland, 1988) is that evolutionary processes, similar to those
that occurred in nature during the course of billions of years
(Darwin, 1859; Dobzhansky, 1970), can be simulated with
computer software. This idea led to the belief that intelligent
computer programs could emerge with little human interven-
tion by means of evolution in-silico (Holland and Reitman,
1977; Koza, 1992; Fogel, 2006).
The emergence of evolved intelligent software, however, did
not occur as easily as initially hoped. The reasons for the slow
progress are not completely understood, but a number of prob-
lems have been identified, likely related to the simplicity of the
early implementations of evolutionary algorithms and the high
computational requirements. Current topics of investigation
focus on levels of abstraction, diversity in the population,
selection criteria, the concepts of evolvability and scalability
(Wagner and Altenberg, 1996; Pigliucci, 2008; Lehman and
Stanley, 2013), the encoding of genetic information through
the genotype-phenotype mapping processes (Wagner and Al-
tenberg, 1996; Hornby et al., 2002), the deception of fitness
objectives, and how to avoid them (Lehman and Stanley, 2008;
Stanley and Lehman, 2015). It is also not clear yet which
stepping stones were most challenging for natural evolution
(Roff, 1993; Stanley and Lehman, 2015) in the evolutionary
path to intelligent and complex forms of life. This lack
of knowledge highlights that our understanding of natural
evolutionary processes is incomplete, and thus the potential
to exploit computational methods is not fully realized. In
particular, EPANN research is concerned with those evolution-
ary algorithms that allow the most creative, open-ended and
scalable design. Effective evolutionary algorithms and their
desirable features for EPANNs are detailed later in Section
B. Plasticity in biological neural networks
Biological neural networks demonstrate lifelong learning,
from simple reflex adaptation to the acquisition of astonishing
skills such as social behavior and learning to speak one or
more languages. Those skills are acquired through experienc-
ing stimuli and actions and by means of learning mechanisms
not yet fully understood (Bear et al., 2007). The brief overview
here outlines that adaptation and learning strongly rely on
neural plasticity, understood as “the ability of neurons to
change in form and function in response to alterations in their
environment” (Kaas, 2001).
The fact that experiences guide lifelong learning was exten-
sively documented in the works of behaviorism by scientists
such as Thorndike (1911), Pavlov (1927), Skinner (1938,
1953), and Hull (1943) who started to test scientifically how
experiences cause a change in behavior, in particular as a result
of learning associations and observable behavioral patterns
(Staddon, 1983). This approach means linking behavior to
brain mechanisms and dynamics, an idea initially entertained
by Freud (Køppe, 1983) and later by other illustrious scientists
(Hebb, 1949; Kandel, 2007). A seminal contribution to link
psychology to physiology came from Hebb (1949), whose
principle that neurons that fire together, wire together is
relevant to understanding both low level neural wiring and
high level behaviors (Doidge, 2007). Much later, a Hebbian-
compatible rule that regulates synaptic changes according to
the firing times of the presynaptic and postsynaptic neurons
was observed by Markram et al. (1997) and named Spike-
Timing-Dependent Plasticity (STDP).
The seminal work of Kandel and Tauc (1965), and following
studies (Clark and Kandel, 1984), were the first to demonstrate
that changes in the strength of connectivity among neurons,
i.e. plasticity, relates to behavior learning. Walters and Byrne
(1983) showed that, by means of plasticity, a single neuron can
perform associative learning such as classical conditioning, a
class of learning that is observed in simple neural systems such
as that of the Aplysia (Carew et al., 1981). Plasticity driven
by local neural stimuli, i.e. compatible with the Hebb synapse
(Hebb, 1949; Brown et al., 1990), is responsible not only for
fine tuning, but also for building a working visual system in
the cat’s visual cortex (Rauschecker and Singer, 1981).
Biological plastic neural networks are also capable of struc-
tural plasticity, which creates new pathways among neurons
(Lamprecht and LeDoux, 2004; Chklovskii et al., 2004; Russo
et al., 2010): it occurs primarily during development, but there
is evidence that it continues well into adulthood (Pascual-
Leone et al., 2005). Axon growth, known to be regulated by
neurotrophic nerve growth factors (Tessier-Lavigne and Good-
man, 1996), was also modeled computationally in Roberts
et al. (2014). Developmental processes and neural plasticity
are often indistinguishable (Kolb, 1989; Pascual-Leone et al.,
2005) because the brain is highly plastic during develop-
ment. Neuroscientific advances reviewed in Damasio (1999);
LeDoux (2003); Pascual-Leone et al. (2005); Doidge (2007);
Draganski and May (2008) outline the importance of structural
plasticity in learning motor patterns, associations, and ways of
thinking. Both structural and functional plasticity in biology
are essential to acquiring long-lasting new skills, and for this
reason appears to be an important inspiration for EPANNs.
Finally, an important mechanism for plasticity and behav-
ior is neuromodulation (Marder and Thirumalai, 2002; Gu,
2002; Bailey et al., 2000). Modulatory chemicals such as
acetylcholine (ACh), norepinephrine (NE), serotonin (5-HT)
and dopamine (DA) appear to regulate a large variety of
neural functions, from arousal and behavior (Harris-Warrick
and Marder, 1991; Hasselmo and Schnell, 1994; Marder, 1996;
Katz, 1995; Katz and Frost, 1996), to pattern generation (Katz
et al., 1994), to memory consolidation (Kupfermann, 1987;
Hasselmo, 1995; Marder, 1996; Hasselmo, 1999). Learning by
reward in monkeys was linked to dopaminergic activity during
the 1990s with studies by Schultz et al. (1993, 1997); Schultz
(1998). For these reasons, neuromodulation is considered an
essential element in cognitive and behavioral processes, and
levels of abstractions
lifetime learning
environments (sec 2.4)
natural and artificial
evolutionary processes (sec 2.1)
biological plastic
neural networks (sec 2.2)
artificial plastic
neural networks (sec 2.3)
Evolved Plastic
Artificial Neural Networks
search and
models of
pathways and
learning &
behavior and brain functions
brain functions
reinforcement learning
neural learning architectures (deep, recurrent, etc)
in games
control problems
cognitive problems
models of plasticity (e.g.
Hebb, STDP, etc)
evolution neuromodulation
structural plasticity
catastrophic forgetting
stability/plasticity dilemma
offline learning
Figure 1: EPANN inspiration principles described in Section 2. The figure brings together the variety of inspirational topics
and areas with no pretense of taxonomic completeness.
has been the topic of a considerable amount of work in
EPANNs (Section IV-E).
This compact overview suggests that neural plasticity en-
compasses an important set of mechanisms, regulated by a
rich set of signals and dynamics currently mostly ignored in
ANNs. Thus, EPANNs can be used to explore, via evolutionary
search, the potential of plasticity and to answer questions such
as: (1) How does a brain-like structure form—driven both by
genetic instructions and neural activity—and acquire functions
and behaviors? (2) What are the key plasticity mechanisms
from biology that can be applied to artificial systems such
as EPANNs? (3) Can memories, skills, and behaviors be
stored in plastic synaptic connections, in patterns of activities,
or in a combination of both? Whilst neuroscience continues
to provide inspiration and insight into plasticity in biolog-
ical brains, EPANNs serve the complementary objective of
seeking, implementing, and verifying designs of bio-inspired
methods for adaptation, learning, and intelligent behavior.
C. Plasticity in artificial neural networks
In EPANN experiments, evolution can be seen as a meta-
learning process. Thus, established learning rules for ANNs
are often used as ingredients that evolution uses to search
for good parameter configurations, efficient combinations of
rules and network topologies, new functions representing novel
learning rules, etc. EPANN experiments are suited to include
the largest possible variety of rules because of (1) the variety of
possible tasks in a simulated behavioral experiment and (2) the
flexibility of evolution to combine rules with no assumptions
about their dynamics. The following gives a snapshot of the
extent and scope of various learning algorithms for ANN that
can be used as building blocks of EPANNs.
In supervised learning, backpropagation is the most popular
learning rule used to train both shallow and deep networks
(Rumelhart et al., 1988; Widrow and Lehr, 1990; LeCun et al.,
2015) for classification or regression. Unsupervised learning
is implemented in neural networks with self-organizing maps
(SOM) (Kohonen, 1982, 1990), auto-encoders (Bourlard and
Kamp, 1988), restricted Boltzmann machines (RBM) (Hinton
and Salakhutdinov, 2006), Hebbian plasticity (Hebb, 1949;
Gerstner and Kistler, 2002a; Cooper, 2005), generative adver-
sarial networks (Goodfellow et al., 2014), and various combi-
nations of the above. RBM learning is considered related to the
free-energy principle, proposed by Friston (2009) as a central
principle governing learning in the brain. Hebbian rules, in
particular, given their biological plausibility and unsupervised
learning, are a particularly important inspirational principle for
EPANNs. Variations (Willshaw and Dayan, 1990) have been
proposed to include, e.g., terms to achieve stability (Oja, 1982;
Bienenstock et al., 1982) and various constraints (Miller and
Mackay, 1994), or more advanced update dynamics such as
dual weights for fast and slow decay (Levy and Bairaktaris,
1995; Hinton and Plaut, 1987; Bullinaria, 2009a; Soltoggio,
2015). Hebbian rules have been recently proposed to minimize
defined cost functions (Pehlevan et al., 2015; Bahroun et al.,
2017), and more advanced systems have used backpropagation
as meta-learning to tune Hebbian rules (Miconi et al., 2018).
Neuromodulated plasticity (Fellous and Linster, 1998) is
often used to implement reward-learning in neural networks.
Such a modulation of signals, or gated learning (Abbott, 1990),
allows for amplification or reduction of signals and has been
implemented in numerous models (Baxter et al., 1999; Suri
et al., 2001; Birmingham, 2001; Alexander and Sporns, 2002;
Doya, 2002; Fujii et al., 2002; Suri, 2002; Ziemke and Thieme,
2002; Sporns and Alexander, 2003; Krichmar, 2008).
Plastic neural models are also used to demonstrate how
behavior can emerge from a particular circuitry modeled
after biological brains. Computational models of, e.g., the
basal ganglia and modulatory systems may propose plastic-
ity mechanisms and aim to demonstrate the computational
relations among various nuclei, pathways, and learning pro-
cesses (Krichmar, 2008; Vitay and Hamker, 2010; Schroll and
Hamker, 2015).
Finally, plasticity rules for spiking neural networks (Maass
and Bishop, 2001) aim to demonstrate unique learning mech-
anisms that emerge from spiking dynamics (Markram et al.,
1997; Izhikevich, 2006, 2007), as well as model biological
synaptic plasticity (Gerstner and Kistler, 2002b).
Plasticity in neural networks, when continuously active, was
also observed to cause catastrophic forgetting (Robins, 1995).
If learning occurs continuously, new information or skills have
the potential to overwrite previously acquired information or
skills, a problem also known as plasticity-stability dilemma
(Abraham and Robins, 2005; Finnie and Nader, 2012).
In conclusion, a large range of plasticity rules for neural
networks have been proposed to solve different problems. In
the majority of cases, a careful matching and engineering
of rules, architectures and problems is necessary, requiring
considerable design effort. The variety of algorithms also
reflects the variety of problems and solutions. One key aspect
is that EPANN systems can effectively play with all possible
plasticity rules to offer a unique testing tool and assess
the effectiveness and suitability of different models, or their
combination, in a variety of different scenarios.
D. Lifelong learning environments
One aspect of EPANNs is that they can continuously
improve and adapt both at the evolutionary scale and at the
lifetime scale in a virtually unlimited range of problems.
Natural environments are an inspiration for EPANNs because
organisms have evolved to adapt to, and learn in, a variety
of conditions. Fundamental questions are: what makes an
environment conducive to the evolution of learning and intelli-
gence? What are the challenges faced by learning organisms in
the natural world, and how does biological learning cope with
those? How can those challenges be abstracted and ported to
a simulated environment for EPANNs? EPANNs employ life-
long learning environments in the attempt to provide answers
to such questions.
In the early phases of AI, logic and reasoning were thought
to be the essence of intelligence (Cervier, 1993), so symbolic
input-output mappings were employed as tests. Soon it became
evident that intelligence is not only symbol manipulation, but
resides also in subsymbolic problem solving abilities emerging
from the interaction of brain, body, and environment (Steels,
1993; Sims, 1994). More complex simulators of real-life en-
vironments and body-environment interaction were developed
to better represent the enactivist philosophy (Varela et al.,
2017) and cognitive theories on the emergence of cognition
(Butz and Kutter, 2016). Other environments focus on high-
level planning and strategies required, e.g., when applying AI
to games (Allis et al., 1994; Millington and Funge, 2016)
or articulated robotic tasks. Planning and decision making
with high bandwidth sensory-motor information flow such
as those required for humanoid robots or self-driving vehi-
cles are current benchmarks for lifelong learning systems.
Finally, environments in which affective dynamics and feelings
play a role are recognized as important for human well
being (De Botton, 2016; Lee and Narayanan, 2005). Those
intelligence-testing environments are effectively the “worlds”
in which EPANNs may evolve and live in embodied forms,
and thus largely shape the EPANN design process.
Such different testing environments have very different
features, dynamics, and goals that fall into different machine
learning problems. For example, supervised learning can be
mapped to a fitness function when a precise target behavior
exists and is known. If it is useful to find relationships and
regularities in the environment, unsupervised learning, repre-
sentation learning, or modularity can be evolved (Bullinaria,
2007b). If the environment provides rewards, the objective
may be to search for behavioral policies that lead to collecting
rewards: algorithms specifically designed to do so are called
reinforcement learning (Sutton and Barto, 1998). While re-
inforcement learning maximizes a reward or fitness, recent
advances in evolutionary computation (Lehman and Stanley,
2011; Stanley and Lehman, 2015) suggest that it is not always
the fittest, but at times it is the novel individual or behavior
that can exploit environmental niches, thus leading to creative
evolutionary processes similar to those observed in nature.
Temporal dynamics, i.e. when a system requires to behave over
time according to complex dynamics, need different compu-
tational structures from functions with no temporal dynamics.
This case is typical for EPANN experiments that may exhibit
a large variety of time scales in complex behavioral tasks.
With traditional approaches, all those different cases require
careful manual design to solve each problem. In contrast,
the evolution in EPANNs can be designed to address most
problems by mapping a measure of success to a fitness value,
thus searching for solutions in an increasingly large variety of
problems and environments. In conclusion, lifelong learning
environments of different types can be used with EPANNs to
explore innovative and creative solutions with limited human
intervention and design.
Having introduced the inspirational principles of EPANNs,
we now propose: a list of primary properties that define
EPANNs (Section III-A); the principal aims of EPANN studies
(Section III-B); and a list of desired properties of EAs for
EPANNs (Section III-C).
A. EPANN properties
EPANNs, as formalized in this review, are defined as
artificial neural networks with the following properties:
Property 1 - Evolution:Parts of an EPANN are determined
by an evolutionary algorithm. Inspired by natural and artificial
evolution (Section II-A), such search dynamics in EPANNs
implement a design process.
Property 2 - Plasticity:Parts of the functions that process
signals within the network change in response to signals
propagated through the network, and those signals are at least
partially affected by stimuli. Inspired by biological findings
on neural plasticity (Section II-B) and empowered by the
effectiveness of plasticity in neural models (Section II-C),
EPANNs either include such mechanisms or are set up with
the conditions to evolve them.
Property 3 - Discovery of learning:Property 1 and 2 are
implemented to discover, through evolution, learning dynamics
within an artificial neural network. Thus, an EPANN uses
both evolution and plasticity in synergy to achieve learning.
Such a property can be present in different degrees, from the
highest degree in which no learning occurs before evolution
and it is therefore discovered from scratch, to the lowest
degree in which learning is fine-tuned and optimized, e.g,
when evolution is seeded with proven learning structures.
Given the very diverse interpretations of learning in different
domains, we refer to Michalski et al. (2013) for an overview,
or otherwise assume the general machine learning definition
by Michalski et al. (2013)1.
Property 4 - Generality:Properties 1 to 3 are independent
from the learning problem(s) and from the plasticity mecha-
nism(s) that are implemented or evolved in an EPANN. Ex-
ploiting the flexibility of evolution and learning, (1) EPANNs
can evolve to solve problems of different nature, complexity,
and time scales (Section II-D); (2) EPANNs are not limited
to specific learning dynamics because often it is the aim of
the experiment to discover the learning mechanism throughout
evolution and interaction with the environment.
In summary, the EPANNs’ properties indicate that within
simple assumptions, i.e., using plasticity and evolution,
EPANNs are set to investigate the design of learning in creative
ways for a large variety of learning problems.
B. Aims
Given the above properties, EPANN experiments can be set
up to achieve the following aims.
Aim 1:Autonomously design learning systems: in an
EPANN experiment, it is essential to delegate some design
choices of a learning system to the evolutionary process, so
that the design is not entirely determined by the human expert
and can be automated. The following sub-aims can then be
Aim 1.1:Bootstrap of learning from scratch: in an EPANN
experiment, it may be desirable to initialize the system with
no learning capabilities before evolution takes place, so that
the best learning dynamics for a given environment is evolved
rather than human-designed.
Aim 1.2:Optimize performance: as opposed to Aim 1.1, it
may be desirable to initialize the system with well know learn-
ing capabilities, so that evolution can autonomously optimize
the system, e.g., for final performance after learning.
Aim 1.3:Recover performance in unseen conditions: in an
EPANN experiment, the desired outcome may be to enable the
learning system to autonomously evolve from solving a set of
problems to another set without human intervention.
Aim 2:Test the computational advantages of particular neu-
ral components: the aim of an EPANN experiment might be to
test whether particular neural dynamics or components have an
evolutionary advantage when implementing particular learning
functions. The presence of particular neural component may
be fostered by evolutionary selection.
1A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at tasks
in T, as measured by P, improves with experience E.
Aim 3:Derive hypotheses on the emergence of biological
learning: an aim may be to draw similarities or suggest
hypotheses on how learning evolved in biological systems,
particularly in combination with Aim 1.1 (bootstrap of learn-
Aim 1 is always present in any EPANN because it derives
from the EPANN properties. The other aims may be present
in different EPANN studies and can be expanded into more
detailed and specific research hypotheses.
C. Evolutionary algorithms for EPANNs
In contrast to parameter optimization (B¨
ack and Schwefel,
1993) in which search spaces are often of fixed dimension
and static, EPANNs evolve in dynamic search spaces in which
learning further increases the complexity of the evolutionary
search and of the problem itself. Evolutionary algorithms
(Holland, 1975; Michalewicz, 1994) for EPANNs often require
additional advanced features to cope with the challenges of the
evolution of learning and open evolutionary design (Bentley,
1999). The analysis in this review suggests that evolutionary
algorithms (EAs) for EPANNs may implement the following
desirable properties.
1) Variable genotype length and growing complexity: for
some learning problems, the size and properties of a network
that can solve them are not known in advance. Therefore, a
desirable property of the EA for EPANNs is that of increasing
the length of the genotype, and thus the information contained
in it, as evolution may discover increasingly more complex
strategies and solutions that may require larger networks (see,
e.g., Stanley and Miikkulainen (2002)).
2) Indirect genotype to phenotype encoding: in nature,
phenotypes are expressions of a more compact representation:
the genetic code. Similarly, EAs may represent genetic infor-
mation in a compact form, which is then mapped to a larger
phenotype. Although EPANNs do not require such a property,
such an approach promises better scalability to large networks
(see, e.g., Risi and Stanley (2010)).
3) Expressing regularities, repetitions, and patterns: in-
direct encodings are beneficial when they can use one set
of instructions in the genotype to generate more parts in
the phenotype. This may involve expressing regularities like
symmetry (e.g. symmetrical neural architectures), repetition
(e.g. neural modules), repetition with variation (similar neural
modules), and patterns, e.g., motifs in the neural architecture
(see, e.g., (Stanley, 2007)).
4) Effective exploration via mutation and recombination:
genetic mutation and sexual reproduction in nature allow
for the expression of a variety of phenotypes and for the
exploration of new solutions, but seldom lead to highly unfit
individuals (Ay et al., 2007). Similarly, EAs for EPANNs
need to be able to effectively mutate and recombine genomes
without destroying the essential properties of the solutions.
EAs may use recombination to generate new solutions from
two parents: how to effectively recombine genetic information
from two EPANNs is still an open question (see, e.g., track-
ing genes through historical marking in NEAT (Stanley and
Miikkulainen, 2002)).
5) Genetic encoding of plasticity rules: just as neural net-
works need a genetic encoding to be evolved, so do plasticity
rules. EPANN algorithms require the integration of such a
rule in the genome. The encoding may be restricted to a
simple parameter search, or evolution may search a larger
space of arbitrary and general plasticity rules. Plasticity may
also be applied to all or parts of the network, thus effectively
implementing the evolution of learning architectures.
6) Diversity, survival criteria, and low selection pressure:
the variety of solutions in nature seems to suggest that diversity
is a key aspect of natural evolution. EAs for EPANNs are
likely to perform better when they can maintain diversity in
the population, both at the genotype and phenotype levels.
Local selection mechanisms were shown to perform well in
EPANN experiments (Soltoggio, 2008c). Niche exploration
and behavioral diversity (Lehman and Stanley, 2011) could
also play a key role for creative design processes. Low
selection pressure and survival criteria might be crucial to
evolve learning in deceptive environments (see Section IV-D).
7) Open-ended evolution: evolutionary optimization aims
to quickly optimize parameters and reach a fitness plateau.
On the contrary, EAs for EPANNs often seek a more open-
ended evolution that can evolve indefinitely more complex
solutions given sufficient computational power (Taylor et al.,
2016; Stanley et al., 2017).
8) Implementations: many EAs include one or more of
these desirable properties (Fogel, 2006). Due to the complexity
of neural network design, the field of neuroevolution was the
first to explore most of those extensions of standard evolu-
tionary algorithms. Popular algorithms include early work of
Angeline et al. (1994) and Yao and Liu (1997) to evolve
fixed weights (i.e. weights that do not change while the agent
interacts with its environment) and the topology of arbitrary
neural networks, e.g., recurrent (addressing III-C1 and III-C4).
Neuroevolution of Augmenting Topologies (NEAT) (Stanley
and Miikkulainen, 2002) leverages three main aspects: a
recombination operator intended to preserve network function
(addressing III-C4); speciation (addressing III-C6); and evolu-
tion from small to larger networks (addressing III-C1). Simi-
larly, EPANN-tailored EAs in Soltoggio (2008c) employ local
selection mechanisms to maintain diversity. Analog Genetic
Encoding (AGE) (Mattiussi and Floreano, 2007) is a method
for indirect genotype to phenotype mapping that can be used
in combination with evolution to design arbitrary network
topologies (addressing III-C1 and III-C2) and was used with
EPANNs. HyperNEAT (Stanley et al., 2009) is an indirect
representation method that combines the NEAT algorithm with
compositional patterns producing networks (CPPN) (Stanley,
2007) (III-C1, III-C2 and III-C3). Novelty search (Lehman and
Stanley, 2008, 2011) was introduced as an alternative to the
survival of the fittest as a selection mechanism (III-C6). Ini-
tially, the majority of these neuroevolution algorithms were not
devised to evolve plastic networks, but rather fixed networks in
which the final synaptic weights were encoded in the genome.
To operate with EPANNs, these algorithms need to integrate
additional genotypical instructions to evolve plasticity rules
By adding these EA features (III-C1 - III-C7) to standard
evolutionary algorithms, EPANNs aim to search extremely
large search spaces in fundamentally different and more cre-
ative ways than traditional heuristic searches of parameters
and hyper-parameters.
The process of evolving plastic neural networks is depicted
in Fig. 2.
This section reviews studies that have evolved plastic neural
networks (EPANNs). The survey is divided into six sections
that mirror the analysis of the field up to this point: the
evolution of plasticity rules, the evolution of neural archi-
tectures, EPANNs in evolutionary robotics, the evolutionary
discovery of learning, the evolution of neuromodulation, and
the evolution of indirectly encoded plasticity. Accordingly,
Figure 3 provides our perspective on the organization of the
field, reflected in the structure of this paper.
A. Evolving plasticity rules
Early EPANN experiments evolved the parameters of learn-
ing rules for fixed or hand-designed ANN architectures. Learn-
ing rules are functions that change the connection weight w
between two neurons, and are generally expressed as
w=f(x, θ),(1)
where xis a vector of neural signals and θis a vector of fixed
parameters that can be searched by evolution. The incoming
connection weights wto a neuron iare used to determine the
activation value
xi=σX(wji ·xj),(2)
where xjare the activation values of presynaptic neurons
that connect to neuron iwith the weights wji , and σis
a nonlinear function such as the sigmoid or the hyperbolic
tangent. The vector xmay provide local signals such as pre
and postsynaptic activities, the value of the weight w, and
modulatory or error signals.
Bengio et al. (1990, 1992) proposed the optimization of the
parameters θof generic learning rules with gradient descent,
simulated annealing, and evolutionary search for problems
such as conditioning, boolean function mapping, and classi-
fication. Those studies are also among the first to include a
modulatory term in the learning rules. The optimization was
shown to improve the performance in those different tasks with
respect to manual parameter settings. Chalmers (1990) evolved
a learning rule that applied to every connection and had a
teaching signal. He found that, in 20% of the evolutionary
runs, the algorithm rediscovered, through evolution, the well-
known delta rule, or Widrow-Hoff rule (Widrow et al., 1960),
used in backpropagation, thereby demonstrating the validity of
evolution as an autonomous tool to discover learning. Fonta-
nari and Meir (1991) used the same approach of Chalmers
(1990) but constrained the weights to binary values. Also
in this case, evolution autonomously rediscovered a hand-
designed rule, the directed drift rule by Venkatesh (1993).
They also observed that the performance on new tasks was
of neural
speciation /
pool of genes mapping EPANN
of evolution and
of plasticity
genotype to
of patterns
sensory-motor experiences
possible features for the evolutionary
setup and EPANN-genotype
possible features of an EPANN-phenotype
fitness / objective
via EPANN-specific
during lifetime
acquired skills
Figure 2: Main elements of an EPANN setup in which simulated evolution (left) and an environment (right) allow for an
EPANN (center) to evolve through generations and learn within a lifetime in the environment. Possible features of the genotype,
evolutionary process and phenotype are illustrated as an example.
better when the network evolved on a larger set of tasks,
possibly encouraging the evolution of more general learning
With backpropagation of errors (Widrow and Lehr, 1990),
the input vector xof Eq. 1 requires an error signal between
each input/output pair. In contrast, rules that use only local
signals have been a more popular choice for EPANNs, though
this is changing with the rise in effectiveness of deep learning
using back-propagation and related methods. In the simplest
form, the product of presynaptic (xj) and postsynaptic (xi)
activities, and a learning rate η
is known as Hebbian plasticity (Hebb, 1949; Cooper, 2005).
More generally, any function as in Eq. 1 that uses only local
signals is considered a local plasticity rule for unsupervised
learning. Baxter (1992) evolved a network that applied the
basic Hebbian rule in Eq. 3 to a subset of weights (determined
by evolution) to learn four functions of one variable. The
network, called Local Binary Neural Net (LBNN), evolved
to change its weights to one of two possible values (±1),
or have fixed weights. The experiment proved that learning
can evolve when rules are optimized and applied to individual
Nolfi and Parisi (1993) evolved networks with “auto-
teaching” inputs, which could then provide an error signal for
the network to adjust weights during lifetime. The implication
is that error signals do not always need to be hand-designed but
can be discovered by evolution to fit a particular problem. A
set of eight different local rules was used in Rolls and Stringer
(2000) to investigate the evolution of rules in combination with
the number of synaptic connections for each neuron, different
neuron classes, and other network parameters. They found
that evolution was effective in selecting specific rules from
a large set to solve simple linear problems. In Maniadakis
and Trahanias (2006), co-evolution was used to evolve agents
(each being a network) that could use ten different types
of Hebbian-like learning rules for simple navigation tasks:
the authors reported that, despite the increase in the search
space, using many different learning rules results in better
performance but, understandably, a more difficult analysis of
the evolved systems. Meng et al. (2011) evolved a gene regu-
latory network that in turn determined the learning parameters
of the Bienenstock-Cooper-Munro (BCM) rule (Bienenstock
et al., 1982), showing promising performance in time series
classification and other supervised tasks.
One general finding from these studies is that evolution
operates well within large search spaces, particularly when
a large set of evolvable rules is used.
B. Evolving learning architectures
The interdependency of learning rules and neural architec-
tures led to experiments in which evolution had more freedom
on the network’s design. The evolution of architectures in
ANNs may involve searching an optimal number of hid-
den neurons, the number of layers in a network, particular
topologies or modules, the type of connectivity, and other
properties of the network’s architecture. In EPANNs, evolving
learning architectures implies more specifically to discover a
combination of architectures and learning rules whose syn-
ergetic matching enables particular learning dynamics. As
opposed to biological networks, EPANNs do not have the
neurophysiological constraints, e.g., short neural connections,
sparsity, brain size limits, etc., that impose limitations on the
natural evolution of biological networks. Thus, biologically
Section 5
Section 4
Section 3
AIMS, and EAs
Evolved Plastic Artificial
Neural Networks
Natural evolution
Artificial evolution
Plasticity in biological
neural networks
Plasticity in artificial
neural networks
Natural learning
learning enviroment
discovery of
abstractions and
general learning
incremental and
social learning
deep learning
Section 2
P1: evolution
P2: plasticity
P3: discovery of learning
P4: generality
Figure 3: Organization of the field of EPANNs, reflected in the structure of this paper.
implausible artificial systems may nevertheless be evolved in
computer simulations (Bullinaria, 2007b, 2009b).
One seminal early study by Happel and Murre (1994)
proposed the evolutionary design of modular neural networks,
called CALM (Murre, 1992), in which modules could perform
unsupervised learning, and the intermodule connectivity was
shaped by Hebbian rules. The network learned categorization
problems (simple patterns and hand written digits recognition),
and showed that the use of evolution led to enhanced learning
and better generalization capabilities in comparison to hand-
designed networks. In Arifovic and Gencay (2001), the authors
used evolution to optimize the number of inputs and hidden
nodes, and allowed connections in a feedforward neural net-
work to be trained with backpropagation. Abraham (2004) pro-
posed a method called Meta-Learning Evolutionary Artificial
Neural Networks (MLEANN) in which evolution searches for
initial weights, neural architectures and transfer functions for a
range of supervised learning problems to be solved by evolved
networks. The evolved networks were tested in time series pre-
diction and compared with manually designed networks. The
analysis showed that evolution consistently found networks
with better performance than the hand-designed structures.
Khan et al. (2008) proposed an evolutionary developmental
system that created an architecture that adapted with learning:
the network had a dynamic morphology in which neurons
could be inserted or deleted, and synaptic connections formed
and changed in response to stimuli. The networks were evolved
with Cartesian genetic programming and appeared to improve
their performance while playing checkers over the generations.
Downing (2007) looked at different computational models of
neurogenesis to evolve learning architectures. The proposed
evolutionary developmental system focused in particular on
abstraction levels and principles such as Neural Darwinism
(Edelman and Tononi, 2000). A combination of evolution of
recurrent networks with a linear learner in the output was
proposed in Schmidhuber et al. (2007), showing that the
evolved RNNs were more compact and resulted in better
learning than randomly initialized echo state networks (Jaeger
and Haas, 2004). In Khan et al. (2011b,a); Khan and Miller
(2014), the authors introduced a large number of bio-inspired
mechanisms to evolve networks with rich learning dynamics.
The idea was to use evolution to design a network that was
capable of advanced plasticity such as dendrite branch and
axon growth and shrinkage, neuron insertion and destruction,
and many others. The system was tested on the Wumpus
World (Russell and Norvig, 2013), a fairly simple problem
with no learning required, but the purpose was to show that
evolution can design working control networks even within a
large search space.
In summary, learning mechanisms and neural architectures
are strongly interdependent, but a large set of available dynam-
ics seem to facilitate the evolution of learning. Thus, EPANNs
become more effective precisely when manual network design
becomes less practical because of complexity and rich dynam-
C. EPANNs in Evolutionary Robotics
Evolutionary robotics (ER) (Cliff et al., 1993; Floreano and
Mondada, 1994, 1996; Urzelai and Floreano, 2000; Floreano
and Nolfi, 2004) contributed strongly to the development
of EPANNs, providing a testbed for applied controllers in
robotics. Although ER had no specific assumptions on neu-
ral systems or plasticity (Smith, 2002), robotics experiments
suggested that neural control structures evolved with fixed
weights perform less well than those evolved with plastic
weights (Nolfi and Parisi, 1996; Floreano and Urzelai, 2001b).
In a conditional phototaxis robotic experiment2, Floreano
and Urzelai (2001a) reported that networks evolved faster
when synaptic plasticity and neural architectures were evolved
simultaneously. In particular, plastic networks were shown
to adapt better in the transition from simulation to real
robots. The better simulation-to-hardware transition, and the
increased adaptability in changing ER environments, appeared
intuitive and supported by evidence (Nolfi and Parisi, 1996).
However, the precise nature and magnitude of the changes
from simulation to hardware is not always easy to quantify:
those studies do not clearly outline the precise principles, e.g.,
better or adaptive feedback control, minimization principles,
etc., that are discovered by evolution with plasticity to produce
those advantages. In fact, the behavioral changes required to
switch behaviors in simple ER experiments can also take place
with non-plastic recurrent neural networks because evolution
can discover recurrent units that act as switches. A study in
2003 observed similar performance in an associative learning
task (food foraging) when comparing plastic and non-plastic
recurrent networks (Stanley et al., 2003). Recurrent networks
with leaky integrators as neurons (Beer and Gallagher, 1992;
Funahashi and Nakamura, 1993; Yamauchi and Beer, 1994)
were also observed to achieve similar performance to plastic
networks (Blynel and Floreano, 2002, 2003). These early
studies indicate that the evolution of learning with plastic
networks was at that point still a proof-of-concept rather
than a superior learning tool: aided by evolutionary search,
networks with recurrent connections and fixed weights could
create recurrent nodes, retain information and achieve similar
learning performance to networks with plastic weights.
Nevertheless, ER maintained a focus on plasticity as demon-
strated, e.g., in The Cyber Rodent Project (Doya and Uchibe,
2005) that investigated the evolution of learning by seeking
to implement a number of features such as (1) evolution
of neural controllers, (2) learning of foraging and mating
behaviors, (3) evolution of learning architectures and meta-
parameters, (4) simultaneous learning of multiple agents in a
body, and (5) learning and evolution in a self-sustained colony.
Plasticity in the form of modulated neural activation was used
in Husbands et al. (1998) and Smith et al. (2002) with a
network that adapts its activation functions according to the
diffusion of a simulated gas spreading to the substrate of the
network. Although the robotic visual discrimination tasks did
not involve learning, the plastic networks appeared to evolve
faster than a network evolved with fixed activation functions.
Similar conclusions were reached in Di Paolo (2003) and
Federici (2005). Di Paolo (2002, 2003) evolved networks with
STDP for a wheeled robot to perform positive and negative
phototaxis, depending on a conditioned stimulus, and observed
that networks with fixed weights could learn but had inferior
performance with respect to plastic networks. Federici (2005)
evolved plastic networks with STDP and an indirect encoding,
showing that plasticity helped performance even if learning
was not required. Stability and evolvability of simple robotic
2The fitness value was the time spent by a two-wheeled robot in one
particular area of the area when a light was on, divided by the total experiment
controllers were investigated in Hoinville et al. (2011) who
focused on EPANNs with homeostatic mechanisms.
Experiments in ER in the 1990s and early 2000s revealed
the extent, complexity, and multitude of ideas behind the
evolutionary design of learning neuro-robotics controllers.
They generally indicate that plasticity helps evolution under
a variety of conditions, even when learning is not required,
thereby promoting further interest in more specific topics.
Among those are the evolutionary discovery of learning, the
evolution of neuromodulation, and the evolution of indirectly
encoded plasticity, as described in the following.
D. Evolutionary discovery of learning
When evolution is used to search for learning mechanisms,
two main cases can be distinguished: (1) when learning is
used to acquire constant facts about the agent or environment,
and (2) when learning is used to acquire changeable facts.
The first case, that of static or stationary environments, is
known to be affected by the Baldwin effect (Baldwin, 1896)
that suggests an acceleration of evolution when learning occurs
during lifetime. A number of studies showed that the Baldwin
effect can be observed with computational simulations (Smith,
1986; Hinton and Nowlan, 1987; Boers et al., 1995; Mayley,
1996; Bullinaria, 2001). With static environments, learning
causes a faster transfer of knowledge into the genotype, which
can happen when facts are stationary (or constant) across
generations. Eventually, a system in those conditions can
perform well without learning because it can be born knowing
to perform well. However, one limitation is that the genome
might grow very large to hold large amount of information,
and might, as a result, become less easy to evolve further. A
second limitation is that such solutions might not perform well
in non-stationary environments.
In the second case, that of variable or non-stationary envi-
ronments, facts cannot be embedded in the genotype because
those are changeable as, e.g., the location of food in a foraging
problem. This case requires the evolution of learning for the
performance to be maximized. For this reason, non-stationary
reward-based environments, in which the behaviors to obtain
rewards may change, are more typically used to study the
evolution of learning in EPANNs.
EPANN experiments have been used to observe the advan-
tages of combining learning and evolution, and the complex
interaction dynamics that derives (Nolfi and Floreano, 1999).
Stone (2007) showed that distributed neural representations
accelerate the evolution of adaptive behavior because learning
part of a skill induced the automatic acquisition of other skill
components. One study in a non-stationary environment (a
foraging problem with variable rewards) (Soltoggio et al.,
2007) suggested that evolution discovers, before optimizing,
learning in a process that is revealed by discrete fitness
stepping stones. At first, non-learning solutions are present
in the population. When evolution casually discovers a weak
mechanism of learning, it is sufficient to create an evolu-
tionary advantage, so the neural mechanism is subsequently
optimized: Fig. 4 shows a sudden jump in the fitness when
one agent suddenly evolves a learning strategy: such jumps
250001000 2000
Learning is discovered
by one EPANN
Simple navigation
is evolved
without learning
Stagnation occurs
before learning evolves
Figure 4: Discovery of a learning strategy during evolution in a
non-stationary reward-based environment, i.e. where learning
is required to maximise the fitness because of the variability
of the environment. After approximately 1,000 generations of
100 individuals, a network evolves to use the reward signal
to modify networks weights, and thus implement learning.
The graphic is adapted from Soltoggio et al. (2007). This
experiment shows that evolution stagnates until learning is
discovered. At that point, the evolutionary search hits on a
gradient in the search space that improves on the learning as
reflected by the fitness values.
in fitness graphs are common in evolutionary experiments in
which learning is discovered from scratch (Aim 1.1), rather
than optimized (Aim 1.2), and were observed as early as in
Fontanari and Meir (1991). When an environment changes
over time, the frequency of those changes plays a role because
it determines the time scales that are required from the
learning agent. With time scales comparable to a lifetime,
evolution may lead to phenotypic plasticity, which is the
capacity for a genotype to express different phenotypes in
response to different environmental conditions (Lalejini and
Ofria, 2016). The frequency of environmental changes was
observed experimentally in plastic neural networks to affect
the evolution of learning (Ellefsen, 2014), revealing a complex
relationship between environmental variability and evolved
learning. One conclusion is that evolution copes with non-
stationary environments by evolving the specific learning that
better matches those changes.
The use of reward to guide the discovery of neural learning
through evolution was shown to be inherently deceptive in
Risi et al. (2010) and Lehman and Miikkulainen (2014).
In Risi et al. (2009, 2010), EPANN-controlled simulated
robots, evolved in a discrete T-Maze domain, revealed that
the stepping stones towards discovering learning are often
not rewarded by objective-based performance measures. Those
stepping stones to learning receive a lower fitness score than
more brittle solutions with no learning but effective behaviors.
A solution to this problem was devised in Risi et al. (2010,
2009), in which novelty search (Lehman and Stanley, 2008,
2011) was adopted as a substitute for performance in the
fitness objective with the aim of finding novel behaviors.
Novelty search was observed to perform significantly better
in the T-Maze domain. Lehman and Miikkulainen (2014) later
showed that novelty search can encourage the evolution of
more adaptive behaviors across a variety of different variations
of the T-Maze learning tasks. As a consequence, novelty
search contributed to a philosophical change by questioning
the centrality of objective-driven search in current evolutionary
algorithms (Stanley and Lehman, 2015). By rewarding novel
behaviors, novelty search validates the importance of explo-
ration or curiosity, previously proposed in Schmidhuber (1991,
2006), also from an evolutionary viewpoint. With the aim of
validating the same hypothesis, Soltoggio and Jones (2009)
devised a simple EPANN experiment in which exploration
was more advantageous than exploitation in the absence of
reward learning; to do this, the reward at a particular location
depleted itself if continuously visited, so that changing location
at random in a T-maze became beneficial. Evolution discov-
ered exploratory behavior before discovering reward-learning,
which in turn, and surprisingly, led to an earlier evolution
of reward-based learning. Counterintuitively, this experiment
suggests that a stepping stone to evolve reward-based learning
is to encourage reward-independent exploration.
The seminal work in Bullinaria (2003, 2007a, 2009c) pro-
poses the more general hypothesis that learning requires the
evolution of long periods of parental protection and late onset
of maturity. Similarly, Ellefsen (2013b,a) investigates sensitive
and critical periods of learning in evolved neural networks.
This fascinating hypothesis has wider implications for experi-
ments with EPANNs, and more generally for machine learning
and AI. It is therefore foreseeable that future EPANNs will
have a protected childhood during which parental guidance
may be provided (Clutton-Brock, 1991; Klug and Bonsall,
2010; Eskridge and Hougen, 2012).
E. Evolving neuromodulation
Growing neuroscientific evidence on the role of neuromodu-
lation (previously outlined in Section II-B) inspired the design
of experiments with neuromodulatory signals to evolve control
behavior and learning strategies (Section II-C). One particular
case is when neuromodulation gates plasticity. Eq. 1 can be
rewritten as as
to emphasize the role of m, a modulatory signal used as
a multiplicative factor that can enhance or reduce plasticity
(Abbott, 1990). A network may produce many independent
modulatory signals mtargeting different neurons or areas of
the network. Thus, modulation can vary in space and time.
Modulation may also affect other aspects of the network
dynamics, e.g., modulating activations rather than plasticity
(Krichmar, 2008). Graphically, modulation can be represented
as a different type of signal affecting various properties of the
synaptic connections of an afferent neuron i(Fig. 5).
Evolutionary search was used to find the parameters of
a neuromodulated Hebbian learning rule in a reward-based
armed-bandit problem in Niv et al. (2002). The same problem
was used later in Soltoggio et al. (2007) to evolve arbitrary
learning architectures with a bio-inspired gene representation
method called Analog Genetic Encoding (AGE) (Mattiussi and
synthesis of
learning signals
synthesis of
control signals
Figure 5: A modulatory neuron gates plasticity of the synapses
that connect to the postsynaptic neuron. The learning is local,
but a learning signal can be created by one part of the network
and used to regulate learning elsewhere.
Floreano, 2007). In that study, evolution was used to search
both modulatory topologies and parameters of a particular
form of Eq. 4:
where the parameters Ato Ddetermined the influence of four
factors in the rule: a multiplicative Hebbian term A, a presy-
naptic term B, a postsynaptic term C, and pure modulatory,
or heterosynaptic, term D. Such a rule is not dissimilar from
those presented in previous studies (see Section 3.2). However,
when used in combination with modulation and a search
for network topologies, evolution seems to be particularly
effective at solving reward-based problems. Kondo (2007)
proposed an evolutionary design and behavior analysis of
neuromodulatory neural networks for mobile robot control,
validating the potential of the method.
Soltoggio et al. (2008) tested the question of whether
modulatory dynamics held an evolutionary advantage in T-
maze environments with changing reward locations3. In their
algorithm, modulatory neurons were freely inserted or deleted
by random mutations, effectively allowing the evolutionary
selection mechanism to autonomously pick those networks
with advantageous computational components (Aim 2). Af-
ter evolution, the best performing networks had modulatory
neurons regulating learning, and evolved faster than a control
evolutionary experiment that could not employ modulatory
neurons. Modulatory neurons were maintained in the networks
in a second phase of the experiment when genetic opera-
tors allowed for the deletion of such neurons but not for
their insertion, thus demonstrating their essential function in
maintaining learning in that particular experiment. In another
study, Soltoggio (2008b) suggested that evolved modulatory
topologies may be essential to separate the learning circuity
from the input-output controller, and shortening the input-
output pathways which sped up decision processes. Soltoggio
(2008a) showed that the learning dynamics are affected by
tight coupling between rules and architectures in a search
space with many equivalent but different control structures.
Fig. 5 also suggests that modulatory networks require evolu-
tion to find two essential topological structures: what signals or
3In reward-based T-Maze environments, it is often assumed that the fitness
function is the sum or all rewards collected during a lifetime.
combination of signals trigger modulation, and what neurons
are to be targeted by modulatory signals. In other words, a
balance between fixed and plastic architectures, or selective
plasticity (DARPA-L2M, 2017), is an intrinsically emergent
property of evolved modulated networks.
A number of further studies on the evolution of neu-
romodulatory dynamics confirmed the evolutionary advan-
tages in learning scenarios (Soltoggio, 2008c). Silva et al.
(2012a) used simulations of 2-wheel robots performing a
dynamic concurrent foraging task, in which scattered food
items periodically changed their nutritive value or became
poisonous, similarly to the setup in Soltoggio and Stanley
(2012). The results showed that when neuromodulation was
enabled, learning evolved faster than when neuromodulation
was not enabled, also with multi-robot distributed systems
(Silva et al., 2012b). Nogueira et al. (2013, 2016) also re-
ported evolutionary advantages in foraging behavior of an
autonomous virtual robot when equipped with neuromodu-
lated plasticity. Harrington et al. (2013) demonstrated how
evolved neuromodulation applied to a gene regulatory network
consistently generalized better than agents trained with fixed
parameter settings. Interestingly, Arnold et al. (2013b) showed
that neuromodulatory architectures provided an evolutionary
advantage also in reinforcement-free environments, validating
the hypothesis that plastic modulated networks have higher
evolvability in a large variety of tasks. The evolution of social
representations in neural networks was shown to be facilitated
by neuromodulatory dynamics in Arnold et al. (2013a). An
artificial life simulation environment called Polyworld (Yoder
and Yaeger, 2014) helped to assess the advantage of neuro-
modulated plasticity in various scenarios. The authors found
that neuromodulation may be able to enhance or diminish
foraging performance in a competitive, dynamic environment.
Neuromodulation was evolved in Ellefsen et al. (2015) in
combination with modularity to address the problem of catas-
trophic forgetting. In Gustafsson (2016), networks evolved
with AGE (Mattiussi and Floreano, 2007) for video game
playing were shown to perform better with the addition of
neuromodulation. Norouzzadeh and Clune (2016) showed that
neuromodulation produced forward models that could adapt
to changes significantly better than the controls. They verified
that evolution exploited variable learning rates to perform
adaptation when needed. In Velez and Clune (2017), diffusion-
based modulation, i.e., targeting entire parts of the network,
evolved to produce task-specific localized learning and func-
tional modularity, thus reducing the problem of catastrophic
The evidence in these studies suggests that neuromodulation
is a key ingredient to facilitate the evolution of learning in
EPANNs. They also indirectly suggest that neural systems
with more than one type of signal, e.g., activation and other
modulatory signals, might be beneficial in the neuroevolution
of learning.
F. Evolving indirectly encoded plasticity
An indirect genotype to phenotype mapping means that
evolution operates on a compact genotypical representation
(a) HyperNEAT
(b) Adaptive HyperNEAT
Figure 6: Example of an indirect mapping of plasticity rules from a compact genotype to a larger phenotype. (a) ANN nodes
in HyperNEAT are situated in physical space by assigning them specific coordinates. The connections between nodes are
determined by an evolved Compositional Patterns Producing Network (CPPN; Stanley (2007)), which takes as inputs the
coordinates of two ANN neurons and returns the weight between them. In the normal HyperNEAT approach (a), the CPPN is
queried once for all potential ANN connections when the agent is born. On the other hand, in adaptive HyperNEAT (b), the
CPPN is continually queried during the lifetime of the agent to determine individual connection weight changes based on the
location of neurons and additionally the activity of the presynaptic and postsynaptic neuron, and current connection weight.
Adaptive HyperNEAT is able to indirectly encode a pattern of nonlinear learning rules for each connection in the ANN (right).
(analogous to the DNA) that is then mapped into a fully
fledged network (analogous to a biological brain). Learning
rules may undergo a similar indirect mapping, so that compact
instructions in the genome expand to fully fledged plasticity
rules in the phenotype. One early study (Gruau and Whitley,
1993) encoded plasticity and development with a grammar
tree, and compared different learning rules on a simple
static task (parity and symmetry), demonstrating that learning
provided an evolutionary advantage in a static scenario. In
non-static contexts, and using a T-Maze domain as learning
task, Risi and Stanley (2010) showed that HyperNEAT, which
usually implements a compact encoding of weight patterns for
large-scale ANNs (Fig. 6a), can also encode patterns of local
learning rules. The approach, called adaptive HyperNEAT,
can encode arbitrary learning rules for each connection in an
evolving ANN based on a function of the ANN’s geometry
(Fig. 6b). Further flexibility was added in Risi and Stanley
(2012) to simultaneously encode the density and placement
of nodes in substrate space. The approach, called adaptive
evolvable-substrate HyperNEAT, makes it possible to indi-
rectly encode plastic ANNs with thousands of connections
that exhibit regularities and repeating motifs. Adaptive ES-
HyperNEAT allows each individual synaptic connection, rather
than neuron, to be standard or modulatory, thus introducing
further design flexibility. Risi and Stanley (2014) showed how
adaptive HyperNEAT can be seeded to produce a specific
lateral connectivity pattern, thereby allowing the weights to
self-organize to form a topographic map of the input space.
The study shows that evolution can be seeded with specific
plasticity mechanisms that can facilitate the evolution of
specific types of learning.
The effect of indirectly encoded plasticity on the learning
and on the evolutionary process was investigated by Tonelli
and Mouret (2011, 2013). Using an operant conditioning task,
i.e., learning by reward, the authors showed that indirect
encodings that produced more regular neural structures also
improved the general EPANN learning abilities when com-
pared to direct encodings. In an approach similar to adaptive
HyperNEAT, Orchard and Wang (2016) encoded the learning
rule itself as an evolving network. They named the approach
neural weights and bias update (NWB), and observed that
increasing the search space of the possible plasticity rules
created more general solutions than those based on only
Hebbian learning.
The progress of EPANNs reviewed so far is based on
rapidly developing theories and technologies. In particular,
new advances in AI, machine learning, neural networks and
increased computational resources are currently creating a new
fertile research landscape, and are setting the groundwork for
new directions for EPANNs. This section presents promising
research themes that have the potential to extend and radically
change the field of EPANNs and AI as a whole.
A. Levels of abstraction and representations
Choosing the right level of abstraction and the right repre-
sentation (Bengio et al., 2013) are themes at the heart of many
problems in AI. In ANNs, low levels of abstraction are more
computationally expensive, but might be richer in dynamics.
High levels are faster to simulate, but require an intuition of the
essential dynamics that are necessary in the model. Research
in EPANNs is well placed to address the problem of levels of
abstraction because it can reveal evolutionary advantages for
different components, structures and representations.
Similarly to abstractions, representations play a critical role.
Compositional Patterns Producing Networks (CPPNs) (Stan-
ley, 2007), and also the previous work of Sims (1991), demon-
strated that structured phenotypes can be generated through a
function without going through the dynamic developmental
process typical of multicellular organisms. Relatedly, Hornby
et al. (2002) showed that the different phenotypical represen-
tations led to considerably different results in the evolution of
regular structures with patterns and repetitions. Miller (2014)
discussed explicitly the effect of abstraction levels for evolved
developmental learning networks, in particular in relation to
two approaches that model development at the neuron level or
at the network level.
Finding appropriate abstractions and representations, just
as it was fundamental in the advances in deep learning to
represent input spaces and hierarchical features (Bengio et al.,
2013; Oquab et al., 2014), can also extend to representations of
internal models, learning mechanisms, and genetic encodings,
affecting the algorithms’ capabilities of evolving learning
B. Evolving general learning
One challenge in the evolution of learning is that evolved
learning may simply result in a switch among a finite set of
evolved behaviors, e.g., turning left or right in a T-Maze in a
finite sequence, which is all that evolving solutions encounter
during their lifetime. A challenge for EPANNs is to acquire
general learning abilities in which the network is capable of
learning problems not encountered during evolution. Mouret
and Tonelli (2014) propose the distinction between the evo-
lution of behavioral switches and the evolution of synaptic
general learning abilities, and suggest conditions that favor
these types of learning. General learning can be intuitively
understood as the capability to learn any association among
input, internal, and output patterns, both in the spatial and tem-
poral dimensions, regardless of the complexity of the problem.
Such an objective clearly poses practical and philosophical
challenges. Although humans are considered better at general
learning than machines, human learning skills are also specific
and not unlimited (Ormrod and Davis, 2004). Nevertheless,
moving from behavior switches to more general learning is
a desirable feature for EPANNs. Encouraging the emergence
of general learners may likely involve (1) an increased com-
putational cost for testing in rich environments that include
a large variety of uncertain and stochastic scenarios with
problems of various complexity, and (2) an increased search
space to explore the evolution of complex strategies and avoid
C. Incremental and social learning
An important open challenge for machine learning in gen-
eral is the creation of neural systems that can continuously
integrate new knowledge and skills without forgetting what
they previously learned (Parisi et al., 2018), thus solving
the stability-plasticity dilemma. A promising approach is
progressive neural networks (Rusu et al., 2016), in which
a new network is created for each new task, and lateral
connections between networks allow the system to leverage
previously learned features. In the presence of time delays
among stimuli, actions and rewards, a rule called hypothesis
testing plasticity (HTP) (Soltoggio, 2015) implements fast
and slow decay to consolidate weights and suggests neural
dynamics to avoid catastrophic forgetting. A method to find
the best shared weights across multiple tasks, called elastic
weight consolidation (EWC) was proposed in Kirkpatrick et al.
(2017). Plasticity rules that implement weight consolidation,
given their promise to prevent catastrophic forgetting, are
likely to become standard components in EPANNs.
Encouraging modularity (Ellefsen et al., 2015; Durr et al.,
2010) or augmenting evolving networks with a dedicated
external memory component (L¨
uders et al., 2016) have been
proposed recently. An evolutionary advantage is likely to
emerge for networks that can elaborate on previously learned
sub-skills during their lifetime to learn more complex tasks.
One interesting case in which incremental learning may play
a role is social learning (Best, 1999). EPANNs may learn both
from the environment and from other individuals, from scratch
or incrementally (Offerman and Sonnemans, 1998). In an
early study, McQuesten and Miikkulainen (1997) showed that
neuroevolution can benefit from parent networks teaching their
offspring through backpropagation. When social, learning may
involve imitation, language or communication, or other social
behaviors. Bullinaria (2017) proposes an EPANN framework
to simulate the evolution of culture and social learning. It is
reasonable to assume that future AI learning systems, whether
based on EPANNs or not, will acquire knowledge through
different modalities. These will involve direct experience with
the environment, but also social interaction, and possibly
complex incremental learning phases.
D. Fast learning
Animal learning does not always require a myriad of trials.
Humans can very quickly generalize from only a few given
examples, possibly leveraging previous experiences and a
long learning process during infancy. This type of learning,
advocated in AI and robotics systems (Thrun and Mitchell,
1995), is currently still missing in EPANNs. Inspiration for
new approaches could come from complementary learning
systems (McClelland et al., 1995; Kumaran et al., 2016) that
humans seem to possess, which include fast and slow learning
components. Additionally, approaches such as probabilistic
program induction seem to be able to learn concepts in one-
shot at a human-level in some tasks (Lake et al., 2015). Fast
learning is likely to derive not just from trial-and-error, but also
from mental models that can be applied to diverse problems,
similarly to transfer learning (Thrun and Mitchell, 1995; Thrun
and O’Sullivan, 1996; Pan and Yang, 2010). Reusable mental
models, once learned, will allow agents to make predictions
and plan in new and uncertain scenarios with similarities
to previously learned ones. If EPANNs can discover neural
structures or learning rules that allow for generalization, an
evolutionary advantage of such a discovery will lead to its
full emergence and further optimization of such a property.
A rather different approach to accelerate learning was
proposed in Fernando et al. (2008); de Vladar and Szathm´
(2015) and called Evolutionary Neurodynamics. According
to this theory, replication and selection might happen in a
neural system as it learns, thus mimicking an evolutionary
dynamics at the much faster time scale of a lifetime. We refer
to Fernando et al. (2012); de Vladar and Szathm´
ary (2015)
for an overview of the field. The appeal of this method is that
evolutionary search can be accelerated by implementing its
dynamics at both the evolution’s and life’s time scales.
E. Evolving memory
The consequence of learning is memory, both explicit
and implicit (Anderson, 2013), and its consolidation (Dudai,
2012). For a review of computational models of memory see
Fusi (2017). EPANNs may reach solutions in which memory
evolved in different fashions, e.g., preserved as self-sustained
neural activity, encoded by connection weights modified by
plasticity rules, stored with an external memory (e.g. Neural
Turing Machine), or a combination of these approaches. Re-
current neural architectures based on long short-term memory
(LSTM) allow very complex tasks to be solved through
gradient descent training (Greff et al., 2015; Hochreiter and
Schmidhuber, 1997) and have recently shown promise when
combined with evolution (Rawal and Miikkulainen, 2016).
Neuromodulation and weight consolidation could also be used
to target areas of the network where information is stored.
Graves et al. (2014) introduced the Neural Turing Machine
(NTM), networks augmented with an external memory that
allows long-term memory storage. NTMs have shown promise
when trained through evolution (Greve et al., 2016; L¨
et al., 2016, 2017) or gradient descent (Graves et al., 2014,
2016). The Evolvable Neural Turing Machine (ENTM) showed
good performance in solving the continuous version of the
double T-Maze navigation task (Greve et al., 2016), and
avoided catastrophic forgetting in a continual learning domain
uders et al., 2016, 2017) because memory and control are
separated by design. Research in this area will reveal which
computational systems are more evolvable and how memories
will self organize and form in EPANNs.
F. EPANNs and deep learning
Deep learning has shown remarkable results in a variety
of different fields (Krizhevsky et al., 2012; Schmidhuber,
2015; LeCun et al., 2015). However, the model structures
of these networks are mostly hand-designed, include a large
number of parameters, and require extensive experiments to
discover optimal configurations. With increased computational
resources, it is now possible to search design aspects with
evolution, and set up EPANN experiments with the aim of
optimizing learning (Aim 1.2).
ık et al. (2014) used evolution to design a controller
that combined evolved recurrent neural networks, for the
control part, and a deep max-pooling convolutional neural
network to reduce the input dimensionality. The study does not
use evolution on the deep preprocessing networks itself, but
demonstrates nevertheless the evolutionary design of a deep
neural controller. Young et al. (2015) used an evolutionary
algorithm to optimize two parameters of a deep network: the
size (range [1,8]) and the number (range [16,126]) of the
filters in a convolutional neural network, showing that the op-
timized parameters could vary considerably from the standard
best-practice values. An established evolutionary computation
technique, the Covariance Matrix Adaptation Evolution Strat-
egy (CMA-ES) (Hansen and Ostermeier, 2001), was used in
Loshchilov and Hutter (2016) to optimize the parameters of
a deep network to learn to classify the MNIST dataset. The
authors reported performance close to the state-of-the-art using
30 GPU devices.
Real et al. (2017) and Miikkulainen et al. (2017) showed
that evolutionary search can be used to determine the topology,
hyperparameters and building blocks of deep networks trained
through gradient descent. The performance were shown to rival
those of hand-designed architectures in the CIFAR-10 classifi-
cation task and a language modeling task (Miikkulainen et al.,
2017), while Real et al. (2017) also tested the method on the
larger CIFAR-100 dataset. Desell (2017) proposes a method
called evolutionary exploration of augmenting convolutional
topologies, inspired by NEAT (Stanley and Miikkulainen,
2002), which evolves progressively more complex unstruc-
tured convolutional neural networks using genetically specified
feature maps and filters. This approach is also able to co-
evolve neural network training hyperparameters. Results were
obtained using 5,500 volunteered computers at the Citizen
Science Grid who were able to evolve competitive results
on the MNIST dataset in under 12,500 trained networks in
a period of approximately ten days. Liu et al. (2017) used
evolution to search for hierarchical architecture representations
showing competitive performance on the CIFAR-10 and Im-
agenet databases. The Evolutionary DEep Networks (EDEN)
framework (Dufourq and Bassett, 2017) aims to generalize
deep network optimization to a variety of problems and is
interfaced with TensorFlow (Abadi et al., 2016). A number of
similar software frameworks are currently being developed.
Fernando et al. (2017) used evolution to determine a sub-
set of pathways through a network that are trained through
backpropagation, allowing the same network to learn a variety
of different tasks. Fernando et al. (2016) were also able to
rediscover convolutional networks by means of evolution of
Differentiable Pattern Producing Networks (Stanley, 2007).
So far, EPANN experiments in deep learning have focused
primarily on the optimization of learning (Aim 1.2) in super-
vised classification tasks, e.g. optimizing final classification
accuracy. In the future, evolutionary search may be used with
deep networks to evolve learning from scratch, recover per-
formance, or combining different learning rules and dynamics
in an innovative and counter-intuitive fashion (Aims 1.1, 1.3
or 2 respectively).
G. GPU implementations and neuromorphic hardware
The progress of EPANNs will crucially depend on imple-
mentations that take advantage of the increased computational
power of parallel computation with GPUs and neuromorphic
hardware (Jo et al., 2010; Monroe, 2014). Deep learning
greatly benefited from GPU-accelerated machine learning but
also standardized tools (e.g. Torch, Tensorflow, Theano, etc.)
that made it easy for anybody to download, experiment, and
extend promising deep learning models.
EPANNs have shown promise with hardware implementa-
tions. Howard et al. (2011, 2012, 2014) devised experiments
to evolve plastic spiking networks implemented as memristors
for simulated robotic navigation tasks. Memristive plasticity
was observed consistently to enable higher performance than
constant-weighted connections in both static and dynamic
reward scenarios. Carlson et al. (2014) used GPU imple-
mentations to evolve plastic spiking neural networks with an
evolution strategy, which resulted in an efficient and automated
parameter tuning framework.
In the context of newly emerging technologies, it is worth
noting that, just as GPUs were not developed initially for
deep learning, so novel neural computation tools and hardware
systems, not developed for EPANNs, can now be exploited to
enable more advanced EPANN setups.
H. Measuring progress
The number of platforms and environments for testing
the capabilities of intelligent systems is constantly growing,
e.g., the Atari or General Video Game Playing Benchmark
(GVGAI, 2017), the Project Malmo (Microsoft, 2017), or
the OpenAI Universe (OpenAI, 2017). Because EPANNs are
often evolved in reward-based, survival, or novelty-oriented
environments to discover new, unknown, or creative learning
strategies or behaviors, measuring progress is not straight-
forward. Desired behaviors or errors are not always defined.
Moreover, the goal for EPANNs is often not to be good at
solving one particular task, but rather to test the capability
to evolve the learning required for a range of problems, to
generalize to new problems, or to recover performance after
a change in the environment. Therefore, EPANNs will require
the community to devise and accept new metrics based on one
or more objectives such as the following:
the time (in the evolutionary scale) to evolve the learning
mechanisms in one or more scenarios;
the time (in the lifetime scale) for learning in one or more
the number of different tasks that an EPANN evolves to
a measure of the variety of skills acquired by one
the complexity of the tasks and/or datasets, e.g., variations
in distributions, stochasticity, etc.;
the robustness and generalization capabilities of the
the recovery time in front of high-level variations or
changes, e.g., data distribution, type of problem, stochas-
ticity levels, etc.;
computational resources used, e.g., number of lifetime
evaluations, length of a lifetime;
size, complexity, and computational requirements of the
solution once deployed;
novelty or richness of the behavior repertoire from mul-
tiple solutions, e.g., the variety of different EPANNs and
their strategies that were designed during evolution.
Few of those metrics are currently used to benchmark
machine learning algorithms. Research in EPANNs will foster
the adoption of such criteria as wider performance metrics for
assessing lifelong learning capabilities (Thrun and Pratt, 2012;
DARPA-L2M, 2017) of evolved plastic networks.
The broad inspiration and aspirations of evolved artificial
plastic neural networks (EPANNs) strongly motivate this field,
drawing from large, diverse, and interdisciplinary areas. In par-
ticular, the aspirations reveal ambitious and long-term research
objectives related to the discovery of neural learning, with
important implications for artificial intelligence and biology.
EPANNs saw considerable progress in the last two decades,
primarily pointing to the potential of the autonomous evolution
and discovery of neural learning. We now have: (i) advanced
evolutionary algorithms to promote the evolution of learn-
ing, (ii) a better understanding of the interaction dynamics
between evolution and learning, (iii) assessed advantages of
multi-signal networks such as modulatory networks, and (iv)
explored evolutionary representations of learning mechanisms.
Recent scientific and technical progress has set the foun-
dation for a potential step change in EPANNs. Concurrently
with the increase of computational power and a resurgence
of neural computation, the need for more flexible algorithms
and the opportunity to explore new design principles could
make EPANNs the next AI tool capable of discovering new
principles and systems for general adaptation and intelligent
We thank John Bullinaria, Kris Carlson, Jeff Clune, Travis
Desell, Keith Downing, Dean Hougen, Joel Lehman, Jeff
Krichmar, Jay McClelland, Robert Merrison-Hort, Julian
Miller, Jean-Baptiste Mouret, James Stone, Eors Szathmary,
and Joanna Turner for insightful discussions and comments
on earlier versions of this paper.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro,
C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al. Tensor-
flow: Large-scale machine learning on heterogeneous distributed
systems. arXiv preprint arXiv:1603.04467, 2016.
Abbott, L. F. Modulation of Function and Gated Learning in
a Network Memory. Proceedings of the National Academy of
Science of the United States of America, 87(23):9241–9245, 1990.
Abraham, A. Meta learning evolutionary artificial neural networks.
Neurocomputing, 56:1–38, 2004.
Abraham, W. C. and Robins, A. Memory retention–the synaptic
stability versus plasticity dilemma. Trends in neurosciences, 28
(2):73–78, 2005.
Alexander, W. H. and Sporns, O. An Embodied Model of Learning,
Plasticity, and Reward. Adaptive Behavior, 10(3-4):143–159, 2002.
Allis, L. V. et al. Searching for solutions in games and artificial
intelligence. Ponsen & Looijen, 1994.
Alpaydin, E. Introduction to machine learning. MIT press, 2014.
Anderson, J. R. The architecture of cognition. Psychology Press,
Angeline, P. J., Saunders, G. M., and Pollack, J. B. An evolutionary
algorithm that constructs recurrent neural networks. IEEE trans-
actions on Neural Networks, 5(1):54–65, 1994.
Arifovic, J. and Gencay, R. Using genetic algorithms to select
architecture of a feedforward artificial neural network. Physica A:
Statistical mechanics and its applications, 289(3):574–594, 2001.
Arnold, S., Suzuki, R., and Arita, T. Evolution of social representation
in neural networks. In Advances in Artificial Life, ECAL 2013:
Proceedings of the Twelfth European Conference on the Synthesis
and Simulation of Living Systems. MIT Press, Cambridge, MA,
pages 425–430, 2013a.
Arnold, S., Suzuki, R., and Arita, T. Selection for reinforcement-
free learning ability as an organizing factor in the evolution of
cognition. Advances in Artificial Intelligence, 2013:8, 2013b.
Ay, N., Flack, J., and Krakauer, D. C. Robustness and complexity
co-constructed in multimodal signalling networks. Philosophical
Transactions of the Royal Society of London B: Biological Sci-
ences, 362(1479):441–447, 2007.
ack, T. and Schwefel, H.-P. An overview of evolutionary algorithms
for parameter optimization. Evolutionary computation, 1(1):1–23,
Bahroun, Y., Hunsicker, E., and Soltoggio, A. Building Efficient Deep
Hebbian Networks for Image Classification Tasks. In International
Conference on Artificial Neural Networks, 2017.
Bailey, C. H., Giustetto, M., Huang, Y.-Y., Hawkins, R. D., and
Kandel, E. R. Is heterosynaptic modulation essential for stabilizing
Hebbian plasticity and memory? Nature Reviews Neuroscience, 1
(1):11–20, October 2000.
Baldwin, J. M. A new factor in evolution (continued). American
naturalist, pages 536–553, 1896.
Baxter, D. A., Canavier, C. C., Clark, J. W., and Byrne, J. H.
Computational Model of the Serotonergic Modulation of Sensory
Neurons in Aplysia. Journal of Neurophysiology, 82:1914–2935,
Baxter, J. The Evolution of Learning Algorithms for Artificial Neural
Networks. Complex Systems, D. Green and T.Bossomaier, Eds.
Amsterdam, The Netherlands: IOS, pages 313–326, 1992.
Bear, M. F., Connors, B. W., and Paradiso, M. A. Neuroscience,
volume 2. Lippincott Williams & Wilkins, 2007.
Beer, R. D. and Gallagher, J. C. Evolving dynamical neural networks
for adaptive behavior. Adaptive Behavior, 1(1):91–122, 1992.
Bengio, S., Bengio, Y., Cloutier, J., and Gecsei, J. On the optimization
of a synaptic learning rule. In Preprints Conf. Optimality in
Artificial and Biological Neural Networks, Univ. of Texas, Dallas,
Feb 6-8, 1992, 1992.
Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning
rule. Technical report, Universit´
e de Montr´
eal, D´
d’informatique et de recherche op´
erationnelle, 1990.
Bengio, Y., Courville, A., and Vincent, P. Representation learning:
A review and new perspectives. IEEE transactions on pattern
analysis and machine intelligence, 35(8):1798–1828, 2013.
Bentley, P. Evolutionary design by computers. Morgan Kaufmann,
Best, M. L. How culture can guide evolution: An inquiry into
gene/meme enhancement and opposition. Adaptive Behavior, 7
(3-4):289–306, 1999.
Bienenstock, L. E., Cooper, L. N., and Munro, P. W. Theory for
the development of neuron selectivity: orientation specificity and
binocular interaction in visual cortex. The Journal of Neuroscience,
2(1):32–48, January 1982.
Birmingham, J. T. Increasing Sensor Flexibility Through Neuromod-
ulation. Biological Bulletin, 200:206–210, April 2001.
Blynel, J. and Floreano, D. Levels of Dynamics and Adaptive
Behavior in Evolutionary Neural Controllers. In Proceedings of
the seventh international conference on simulation of adaptive
behavior on From animals to animats, pages 272–281. MIT Press
Cambridge, MA, USA, 2002.
Blynel, J. and Floreano, D. Exploring the T-Maze: Evolving
Learning-Like Robot Behaviors Using CTRNNs. In EvoWork-
shops, pages 593–604, 2003.
Boers, E. J., Borst, M. V., and Sprinkhuizen-Kuyper, I. G. Evolving
neural networks using the “Baldwin effect”. In Artificial Neural
Nets and Genetic Algorithms, pages 333–336. Springer, 1995.
Bourlard, H. and Kamp, Y. Auto-association by multilayer percep-
trons and singular value decomposition. Biological cybernetics, 59
(4-5):291–294, 1988.
Brown, T. H., Kairiss, E. W., and Keenan, C. L. Hebbian Synapse:
Biophysical Mechanisms and Algorithms. Annual Review of
Neuroscience, 13:475–511, 1990.
Bullinaria, J. A. Exploring the Baldwin effect in evolving adaptable
control systems. In Connectionist models of learning, development
and evolution, pages 231–242. Springer, 2001.
Bullinaria, J. A. From biological models to the evolution of
robot control systems. Philosophical Transactions: Mathematical,
Physical and Engineering Sciences, pages 2145–2164, 2003.
Bullinaria, J. A. The effect of learning on life history evolution.
In Proceedings of the 9th annual conference on Genetic and
evolutionary computation, pages 222–229. ACM, 2007a.
Bullinaria, J. A. Understanding the emergence of modularity in neural
systems. Cognitive science, 31(4):673–695, 2007b.
Bullinaria, J. A. Evolved Dual Weight Neural Architectures to
Facilitate Incremental Learning. In IJCCI, pages 427–434, 2009a.
Bullinaria, J. A. The importance of neurophysiological constraints for
modelling the emergence of modularity. Computational modelling
in behavioural neuroscience: closing the gap between neurophys-
iology and behaviour, 2:188–208, 2009b.
Bullinaria, J. A. Lifetime learning as a factor in life history evolution.
Artificial Life, 15(4):389–409, 2009c.
Bullinaria, J. A. Imitative and Direct Learning as Interacting Factors
in Life History Evolution. Artificial Life, 23, 2017.
Butz, M. V. Learning classifier systems. In Springer Handbook of
Computational Intelligence, pages 961–981. Springer, 2015.
Butz, M. V. and Kutter, E. F. How the mind comes into being:
Introducing cognitive science from a functional and computational
perspective. Oxford University Press, 2016.
Cabessa, J. and Siegelmann, H. T. The super-Turing computational
power of plastic recurrent neural networks. International journal
of neural systems, 24(08):1450029, 2014.
Carew, T. J., Walters, E. T., and Kandel, E. R. Classical conditioning
in a simple withdrawal reflex in Aplysia californica. The Journal
of Neuroscience, 1(12):1426–1437, December 1981.
Carlson, K. D., Nageswaran, J. M., Dutt, N., and Krichmar, J. L. An
efficient automated parameter tuning framework for spiking neural
networks. Neuromorphic Engineering Systems and Applications,
8, 2014.
Carver, C. S. and Scheier, M. F. Attention and self-regulation: A
control-theory approach to human behavior. Springer Science &
Business Media, 2012.
Cervier, D. AI: The Tumultuous Search for Artificial Intelligence,
Chalmers, D. J. The evolution of learning: An experiment in genetic
connectionism. In Proceedings of the 1990 connectionist models
summer school, pages 81–90. San Mateo, CA, 1990.
Chklovskii, D. B., Mel, B., and Svoboda, K. Cortical rewiring and
information storage. Nature, 431(7010):782–788, 2004.
Clark, G. A. and Kandel, E. R. Branch-specific heterosynaptic
facilitation in Aplysia siphon sensory cells. PNAS, 81(8):2577–
2581, 1984.
Cliff, D., Husbands, P., and Harvey, I. Explorations in Evolutionary
Robotics. Adaptive Behavior, 2(1):73–110, 1993.
Clutton-Brock, T. H. The evolution of parental care. Princeton
University Press, 1991.
Coleman, O. J. and Blair, A. D. Evolving plastic neural networks for
online learning: review and future directions. In AI 2012: Advances
in Artificial Intelligence, pages 326–337. Springer, 2012.
Cooper, S. J. Donald O. Hebb’s synapse and learning rule: a history
and commentary. Neuroscience and Biobehavioral Reviews, 28(8):
851–874, January 2005.
Damasio, A. R. The feeling of what happens: Body and emotion in
the making of consciousness. Houghton Mifflin Harcourt, 1999.
DARPA-L2M. DARPA, Lifelong Learning Machines. https://www.
fbo. gov /spg/ ODA/ DARPA/CMO/ HR001117S0016/listing . html,
accessed July 2017, 2017.
Darwin, C. On the origin of species by means of natural selection, or
The preservation of favoured races in the struggle for life. Murray,
London, 1859.
Dawkins, R. The evolution of evolvability. On growth, form and
computers, pages 239–255, 2003.
De Botton, A. Six areas that artificial emotional intelligence will
alain-de-botton-artificial-emotional-intelligence, 2016. Accessed:
de Vladar, H. P. and Szathm´
ary, E. Neuronal boost to evolutionary
dynamics. Interface focus, 5(6):20150074, 2015.
Deary, I. J., Johnson, W., and Houlihan, L. M. Genetic foundations
of human intelligence. Human genetics, 126(1):215–232, 2009.
Deng, L., Hinton, G., and Kingsbury, B. New types of deep
neural network learning for speech recognition and related appli-
cations: An overview. In Acoustics, Speech and Signal Processing
(ICASSP), 2013 IEEE International Conference on, pages 8599–
8603. IEEE, 2013.
Desell, T. Large Scale Evolution of Convolutional Neural Networks
Using Volunteer Computing. arXiv preprint arXiv:1703.05422,
Di Paolo, E. Spike-timing dependent plasticity for evolved robots.
Adaptive Behavior, 10(3-4):243–263, 2002.
Di Paolo, E. A. Evolving spike-timing-depending plasticity for single-
trial learning in robots. Philosophical Transactions of the Royal
Society, 361(1811):2299–2319, October 2003.
Dobzhansky, T. Genetics of the evolutionary process, volume 139.
Columbia University Press, 1970.
Doidge, N. The brain that changes itself: Stories of personal triumph
from the frontiers of brain science. Penguin, 2007.
Downing, K. L. Supplementing Evolutionary Development Systems
with Abstract Models of Neurogenesis. In at al., D. T., editor, Pro-
ceeding of the Genetic and Evolutionary Computation Conference