Conference PaperPDF Available

The Evolution of Learning Under Environmental Variability

Authors:
The Evolution of Learning Under Environmental Variability
Kai Olav Ellefsen
Department of Computer and Information Science,
Norwegian University of Science and Technology
email: kaiolae@idi.ntnu.no
Abstract
An important unanswered question within the evolution of
intelligence is how evolved learning efforts relate to environ-
mental characteristics. Through simulated evolution of sim-
ple learning agents, we study two different models of the
relationship between environmental variability and evolved
learning. We begin with a recently proposed model (the “pq-
model”), which suggests that 1) unreliable reinforcing feed-
back select against learning and 2) a fixed environment selects
for innate strategies, whereas a changing environment selects
for learned strategies. The other model we study proposes
that, in contrast with point 2 above, intermediate values of
environmental stability select for learning, whereas both too
stable and too variable environments will select against it. We
harmonize these seemingly conflicting models by evolving
learning agents across a wide range of environments, vary-
ing in levels of stability and reliability of stimuli. Based on
our findings, we propose a revised model for how learning
evolves under different levels of environmental variability.
Introduction
In an evolutionary context, we would intuitively expect a
learning individual to outperform non-learners in a changing
environment. In a stable environment, on the other hand, we
would expect individuals with innate behaviors to do bet-
ter, in particular if learning is associated with a cost (De-
Witt et al. (1998); Mery and Kawecki (2002)). Dunlap and
Stephens (2009) argue that this view of learning, where a
changing environment promotes plasticity while a static en-
vironment promotes stability (the “learning folk theorem”)
is too simplified. Through a mathematical model (from here
on referred to as the pq-model) and an evolutionary experi-
ment on fruit flies, they identify two different factors of en-
vironmental change that affect the evolution of plasticity in
opposite directions. The first type of change, termed best-
action fixity describes to which degree the best action to take
in the environment is always the same. The second type of
change, termed reliability of experience represents the reli-
ability of reinforcing feedback. According to the pq-model
(Figure 1), a higher reliability of experience selects more
strongly for learning, while a higher fixity of the best action
selects more strongly for non-learning strategies. Intuitively,
Reliability of Experience (q)
Fixity of Best Action (p)
0.5
0.5
1.0
1.0
Learning
No
Learning
Figure 1: The pq-model (Dunlap and Stephens (2009)) of
evolved learning: Environments where the optimal strategy
never changes (p= 1) select most strongly against learn-
ing. Environments where experiences cannot be trusted (q=
0.5) do the same. By similar reasoning, (p= 0.5, q = 1) se-
lect most strongly for learning.
we can think of it like this: Stable associations (high fixity of
best action) do not require learning – thus selecting against
it, and noisy sensors (low reliability of experience) makes
learning difficult or impossible – also selecting against it.
Details about the model, such as what the actual values of p
and qmean will be given in the Experimental Setup section.
Another view of how environmental change affects the
learning ability of individuals (Kerr and Feldman (2003);
Dukas (1998)) suggests that the relationship between envi-
ronmental variability and the utility of learning follows the
“Goldilocks principle” (Figure 2): For learning to be ben-
eficial, environmental variability needs to “just right” – too
frequent changes, and learning cannot track them. Too in-
frequent, and evolution can track them alone.
There seems to be a conflict between these two mod-
els for the relationship between environmental change and
plasticity. The type of environmental change studied in the
“Goldilocks principle”-model is what Dunlap and Stephens
termed “fixity of best action”: How often does the best be-
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
DOI: http://dx.doi.org/10.7551/978-0-262-32621-6-ch104
Environmental Variability
Utility of Learning
Minimal
Low
Maximal
High
Learning
Figure 2: The “Goldilocks principle”-model of evolved
learning: Learning evolves in an intermediate range of envi-
ronmental variability. Too slow or too rapid changes select
against learning.
havioral strategy in the environment change? According to
the pq-model, low levels of best-action fixity are associated
with a high learning ability. The Goldilocks principle, on the
other hand, states that both extremes should disfavor learn-
ing.
We believe the source of the conflict is the way Dunlap
and Stephens defined their lowest level of best-action fixity:
It is simply not low enough, and rather placed somewhere in
the middle of the range of possible environmental change –
which would make it a candidate for high levels of learning,
according to the Goldilocks principle.
The goal of this paper is to study the pq-model through
an experiment with simulated organisms evolved under dif-
ferent degrees of evolutionary change, to get a better under-
standing of how the reliability of experience and fixity of
best action influence their learning ability. Next, we will
attempt to unify the pq-model with the Goldilocks princi-
ple model in a single framework. We will also study how
the model changes when we apply the biologically realistic
constraint of a cost of learning (Mery and Kawecki (2002);
Mayley (1996)).
Related Work
The experimental setup of Dunlap and Stephens was based
on a preparation developed by Mery and Kawecki (2002).
Using this preparation, Mery and Kawecki tested and con-
firmed the “learning folk theorem” – that changing environ-
ments select for learning, while stable environments select
against it. Challenging this experimental confirmation was
one of the goals of Dunlap and Stephens, and therefore we
find it relevant to briefly review Mery and Kawecki’s exper-
imental setup and findings.
Mery and Kawecki (2002) wanted to study how learning
evolves under natural conditions, where the costs of learn-
ing will have to be outweighed by its benefits, for learning
to evolve. For this reason, they developed an experiment
testing under which circumstances learning would evolve in
a population of fruit flies. The experiment had two differ-
ent phases: In the first phase (training), the flies were sub-
jected to two fruit-flavored media, one of which (alternating
each generation) additionally contained quinine hydrochlo-
ride. Flies showed a strong avoidance to the medium paired
with quinine. In the next phase (testing), the quinine cue was
not present, and flies were again subjected to the media, this
time laying eggs. Flies which had learned a strong associ-
ation between quinine and the fruit flavored medium would
lay most of their eggs on the opposite medium. When breed-
ing the next generation of flies, the eggs from this opposite
medium were selected, giving a pressure for flies to learn in
order to succeed in reproducing.
The results showed that after several generations of evo-
lution, the presence of quinine in the training phase affected
the choice of oviposition substrate (where to lay eggs) in
the testing phase, an effect which grew stronger as genera-
tions passed – demonstrating that environmental character-
istics can affect evolved learning abilities.
Environmental change and plasticity
Computational experiments are well-suited to study the re-
lationship between environmental change and evolved plas-
ticity, as the environmental change can be tuned exactly as
needed, and as generations can be completed very rapidly
compared to in animal studies. These facts allow studies
spanning many types of environments and a large number
of generations. Sasaki and Tokoro (1999) studied how rates
of change in an environment affected populations of evolv-
ing individuals, finding that the benefit of learning agents (as
opposed to purely genetically specified agents) was greater
in the more dynamic environments than in more stable ones.
Our previous experiments (Ellefsen (2013)) have sug-
gested a similar trend: Allowing evolution to shape the de-
gree of learning vs genetic specification, we saw hard-coded
individuals evolve in more stable environments, and learn-
ing agents evolve in the less stable ones. Further, we discov-
ered that the most unstable environments had a tendency to
select against learning, a finding also reported in other ex-
periments on the evolution of plasticity (Watson and Wiles
(2002)). Together, these computational studies of the rela-
tionship between rates of environmental change and evolved
learning abilities lend support to the Goldilocks principle
model of learning: Too rapidly or too slowly changing envi-
ronments were both seen to select for hard-coded strategies,
while learning was evolved in environments with intermedi-
ate levels of change.
Learning Costs
The benefits of learning are frequently documented in stud-
ies of interactions between evolution and learning. Many
experiments (see for instance Floreano and Urzelai (2001);
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
Basic Experiment
Multiple Decisions
Periodic Environmental Changes
Experience Phase Consequence Phase
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Figure 3: Illustrations of the different phases in the three
different experimental setups we use. See text for details.
Littman (1995); Nolfi and Parisi (1996); Nolfi et al. (1994))
have highlighted how learning, as a supplement to evolution,
gives agents the ability to respond to rapidly changing condi-
tions, to adjust to different environments and morphologies,
and to solve more general problems than evolution could
alone.
When studying the evolution of learning, it is important to
also remember that learning has a cost (DeWitt et al. (1998);
Mayley (1996)). It is the balance between the cost and ben-
efit of learning that decides the final learning strategies fol-
lowed by individuals resulting from an evolutionary process.
An implication of the cost of plasticity is that plasticity in
organisms needs to have adaptive value. When possible,
natural selection will reduce costs by replacing plastic re-
sponses with genetic mechanisms. This will have an influ-
ence on the relationship between environmental change and
evolved learning efforts. Therefore, we study our models
under three conditions: without an externally imposed plas-
ticity cost and with two different levels of this cost.
Experimental setup
All experiments were performed with a discrete-generation
evolutionary algorithm where all individuals had the same
lifespan and fitness was based on the decision(s) made
throughout life. Within this general setup we performed dif-
ferent experimental treatments, with increasing levels of so-
phistication and realism, as explained below.
The basic experiment
The first experiment simulates Dunlap and Stephens’ origi-
nal setup. Their experimental setup consisted of two phases,
the experience phase and the consequence phase. In the ex-
perience phase, fruit flies were exposed to two fruit flavors,
pineapple and orange, for three hours. One of the flavors
was associated with quinine, enabling flies to learn to avoid
one of the fruits in this phase. In the consequence phase,
both fruit flavored media were presented without quinine. In
this phase, flies laid eggs, leading to more eggs from the best
Pineapple
Quinine OutputEat
Orange
Figure 4: A neural network learning the association from
food items to edibleness via quinine association. Rounded
rectangles are neurons, the dashed lines are learning links
that are modified by Hebbian learning and the solid line is a
static, negative connection simulating the fruit fly’s negative
response to the quinine (Mery and Kawecki (2002)). Inputs
are binary, representing the presence or absence of the rel-
evant stimulus. The output at any time gives a preference
score for the current fruit, and the highest scoring of the two
is named the preferred (fitness-deciding) fruit.
learners landing on the medium that did not contain quinine
in the experience phase. To vary the environmental stability,
Dunlap and Stephens varied 1) the probability that the eggs
laid on the orange were selected for reproduction (p) and 2)
the probability that the eggs laid on the medium not associ-
ated with quinine were selected for reproduction. The first
allows regulation of environmental stability, and the second
allows variation of how predictive the quinine association is
of reproductive success.
In our replication of this experiment, we split the task in
an experience and consequence phase in a similar fashion. In
the experience phase, fruit flies observe both fruits 49 times,
one of the two giving a quinine-stimulus in addition to the
fruit flavor. In the consequence phase, both fruits are pre-
sented again without quinine, and the fly computes a prefer-
ence score for each fruit. The most preferred fruit is visited
(or, equivalently, eaten), and this single visit decides the fit-
ness score (100 for the “good” fruit, -100 for the “bad” fruit).
Note that there is a certain simplification here: The fly only
makes a single decision, which equals laying all its eggs on
one medium. However, in the experiment on real fruit flies,
each fly laid multiple eggs. We decided to make this sim-
plification, as the two extensions we developed for the task
specifically deal with the option of multiple decisions.
The top part of Figure 3 illustrates this experiment. Note
that this figure is not to scale: In the actual experiment, the
consequence phase was even shorter compared to the expe-
rience phase (1 decision vs. 49 observations). The same
neural network (Figure 4) was used in this and the next ex-
periment. The network can learn the association from food
item to edibleness when the quinine is present via Hebbian
learning (Hebb (1949)), and the learned association is used
when classifying food items without the quinine present.
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
Extending to multiple decisions
In Dunlap and Stephens’ setup, a single learning phase fol-
lowed by a single decision phase forms the basis for the
reproductive chances of each individual. In a more realis-
tic scenario, survival and reproductive success will depend
on learning experiences and decisions made throughout life.
This is simulated by letting each individual go through 25
minimal experience phases followed by consequence phases
in a lifetime. In this case, experience phases consist of a
single observation of each fruit (paired with quinine when
appropriate), and the consequence phase is identical, but
without the quinine. Individuals receive fitness points for
each consequence phase (+1 for the “good” fruit, 1for
the “bad” fruit), and the total fitness equals the sum of fit-
ness points gained in all the consequence phases.
Notice that this setup (Figure 3, middle) also deviates
from natural learning processes in that it sharply separates
learning-periods and fitness-determining periods.
Extending to periodic environmental changes
The final and most realistic model has been used it previous
studies of the evolution of associative learning (Todd and
Miller (1991); Ellefsen (2013)). This model views learning
and fitness-evaluation as overlapping, continuously occur-
ring processes throughout individuals’ lifetimes. The model
contains enough detail to allow us to harmonize the two the-
ories we focus on in this paper.
To move away from looking at learning and fitness mea-
surement as two processes happening at different times, we
need to slightly alter the decision task. We cannot rely on the
quinine as the cue for associative learning, because the qui-
nine naturally divides the learning task into separate training
and testing phases. Instead, we look to a similar experiment,
where associations are continuously tested and trained by
reinforcing stimuli: Agents are presented with one of the
two edible substances, and decide if they want to eat it or
not. If they decide to eat it, they get a reinforcing feedback
in the next time step indicating whether the food was poi-
sonous or edible. This way, there is no separate training and
test phases: In each time step, agents receive a new food
item and any reinforcing feedback from their previous de-
cision. Agents receive a fitness point whenever eating the
edible substance, and lose a point whenever eating the poi-
sonous one.
In the original experiment, the best action fixity (p) was
regulated to make the best action change at most once per
generation. In our experiment, this is not enough: That
change rate will not allow us to see the full range of evolved
plasticity levels necessary to observe the Goldilocks princi-
ple. Instead, we make the environment change periodically
– meaning we can regulate the rate of change freely, by ad-
justing the average length of stable periods.
This setup is illustrated at the bottom of Figure 3. Com-
pared to Dunlap and Stephens’ original setup, it differs in
Pineapple
Reward
Punishment
OutputEat
Orange
Figure 5: A neural network learning the association from
food items to edibleness with continuous reinforcement.
Rounded rectangles represent neurons. The dashed lines are
learning links that are modified by neuromodulated Hebbian
learning. The solid lines are neuromodulatory connections.
Inputs are binary, representing the presence or absence of
the relevant stimulus. The output at any time gives a prefer-
ence score for the current fruit. Any fruit preferred above a
threshold of 0.5 is “eaten”.
two important ways: 1) There is no separate experience and
consequence phase. Instead individuals constantly adapt to
environmental changes, and are fitness evaluated throughout
life. 2) Environmental changes are periodic, allowing us to
regulate their rate between the two extremes of no stability
and full environmental stability.
The network used to learn this task (Figure 5) uses neu-
romodulated reinforcement (Soltoggio et al. (2007); Ellef-
sen (2013)) to form preferences. The neuromodulatory input
changes the learning rate temporarily for the links it targets,
allowing efficient reinforcement learning.
Parameters
The experimental parameters from Dunlap and Stephens’
experiment, fixity of best action (p) and reliability of expe-
rience (q) are also the most important parameters here. In
the different experimental setups we employ, phas slightly
different meanings as a natural consequence of the differ-
ences between the setups. These differences are explained
here. qin all cases has the same meaning as in the origi-
nal experiment by Dunlap and Stephens: “The chance that
the reinforcing signal (quinine in Figure 4, reward and pun-
ishment in Figure 5) is correct.”. In other words, this sig-
nals to which degree the individual can successfully learn
the task (and thus get a high fitness score) by following the
reinforcing feedback. This parameter ranges from 0.5 to 1,
the lowest value meaning the signal is correct 50% of the
time (making it useless for this binary decision task), and 1
meaning it is correct 100% of the time.
In the basic setup,pequals the chance that the fruit yield-
ing the high fitness will be the orange. It varies from 0.5 to
1, as the values from 0 to 0.5 would be symmetrical. This is
identical to its role in the original experiment by Dunlap and
Stephens. High values of pindicate a stable environment,
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
where genetically encoded strategies may be most benefi-
cial.
In the multiple-decision setup,pstill signals the chance
that the fruit yielding the high fitness will be the orange.
However, since we have multiple experience and conse-
quence phases here, this value (as well as q) is tested 25
times per individual instead of once.
In the periodically changing setup, psignals the stability
period of the environment measured in generations. A value
of 1 means the environment changes once each generation.
This is similar to a p-value of 0.5in the basic setup, but
not identical as we are now dealing with periodic instead of
random changes. A value above 1 means the environment
changes more seldom than each generation. This is simi-
lar to a p-value between 0.5and 1in the basic setup, values
closer to 1 corresponding to a longer period of environmen-
tal stability. Stability periods below 1 indicate several envi-
ronmental changes per generation. For instance, p= 0.25
means we have 4 changes each generation. The situation
with changes within a generation was not considered in the
original setup by Dunlap and Stephens.
Neural network learning
As discussed above, we utilize two neural network models
in this work, one employing purely Hebbian learning, and
the other neuromodulated learning. The former (Figure 4)
utilizes a simple rule that increases the connection strength
between co-active neurons:
wij =ηxixj(1)
where ηis the evolved learning rate and xixjis the prod-
uct of pre-synaptic and post-synaptic activity. This weakens
the relevant connection if the quinine is active (it supplies an
inhibitory input, making xjnegative), and strengthens it oth-
erwise. This produces similar associative learning dynamics
as in the experiment by Dunlap and Stephens.
The evolved neuromodulated links (Figure 5) update their
weights by the following equation:
wij =ηmod ∗ |xixj|(2)
This weight is updated by the absolute value of a Hebbian
term multiplied by the modulatory signal, mod. Thereby, the
modulation can regulate the direction of the weight change,
leading active links to weaken when eating punishers and
strengthen when eating rewarding food items.
Evolved variables
In all three experimental setups, three parameters of the neu-
ral network were evolved: the innate weight vector ( ~w), the
learning rate (η) and the weight decay per time step (λ). ~w
codes for the two weights from Pineapple and Orange
to OutputEat. Evolving ~w allows evolution to regulate
innate behaviors, thus removing the need to learn in static or
Parameter Value
Generations 50
Adults 50
Children 50
Crossover probability 0.01
Mutation probability 0.005
Genes per individual 4
Bits per gene 8
Elite fraction 0.1
Culling fraction 0.1
Table 1: Parameters of the Evolutionary Algorithm
very slowly changing environments. ηis the parameter we
are mainly interested in measuring – it tells us which learn-
ing efforts are evolved for each environmental setup. Finally,
λis present for the experiments where the optimal behavior
changes within individual lifetimes, to allow regulation of
how rapidly behaviors are forgotten. Regulating λand ηto-
gether lets evolution tune the forgetting and re-learning of
behaviors to find an optimal level of plasticity. All parame-
ters were encoded and evolved as 8-bit genes translated into
floating-point values for fitness evaluation. The ranges for
the evolved variables were: weights [0,1], η[-1,1] (negative
learning rates were rounded up to zero, allowing evolution
to easily remove learning completely) and λ[0,0.5].
Evolutionary setup
Table 1 gives the parameters of the applied evolutionary
algorithm. Crossover probability gives the probability of
crossover per individual, and mutation probability gives the
probability of mutation per bit in the individuals’ genotype.
The values coded for by the four genes are ~w (2 genes), η
and λ. See the previous section for their meaning.
Results
The basic experiment
The first thing we wanted to investigate, was whether Dun-
lap and Stephens’ pq-model for learning was correct for their
suggested fruit fly experiment. Dunlap and Stephens con-
firmed their model through experiments on real fruit flies
for two extremes of the model: (p= 0.5, q = 1.0) and
(p= 1, q = 0.5). We wanted to investigate evolved learning
efforts also for the rest of the possible p- and q-values.
To do this, we systematically varied p- and q-values, and
evolved agents for 50 generations. 20 runs were done for
each parameter combination, and the average learning rate
of all agents in the final generation was recorded (Figure 6).
Dunlap and Stephens’ model proposed that the border be-
tween the learning region and the non-learning region would
be at p=q, and the evolved individuals lend support to their
hypothesis (Figure 6a). Their model makes no prediction for
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.5
0.6
0.7
0.8
0.9
1
P: Fixity of Best Action
0.00
0.12
0.24
0.36
0.48
(a) No Cost
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.5
0.6
0.7
0.8
0.9
1
P: Fixity of Best Action
0.00
0.02
0.03
0.05
(b) Cost 0.05
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.5
0.6
0.7
0.8
0.9
1
P: Fixity of Best Action
0.00
0.02
0.03
0.05
0.06
(c) Cost 0.1
Figure 6: The evolved plasticity levels under the basic experimental setup. Colors correspond to evolved learning rates (η). The
colors are scaled differently for different figures, to emphasize their internal variations. – Results averaged over 20 runs
the case where learning is associated with a cost, but our re-
sults (Figure 6b and 6c) indicate that in this case, the border
will shift, making the non-learning region larger.
The results for the individuals evolved with a cost of
learning are quite noisy, due in part to the very low levels
of learning effort that were evolved. The reason the levels
can be so low is that, in this experiment, 49 presentations
of stimuli are available to learn from before being tested. In
the rest of our experiments, this simplifying assumption (of
a long training period before testing) is removed.
Multiple decisions
Extending the experience to multiple decisions has these
beneficial effects:
1. It makes the experiment more realistic – natural organ-
isms are likely to have their survival- and reproduction
ability depend on several decisions rather than a single
one.
2. It makes the evolutionary search proceed more efficiently,
as it is now possible to identify intermediate fitness lev-
els. In other words, there are more “stepping stones” for
evolution to visit on the way to a good solution.
The results (Figure 7) are similar to those seen before, but
more clean and systematic. In particular, the results for situ-
ations where plasticity is associated with a cost are different.
In the basic setup, these plasticity levels were very low, but
here, we see they have a magnitude comparable to the situa-
tion without costs. We now see clearly that the learning/no-
learning border is shifted as a result of the cost.
Periodic changes
Changing the environment to contain periodic changes al-
lows us to smoothly alter the best-action fixity within all
values imaginable: from a completely stable environment
to one that changes constantly. The periodic changes also
imply that the environment has a state which the agent can
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.5
0.6
0.7
0.8
0.9
1
P: Fixity of Best Action
0.00
0.16
0.32
0.48
0.64
(a) No Cost
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.5
0.6
0.7
0.8
0.9
1
P: Fixity of Best Action
0.00
0.16
0.32
0.48
0.64
(b) Cost 0.05
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.5
0.6
0.7
0.8
0.9
1
P: Fixity of Best Action
0.00
0.16
0.32
0.48
0.64
(c) Cost 0.1
Figure 7: The evolved plasticity levels under the setup with multiple decisions. Colors correspond to evolved learning rates (η).
The colors are scaled differently for different figures, to emphasize their internal variations. – Results averaged over 20 runs
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.05
0.075
0.1
0.25
0.5
1
2.5
5
10
20
40
P: Generations Between Changes
0.00
0.16
0.32
0.48
(a) No Cost
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.05
0.075
0.1
0.25
0.5
1
2.5
5
10
20
40
P: Generations Between Changes
0.00
0.16
0.32
0.48
(b) Cost 0.05
0.5 0.6 0.7 0.8 0.9 1
Q: Reliability of Experience
0.05
0.075
0.1
0.25
0.5
1
2.5
5
10
20
40
P: Generations Between Changes
0.00
0.10
0.20
0.30
0.40
(c) Cost 0.1
Figure 8: The evolved plasticity levels under the setup with periodic environmental changes. Colors correspond to evolved
learning rates (η). The colors are scaled differently for different figures, to emphasize their internal variations. The value p
discussed here is defined differently than the one in other plots: It now signals the average number of generations between
environmental changes. – Results averaged over 20 runs
learn. There was a state also in the previous environment,
but it lasted only for a single experience-consequence phase
pair. After one set of phases the environment was changed
randomly. In other words, there was no reason for the agent
to learn lasting associations – rather, it was encouraged to
do learning only by the somewhat artificial variation we im-
posed between experience and consequence phases.
Evolved learning rates in this environment (Figure 8) are
not directly comparable to the ones in the two previous ex-
periments, because the separation between the experience-
and consequence phase in those experiments made it impos-
sible to realize an environment that was changing too fast
to learn. The association observed in the experience phase
would always last until the consequence phase, and could
therefore always be learned.
Although the experiments are not directly comparable, we
can observe some relationships between the basic experi-
ment and the one with periodic changes. In the basic exper-
iment, a p-value of 0.5means the environment will have a
50 % chance of changing each generation. This corresponds
roughly to a stability period of 1-2 generations. Increasing
the stability period from 1corresponds to increasing pin the
basic experiment, making the best action more stable. As
seen in Figures 6 and 7, this has the effect of lowering the
evolved learning rates. The same is observed in Figure 8 for
p-values increasing above 1. In fact, the top half of Figure
8a (the part above p= 1) is quite similar to Figures 6a and
7a.
In light of this comparison, we can see what the original
experiment by Dunlap and Stephens was missing: environ-
ments with a lower stability period than a single generation.
For such environments (Figure 8), decreasing the best-action
fixity substantially will select against learning, as a result
of learned associations more rapidly losing their utility. In
other words, the benefits of learning decrease as the amount
of time learned associations can be exploited decreases.
Summarizing, we can see that the revised pq-model (Fig-
ure 9) of learning has the following properties:
1. A range of best-action fixity levels select for learning.
For environments where the best action changes too fre-
quently or too infrequently, learning is selected against.
2. An increased reliability of experience selects for learning.
3. Plasticity levels increase the further away from the non-
learning zone we are.
Conclusion
Through simulated evolution of learning individuals under
different levels of environmental change we have studied
and unified two seemingly conflicting models of the evo-
lution of learning ability.
The pq-model captures two important properties of the
evolution of learning: Learning requires 1) environmen-
tal changes and 2) reliable experiences to evolve. The
Goldilocks principle model suggests that a range of environ-
mental variability selects for learning, while both too stable
and too variable environments select against it.
Initially, we saw results lending support to the pq-model
for the original experiment it was designed for. We later
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
Reliability of Experience (q)
Fixity of Best Action (p)
0.5
Min
1.0
Max
No
Learning
No
Learning
Learning
Figure 9: The revised pq-model (see Figure 1 for original)
of how learning relates to environmental change. Darker
shades of gray indicate higher levels of plasticity.
showed that the model also holds for more realistic scenar-
ios and for scenarios where learning has a cost. Applying
an explicit cost to learning ability moved the “neutral re-
gion” of learning, making more areas in the model select
for non-learning strategies, as one might expect. To study
the relationship between the pq-model and Goldilocks prin-
ciple, we needed to generalize the experiment Dunlap and
Stephens originally proposed and study the full range of en-
vironmental change, from full stability to constant changes.
The results led us to suggest a revised pq-model (Figure 9).
In conclusion, we want to emphasize that this model by
no means tells the full truth about the relationship between
environmental variability and evolved learning efforts. The
model shows one part of the truth, but it leaves out sev-
eral issues known to be important to this relationship, such
as age-varying learning efforts (Knudsen (2004)) and the
complex interaction between agents and their environments
(Beer (1995)). Further studies of these and other related top-
ics will increase our understanding of how learning evolves,
and how the evolved learning is influenced by environmental
properties.
References
Beer, R. D. (1995). A dynamical systems perspective on agent-
environment interaction. Artificial Intelligence, 72(1-2):173–
215.
DeWitt, T. J., Sih, A., and Wilson, D. S. (1998). Costs and lim-
its of phenotypic plasticity. Trends in Ecology & Evolution,
13(2):77–81.
Dukas, R. (1998). Evolutionary ecology of learning. In Cognitive
ecology: The evolutionary ecology of information processing
and decision making, pages 129–174. University of Chicago
Press, Chicago.
Dunlap, A. S. and Stephens, D. W. (2009). Components of
change in the evolution of learning and unlearned prefer-
ence. Proceedings of the Royal Society B: Biological Sci-
ences, 276(1670):3201–3208.
Ellefsen, K. O. (2013). Balancing the Costs and Benefits of Learn-
ing Ability. In Advances in Artificial Life, ECAL 2013: Pro-
ceedings of the Twelfth European Conference on the Synthesis
and Simulation of Living Systems, pages 292–299.
Floreano, D. and Urzelai, J. (2001). Evolution of Plastic Control
Networks. Autonomous Robots, 11(3):311–317.
Hebb, D. O. (1949). The Organization of Behavior. Wiley and
Sons, New York.
Kerr, B. and Feldman, M. (2003). Carving the Cognitive Niche:
Optimal Learning Strategies in Homogeneous and Hetero-
geneous Environments. Journal of Theoretical Biology,
220(2):169–188.
Knudsen, E. I. (2004). Sensitive Periods in the Development of
the Brain and Behavior. Journal of Cognitive Neuroscience,
16(8):1412–1425.
Littman, M. (1995). Simulations Combining Evolution and Learn-
ing. In Adaptive Individuals in Evolving Populations: Models
and Algorithms: Santa Fe Institute Studies in the Sciences of
Complexity, pages 465–477. Addison-Wesley.
Mayley, G. (1996). Landscapes, Learning Costs, and Genetic As-
similation. Evolutionary Computation, 4(3):213–234.
Mery, F. and Kawecki, T. (2002). Experimental Evolution of
Learning Ability in Fruit Flies. Proceedings of the National
Academy of Sciences, 99(22):14274–14279.
Nolfi, S. and Parisi, D. (1996). Learning to Adapt to Changing
Environments in Evolving Neural Networks. Adaptive Be-
havior, 5(1):75–98.
Nolfi, S., Parisi, D., and Elman, J. L. (1994). Learning and Evolu-
tion in Neural Networks. Adaptive Behavior, 3(1):5–28.
Sasaki, T. and Tokoro, M. (1999). Evolving Learnable Neu-
ral Networks under Changing Environments with Various
Rates of Inheritance of Acquired Characters: Comparison be-
tween Darwinian and Lamarckian Evolution. Artificial Life,
5(3):203–223.
Soltoggio, A., D¨
urr, P., Mattiussi, C., and Floreano, D. (2007).
Evolving neuromodulatory topologies for reinforcement
learning-like problems. In Proceedings of the IEEE Congress
on Evolutionary Computation 2007, pages 2471–2478. IEEE.
Todd, P. M. and Miller, G. F. (1991). Exploring adaptive agency
II: Simulating the evolution of associative learning. In From
Animals to Animats: Proceedings of the First International
Conference on Simulation of Adaptive Behavior, pages 306–
315, Cambridge, MA. MIT Press/Bradford Books.
Watson, J. and Wiles, J. (2002). The rise and fall of learning: a
neural network model of the genetic assimilation of acquired
traits. In Proceedings of the 2002 Congress on Evolutionary
Computation., pages 600–605. IEEE.
ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems
... Other implementations include the use of sine waves to determine environmental fluctuations over time, or variations on this theme (Borg & Channon, 2012;Grove, 2014Grove, , 2018Khan et al., 2020;Stanton, 2018;Stanton & Channon, 2013). A final common approach is to predetermine a series of states which an environment could be in, and move between these states given a certain frequency, this frequency determining the difficulty or harshness of environmental change (Asakura et al., 2015;Canino-Koning et al., 2016;Ellefsen, 2014;Nolfi & Parisi, 1996;Ofria & Lalejini, 2016;Wilder & Stanley, 2015). One thing is common across all of these approaches, they all rarely consider pink noise, or ground the method of environmental uncertainty in empirical observations. ...
... The application of environmental variability or noise is used to address a number of questions concerned with the affects of environmental uncertainty. The most common amongst these are whether changing environments affect: the emergence of phenotypic plasticity (Kouvaris et al., 2017;Wilder & Stanley, 2015) and evolvability (Canino-Koning et al., 2016;Ofria & Lalejini, 2016;Steiner, 2012); the evolution of robust controllers in 3D virtual creatures (Stanton, 2018;Stanton & Channon, 2013) and robots (physical and simulated) (Asakura et al., 2015;Bongard & Pfeifer, 2003;Jakobi, 1997;Jakobi et al., 1995); and the evolution of versatile adaptations such as learning (Ellefsen, 2014;Nolfi & Parisi, 1996) and social learning (Borg & Channon, 2012;Bullinaria, 2018). What is common amongst all these questions and domains is that they are seeking to understand the interaction between changing environments, specialist-generalist evolutionary dynamics, and adaptability. ...
Article
Simulations of evolutionary dynamics often employ white noise as a model of stochastic environmental variation. Whilst white noise has the advantages of being simply generated and analytically tractable, empirical analyses demonstrate that most real environmental time series have power spectral densities consistent with pink or red noise, in which lower frequencies contribute proportionally greater amplitudes than higher frequencies. Simulated white noise environments may therefore fail to capture key components of real environmental time series, leading to erroneous results. To explore the effects of different noise colours on evolving populations, a simple evolutionary model of the interaction between life-history and the specialism-generalism axis was developed. Simulations were conducted using a range of noise colours as the environments to which agents adapted. Results demonstrate complex interactions between noise colour, reproductive rate, and the degree of evolved generalism; importantly, contradictory conclusions arise from simulations using white as opposed to red noise, suggesting that noise colour plays a fundamental role in generating adaptive responses. These results are discussed in the context of previous research on evolutionary responses to fluctuating environments, and it is suggested that Artificial Life as a field should embrace a wider spectrum of coloured noise models to ensure that results are truly representative of environmental and evolutionary dynamics.
... It is unclear how the ability to learn first evolved Papini (2012), but its utility appears evident. Natural environments are too complex for all the necessary information to be hardcoded genetically Snell-Rood (2013) and more importantly, they keep changing during an organism's lifetime in ways that cannot be anticipated Ellefsen (2014); Dunlap and Stephens (2016). The link between learning and environmental uncertainty and fluctuation has been extensively demonstrated in both natural Kerr and Feldman (2003); Snell-Rood and Steck (2019), and artificial environments Nolfi and Parisi (1996). ...
Preprint
Full-text available
The evolutionary balance between innate and learned behaviors is highly intricate, and different organisms have found different solutions to this problem. We hypothesize that the emergence and exact form of learning behaviors is naturally connected with the statistics of environmental fluctuations and tasks an organism needs to solve. Here, we study how different aspects of simulated environments shape an evolved synaptic plasticity rule in static and moving artificial agents. We demonstrate that environmental fluctuation and uncertainty control the reliance of artificial organisms on plasticity. Interestingly, the form of the emerging plasticity rule is additionally determined by the details of the task the artificial organisms are aiming to solve. Moreover, we show that co-evolution between static connectivity and interacting plasticity mechanisms in distinct sub-networks changes the function and form of the emerging plasticity rules in embodied agents performing a foraging task.
... The application of environmental variability or noise is used to address a number of questions concerned with the affects of environmental uncertainty (Grove et al. 2022). The most common amongst these are whether changing environments affect: the emergence of phenotypic plasticity (Wilder & Stanley 2015;Kouvaris et al. 2017) and evolvability (Steiner 2012;Canino-Koning et al. 2016;Ofria & Lalejini 2016); and the evolution of versatile adaptations such as learning (Nolfi & Parisi 1996;Ellefsen 2014) and social learning (Borg & Channon 2012;Bullinaria 2018). What is common amongst all these questions and domains is that they are seeking to understand the interaction between changing environments, specialist-generalist evolutionary dynamics, and adaptability. ...
... The first matter, the variance of information, has been the object of little study [SSR18]. The frequency of environment changes has been shown to impact the evolution of learning [Ell14], and changes at every generation has been shown to lead to phenotypic plasticity [OL16]. ...
Thesis
The biological brain is an ensemble of individual components which have evolved over millions of years. Neurons and other cells interact in a complex network from which intelligence emerges. Many of the neural designs found in the biological brain have been used in computational models to power artificial intelligence, with modern deep neural networks spurring a revolution in computer vision, machine translation, natural language processing, and many more domains. However, artificial neural networks are based on only a small subset of biological functionality of the brain, and often focus on global, homogeneous changes to a system that is complex and locally heterogeneous. In this work, we examine the biological brain, from single neurons to networks capable of learning. We examine individually the neural cell, the formation of connections between cells, and how a network learns over time. For each component, we use artificial evolution to find the principles of neural design that are optimized for artificial neural networks. We then propose a functional model of the brain which can be used to further study select components of the brain, with all functions designed for automatic optimization such as evolution. Our goal, ultimately, is to improve the performance of artificial neural networks through inspiration from modern neuroscience. However, through evaluating the biological brain in the context of an artificial agent, we hope to also provide models of the brain which can serve biologists.
... With timescales comparable to a lifetime, evolution may lead to phenotypic plasticity, which is the capacity for a genotype to express different phenotypes in response to different environmental conditions (Lalejini and Ofria, 2016). The frequency of environmental changes was observed experimentally in plastic neural networks to affect the evolution of learning (Ellefsen, 2014), revealing a complex relationship between environmental variability and evolved learning. A focus on the deceptiveness of evolving to learn is presented in ; Lehman and Miikkulainen (2014). ...
Article
Full-text available
Biological neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence, but the complexity of the whole system of interactions is an obstacle to the understanding of the key factors at play. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks, artificial systems composed of sensors, outputs, and plastic components that change in response to sensory-output experiences in an environment. These systems may reveal key algorithmic ingredients of adaptation, autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now setting the conditions for radically new approaches and results. In particular, the limitations of hand-designed structures and algorithms currently used in most deep neural networks could be overcome by more flexible and innovative solutions. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main computational methods and results are reviewed. Finally, new opportunities and developments are presented.
Article
Full-text available
The costs and limits of phenotypic plasticity are thought to have important ecological and evolutionary consequences, yet they are not as well understood as the benefits of plasticity. At least nine ideas exist regarding how plasticity may be costly or limited, but these have rarely been discussed together. The most commonly discussed cost is that of maintaining the sensory and regulatory machinery needed for plasticity, which may require energy and material expenses. A frequently considered limit to the benefit of plasticity is that the environmental cues guiding plastic development can be unreliable. Such costs and limits have recently been included in theoretical models and, perhaps more importantly, relevant empirical studies now have emerged. Despite the current interest in costs and limits of plasticity, several lines of reasoning suggest that they might be difficult to demonstrate.
Conference Paper
Full-text available
We study the costs and benefits of plasticity by evolving agents in environments with different rates of environmental change. Evolution allows both hard-coded strategies and learned strategies, with learning rates varying throughout life. We observe a range of change rates where the balance of costs and benefits are just right for evolving learning. Inside this range, we see two separate strategies evolve: lifelong plasticity and sensitive periods of plasticity. Sensitive periods of plasticity are found to reduce the learning cost while retaining the benefits of learning. This affects the evolutionary process, by limiting genetic assimilation of learned characteristics, making agents able to remain adaptive after relatively long periods of environmental stability.
Conference Paper
Full-text available
Environments with varying reward contingencies constitute a challenge to many living creatures. In such conditions, animals capable of adaptation and learning derive an advantage. Recent studies suggest that neuromodulatory dynamics are a key factor in regulating learning and adaptivity when reward conditions are subject to variability. In biological neural networks, specific circuits generate modulatory signals, particularly in situations that involve learning cues such as a reward or novel stimuli. Modulatory signals are then broadcast and applied onto target synapses to activate or regulate synaptic plasticity. Artificial neural models that include modulatory dynamics could prove their potential in uncertain environments when online learning is required. However, a topology that synthesises and delivers modulatory signals to target synapses must be devised. So far, only handcrafted architectures of such kind have been attempted. Here we show that modulatory topologies can be designed autonomously by artificial evolution and achieve superior learning capabilities than traditional fixed-weight or Hebbian networks. In our experiments, we show that simulated bees autonomously evolved a modulatory network to maximise the reward in a reinforcement learning-like environment.
Article
Full-text available
Several phenomena in animal learning seem to call for evolutionary explanations, such as patterns of what animals learn and do not learn. While several models consider how evolution should influence learning, we have very little data testing these models. Theorists agree that environmental change is a central factor in the evolution of learning. We describe a mathematical model and an experiment, testing two components of change: reliability of experience and predictability of the best action. Using replicate populations of Drosophila we varied statistical patterns of change across 30 generations. Our results provide the first experimental demonstration that some types of environmental change favour learning while others select against it, giving the first experimental support for a more nuanced interpretation of the selective factors influencing the evolution of learning.
Article
Evolutionary Robotics is a powerful method to generate efficient controllers with minimal human intervention, but its applicability to real-world problems remains a challenge because the method takes long time and it requires software simulations that do not necessarily transfer smoothly to physical robots. In this paper we describe a method that overcomes these limitations by evolving robots for the ability to adapt on-line in few seconds. Experiments show that this method require less generations and smaller populations to evolve, that evolved robots adapt in a few seconds to unpredictable change-including transfers from simulations to physical robots- and display non-trivial behaviors. Robots evolved with this method can be dispatched to other planets and to our homes where they will autonomously and quickly adapt to the specific properties of their environments if and when necessary.
Article
Using the language of dynamical systems theory, a general theoretical framework for the synthesis and analysis of autonomous agents is sketched. In this framework, an agent and its environment are modeled as two coupled dynamical systems whose mutual interaction is in general jointly responsible for the agent's behavior. In addition, the adaptive fit between an agent and its environment is characterized in terms of the satisfaction of a given constraint on the trajectories of the coupled agent-environment system. The utility of this framework is demonstrated by using it to first synthesize and then analyze a walking behavior for a legged agent.
Article
The evolution of a population can be guided by phenotypic traits acquired by members of that population during their lifetime. This phenomenon, known as the Baldwin Effect, can speed the evolutionary process as traits that are initially acquired become genetically specified in later generations. This paper presents conditions under which this genetic assimilation can take place. As well as the benefits that lifetime adaptation can give a population, there may be a cost to be paid for that adaptive ability. It is the evolutionary trade-off between these costs and benefits that provides the selection pressure for acquired traits to become genetically specified. It is also noted that genotypic space, in which evolution operates, and phenotypic space, on which adaptive processes (such as learning) operate, are, in general, of a different nature. To guarantee an acquired characteristic can become genetically specified, then these spaces must have the property of neighbourhood corr...
Article
The processes of adaptation in natural organisms consist of two complementary phases: learning, occurring within each individual's lifetime, and evolution, occurring over successive generations of the population. In this article, we study the relationship between learning and evolution in a simple abstract model, where neural networks capable of learning are evolved using genetic algorithms (GAs). Individuals try to maximize their life energy by learning certain rules that distinguish between two groups of materials: food and poison. The connective weights of individuals' neural networks undergo modification, that is, certain characters will be acquired, through their lifetime learning. By setting various rates for the heritability of acquired characters, which is a motive force of Lamarckian evolution, we observe adaptational processes of populations over successive generations. Paying particular attention to behaviors under changing environments, we show the following results. Populations with lower rates of heritability not only show more stable behavior against environmental changes, but also maintain greater adaptability with respect to such changing environments. Consequently, the population with zero heritability, that is, the Darwinian population, attains the highest level of adaptation to dynamic environments.