ArticlePDF Available

Learning and Evolution: Factors Influencing an Effective Combination

Authors:

Abstract

(1) Background: The mutual relationship between evolution and learning is a controversial argument among the artificial intelligence and neuro-evolution communities. After more than three decades, there is still no common agreement on the matter. (2) Methods: In this paper, the author investigates whether combining learning and evolution permits finding better solutions than those discovered by evolution alone. In further detail, the author presents a series of empirical studies that highlight some specific conditions determining the success of such combination. Results are obtained in five qualitatively different domains: (i) the 5-bit parity task, (ii) the double-pole balancing problem, (iii) the Rastrigin, Rosenbrock and Sphere optimization functions, (iv) a robot foraging task and (v) a social foraging problem. Moreover, the first three tasks represent benchmark problems in the field of evolutionary computation. (3) Results and discussion: The outcomes indicate that the effect of learning on evolution depends on the nature of the problem. Specifically, when the problem implies limited or absent agent–environment conditions, learning is beneficial for evolution, especially with the introduction of noise during the learning and selection processes. Conversely, when agents are embodied and actively interact with the environment, learning does not provide advantages, and the addition of noise is detrimental. Finally, the absence of stochasticity in the experienced conditions is paramount for the effectiveness of the combination. Furthermore, the length of the learning process must be fine-tuned based on the considered task.
Citation: Pagliuca, P. Learning and
Evolution: Factors Influencing an
Effective Combination. AI 2024,5,
2393–2432. https://doi.org/
10.3390/ai5040118
Received: 3 September 2024
Revised: 25 October 2024
Accepted: 12 November 2024
Published: 15 November 2024
Copyright: © 2024 by the author.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
AI
Article
Learning and Evolution: Factors Influencing
an Effective Combination
Paolo Pagliuca
Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR), Via Giandomenico
Romagnosi 18A, 00196 Rome, Italy; paolo.pagliuca@istc.cnr.it
Abstract: (1) Background: The mutual relationship between evolution and learning is a controversial
argument among the artificial intelligence and neuro-evolution communities. After more than three
decades, there is still no common agreement on the matter. (2) Methods: In this paper, the author
investigates whether combining learning and evolution permits finding better solutions than those
discovered by evolution alone. In further detail, the author presents a series of empirical studies that
highlight some specific conditions determining the success of such combination. Results are obtained
in five qualitatively different domains: (i) the 5-bit parity task, (ii) the double-pole balancing problem,
(iii) the Rastrigin, Rosenbrock and Sphere optimization functions, (iv) a robot foraging task and (v) a
social foraging problem. Moreover, the first three tasks represent benchmark problems in the field
of evolutionary computation. (3) Results and Discussion: The outcomes indicate that the effect of
learning on evolution depends on the nature of the problem. Specifically, when the problem implies
limited or absent agent–environment conditions, learning is beneficial for evolution, especially with
the introduction of noise during the learning and selection processes. Conversely, when agents are
embodied and actively interact with the environment, learning does not provide advantages, and the
addition of noise is detrimental. Finally, the absence of stochasticity in the experienced conditions is
paramount for the effectiveness of the combination. Furthermore, the length of the learning process
must be fine-tuned based on the considered task.
Keywords: evolution; learning; memetic algorithms; stochastic hill-climbing; evolutionary strategies
1. Introduction
The interplay between learning and evolution has been studied for decades, but it
is still a very controversial topic. Despite the huge amount of work, to what extent the
interaction between learning and evolution actually fosters the development of successful
behaviors is still a matter of debate in the scientific community. Indeed, as reported in [
1
,
2
],
there exist some controversial arguments about the effect of learning on evolution. Some
studies revealed how learning accelerates evolution [
3
18
], while other works demonstrated
that learning does not provide any advantage on the course of evolution [1925].
As explained in [
26
], “Evolution and learning (or phylogenetic and ontogenetic adap-
tation) are two forms of biological adaptation that differ in space and time. Evolution is a
process of selective reproduction and substitution based on the existence of a population
of individuals displaying variability at the genetic level. Learning, instead, is a set of
modifications taking place within each single individual during its own lifetime. Evolution
and learning operate on different time scales. Evolution is a form of adaptation capable of
capturing relatively slow environmental changes that might encompass several generations
(e.g., the perceptual characteristics of food sources for a given species). Learning, instead,
allows an individual to adapt to environmental modifications that are unpredictable at
the generational level. Learning might include a variety of mechanisms that produce
adaptive changes in an individual during its lifetime, such as physical development, neural
maturation, variation of the connectivity between neurons, and synaptic plasticity. Finally,
AI 2024,5, 2393–2432. https://doi.org/10.3390/ai5040118 https://www.mdpi.com/journal/ai
AI 2024,52394
whereas evolution operates on the genotype, learning affects only the phenotype and
phenotypic modifications cannot directly modify the genotype”. Therefore, learning and
evolution provide two alternative frameworks [
27
] to understand the adaptive changes
allowing evolving agents to behave more effectively based on the particular environment
they are situated in. However, learning and evolution might concur in the development
of complex behaviors. The first work trying to highlight the positive effect of learning on
evolution is presented in [
8
]. The authors considered a simple experimental setting, where
genotype and phenotype representations are trivial and their relationship is immediate.
The genotype is a string of bits, while the phenotype is a neural network. Given a genotype,
a 1-bit corresponds to the presence of a particular connection in the network. Conversely,
a 0-bit means that the corresponding connection is absent. In the abstract task used by
the authors, only a specific combination of genes (i.e., a genotype with all 1-bits) obtains a
fitness score of 1, whereas all other genotypes receive a fitness score of 0. To study the effect
of the combination of evolution and learning, the authors considered a control situation in
which the alleles could also assume a “*” value and learning operated by simply assigning
random values to these alleles. The collected results demonstrated how the combination
of learning and evolution permits finding the solution of the problem, while the use of
learning or evolution alone fails. To date, however, this method has not been successfully
applied to realistic problems, such as, for example, the evolution of robots selected based
on their capability to solve a problem that cannot be addressed by using evolution alone.
The seminal work [
8
] has been a source of inspiration for many other works. Some
studies stated that the combination of evolution and learning is advantageous compared
to the application of single approaches [
3
,
6
,
7
,
9
,
16
,
17
,
28
30
], while others showed that
learning actually decelerates evolution [
19
22
,
24
,
25
,
31
]. Moreover, some works focused on
the analysis of the benefits and limitations of plasticity/learning or the conditions under
which learning accelerates/decelerates evolution [
32
40
]. In spite of the huge amount of
publications arguing that learning has a beneficial/detrimental effect on evolution, there
are aspects that have not yet been considered and require a deeper analysis. In particular,
two weak points can be found in most of the cited studies. First, the majority of these
works do not consider the computational costs, a crucial factor in determining the supe-
riority/inferiority of a method over another. Evaluating different algorithms under the
same evolutionary conditions and for the same duration is pivotal to shed light on and
untangle this topic. Second, most of the conducted studies have been carried out by using
trivial and often abstract problems specifically designed to test the proposed algorithms.
Instead, the main goal of this work is to verify whether the combination of learning and
evolution might be advantageous over traditional approaches in widespread and more
challenging domains.
In this work, the author introduces a new technique, called Stochastic Steady State
with Hill-Climbing (SSSHC), which combines an Evolutionary Algorithm (EA), the Stochas-
tic Steady State (SSS) [
41
], with an optimization method performing solution modifications.
The SSSHC algorithm belongs to the family of Memetic Algorithms (MAs) [
42
,
43
]. MAs
operate at a population level like traditional EAs, but they allow the refinement of solu-
tions through a local search technique retaining adaptive modifications. This enables to
improve the quality of the solutions found and helps to avoid the premature convergence
issue. In [
44
], the authors demonstrate the superiority of MAs over EAs in the hurdle
problems [
45
]. A memetic algorithm combining a classic genetic algorithm (GA) [
46
] and
hill-climbing (HC) [
47
] as a refinement strategy has been proposed in [
48
]. It is worth high-
lighting that the term “learning” here denotes a refinement process consisting of repeatedly
generating modifications of current solutions, verifying whether such changes produce
an advantage and retaining positive modifications. Specifically, the learning process used
in the presented experiments is obtained through a stochastic hill-climbing process [
49
],
exploring the search space and looking for adaptive solutions. Variations concurring with
such adaptive traits are inherited in the population. The reader can imagine this process
as a form of Lamarckian learning [
50
,
51
]. Lamarckian learning has been successfully used
AI 2024,52395
for solving function optimization problems [
17
], evolving solutions in multi-agent envi-
ronments [
51
], training recurrent neural networks [
52
,
53
], training convolutional neural
networks for image classification [
54
] and evolving robot body and brain [
55
59
]. A com-
parison of the performance of GA and MA using both Baldwinian and Lamarckian learning
is reported in [
60
]. In addition, in [
61
] the authors show how the combination of evolution
and Lamarckian learning enables the discovery of more effective solutions than the single
approaches in two robotic scenarios. The presented examples indicate that, under particular
circumstances, learning promotes the identification of behaviors superior to those provided
through evolution.
The aim of this paper is to investigate whether the combination of learning and evo-
lution permit finding better solutions than evolution alone and what factors foster the
success or failure of such a combination. This has been examined in five different domains:
(i) the well-known 5 bit-parity task, (ii) the popular double-pole balancing problem [
62
],
a benchmark task largely used to compare different evolutionary algorithms [
41
,
63
66
],
(iii) the Rastrigin [
67
], Rosenbrock [
68
] and Sphere optimization functions, which constitute
benchmark problems in evolutionary computation and optimization [
69
,
70
], (iv) a robot for-
aging task and (v) a social foraging task already used to evaluate dynamics, adaptation and
emergent behaviors in two-agent systems [
71
73
]. The collected results indicate that the
combination of evolution and learning allows discovering better solutions than evolution
alone when (i) the task implies absent or limited agent–environment interactions and (ii) the
problem is deterministic, i.e., the experienced conditions are kept constant throughout
the whole evolutionary process. Under these circumstances, the addition of noise to the
learning process is advantageous and the computational cost required by the combination
of learning and evolution to find out a suitable solution is lower. Conversely, learning does
not improve the quality of solutions discovered by evolution when the considered problem
involves embodied agents interacting with the environment, and the addition of noise has
a detrimental effect. Finally, the addition of learning to evolution does not provide advan-
tages in the function optimization scenario. A noteworthy aspect concerns the fact that the
comparison has been performed by using the same parameter settings in order to avoid
biases in the results. In particular, different combinations of parameters have been system-
atically analyzed and the outcomes are provided in the Appendix Appendix A. The author
did not use automatic algorithm configuration methods [
74
,
75
], since the full list of param-
eters to be configured requires significantly more powerful computational resources and
goes beyond the scope of this research. In fact, not only the algorithm’s parameters but also
the controller’s parameters have to be optimized. Moreover, the considered problems have
already been addressed in the literature [
41
,
63
66
,
70
,
72
,
76
,
77
], and some of the parameters
are derived from these works.
The main contributions of this work can be summarized as follows:
an analysis of the factors promoting a successful combination of evolution and learning
is provided;
the conditions under which learning is not beneficial to evolution are investigated
and discussed;
a novel evolutionary algorithm, called Stochastic Steady State with Hill-Climbing
(SSSHC), is proposed and compared with the two state-of-the art methods constituting
it in five different experimental scenarios;
the combination of learning and evolution is beneficial when there are limited or absent
agent–environment interactions and the experienced conditions do not change during
evolution. Under these circumstances, the addition of noise to both the selection and
the learning processes provides advantages to evolution;
learning does not improve the solutions found by evolution when the considered prob-
lem involves embodied agents actively interacting with the environment. Moreover,
in these scenarios, the addition of learning has a detrimental effect;
learning is not effective at increasing the performance of solutions discovered by
evolution in the function optimization setting.
AI 2024,52396
In the next section, the author presents the methodology used to investigate the factors
fostering the emergence of a beneficial effect of learning on evolution. Specifically, in
Section 2.1, the evolutionary tasks used in this work are illustrated, while Section 2.2
contains a description of the different evolutionary algorithms used. In Section 3, the
results of the performed analysis are presented. Section 4provides a discussion of the
experimental outcomes and identifies the main findings of the work. Finally, in Section 5,
the author draws his conclusions.
2. Materials and Methods
This section provides a thorough explanation of the methodology used to validate the
research hypothesis. In particular, the evolutionary tasks are described in Section 2.1, while
the algorithms are presented in Section 2.2.
2.1. Tasks
Three of the five considered tasks represent well-known benchmark problems in
evolutionary computation, while the other two problems involve the use of robotic agents
interacting with and modifying the environment in which they are situated. As stated in
Section 1, the main purpose is to verify to what extent learning is beneficial to evolution in
widely recognized and challenging domains.
The experiments were run using the Framework for Autonomous Robotics Simulation
and Analysis (FARSA) simulator [
78
,
79
], an open software tool widely used to implement
simulations for autonomous robot controllers [61,7173,8083].
2.1.1. 5-Bit Parity
Digital circuits (Figure 1) are systems computing logic functions, like the sum and/or
the multiplication of digital numbers. The input of these systems consists of two or more
binary (Boolean) values, while the output includes one or more binary values. A digital
circuit is made of several logic gates receiving two binary values in input and producing
one binary value in output, which is the result of an elementary logic function (e.g., AND,
OR, NAND, NOR) of the input. The input of each gate can come from either the input
pattern or the output of other gates. The types of wiring between gates and the logic
functions computed by each gate determine the logic function calculated by the circuit [
76
].
Figure 1. An example of a digital circuit: it consists of four logic gates, receives two inputs and
produces two outputs. On the right side, there are four symbols corresponding to the four types of
usable logic gates. The numbers 1–2 indicate the binary states constituting the inputs of the circuit.
The numbers 3–6 represent the outputs calculated by the associated gates. The circuit output is given
by the outputs of the two logic gates wired to the output units (in this example, the output is given
by gates 4 and 5). The lines indicate how gates are wired. Figure adapted from [76].
AI 2024,52397
Digital circuits can be either hardwired or simulated. In this work, the second scenario
was considered, in which circuits evolve: they are represented through genotypes encoding
the way gates are wired and logic function calculated by each gate. The ability of the
evolving circuits to approximate a certain function is evaluated by computing a fitness
score measuring the error between the circuit outputs and the corresponding target values.
In the experiments reported in this paper, the task is to evolve simulated digital circuits
for the ability to calculate a 5-bit even parity function (i.e., the capability of producing an
output equal to 1 when the number of 1-bits is even). The architecture of the evolving
circuits was derived from [
76
]: they are made of 400 logic gates divided into 20 layers of
20 gates. The evolution of circuits implementing a 5-bit parity function by using only OR,
AND, NAND and NOR logic gates represents a far from trivial problem [76,77,84].
The evaluation of circuits consists of measuring their ability to produce the desired
outputs for all the 2
n
(32 in this case, with n= 5) possible input patterns. The fitness is
computed according to Equation (1):
F=11
2n
2n
j=1OjEj(1)
In Equation (1), ndenotes the size of the input pattern of the circuit, jindicates the
generic input pattern (with
j[1, 2n]
),
Oj
represents the output produced by the circuit
when receiving the pattern j, and Ejis the expected output for pattern j.
A circuit evaluation requires one evolutionary step. The evolutionary process is
continued until the total number of performed steps reaches 10
8
, i.e., when around
108
32 =3.125 ×106circuits have been evaluated.
2.1.2. Double-Pole Balancing
The double-pole balancing problem, introduced in [
62
], is a benchmark task in which
two poles are attached to a mobile cart through passive hinge joints. The experimental
settings are provided in Table 1. The goal is to move the cart within the track in order to
prevent the poles from falling. An illustration of the problem is shown in Figure 2.
Table 1. Double-pole balancing parameters. All the values are expressed with their units of measure-
ment. The ratio between the size of the long pole and that of the short one is 10 to 1. The cart moves
only along one dimension x (see Figure 2).
Parameter Value
cart mass 1 kg
long pole length 1.0 m
long pole mass 0.5 kg
short pole length 0.1 m
short pole mass 0.05 kg
track length 4.8 m
In this work, the author analyzed only the non-Markovian version of the task, i.e., the
case in which no velocity input is provided and the system has to derive this information
internally. To this end, the agent’s controller is a recurrent neural network receiving four
sensory inputs (see Table 2).
AI 2024,52398
Figure 2. The double-pole balancing problem. Two poles (blue rectangles) are placed on a wheeled
cart (red rectangle, the wheels are represented by two dark gray circles), which can move only on the
x-axis in a limited track (light gray rectangle over which the cart is placed). The goal is to avoid both
poles falling and the cart exiting from the track.
Table 2. List of inputs of the non-Markovian double-pole balancing problem with their associated
ranges. The “-” symbol denotes the absence of the corresponding entry.
Input Description Range
xPosition of the cart on the track [2.4, 2.4]m
θ1Angle of the long pole [36, 36]
θ2Angle of the short pole [36, 36]
b=0.5 Constant bias -
The network output determines the force applied to the cart and is normalized in the
range
[10.0, 10.0]
N. The states of the x,
θ1
and
θ2
sensors are normalized in the
[0.5, 0.5]
,
5
13 ×π,5
13 ×π
and
5
13 ×π,5
13 ×π
ranges, respectively [
41
]. Equations (2)–(9) [
62
]
are used to compute the effective mass of the poles (Equation (2)), the acceleration of
the poles (Equation (3)), the acceleration of the cart (Equation (4)), the effective force on
each pole (Equation (5)), the next angle of the poles (Equation (6)), the velocity of the
poles (Equation (7)), the position of the cart (Equation (8)), and the velocity of the cart
(Equation (9)), respectively.
ˆ
mi=mi×13
4cos2(θi)(2)
.
θi=3
4li ..
x×cos(θi)+g×sin(θi)+µpi ×
.
θi
mi×li!(3)
..
x=F+N
i=0ˆ
Fi
M+N
i=0ˆ
Mi
(4)
ˆ
Fi=mi×li×
.
θi
2×sin(θi)
+3
4×mi×cos(θi)× µpi ×
.
θi
mi×li
+g×sin.
θi!(5)
x[t+1]=x[t]+τ×
.
x[t](6)
.
x[t+1]=.
x[t]+τ×
..
x[t](7)
AI 2024,52399
θ[t+1]=θ[t]+τ×
.
θ[t](8)
.
θ[t+1]=.
θ[t]+τ×
..
θ[t](9)
In Equations (2)–(9), xrepresents the position of the cart,
.
x
denotes the velocity of the
cart,
θi
is the angular position of the i-th pole, and
.
θi
indicates the angular velocity of the
i-th pole.
Analogously to [
41
,
85
], with the aim at fostering the discovery of robust solutions,
each controller has been evaluated for eight episodes differing with respect to the initial
state of the system. In particular, two experimental conditions were considered:
1.
“Fixed Initial States” condition, in which controllers are evaluated by using the initial
states provided in Table 3;
2.
“Randomly Varying Initial States” condition: in this scenario, at each trial the initial
states were randomized according to the ranges reported in Table 4.
Table 3. Values of the state variables (x,
.
x
,
θ1
,
.
θ1
,
θ2
,
.
θ2
) used to initialize the different episodes
(referred to as Epi, with idenoting the index of the episode) in the Fixed Initial States condition.
Variable Ep1Ep2Ep3Ep4Ep5Ep6Ep7Ep8
x1.944 1.944 0.0 0.0 0.0 0.0 0.0 0.0
.
x0.0 0.0 1.215 1.215 0.0 0.0 0.0 0.0
θ10.0 0.0 0.0 0.0 0.10472 0.10472 0.0 0.0
.
θ10.0 0.0 0.0 0.0 0.0 0.0 0.135088 0.135088
θ20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
.
θ20.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Table 4. Ranges used to randomize the initial values of the state variables in the Randomly Varying
Initial States condition.
Variable Min Max
x1.944 1.944
.
x1.215 1.215
θ10.10472 0.10472
.
θ10.135088 0.135088
θ20.10472 0.10472
.
θ20.135088 0.135088
In both experimental settings, an evaluation episode ends when one of the following
conditions is met:
the episode lasts 1000 steps;
the angle of the long pole goes out of the viable range (see Table 2);
the angle of the short pole exceeds its range (see Table 2);
the cart position is out of the track (see Table 2).
The performance (i.e., fitness) of the agent Fis computed according to Equation (10):
F=1
8
8
i=1
fi(10)
where
fi
—the fitness achieved by the agent in the i-th episode—is defined in Equation (11).
fi=t
1000 (11)
AI 2024,52400
In Equation (11), the variable tdenotes the number of steps the cart successfully keeps
both poles balanced. The evolutionary process is continued until the total number of
performed steps exceeds 5 ×107.
As far as the Randomly Varying Initial States condition is concerned, differently
from [
41
], this setup is used to verify the ability of the evolved controllers to deal with the
Fixed Initial States condition. In other words, the random initial states are only used to
make agents experience a large number of different conditions. This, in turn, should foster
their robustness [8587] and, therefore, their effectiveness at dealing with the initial states
defined in Table 3. The reason behind such design choice depends on the consideration
that the performance on the Randomly Varying Initial States condition is strongly biased
by chance; some controllers might experience very easy situations to cope with and achieve
high fitness scores, while others might suffer from hard initial conditions leading to final
poor performances. Evaluating the evolving agents on the Fixed Initial States condition
allows discriminating between truly effective and lucky controllers.
2.1.3. Optimization Functions
In this work, the Rastrigin [
67
], the Rosenbrock [
68
] and the Sphere optimization
functions were considered, which represent widely employed benchmark functions in
evolutionary computation and optimization [
66
,
70
,
88
,
89
]. Differently from the other con-
sidered problems, in this scenario, the goal is to minimize the value of the objective function.
In particular, given a vector
x=[x1, . . . , xn]
of size n, the problem can be generically
formulated as follows:
F=min
xRnFf name (x)(12)
where
f nam e
indicates the name of the function to be optimized (i.e., Rastrigin, Rosenbrock
and Sphere).
The Rastrigin function is defined according to Equation (13):
Frastrigin =
n
i=1xi210 ×cos(2πxi)+10(13)
The Rosenbrock function can be formulated according to Equation (14):
Frosenbrock =
n1
i=1100 xi+1xi22+(xi1)2(14)
Finally, the Sphere function is defined according to Equation (15):
Fsp here =
n
i=1
xi2(15)
A visual representation of the optimization functions is provided in Figure 3. As can
be seen, these functions are characterized by the existence of a single global optimum and
multiple local minima (see Figure 3). Consequently, discovering an effective solution is far
from trivial.
In the experiments reported in this work, individuals are evaluated and rewarded for
the ability to optimize the above described functions on vectors xof size n= 20. Furthermore,
the length of the evolutionary process was set to 106evaluation steps.
AI 2024,52401
Figure 3. Two-dimensional visualization of the Rastrigin (top), Rosenbrock (middle) and Sphere
(bottom) functions. To facilitate understanding, the case of a two-dimensional input
x=[x1,x2]
is
considered. The red circle within each colormap denotes the optimum of the corresponding function.
2.1.4. Robot Foraging
The robot foraging task is a modified version of the swarm-foraging problem [
85
,
90
],
in which an e-puck robot [
91
] is situated in a squared arena of 2 m
×
2 m surrounded by
walls (see Figure 4). The environment contains five food elements. The goal for the agent is
AI 2024,52402
to forage the highest number of food items. The fitness function Fis defined as shown in
Equation (16):
F=1
5
5
i=1
n f i(16)
where
n f i
represents the number of food items eaten in the i-th episode. When the robot
manages to eat one food element, this suddenly reappears in a different random location.
The robot can perceive objects (i.e., food elements and walls) through a linear camera with a
field of view (FOV) of 90
split into six sectors (see Figure 4). Each sector has three channels
for red, green and, blue colors.
Figure 4. The robot foraging task. The robot (blue circle) lives in an environment surrounded by
walls (in green) and filled with five food elements (red cylinders). Both the robot and food items are
randomly placed in the environment.
The controller of the robot is a feed-forward neural network with 18 sensory inputs
(provided by the camera), eight internal neurons and two output motors determining the
speeds of the robot wheels. The network architecture was taken from similar experimental
settings [
71
,
72
]. The evolutionary process lasts 5
×
10
6
evaluation steps. Individuals are
evaluated in five episodes lasting 1000 steps.
2.1.5. Social Foraging
The social foraging problem is commonly used to analyze the complexity of dynamics
and the emergent behaviors in systems where two agents might interact with each other [
72
,
73
].
In this work, the formulation reported in [
72
] was used, in which two e-puck robots are
situated in a 2 m
×
2 m arena surrounded by walls. The environment contains one food
element (see Figure 5). The agents have to forage “socially”, that is, they can eat the food
item only if both manage to reach it. The goal for the robots is to forage the highest number
of food items. The fitness function Frewards agents at a group level and is defined as
for the robot foraging (see Equation (16)). It is worth pointing out that the definition of
the fitness function Fdoes not provide any information about how agents should behave.
In particular, there is no constraint pushing the robots to arrive simultaneously at the
food item. Effective solutions could entail that one agent moves toward the food element,
reaching it and waiting for the mate. An alternative strategy for the robots might be to look
for each other and reach the food item together.
Different from the work reported in [
72
,
73
], here, the two agents are homogeneous;
the neural network controller is the same for both robots. This is due to the fact that the
focus of this work is neither on the analysis of the group dynamics, nor on the ability to
adapt to the partner skills. Instead, the aim is to understand the factors fostering a positive
AI 2024,52403
influence of learning on evolution. The network structure is the same as defined for the
robot foraging task. Similarly, the length of the evolutionary process and the evaluation of
each individual are the same as used in the robot foraging problem.
Figure 5. The social foraging task. The robots (blue circles) live in an environment surrounded by
walls (in green) and filled with one food element (red cylinder). Both the robots and food item are
randomly placed in the environment.
2.2. Evolutionary Algorithms
Before focusing on the evolutionary algorithms used in this work, it is important to
underline the different ways of initializing the population depending on the nature of
the evolutionary problem. Concerning the double-pole balancing task, the optimization
of the Rastrigin, Rosenbrock and Sphere functions, the robot foraging problem and the
social foraging task, the initial population is encoded in a matrix of
µ×θ
integer numbers
randomly initialized with a uniform distribution in the range
[0, 255]
and then converted
into values in the range
[8.0, 8.0]
with respect to the double-pole balancing problem,
and in the range
[5.0, 5.0]
for both the function optimization scenario and the robot
foraging and the social foraging tasks. In particular,
µ
corresponds to the number of parents
and
θ
corresponds to the number of neural network’s parameters (i.e., connection weights
and biases). Offspring are obtained from their corresponding parents by mutating each
gene with a
MutRate
probability. In particular, when a mutation is performed, the original
gene is replaced with a new integer number randomly generated within the range
[0, 255]
with a uniform distribution. On the other hand, in the case of the 5-bit parity task, the
population contains individuals whose genotype is constituted by a vector of integer
numbers that encode the logic function calculated by each gate and the way gates are wired.
This method is called Cartesian Genetic Programming (CGP) [
77
]. More specifically, each
genotype includes 400
×
3
=
1200 genes specifying the characteristics of the nodes and
one additional gene representing the identification number of the node that is the output of
the circuit. The indices of the inputs and the indices of the nodes belong to the ranges [1–5]
and [6–406], respectively. Each node is made of three genes:
1. inp1
: index of the first input of the node. It is bounded in the range
[1, 5 +(L1)×20]
,
where Ldenotes the layer of the gate;
2. inp2: index of the second input of the node. It belongs to the same range as in p1;
3. f unc
: integer representing the logic function of the node (1 = OR, 2 = AND, 3 = NAND,
4 = NOR).
The value of the last gene encoding the index of the output belongs to the range
[6, 406]
. Mutations (with probability
MutRate
) are made by substituting each gene with a
value randomly extracted from a uniform distribution in the appropriate range.
AI 2024,52404
2.2.1. Stochastic Steady State
The first evolutionary algorithm considered in this work is a variant of the standard
steady-state evolutionary algorithm [92], called Stochastic Steady State [41].
The Stochastic Steady State (SSS) is a
(µ+µ)
evolutionary strategy [
93
,
94
] operating
on the basis of a population formed by
µ
parents. The pseudo-code of the algorithm is
provided in Algorithm 1. At each generation, parents are evaluated (Algorithm 1, line 7)
and generate one offspring each (Algorithm 1, lines 9–10), which, in turn, are evaluated
(Algorithm 1, line 12). The individuals are then ranked based on their fitness and the best
µ
ones are retained in the population (see Algorithm 1, line 14). Differently from previous
related methods, like the Steady State [
92
], in [
41
], the authors introduced the possibility of
adding noise to the fitness, thus making the selective process stochastic. The noise is a value
randomly chosen in the range
[NoiseRange,NoiseRange]
with a uniform distribution and
is applied to the individual’s fitness according to Equation (17):
Noiserand(NoiseRange,NoiseRange)
ˆ
F=F×(1.0 +Noise)(17)
Therefore, the application of noise enables shaping the selection process [
41
]. In fact,
the higher the noise, the higher the probability that less fit individuals reproduce. Differ-
ently from the original algorithm, the version used in this work (see Algorithm 1) was
modified by removing the solution refinement during the last
1
20
period of the evolutionary
process (see [41] (pp. 10–11)).
Algorithm 1 SSS algorithm
Functions: init(), eval(), mut()
1: Initialize: NEvals 0 , PopSize
2: for p0 to Po pSize do
3: init(genome[p]) Population is randomly initialized
4: end for
5: while NEvals <MaxEvals do
6: for p0 to Po pSize do
7: Fitness[p]eval(genome[p])
8: NEvaluations NEval uations +NSteps
9: genome[p+NParents]genome[p]Create a copy of the parents’ genotype,
i.e., offspring
10:
mut(genome[p+NParents]) Mutate the genotype of the offspring
11:
Fitness[p+NParents]eval(genome[p+NParents])
12:
NEvals NEvals +NSte ps
13:
end for
14:
Rank genotype for Fitness
15:
end while
2.2.2. Hill-Climbing
The Hill-Climbing (HC) algorithm is a method introduced in [
49
], which evolves solu-
tions through a local search process. A description of the method is given in Algorithm 2.
As for the SSS algorithm, the population is formed by µparents. At each generation, each
parent generates an offspring, i.e., a mutated copy (Algorithm 2, lines 9–10), and both
individuals are evaluated (Algorithm 2, lines 7 and 11). If the offspring is not worse
than its parent (Algorithm 2, line 13), the former replaces the latter in the population
(Algorithm 2, line 14). Therefore, differently from the SSS, each individual of the population
evolves independently.
AI 2024,52405
Algorithm 2 HC algorithm
Functions: init(), eval(), mut()
1: Initialize: NEvals 0 , PopSize
2: for p0 to Po pSize do
3: init(genome[p]) Individuals are randomly initialized
4: end for
5: while NEvals <MaxEvals do
6: for p0 to Po pSize do
7: Fitness[p]eval(genome[p])
8: NEvals NEvals +NSte ps
9: genome[p+NParents]genome[p]Create a copy of the parents’ genotype,
i.e., offspring
10:
mut(genome[p+NParents]) Mutate the genotype of the offspring
11:
Fitness[p+NParents]eval(genome[p+NParents])
12:
NEvals NEvals +NSte ps
13:
if Fitness[p+NParents]>Fitness[p]then
14:
genome[p]genome[p+NParents]
15:
Fitness[p]Fitness[p+NParents]
16:
end if
17:
end for
18:
end while
2.2.3. Stochastic Steady State with Hill-Climbing
The Stochastic Steady State with Hill-Climbing (SSSHC) algorithm is a novel method
developed by the author (similar works can be found in [
48
,
60
]), which combines the
SSS algorithm with a particular Lamarckian learning process implemented through a
stochastic Hill Climber [
49
]. The former method performs the evolutionary process by
selecting the fittest individuals in the population. On the other hand, the latter technique
refines the solution found by applying a local search process, which retains adaptive
mutations producing an improvement. These modifications are inherited in the population,
thus HC acts as Lamarckian learning. As stated in Section 1, the SSSHC belongs to the
family of memetic algorithms. Algorithm 3 provides the pseudo-code of the SSSHC
algorithm; it works as the SSS method until the selection process (Algorithm 3, lines 1–14),
and learning is applied to selected individuals only. During the latter phase (Algorithm 3,
lines 15–28), the individuals undergo a refinement process for a fixed number of iterations
NumLearnI ters
(Algorithm 3, lines 18–27). At each learning iteration, the currently selected
individual is mutated (as it happens during offspring generation, see Algorithm 3, lines 10
and 20) and the novel individual, referred as the candidate, is evaluated (Algorithm 3, line
21). The currently selected individual is then compared with the candidate (Algorithm 3,
line 23). The best performing individual is retained as the currently selected individual.
The process is repeated until the given number of learning iterations has been run.
2.2.4. Parameters
As already pointed out in Section 1, the algorithms used in this work were compared
by taking into account the same number of evolutionary steps and with the same parameter
settings. More specifically, in the 5-bit parity task and the double-pole balancing problem,
the best combination of parameters
MutRate
and
PopSize
for the SSS method is determined,
which is referred to as “BestSetup” and represents the baseline. The HC and SSSHC
algorithms were evaluated only by using parameters of the “BestSetup”. The rationale
behind this choice lies in the need for a fair comparison so as to exclude any possible bias
due to the use of different parameters. It is worth pointing out that this approach might
clearly penalize the HC and SSSHC strategies, since there is no guarantee that the optimal
parameters found for a method are the same for all the considered algorithms. On the other
hand, with regard to both the function optimization scenario and the robot foraging and
AI 2024,52406
social foraging problems, the parameters used for the SSS, HC and SSSHC algorithms were
not varied and are reported in Tables 5and 6, respectively.
Algorithm 3 SSSHC algorithm
Functions: init(), eval(), mut()
1: Initialize: NEvals 0 , PopSize,NLearnI ters
2: for p0 to Po pSize do
3: init(genome[p]) Population is randomly initialized
4: end for
5: while NEvals <MaxEvals do
6: for p0 to Po pSize do
7: Fitness[p]eval(genome[p])
8: NEvaluations NEval uations +NSteps
9: genome[p+NParents]genome[p]Create a copy of the parents’ genotype,
i.e., offspring
10:
mut(genome[p+NParents]) Mutate the genotype of the offspring
11:
Fitness[p+NParents]eval(genome[p+NParents])
12:
NEvals NEvals +NSte ps
13:
end for
14:
Rank genotype for Fitness
15:
for p0 to Po pSize do
16:
selectedPop[p]genome[p]
17:
selectedFit[p]Fitness[p]
18:
for iter 0 to NLearnI ters do
19:
candidate selectedPop[p]
20:
mut(candidate)
21:
candidateFit eval(candidate)
22:
NEvals NEvals +NSte ps
23:
if candidateFit >selectedPop[p]then
24:
selectedPop[p]candidate
25:
selectedFit[p]candidateFit
26:
end if
27:
end for
28:
end for
29:
end while
Table 5. List of parameters used in the function optimization scenario. As already stated in the text,
the size of the input vector xwas set to n=20.
Parameter Value
NumRe plications 20
MutRate 0.05
PopSize 20
Table 6. List of parameters used in the robot foraging and the social foraging problems. They were
derived from similar experimental settings (see [72,73]).
Parameter Value
NumRe plications 20
MutRate 0.01
PopSize 20
Four of the five considered problems were analyzed in two experimental conditions:
in the “No noise” condition, the agent evaluation is not affected by the addition of noise.
AI 2024,52407
Instead, a certain amount of noise is applied to the performance measure in the “Noisy”
condition, thus influencing the evolutionary process. With respect to the 5-bit parity task
and the double-pole balancing problem, the best value of the
NoiseRange
parameter for
the SSS method was used to evaluate also the HC and SSSHC algorithms, similarly to the
MutRate
and
PopSize
parameters. Conversely, the parameter
NoiseRange
was set to 0.05
in the “Noisy” condition of the robot foraging and the social foraging problems. Finally,
the optimization of the Rastrigin, Rosenbrock and Sphere functions was performed only in
the “No noise” condition.
3. Results
In this section, the results obtained in the five different scenarios are presented and
discussed (an overview of the outcomes is shown in Figure 6). The metrics used to evaluate
the different algorithms are the performance/fitness, which measures the ability of the
discovered solutions to cope with the given problem, and the convergence speed, i.e., the
number of evaluation episodes required to find the optimal solution to the task. The latter
was calculated only for the 5-bit parity task and the Fixed Initial States condition of the
double-pole balancing problem. Indeed, the optimal value of the Rastrigin, Rosenbrock
and Sphere functions was not discovered by the considered algorithms (see Section 3.3).
Moreover, the maximum performance that can be achieved in the robot foraging and social
foraging problems is not known in advance. With respect to the statistical analyses, the
Mann–Whitney U test with application of the Bonferroni correction was used. A value of p
below 0.05 indicates statistical significance. Conversely, when the compared conditions are
equivalent, the statistical analysis returns a value of pabove 0.05. Moreover, the Spearman’s
Rank correlation coefficient (denoted with
ρ
) was employed to analyze the relationships
between variables, and the associated statistical significance (pbelow 0.01) is provided
where applicable.
Figure 6. Performance of the SSS, the HC and the SSSHC algorithms with respect to the different
considered evolutionary problems. Task labels in the x-axis are defined as follows: 5-BitParity1: 5-bit
parity, “No noise” condition; 5-BitParity2: 5-bit parity, “Noisy” condition; DoublePole1: double-pole
balancing, Fixed Initial States condition, “No noise” case; DoublePole2: double-pole balancing, Fixed
Initial States condition, “Noisy” case; DoublePole3: double-pole balancing, Randomly Varying Initial
States condition; RobotForaging1: robot foraging, “No noise” condition; RobotForaging2: robot
foraging, “Noisy” condition; SocialForaging1: social foraging, “No noise” condition; SocialForaging2:
social foraging, “Noisy” condition. With respect to the optimization function scenario (top right
figure), performance in the y-axis is displayed through a logarithmic scale and a lower fitness
corresponds to a better result.
AI 2024,52408
3.1. 5-Bit Parity
As indicated by the results reported in Table 7, the SSSHC and HC algorithms out-
perform the SSS method in the deterministic case (
p<
0.05). Furthermore, they require a
considerably lower number of evaluations (
p<
0.05, see Table 8). Noticeably, the SSSHC
algorithm is significantly faster than the HC method (
p<
0.05, see Table 8). The same result
holds with the addition of noise with respect to both performance (
p<
0.05, see Table 7)
and convergence speed (
p<
0.05, see Table 8). In the “Noisy” condition, the HC algorithm
is slightly better than the SSSHC algorithm, although the difference is not statistically sig-
nificant (
p>
0.05). Interestingly, in the deterministic case, the SSSHC algorithm manages
to solve this relatively complex task with
NumLearnI ters [100, 10, 000]
(see Table A3),
although it achieves a remarkable performance of 0.99 even with 20–50 learning iterations
(see Table A3). The addition of noise makes the task harder to be solved, requiring a number
of learning iterations higher than 200 to achieve a performance of 0.99 (see Table A3).
Table 7. Average fitness of the controllers evolved with the SSS, the HC and the SSSHC algorithms
in the 5-bit parity task. Data were obtained by running 50 replications of the experiment. Data in
square brackets indicate the standard deviation. All the algorithms were evaluated by using the best
combination of parameters for the SSS in the “No noise” condition. The following parameter values
were used: MutRate =1%; PopSize =10 (see Table A1). The NoiseRange parameter (i.e., “Noisy”
condition) was set to 0.03 (see Table A2). Regarding the SSSHC, the number of learning iterations
Num Learn Iters
was set to 2000 in the “No noise” condition and 5000 in the “Noisy” condition,
respectively (see Table A3). Bold values denote the best results.
SSS HC SSSHC
No noise 0.974 [0.053] 1.0 [0.0] 1.0 [0.0]
Noisy 0.983 [0.037] 0.999 [0.009] 0.998 [0.018]
Table 8. Average number of evaluations required to find a solution to the 5-bit parity problem by
the controllers evolved with the SSS, the HC and the SSSHC algorithms. Data represent the result of
50 replications of the experiment. The parameter settings are the same as reported in the caption of
Table 7. Bold values indicate the best results.
SSS HC SSSHC
No noise 5.766 ×1078.802 ×1065.433 ×106
Noisy 5.003 ×1072.599 ×1071.731 ×107
Figure 7displays the performance obtained by the different algorithms during evo-
lution. As can be seen, SSSHC has a convergence speed noticeably faster than SSS and
faster than HC. Concerning the “No noise” case (Figure 7, top), SSSHC manages to find
a quasi-optimal solution (i.e., a performance
F
0.95) after 5.225
×
10
6
evaluation steps,
HC succeeds in 7.324
×
10
6
evaluation steps, while SSS requires 5.295
×
10
7
evaluation
steps, i.e., more than 10 times the convergence speed of SSSHC and more than seven times
the convergence speed of HC. With respect to the “Noisy” case (Figure 7, bottom), SSSHC
discovers a quasi-optimal solution after 1.73
×
10
7
evaluation steps, HC achieves a fitness
score over 0.95 in 2.091
×
10
7
evaluation steps, while SSS requires 4.565
×
10
7
evaluation
steps, corresponding to more than 2.5 times the convergence speed of SSSHC and more
than two times the convergence speed of HC. The difference is statistically significant in
both conditions (p<0.05).
AI 2024,52409
Figure 7. Performance of the SSS, the HC and the SSSHC algorithms with the best combination of
parameters (
MutRate
= 1% ,
PopSize
= 10) in the 5-bit parity task. Top picture refers to experiments
without the addition of noise (
Num Learn Iters
= 2000). Bottom picture shows results obtained with
the addition of noise (
NoiseRange
= 0.03;
Num Learn Iters
= 5000). The shadow areas indicate the
mean and 85% bootstrapped confidence intervals of the mean. The vertical dashed line marks the
number of steps required by the algorithms to achieve a quasi-optimal solution (i.e., a performance
score greater or equal to 0.95). These data were achieved by averaging outcomes from 50 replications
of the experiment.
Figure 8shows the performance of the SSSHC algorithm depending on the number
of learning iterations. The data indicate that the SSSHC algorithm is effective at finding
a solution independently of the length of the learning process, at least in this context.
Nonetheless, there exists a positive correlation between fitness and number of learning
iterations (
ρ
= 0.330699, significant at
p
< 0.01 in the “No noise” condition;
ρ=
0.243837,
significant at
p
< 0.01 in the “Noisy” condition). Furthermore, the analysis of the correlation
between the number of evaluation steps and the number of learning iterations displays
a positive correlation (
ρ
= 0.549272, significant at
p
< 0.01 in the “No noise” condition;
ρ=
0.380872, significant at
p
< 0.01 in the “Noisy” condition). This implies that, at least in
this domain, the longer the learning process, the faster the algorithm.
Figure 8. Analysis of the performance of the SSSHC algorithm in the 5-bit parity task depending
on the different number of learning iterations. Circles indicate the fitness values obtained with
the corresponding number of iterations. The top curve shows results without the application of
noise, while the bottom one displays the performance with the addition of noise (
NoiseRange
= 0.03).
Data in the x-axis are shown using a logarithmic scale and indicate the average performance over
50 replications of the experiment.
AI 2024,52410
3.2. Double-Pole Balancing
With respect to the double-balancing task, the Markovian version of the problem was
not considered, since it is quite easy to solve. The results obtained in this domain indicate
that the SSS, HC and SSSHC algorithms successfully manage to find the optimal solution
to the problem in both the deterministic (“No noise”) and the stochastic case (“Noisy”).
3.2.1. Fixed Initial States Condition
As indicated by the results reported in Table 9, the SSS and SSSHC algorithms out-
perform the HC algorithm (
p<
0.05) in both conditions. Moreover, the former methods
are noticeably faster than the latter in the “No noise” condition (
p<
0.05, see Table 10).
The performance and convergence speed of the SSSHC and SSS algorithms are similar
(
p>
0.05). Considering the addition of noise (“Noisy” condition), the SSSHC remarkably
outperforms the SSS algorithm (
p<
0.05, see Table 9), while there is no significant differ-
ence concerning the convergence speed (
p>
0.05). The outcomes are further illustrated in
Figure 9.
Table 9. Average fitness of the controllers evolved with the SSS, the HC and the SSSHC algorithms in
the Fixed Initial States condition of the double-pole balancing problem. These data were obtained
by repeating the experiment 30 times. Data in square brackets indicate the standard deviation. All
the algorithms were evaluated by using the best combination of parameters for the SSS in the “No
noise” condition. The mutation rate
MutRate
was set to 5% and the population size
PopSize
is 200
(see Table A4); the
NoiseRange
parameter (i.e., “Noisy” condition) was set to 0.06 (see Table A5). The
number of learning iterations of the SSSHC algorithm was set to 1 in the “No noise” condition and to
50 in the “Noisy” condition (see Table A6). Bold values indicate the best results.
SSS HC SSSHC
No noise 0.996 [0.015] 0.891 [0.081] 0.991 [0.028]
Noisy 0.840 [0.100] 0.584 [0.143] 0.950 [0.052]
Table 10. Average number of evaluations required to find a solution to the Fixed Initial States
condition of the double-pole balancing problem by the controllers evolved with the SSS, the HC
and the SSSHC algorithms. Data correspond to the outcome of 30 replications of the experiment.
The parameter settings are the same as reported in the caption of Table 9. Bold values indicate the
best results.
SSS HC SSSHC
No noise 2.81 ×1074.795 ×1072.458 ×107
Noisy 4.702 ×1075×1074.386 ×107
The results obtained in the non-Markovian version of the double-pole suggest a
possible relationship between the learning performance and the population size. Differently
from the 5-bit parity and the Markovian version of the task (results not shown), in this case,
the optimal population size is 200, while in the former tasks, the optimal value is 10. The
hypothesis is that using a large population size does not allow exploring the search space
effectively and finding of optimal solutions. Aiming to verify this assumption, a control
experiment was run, where only the population size was varied and all other parameters
were kept fixed (see Table 11). As a reference task, the non-Markovian double-pole with
Fixed States initial condition was used. The mutation rate
MutRate
was set to 0.05. The
number of learning iterations
NumLearnI ters
was set to 5. No noise (i.e.,
NoiseRange
= 0.0)
was added to the learning process. The population size was varied by using the following
values:
PopSize [10, 20, 50, 100, 200, 500]
. Overall, 30 replications of the experiment were
performed, thus evaluating 26400 individuals. The analysis revealed a negative correlation
between the fitness of the individuals and the population size (
ρ=
0.788579, significant
at
p<
0.01). In other words, the larger the population size, the lower the performance
AI 2024,52411
of the individuals being part of it. This result confirms the hypothesis about the negative
effect of large population sizes on the learning process, at least in this domain.
Figure 9. Fitness of the SSS, the HC and the SSSHC algorithms in the Fixed Initial States condition
of the double-pole balancing problem. The black rectangles inside violins contain the inter-quartile
range of the data, while the median value of the data is indicated by the white circle inside each
violin. The extreme points of the vertical black line denote the minimum and maximum values of
the data. Results achieved in 30 replications of the experiment by using the best combination of
parameters (mutation rate: 5%; population size: 200). Left figure refers to the “No noise” condition
(
Num Learn Iters
= 1), while the right plot corresponds to the “Noisy” condition (
NoiseRange
= 0.06;
Num Learn Iters = 50).
Table 11. List of parameters used to verify the existence of a correlation between learning and
population size. The latter parameter was set as PopSize [10, 20, 50, 100, 200, 500].
Parameter Value
NumRe plications 30
MutRate 0.05
NoiseRange 0.0
Num Learn Iters 5
Analogously to the 5-bit parity problem, the correlation between performance and
number of learning iterations was investigated. The outcome of this analysis indicates that
there exists a negative correlation in the “No noise” condition (
ρ
= - 0.285502, significant
at
p
< 0.01), while fitness and duration of learning positively correlate in the “Noisy”
condition (
ρ
= 0.290081, significant at
p
< 0.01). This discrepancy (see also Figure 10)
can be ascribed to both the nature of the evolutionary problem and the role played by
noise; differently from the 5-bit parity task, the double-pole balancing problem is not
characterized by neutrality (see Section 4). Instead, it is highly sensitive to small changes.
Therefore, exploring the search space for a long period may be disadvantageous, since
learning might not be able to improve the solution at the cost of reducing the number of
remaining evaluation steps. This property is stronger in a deterministic scenario, like the
“No noise” condition. Conversely, in the “Noisy” condition, learning exploits the addition
of noise in order to discover adaptive mutations and escape from local optima solutions. In
AI 2024,52412
this case, a longer learning process is crucial to substantially improve the fitness. The latter
outcome is in line with the analysis of the 5-bit parity task, where a positive correlation
between performance and duration of the learning process was found.
Figure 10. Fitness obtained by the SSSHC algorithm in the Fixed Initial States condition of the
double-pole balancing problem by systematically varying the number of learning iterations. Circles
indicate the performances obtained with the corresponding number of iterations. As can be observed,
performance decreases as the number of learning iterations increases in the “No noise” condition
(top figure). Conversely, there exists a positive correlation between the number of learning iterations
and fitness in the “Noisy” condition (bottom figure), with
NoiseRange =
0.06. Data in the x-axis are
shown using a logarithmic scale and were achieved by running 30 replications of the experiment.
Concerning the relationship between the number of evaluation steps and the length
of the learning process, there is a negative correlation both in the “No noise” condition
(
ρ=
0.450261, significant at
p<
0.01) and in the “Noisy” condition (
ρ=
0.082322),
although not significant in the latter case. Put in other words, reducing the length of the
learning process leads to better and faster results in this domain.
3.2.2. Randomly Varying Initial States Condition
Due to the intrinsic stochasticity of the considered scenario, only the “No noise” case
was investigated with regard to the Randomly Varying Initial States condition. In particular,
as has been previously explained, the ability of the evolved controllers to perform well
in the Fixed Initial States condition was analyzed. In this respect, the results reported in
Table 12 and Figure 11 indicate that the SSS and SSSHC algorithms remarkably outperform
the HC algorithm (
p<
0.05). Furthermore, the SSSHC is better than the SSS, although the
difference is not statistically significant (
p>
0.05). Therefore, learning provides a positive
effect on the search process, driving it towards higher fitness areas of the search space.
It is worth noting that the capability to “generalize” (i.e., to display good performance
in the Fixed Initial States condition) is achieved in spite of the intrinsic variability of the
task. Further analyses should demonstrate whether or not this property can be extended to
other domains.
AI 2024,52413
Table 12. Average fitness of the controllers evolved with the SSS, the HC and the SSSHC algorithms
in the Randomly Varying Initial States condition of the double-pole balancing problem. Data indicate
the performance obtained by evaluating evolved solutions in the Fixed Initial States condition and
are the result of 30 replications of the experiment. Data in square brackets indicate the standard
deviation. Data refer to the best combination of parameters (
MutRate =
5%;
PopSize =
50) (see
Table A7). The SSSHC performance refers to the best case (
Num Learn Iters =
2) (see Table A8). Bold
values indicate the best results.
SSS HC SSSHC
0.903 [0.076] 0.614 [0.159] 0.939 [0.066]
Figure 11. Performance of the SSS, the HC and the SSSHC algorithms in the Randomly Varying
Initial States condition of the double-pole balancing problem. Data represent the fitness achieved by
evaluating evolved controllers in the Fixed Initial States condition. Violins show the distribution of
data. The black rectangles inside violins are bounded in the range
[Q1,Q3]
, where
Q1
denotes the
first quartile and
Q3
indicates the third quartile. The white circle inside each violin represents the
median value of the data. The vertical black line is limited between the minimum and maximum
values of the data. These outcomes were achieved through 30 replications of the experiment by using
the best combination of parameters (mutation rate: 5%; population size: 50).
Finally, in this case, the correlation between the performance and number of learning
iterations was also analyzed. The results show the existence of a negative correlation
(
ρ=
0.16011, significant at
p<
0.01), i.e., the longer the learning process, the worse the
performance. This outcome is due to the low neutrality of the considered case, which is
characterized by high variance (the task is stochastic). Consequently, the learning process
mainly generates maladaptive perturbations while reducing the number of remaining
evaluation steps.
3.3. Optimization Functions
The analysis of the performance in the function optimization scenario demonstrates the
superiority of the SSS and SSSHC algorithms over the HC method in two of the considered
problems. Specifically, SSS and SSSHC significantly outperform HC in optimizing the
Rastrigin and Sphere functions (
p<
0.05). Interestingly, the former methods achieve the
same performance (see Table 13). Instead, with respect to the Rosenbrock function, SSSHC
is remarkably superior to HC (
p<
0.05) and is also better than SSS (see Table 13), although
the latter outcome is not statistically significant (p>0.05).
AI 2024,52414
Table 13. Average fitness of the controllers evolved with the SSS, the HC and the SSSHC algorithms
with respect to the Rastrigin, Rosenbrock and Sphere optimization functions. Data were collected
by replicating the experiments 20 times. Data in square brackets indicate the standard deviation.
The values of the mutation rate
MutRate
and population size
PopSize
used in both conditions
are reported in Table 6. The SSSHC performance refers to the best case found for all functions
(
Num Learn Iters =
1 concerning the Rastrigin and Sphere functions, and
Num Learn Iters =
100
with respect to the Rosenbrock function, see Tables A9A11). Bold values indicate the best results
(lower is better).
SSS HC SSSHC
Rastrigin 1.524 [0.0] 2.891 [0.668] 1.524 [0.0]
Rosenbrock 31.496 [27.841] 18.473 [3.612] 16.119 [2.100]
Sphere 0.008 [0.0] 0.017 [0.004] 0.008 [0.0]
The obtained results indicate that, in this context, the addition of learning to evolution
does not help to improve the quality of the discovered solutions in the optimization of the
Rastrigin and Sphere functions (see Table 13). Conversely, learning is slightly beneficial
with regard to the optimization of the Rosenbrock function. This discrepancy can be partly
ascribed to the difference in the global optimum of the considered functions: both the
Rastrigin and the Sphere functions are minimized when
x=[0, . . . , 0]
, with xdenoting the
input vector (see Equations (13) and (15)). Instead, the Rosenbrock function is optimized
when the
x=[1, . . . , 1]
(see Equation (14)). Overall, finding such types of solutions when
inputs are not binary and are converted into floating-point values is not trivial, since it
is easy to become stuck in local optima (see Figure 3). Moreover, the order of magnitude
of Rosenbrock is notably higher than those of the Rastrigin and Sphere functions (see
Table 13).
Figure 12 displays the performance variation during the evolutionary process. As can
be seen, the SSS converges to lower values before the HC and SSSHC with respect to both
the Rastrigin and Sphere functions (see Figure 12, top and bottom figures). On the other
hand, the three algorithms exhibit similar trends when optimizing the Rosenbrock function
(Figure 12, middle figure).
By looking at the correlation between the value of the function and the number of
learning iterations, a positive correlation was found with regard to both the Rastrigin
(
ρ=
0.887625, significant at
p<
0.01) and Sphere (
ρ=
0.937437, significant at
p<
0.01)
functions. This implies that increasing the length of the learning process is detrimental,
since performance strongly decreases (recall that, in this domain, the goal is to minimize the
target function). Conversely, there exists a negative correlation in the case of the Rosenbrock
function (
ρ=
0.818182, significant at
p<
0.01). Differently from the other two functions,
here, making the learning process longer is beneficial for the discovery of more effective
solutions (see Figure 13).
As already stated in Section 2.2.4, the experimental parameters were kept fixed (see
Table 5) and not systematically varied. Therefore, it is possible that using different settings
might lead to improved performances for all the considered algorithms, especially with
regard to the Rosenbrock function. The results (not shown) indicate that the performance
of the SSSHC method can increase when diverse values of the
MutRate
and
PopSize
parameters are used. Moreover, the addition of noise was not investigated in this scenario.
Consequently, introducing noise in the selection process could help escape from the local
minima solutions discovered by the different algorithms.
AI 2024,52415
Figure 12. Performance of the SSS, the HC and the SSSHC algorithms in the optimization of the
Rastrigin (top figure), Rosenbrock (middle figure) and Sphere (bottom figure) functions. The shadow
areas indicate the mean and 85% bootstrapped confidence intervals of the mean. The performance on
the y-axis is displayed by means of the logarithmic scale. Collected data are the average result of
20 replications of the experiments. Parameters that were used are reported in Table 5.
Figure 13. Values of the Rastrigin (
Frastrigin
), Rosenbrock (
Frosenbrock
) and Sphere (
Fsphere
) functions
achieved by the SSSHC algorithm by systematically varying the number of learning iterations. Circles
denote the fitness obtained with the corresponding number of iterations. With respect to the Rastrigin
and Sphere functions, the value of the function is higher as the number of learning iterations increases
(top and bottom figures). Conversely, as far as the Rosenbrock function is concerned, the value of
the function decreases when the number of learning increases (middle figure). Data in the x-axis are
shown using a logarithmic scale and were collected in 20 replications of the experiment.
AI 2024,52416
3.4. Robot Foraging
The performance comparison of the considered methods in the robot foraging task
generates results in line with those found in the function optimization scenario. In fact,
as indicated in Table 14 and Figure 14, the SSS and SSSHC algorithms remarkably outper-
form the HC method (
p<
0.05). Moreover, the SSS algorithm achieves a slightly higher
performance than the SSSHC method in the “No noise” condition, while the opposite is
true in the “Noisy” condition. However, the difference is not statistically significant in both
cases (
p>
0.05). Therefore, the combination of learning and evolution does not provide
an advantage to evolution in this scenario. Moreover, the addition of noise does not help
to improve the quality of solutions (see Table 14). Instead, it has a detrimental effect on
performance, which is significantly lower (
p<
0.05). Some insights explaining why the
addition of learning to evolution is not beneficial are provided in Section 4.
Table 14. Performance of the controllers evolved with the SSS, the HC and the SSSHC algorithms in
the robot foraging task. These data were obtained by averaging the results of 20 replications of the
experiment. Standard deviations are reported within square brackets. The parameters
MutRate
and
PopSize
were set according to Table 6. Regarding the “Noisy” condition, the parameter
NoiseRange
was set to 0.05, as stated in Section 2.2.4. The SSSHC performance refers to the best case found
for both conditions (
Num Learn Iters
= 1 in the “No noise” condition and
Num Learn Iters
= 2 in the
“Noisy” condition, see Table A12). Bold values indicate the best results.
SSS HC SSSHC
No noise 8.950 [0.315] 8.270 [0.386] 8.870 [0.305]
Noisy 8.290 [0.392] 6.990 [0.816] 8.350 [0.460]
Figure 14. Performance of the SSS, the HC and the SSSHC algorithms in the robot foraging task. The
black rectangles contained in the violins are bounded in the range
[Q1,Q3]
. Median values are repre-
sented by white circles. The vertical black line is limited between the minimum and maximum values
of the data. Data were obtained by averaging 20 replications of the experiments. Parameters that
were used are reported in Table 6. Left figure refers to the “No noise” condition (
Num Learn Iters
= 1),
while right picture refers to the “Noisy” condition (NoiseRange = 0.05; Num Learn Iters = 2).
AI 2024,52417
It is important to note that, differently from the 5-bit parity and the double-pole balanc-
ing problems, here, the parameter settings were not systematically varied, analogously to
the function optimization scenario. Instead, they were set according to the values reported
in Table 6, which are derived from works focusing on the analysis of dynamics and emer-
gent behaviors in groups [
71
,
72
]. Similarly, the value of the parameter
NoiseRange
was set
to 0.05 in the “Noisy” condition, without any additional analysis. Therefore, there could be
room for further performance improvements of the SSS, HC and SSSHC algorithms. In the
future, the investigation of different combinations of parameters will be explored.
Figure 15 shows how the performance of the considered methods varies during the
evolutionary process. Since the curves are increasing (see Figure 15), continuing the process
is highly likely to further improve the fitness of the different algorithms.
Figure 15. Performance of the SSS, the HC and the SSSHC algorithms in the robot foraging task.
Top picture refers to the “No noise” condition, while the bottom figure contains data achieved
in the “Noisy” condition (
NoiseRange
= 0.05). With respect to the SSSHC algorithm, data refer
to the best value of the parameter
Num Learn Iters
found (
Num Learn Iters
= 1 in the “No noise”
condition,
Num Learn Iters
= 2 in the “Noisy” condition). The shadow areas indicate the mean and
85% bootstrapped confidence intervals of the mean. These data were obtained by averaging the
outcomes from 20 replications of the experiment.
3.5. Social Foraging
The SSSHC algorithm achieves better performance then the HC method in both
conditions of the social foraging problem (see Table 15 and Figure 16). The result is
statistically significant (
p<
0.05). Moreover, the SSS is considerably superior to HC in
the “Noisy” condition (
p<
0.05, see Table 15). The SSS and SSSHC perform similarly
in this domain, with no significant differences among them (
p>
0.05). These outcomes
are coherent with those found for the robot foraging task and confirm that learning is
not beneficial for evolution when the considered problem entails agent–environment
interactions. Indeed, the complexity of these domains makes it difficult for learning to
locally refine solutions, since variations mainly have a disruptive effect. Consequently,
the SSSHC is likely to undergo phases in which learning performs additional evaluation
episodes aiming to improve the performance of current solutions, but without success.
AI 2024,52418
Table 15. Fitness of the controllers evolved with the SSS, the HC and the SSSHC algorithms in the
social foraging problem. Data represent the average performance of 20 replications of the experiment.
Square brackets contain the standard deviations. As for the robot foraging task, the parameters
MutRate
and
PopSize
that were used are reported in Table 6. Moreover, these results were achieved
using the best value of the parameter
Num Learn Iters
found for both conditions (
Num Learn Iters
= 1
in the “No noise” condition and
Num Learn Iters
= 5 in the “Noisy” condition, see Table A13). Bold
values indicate the best results.
SSS HC SSSHC
No noise 1.360 [0.250] 1.260 [0.329] 1.450 [0.218]
Noisy 0.800 [0.410] 0.350 [0.252] 0.620 [0.166]
Figure 16. Performance of the SSS, the HC and the SSSHC algorithms in the social foraging prob-
lem. The black rectangles inside violins contain the inter-quartile range of the data. White circles
inside each violin denote the median. The extreme points of the vertical black line indicate the
minimum and maximum values of the data. These data are the result of 20 replications of the
experiments. Parameters that were used are reported in Table 6. Left figure refers to the “No noise”
condition (
Num Learn Iters
= 1), while right picture refers to the “Noisy” condition (
NoiseRange
= 0.05;
Num Learn Iters = 5).
Notably, in this scenario, the addition of noise is highly detrimental, with a remarkable
drop in performance (see Table 15). In fact, the fitness achieved in the “Noisy” condition
is considerably lower than the one obtained in the “No noise” condition (
p<
0.05, see
also Figure 16). This outcome is in line with the results found in the robot foraging task,
although the drop in performance is less marked in the latter case (see Table 14). However,
the social foraging problem is considerably more challenging than the robot foraging task,
because the performance strongly depends on the ability of the agents to forage “socially”,
i.e., the solution to the problem cannot be achieved by a single robot. This entails that both
agents should have exploratory skills, find the food element and eat the item. Despite
the use of homogeneous agents, which makes the task easier, the need to explore the
environment in searching for the food element could prevent the robots from behaving
effectively. This also explains why the performance in the robot foraging task is about
one order of magnitude higher than the fitness achieved in the social foraging problem
AI 2024,52419
(see Tables 14 and 15). Given the complexity of the considered problem, the addition of
noise has a disruptive effect on the performance of solutions discovered by the considered
algorithms. Indeed, only a few mutations have a positive effect on fitness, but noise hinders
such adaptation mechanisms and affects both the selection and the learning processes.
In this respect, future studies should clarify whether a limited amount of noise could be
beneficial in the considered scenario.
Analogously to the robot foraging task, the analysis of fitness during the whole
evolutionary process indicates that the different algorithms can further improve their
results (see Figure 17). This is emphasized more for the SSS algorithm, especially in the
“Noisy” condition (see Figure 17, bottom). Consequently, it is possible that increasing
the length of the evolutionary process could demonstrate that learning is detrimental in
this scenario.
Figure 17. Average fitness obtained by the SSS, the HC and the SSSHC algorithms in the social
foraging problem. Top picture refers to the “No noise” condition, while bottom figure contains
data achieved in the “Noisy” condition (
NoiseRange
= 0.05). The outcomes of the SSSHC algorithm
refer to the best value of the parameter
Num Learn Iters
found (
Num Learn Iters
= 1 in the “No noise”
condition,
Num Learn Iters
= 5 in the “Noisy” condition). The shadow areas indicate the mean and
85% bootstrapped confidence intervals of the mean. These data are the results of 20 replications of
the experiment.
4. Discussion
In this section, the outcomes of the experiments reported in Section 3and the main
findings are discussed.
With respect to the 5-bit parity task, the better performance of HC over SSS is not
surprising. As pointed out in [
76
], (1
+λ
)-ES algorithms like HC (where
λ=
1) display
higher performance than (
µ+µ
)-ES techniques, such as SSS, in this domain. Indeed, the
family of parity problems is characterized by high neutrality, i.e., large areas of the search
space that can be reached through mutations not affecting the probability of an individual
surviving and reproducing [
95
,
96
]. Therefore, the former methods drive evolution towards
such neutral regions of the search space that can ultimately lead to areas with higher
fitness. Conversely, the latter algorithm tends to explore regions of the search space
characterized by high robustness but far from high-fitness areas [
76
]. The competition
between population members observed in (
µ+µ
)-ES algorithms results in the tendency
AI 2024,52420
to perform a local exploration of the search space, thus preventing them from discovering
optimal solutions to the problem. Moreover, solutions discovered by (1
+λ
)-ES methods
typically contain a higher number of genes playing a functional role than those found
by (
µ+µ
)-ES techniques [
76
]. The SSSHC algorithm exploits the combination of the
two different techniques in order to achieve very good performance (see Table 7) with
considerably better convergence speed (see Table 8).
Concerning the double-pole balancing problem, the HC is significantly worse than
SSS. This implies that strategies in which competition among population members does
not play a role are not effective at finding a solution to the task. The possibility to compete
against other individuals allows avoiding becoming stuck in sub-optimal solutions. The
outcome is even more evident in the “Noisy” condition. Indeed, the addition of noise has a
disruptive effect on the HC algorithm, since the possibility of retaining maladaptive traits
increases. Because most of the mutations typically cause a drop in performance and given
that evolving individuals do not compete for survival, the combination of these two factors
leads to the impossibility to access areas of the search space corresponding to higher fitness.
Conversely, the SSS method is less sensitive to the issue, due to the competition among
population members triggered by the selection process. The SSSHC benefits from the latter
property to avoid being trapped in local minima, while preserving the capability to explore
and, possibly, improve the quality of the discovered solutions. Overall, the obtained results
confirm the hypothesis about the positive influence of noise on learning. Indeed, making
the fitness stochastic allows learning to explore more of the search space. The retention of
maladaptive mutations gives learning the possibility to access areas of the search space that
cannot be reached through a deterministic process.
The performance of SSS and SSSHC is superior to that of HC with respect to the
optimization of the Rastrigin and Sphere functions. Furthermore, the SSSHC considerably
outperforms the HC and is better than the SSS when optimizing the Rosenbrock function.
Overall, these outcomes show how the combination of learning and evolution is not
advantageous in this scenario. One possible explanation is related to the goal of these
types of problem, i.e., finding the best set of values that minimize the considered function.
As illustrated in Section 3.3, this implies the discovery of global optima solutions (i.e.,
x=[0, . . . , 0]
for the Rastrigin and Sphere functions,
x=[1, . . . , 1]
for the Rosenbrock
function, respectively), which is far from trivial. In addition, the presence of multiple
local optima solutions makes it difficult for the algorithms to improve performance. In
this respect, mutations are mainly maladaptive and improve performance only rarely.
Consequently, learning spends time in trying to discover more effective solutions and fails
in providing a beneficial effect on evolution. As illustrated in Section 3.3, the experiments
on function optimization were performed without considering the addition of noise, which
could help to escape from the local optima solutions characterizing the considered functions,
similarly to what has been observed in the double-pole balancing problem. Therefore,
further analysis is needed before reaching a final conclusion.
The SSS and SSSHC algorithms discover more effective solutions than the HC method
in both the robot foraging and the social foraging tasks. Moreover, the performance of
the SSS and the SSSHC is similar, without a clear winner. Consequently, learning does
not provide an advantage to evolution in this domain. The main reason lies in the diverse
nature of the considered problems. In fact, differently from the 5-bit parity and double-pole
balacing tasks, these problems involves robotic agents exploring the arena in search of food
elements to eat. Consequently, the agents interact with the environment. The SSS exploits
the competition between individuals in the population in order to avoid being stuck in local
minima solutions and obtains a relatively good fitness on average. On the other hand, the
HC algorithm maintains different individuals evolving in parallel, but it fails in achieving
adequate performance. The SSSHC benefits from the combination of the two approaches,
but the time spent to refine solutions does not provide an advantage to evolution in this
context. As pointed out with respect to the function optimization scenario, it is worth
considering that there is no guarantee that learning improves the quality of the solutions
AI 2024,52421
discovered during the evolutionary process, since most of the mutations are maladaptive.
Consequently, the SSSHC might be characterized by phases in which learning does not
play a role. Because the local search is costly in terms of number of evaluation episodes,
SSSHC achieves performances comparable to those obtained by SSS in this domain. The
properties of the SSSHC algorithm also clarify why the addition of noise is detrimental in
this scenario (see Table 15), as already explained in Section 2.1.5.
One might argue that the double-pole balancing problem and the two considered
robotic tasks share similarities, since both have agents interacting with the environment
(e.g., the cart and the robots). Nevertheless, the types of interaction are completely different:
in the double-pole balancing problem, the cart can only move on the horizontal axis and
the interaction with the environment implies the contact of the wheels with the floor (see
Figure 2), without friction. On the other hand, in both the robot foraging and the social
foraging tasks, agents are embodied and can freely move in the environment. Moreover,
they perceive objects in the nearby areas (e.g., food elements and/or walls), which affect
their behaviors. This implies that dealing with these stimuli is remarkably more complex.
Finally, the actions performed by the robots cause sudden changes in the inputs; for
example, when the agent succeeds in eating a food item, this reappears immediately in a
different location, and the sensory inputs of the robot are significantly different between
one step and the next (see Table 16).
Table 16. Example of sensory inputs perceived by the robot in two consecutive steps of the robot
foraging problem. Specifically, step tcontains the values of the sensors just before eating a food
element, while step
t+
1 refers to the inputs after the food item has been eaten. Sensors are grouped
by color channel (red: R1–R6, green: G1–G6, blue: B1–B6), as described in Section 2.1.4. As can be
seen, the stimuli received by the agent undergo a sudden change due to the fact that the eaten food
element reappears in a different location.
Step R1 R2 R3 R4 R5 R6 G1 G2 G3
G4 G5 G6
B1 B2 B3 B4 B5 B6
t0.0 0.0 0.933257 1.0 1.0 0.798138 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
t+1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
As shown in Figures 15 and 17, all the considered algorithms might improve their
performances if the evolutionary process continues. This is more evident when evaluating
the SSS in the “Noisy” condition of the social foraging experiment. Therefore, future
work will investigate the performances of the different methods when the length of the
evolutionary process is set to 5
×
10
8
evaluation steps as in [
72
] and whether the effect of
learning on evolution becomes detrimental.
5. Conclusions
In this paper, the benefits and drawbacks of combining evolution and learning, a
well-known topic in the research community, were investigated. Prior works in this area
have led to contradictory results, without providing clear hints about the actual effect of
learning on evolution. Differently from previous approaches, results were collected in five
different scenarios, including three benchmark tasks (5-bit parity, double-pole balacing
and function optimization) and two robotic problems (robot foraging and social foraging).
The aim is to investigate this interplay on widely used and challenging domains. The
hypothesis is that learning provides advantages to evolution under specific conditions,
like the use of deterministic episodes and the absence of agent–environment interactions.
When these criteria are met, the addition of noise to both selection and learning process is
beneficial. Here, the term “learning” denotes a refinement process attempting to modify
parameters through mutations and retaining adaptive modifications. A novel algorithm
combining learning and evolution, called SSSHC, was proposed and tested on the above
mentioned tasks. Specifically, the author compared the SSSHC algorithm with the methods
composing it, namely, the SSS algorithm (i.e., a pure evolutionary method) and the HC
AI 2024,52422
algorithm, which represents an optimization algorithm performing solution refinements
based. The three techniques were analyzed by keeping the same parameter settings to avoid
biases: the best parameters for the SSS algorithm are first determined and used to evaluate
both the HC and the SSSHC algorithms. The results in the presented domains indicate that
the combination of evolution and learning may or may not be beneficial depending on the
nature of the problem. In particular, the following outcomes can be highlighted:
the SSSHC achieves significantly better performance than the SSS with respect to the
5-bit parity task (both “No noise” and “Noisy” conditions);
the SSSHC algorithm is notably faster than the other methods with regard to the 5-bit
parity task (both “No noise” and “Noisy” conditions);
the SSSHC considerably outperforms the SSS in to the Fixed Initial States condition,
“Noisy” case of the double-pole balancing problem;
the SSSHC is better than SSS in the Randomly Varying Initial States condition of the
double-pole balancing problem;
the SSSHC is remarkably superior to HC with respect to both conditions (“No noise”
and “Noisy”) of the double-pole balancing problem;
the SSSHC has a significantly superior convergence speed compared to the HC in the
Fixed Initial States condition of the double-pole balancing problem, “No noise” case;
the performance of SSS and SSSHC is the same with respect to the optimization of the
Rastrigin and Sphere functions;
the SSSHC significantly outperforms the HC in the optimization of the Rastrigin and
Sphere functions;
the SSSHC is notably superior to HC in the optimization of the Rosenbrock function;
the SSSHC is better than SSS in the optimization of the Rosenbrock function;
the SSSHC performs similarly to SSS in both the robot foraging task and the social
foraging problem;
the SSSHC is remarkably superior to HC in both the robot foraging task and the social
foraging problem;
the SSSHC has a considerable performance drop when noise is added in both the robot
foraging task and the social foraging problem.
Moreover, the achieved results reveal that the advantage is higher when noise is added
to the learning and the selection processes if the considered problem is characterized by
determinism, i.e., when evolving agents are evaluated in the same episodes throughout the
whole evolutionary process. Indeed, the possibility to retain maladaptive traits in order
to explore more of the search space allows the discovery of areas of higher fitness that
cannot be reached through a standard evolutionary process. However, some differences
can be highlighted: in the 5-bit parity task, the length of the learning process positively
correlates with both the final performance and the convergence speed. The result was
obtained in both the “No noise” condition and the “Noisy” condition. This implies that the
effect of learning on evolution is higher when the number of learning iterations increases.
An explanation lies in the nature of the parity problem, which is characterized by high
neutrality. This, in turn, allows the learning process to efficiently explore the search space
and ultimately discover the regions associated to high performance. With regard to the
double-pole balancing problem, instead, the duration of the learning process is negatively
correlated with both the fitness and the convergence speed in the “No noise” case of
the Fixed Initial States condition. Differently from the 5-bit parity task, the double-pole
balancing problem is characterized by low neutrality. Consequently, small changes might
have disruptive effects. This explains why reducing the length of the learning process
produces better results. Conversely, in the “Noisy” case of the Fixed Initial States condition,
the effect of learning on evolution is more beneficial when the number of learning iterations
is higher. Indeed, the addition of noise could allow learning to escape from local optima
and generate more effective solutions. Overall, the collected results clearly indicate the
need to fine-tune the duration of the learning process based on the specific evolutionary
AI 2024,52423
problem. In the future, the possibility of adjusting the number of learning iterations during
the course of the evolutionary process should be investigated.
On the other hand, learning does not positively affect evolution on the function
optimization scenario, the robot foraging and social foraging problems. Specifically, with
regard to the former case, the Rastrigin, Rosenbrock and Sphere functions are characterized
by one global optimum and multiple local minima, which make the task challenging.
However, the analysis reveals differences between the examined functions: with respect
to the Rastrigin and Sphere functions, the performance is negatively correlated with the
length of the learning process, as for the “No noise” case of the Fixed Initial States condition
of the double-pole balancing task. Conversely, the fitness is positively correlated with the
duration of learning in the optimization of the Rosenbrock function, analogously to the
5-bit parity task and the “Noisy” case of the Fixed Initial States condition of the double-pole
balancing problem. In addition, it is worth highlighting that the function optimization
scenario was investigated only in the absence of noise. As already discussed in Section 4,
future studies should clarify whether and how the addition of noise might influence the
interplay between learning and evolution in this domain.
As far as the robot foraging and social foraging problems are concerned, the embodi-
ment and the presence of agent–environment interactions increase the task complexity. The
addition of noise in the robot foraging and social foraging problems has a detrimental effect
on the performance of the considered algorithms, due to the nature and the complexity
of these tasks. Nonetheless, further exploration of the effect of noise and other parameter
settings is required, since these problems were studied by using the values reported in
Tables 5and 6, without performing any systematic analysis. Similarly, a future research
direction could explore the possibility of automatically adjusting the parameters (i.e., muta-
tion rate and population size) on the fly, depending on the performance level during the
evolutionary process.
It is worth underlining that the learning definition used in this work does not cor-
respond to the classic concept reported in [
26
]. Consequently, in the future, the author
plans to study the relationship between evolution and learning more thoroughly, by in-
vestigating learning techniques like back-propagation [
97
] or Spike Timing-Dependent
Plasticity (STDP) [
98
,
99
]. Furthermore, future works will be devoted to verify whether the
outcomes reported here remain valid by using automatic algorithm configuration tools to
optimize parameters, a topic beyond the scope of this paper as pointed out in Section 1. In
addition, the results reported in this work show that, when agents actively interact with
the environment, learning does not provide an advantage to evolution. Future studies
will investigate possible modifications of the SSSHC (for example, the utilization of a
crossover operator or the adaptation of the length of learning on the fly) in order to achieve
better performance than the SSS in the considered experimental settings, as well as in
other scenarios involving autonomous robots performing individual [
100
102
] or collective
behaviors [
103
106
]. Finally, future research will be devoted to the comparison of the
SSSHC with other state-of-the-art EAs like the Covariance Matrix Adaptation Evolution
Strategy (CMA-ES, see [
64
,
107
]), Separable Natural Evolution Strategies (sNES, see [
66
,
108
])
Exponential Natural Evolution Strategies (xNES, see [
66
,
109
]) and the OpenAI Evolutionary
Strategy (OpenAI-ES, see [
110
112
]), and/or with Reinforcement Learning (RL) methods
like the Proximal Policy Optimization (PPO) [
113
] and Deep Deterministic Policy Gradient
(DDPG) [114].
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The code to replicate the experiments reported in the paper is freely
available at https://github.com/PaoloP84/LearningAndEvolution (accessed on 28 October 2024).
AI 2024,52424
Acknowledgments: The author would like to express his gratitude to his friend and colleague Alessan-
dra Vitanza for her suggestions that contributed to improving the quality of this work. The author
would also like to thank Stefano Nolfi for his advice about the structure and organization of the article.
Conflicts of Interest: The author declares no conflicts of interest.
Appendix A
Appendix A.1. 5-Bit Parity
Table A1. Average fitness of the controllers evolved with the SSS algorithm in 50 replications of
the experiment. Data in square brackets indicate the standard deviation. Data in round brackets
represent the average number of evaluation episodes performed. Bold values indicate the best results.
MutRate/Pop Size 10 20 50 100
0.01 0.974 [0.053] 0.939 [0.077] 0.915 [0.075] 0.880 [0.106]
(5.766 ×107) (7.838 ×107) (8.424 ×107) (8.191 ×107)
0.02 0.973 [0.059] 0.933 [0.077] 0.922 [0.071] 0.906 [0.073]
(5.46 ×107) (7.463 ×107) (7.973 ×107) (8.584 ×107)
0.05 0.955 [0.052] 0.917 [0.077] 0.896 [0.081] 0.898 [0.086]
(7.572 ×107) (8.334 ×107) (8.831 ×107) (8.67 ×107)
0.1 0.856 [0.083] 0.832 [0.076] 0.843 [0.090] 0.824 [0.074]
(9.448 ×107) (9.808 ×107) (9.432 ×107) (9.896 ×107)
0.2 0.747 [0.059] 0.729 [0.047] 0.723 [0.039] 0.716 [0.038]
(108) (108) (108) (108)
Table A2. Average fitness of the controllers evolved with the SSS algorithm with the best combination
of parameters (
MutRate
= 0.01;
PopSize
= 10, see Table A1). Different levels of noise were applied.
Data were obtained by running 50 replications of the experiment. Data in square brackets indicate
the standard deviation. Data in round brackets represent the average number of evaluation episodes
performed. Bold values indicate the best results.
NoiseRange Fitness
0.0 0.974 [0.053]
(5.766 ×107)
0.01 0.969 [0.063]
(5.277 ×107)
0.02 0.969 [0.062]
(5.497 ×107)
0.03 0.983 [0.037]
(5.003 ×107)
0.04 0.976 [0.054]
(5.335 ×107)
0.05 0.960 [0.070]
(5.663 ×107)
0.06 0.969 [0.061]
(5.612 ×107)
0.07 0.971 [0.052]
(5.334 ×107)
0.08 0.951 [0.072]
(5.98 ×107)
0.09 0.965 [0.063]
(5.785 ×107)
0.1 0.964 [0.070]
(5.338