ArticlePDF Available

Decision-making without a brain: How an amoeboid organism solves the two-armed bandit

Authors:

Abstract and Figures

Several recent studies hint at shared patterns in decision-making between taxonomically distant organisms, yet few studies demonstrate and dissect mechanisms of decision-making in simpler organisms. We examine decision-making in the unicellular slime mould Physarum polycephalum using a classical decision problem adapted from human and animal decision-making studies: the two-armed bandit problem. This problem has previously only been used to study organisms with brains, yet here we demonstrate that a brainless unicellular organism compares the relative qualities of multiple options, integrates over repeated samplings to perform well in random environments, and combines information on reward frequency and magnitude in order to make correct and adaptive decisions. We extend our inquiry by using Bayesian model selection to determine the most likely algorithm used by the cell when making decisions. We deduce that this algorithm centres around a tendency to exploit environments in proportion to their reward experienced through past sampling. The algorithm is intermediate in computational complexity between simple, reactionary heuristics and calculation-intensive optimal performance algorithms, yet it has very good relative performance. Our study provides insight into ancestral mechanisms of decision-making and suggests that fundamental principles of decision-making, information processing and even cognition are shared among diverse biological systems.
Content may be subject to copyright.
rsif.royalsocietypublishing.org
Research
Cite this article: Reid CR, MacDonald H,
Mann RP, Marshall JAR, Latty T, Garnier S.
2016 Decision-making without a brain:
how an amoeboid organism solves the
two-armed bandit. J. R. Soc. Interface 13:
20160030.
http://dx.doi.org/10.1098/rsif.2016.0030
Received: 12 January 2016
Accepted: 13 May 2016
Subject Category:
Life SciencesMathematics interface
Subject Areas:
biomathematics, computational biology
Keywords:
slime mould, Physarum polycephalum,
decision-making, exploration– exploitation
trade-off, Bayesian model selection,
two-armed bandit
Author for correspondence:
Chris R. Reid
e-mail: chrisreidresearch@gmail.com
Present address: Department of Biological
Sciences, Macquarie University, Sydney,
New South Wales 2109, Australia.
Electronic supplementary material is available
at http://dx.doi.org/10.1098/rsif.2016.0030 or
via http://rsif.royalsocietypublishing.org.
Decision-making without a brain:
how an amoeboid organism solves the
two-armed bandit
Chris R. Reid1,†, Hannelore MacDonald1, Richard P. Mann2, James
A. R. Marshall3,4, Tanya Latty5and Simon Garnier1
1
Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ 07102, USA
2
School of Mathematics, University of Leeds, Leeds LS2 9JT, UK
3
Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK
4
Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
5
School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
SG, 0000-0002-3886-3974
Several recent studies hint at shared patterns in decision-making between
taxonomically distant organisms, yet few studies demonstrate and dissect
mechanisms of decision-making in simpler organisms. We examine decision-
making in the unicellular slime mould Physarum polycephalum using a classical
decision problem adapted from human and animal decision-making studies:
the two-armed bandit problem. This problem has previously only been used
to study organisms with brains, yet here we demonstrate that a brainless unicel-
lular organism compares the relative qualities of multiple options, integrates
over repeated samplings to perform well in random environments, and com-
bines information on reward frequency and magnitude in order to make
correct and adaptive decisions. We extend our inquiry by using Bayesian
model selection to determine the most likely algorithm used by the cell when
making decisions. We deduce that this algorithm centres around a tendency
to exploit environments in proportion to their reward experienced through
past sampling. The algorithm is intermediate in computational complexity
between simple, reactionary heuristics and calculation-intensive optimal per-
formance algorithms, yet it has very good relative performance. Our study
provides insight into ancestral mechanisms of decision-making and suggests
that fundamental principles of decision-making, information processing and
even cognition are shared among diverse biological systems.
1. Introduction
While less recognized than their animal counterparts, many non-neuronal organ-
isms, such as plants, bacteria, fungi and protists, also have the ability to make
complex decisions in difficult environments (for a full review, see [1]). The most
incredible feats of problem-solving among non-neuronal organisms, many
previously reported only in the so-called cognitive organisms, have been demon-
strated by the unicellular slime mould Physarum polycephalum. This unicellular
protist lacks a central nervous system and possesses no neurons, yet it has been
demonstrated to solve convoluted labyrinth mazes [2], find shortest length net-
works and solve challenging optimization problems [3], anticipate periodic
events [4], use its slime trail as an externalized spatial memory system to avoid
revisiting areas it has already explored [5] and even construct transport networks
that have similar efficiency to those designed by human engineers [6]. Slime
mould cells also display similar decision-making constraints to the cognitive
constraints observed in brains. Latty & Beekman [7] provide evidence that
P. polycephalum is vulnerable to making the same economically irrational
decisions that can afflict humans [8], starlings [9], honeybees [10] and grey jays
&2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
[10]. The same authors also demonstrate that, like humans,
slime moulds are subject to speed-accuracy trade-offs when
confronted with a difficult choice set [11]. Studies such as
these support the growing notion that certain problem-solving
processes, as well as their associated trade-offs and paradoxes,
are spread wide on the phylogenetic tree [12,13]. To compare
the information processing abilities of different organisms,
we require a common testing platform based on challenges
likely shared by organisms from vastly different taxa.
Many of the decisions faced by humans and most other
organisms necessitate exploration of a number of options
before a commitment is made to exploit a particular choice.
These decisions are often made more complex when their vari-
ables are continually changing, resulting in a need for constant
re-evaluation of alternatives. The question boils down to a
fundamental conundrum in decision-making: to exploit fam-
iliar but potentially sub-optimal options, or to risk further
exploration for potentially more rewarding ones? This is
known as the explorationexploitation trade-off. Several
studies in both humans and other animals have examined
the explorationexploitation trade-off using what has
become a classic behavioural experiment in the understanding
of decision-making; the multi-armed bandit problem. The
multi-armed bandit problem derives its name from casino
slot machines—deciding which machine to play to maximize
the net payoff proves to be nearly impossible for the average
gambler to consistently solve [14]. Only one provably optimal
schedule has been derived for the special case of stationary
bandit problems where there are no costs for switching
between arms—the Gittins index [15]. Empirical studies of
bandit problem-solving have thus far only been carried out
in organisms with brains (such as humans [16], great tits [17],
pigeons [18], sticklebacks [19] and bumblebees [20]). However,
the exploration–exploitation trade-off is a problem faced by
unicellular organisms as well, which must tackle the problem
without the aid of complex nervous systems. Given the
sophisticated problem-solving abilities of the slime mould
P. polycephalum, we chose to examine this protist’s decision-
making capabilities by challenging slime mould cells with
two-armed bandit problems of increasing difficulty. Beginning
with a simple, static choice between two arms of different
quality, we advanced through levels of difficulty to our most
challenging trials in which slime moulds must make decisions
in noisy, unpredictable environments. We next uncovered the
proximate decision rules used by slime moulds to make
‘good’ decisions. Finally, we used Bayesian model selection
to select the most likely behavioural algorithm employed by
the slime mould cells, and compared this strategy to the per-
formance expected by the more complex, and in some cases
provably optimal algorithms such as the Gittins index.
2. Material and methods
2.1. Biological material
The vegetative state of P. polycephalum, called a plasmodium, is a
large, multinucleate cell. The general morphology of a plasmo-
dium includes an extending ‘search front’ at the leading edge of
the migrating cell, typically forming a dense fan-shape. This is fol-
lowed by a system of intersecting tubules towards the trailing edge
of the organism. Protoplasm is constantly and rhythmically
streamed back and forth through the network of tubules, circulat-
ing chemical signals and nutrients throughout the cell (see videos
at https://chrisrreid.wordpress.com/labwork/).
We maintained P. polycephalum plasmodia on plates of 1%
w/v agar with 5% w/v dissolved oat powder (Muscle FeastTM
Whole Oat Powder) in the dark at 258C. We obtained original
cultures from Carolina Biological Supply Company
w
, and
recultured laboratory stocks on new 5% oat-agar plates weekly.
2.2. Shared experimental procedures
To challenge the slime mould with the two-armed bandit problem,
we provided P. polycephalum plasmodia (the mobile, actively fora-
ging stage of the cell’s life cycle) with a choice between two
differentially rewarding environments. These two choices consti-
tuted ‘arms’ of the two-armed bandit, and differed in their
amount and distribution of rewarding food sites (examples pro-
vided in figure 1). By expanding pseudopodia equally into both
environments, the cell could initially explore both arms. If capable
of choosing the better environment, the cell should eventually
switch from exploration to exploitation, and continue moving
only on the more rewarding arm. We considered the point at
which this happens to be where the cell made its decision.
The arms were 31 mm in length, containing varying 1 mm
blocks of either 1% w/v blank agar or 1% w/v agar with 5%
w/v dissolved oat powder (for all set-ups, see the electronic sup-
plementary material, figure S1). A 6 mm blank agar block was
placed between the two arms and acted as the start position for
the cells.Experiments were set up in lidded Petri dishes. The
first block along each arm was always a 5% oat-agar block to
ensure exploration of at least one site along each arm. The arm
with the greater number of oat-agar blocks was designated the
high-quality (HQ) arm, and the other designated the low-quality
(LQ) arm. Whether the left arm wasthe HQ or LQ was randomized
31 3128 2824 2420 2016 1612 128844
LQ arm
31 3128 2824 2420 2016 1612 128844
HQ arm(a)
(b)
Figure 1. Two-armed bandit experimental set-up for Physarum polycephalum. Cell biomass was placed in the centre (yellow box). White boxes indicate blank agar
sites (non-rewarding), brown boxes indicate oat-agar food sites (rewarding). Agar sites were 1 mm in diameter. The first site on either arm was always a5%oat-
agar food site, to ensure the cell initialized exploration on both arms. Pictured here are the (a) 4e versus 8e treatment, where the LQ arm has evenly distributed
reward sites, and the HQ arm has 8 evenly distributed reward sites, and (b) 4r versus 8r treatment, where the reward sites were distributed randomly. For graphic
representations of other tested distributions, see the electronic supplementary material, figure S1. (Online version in colour.)
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
2
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
between experiments. We placed 0.025 (+0.005) g of plasmodial
biomass from our culture on the start block to begin the exper-
iment. This was enough biomass to potentially cover the entire
experimental surface area during exploration, ensuring that any
differentiation we observed in biomass distribution was due to
cell choice, and not constrained by cell size. These cell fragments
begin to act as new individual plasmodia within minutes [21].
The time-course of our experiments (up to 48 h) was sufficiently
short that any redistribution of biomass was due to active cell
movement, not cell growth [22]. The first arm on which the cell
reached the last agar block was the arm we considered to be
‘chosen’. The experiment was stopped at this point. Each Petri
dish contained two replicates that were isolated from each other
by a lack of shared agar substrate.
2.3. Data collection and analysis
Each experiment lasted approximately 48 h, with up to 36 repli-
cates at a time. We took photographs every 10 min using a
GoPro Hero 3TM camera inside a darkbox. The temperature was
maintained at 258C. An LED panel beneath the Petri dishes pro-
vided illumination for photography for a duration of 10 s every
10 min. At all other times, the experiments were kept in darkness.
The images were analysed using custom-designed computer
vision software (run on MatlabTM version R2014a) that determined
the leading edges of the slime mould on each arm. We stopped
image analysis after each cell had reached the end of an arm. We
excluded any replicates where the cell left the arm and explored
the plate before reaching the end of an arm, or invaded or fused
with the adjacent replicate on a plate.
For all treatments where one arm was higher in quality than
the other, we graphed the proportion of replicates where the cell
reached the end of the HQ arm first (‘chose’ that arm). We also pro-
duced graphs depicting the dynamics of the decision-making
process, by graphing the difference in site discovery between HQ
and LQ arms for the first time each site was discovered on
either arm.
2.4. Choice scenarios
2.4.1. Baseline decision behaviour
As a baseline, we first examined how the cell behaved when given
a choice between two environments that were identical in quality.
Our treatments thus contained arms that were completely reward-
ing (31 versus 31), relatively devoid of reward (1 versus 1), or
intermittently rewarding in an evenly distributed (8e versus 8e)
or randomly distributed (8r versus 8r) pattern (electronic sup-
plementary material figure S1 and see figure S2 for the mean
reward site distributions). We compared the cell’s behaviour in
these treatments to a treatment inwhich one arm was considerably
more rewarding than the other (1 versus 8e).
2.4.2. Consistent reward with dissimilar, regular environments
As a standard bandit scenario, we chose a static, well-structured
and predictable exploration environment, where the HQ arm was
twice as rewarding as the LQ arm. We therefore set up treatments
with an LQ arm of 4 reward sites and a HQ arm of 8 reward
sites, distributed evenly along the arms (4e versus 8e, figure 1).
2.4.3. Consistent reward with dissimilar, irregular environments
We next examined how the predictability of the environment
affects the decision-making process. Our experimental set-up
allowed us to control the pattern of information received by the
cell as it explored both environments, simply by controlling the
distribution of reward sites along each arm. We were able to
ensure that asthe cell explored both arms, the information relating
to the quality of each alternative could be received in a random
manner, representing a more naturalistic and unpredictable
environment. We repeated the 4 versus 8 treatment above, but in
this case the position of each reward site along the arm was deter-
mined randomly. The treatment was thus 4r versus 8r. The mean
distributions of reward sites and the randomization method are
available in the electronic supplementary material, figure S2.
2.4.4. Sensitivity to reward differences
In many models of decision-making, the level of similarity
between options can have a large impact on the decision-making
process [23,24]. Our previous treatments used a 1 : 2 ratio of
choice quality. Keeping this constant, we first doubled the absolute
number of food sites on each arm, in both evenly distributed (8e
versus 16e) and randomly distributed (8r versus 16r) scenarios
(electronic supplementary material, figure S1). Comparing these
treatments with their 4 versus 8 counterparts informs us of the sen-
sitivity of the cell to changes in the absolute quality of the opposing
options, while keeping the relative difference in quality constant.
We next lowered this relative difference in quality between the
arms, such that the LQ arm contained 11 reward sites and the
HQ arm contained 16 reward sites (11e versus 16e; electronic sup-
plementary material, figure S1), thereby makingthe discrimination
problem harder. We also repeated the treatment with a random
distribution of reward sites (11r versus 16r; electronic supplemen-
tary material figure S1. See the electronic supplementary material,
figure S2 for distribution of reward sites).
2.4.5. Random, non-binary reward with dissimilar, irregular
environments
In the treatments above, all reward sites contained an identical
concentration of oats as food (5%). Thus, in evaluating the environ-
ments on the LQ and HQ arms, the cell need only compare the
number of times a reward has been discovered on each arm. To
solve the non-binary two-armed bandit problem, the cell must
be able to make the more sophisticated comparison of the magni-
tude of the rewards returned from each environment sampled. In
our next experiments, both arms had an equal number of reward
sites (eight) distributed randomly along their lengths. The magni-
tude of each reward site was chosen randomly between 1% and 8%
oat-agar, and the HQ arm contained twice the overall percentage
of oat-agar as the LQ arm (reward sites totalling 2.5% oat-agar
on the LQ arm and 5% on the HQ arm; electronic supplementary
material, figure S1). We refer to this treatment as the ‘non-binary’
bandit. The mean distribution of reward sites is available in the
electronic supplementary material, figure S2.
2.5. Bayesian model selection
To reveal the specific behavioural algorithm that the cell used in
each step of exploring/exploiting the environment, we considered
10 rules of varying complexity which the cell could use to accu-
rately detect the availability of food in the experimental arena
and exploit this information to maximize total food intake. We
encoded these possible mechanisms as mathematical models for
the cell’s progression and used Bayesian model selection methods
[25,26] to identify which of these models best predicted the
observed movements, over all different treatments and the entire
duration of the experiments. These models ranged from very
simple rules through more complex heuristics, to rules that
approximate optimal two-arm bandit algorithms (Thompson
Sampling [27]). Our final model was the optimal Gittins process,
which provides a benchmark performance level that cannot be
exceeded in the bandit problem, assuming a decision-maker
with a correct Bayesian prior over alternative environmental
states, and extensive computational abilities [15]. The performance
of each model was evaluated by comparison to our experimental
data (electronic supplementary material, figure S3).
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
3
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
2.5.1. Models
In each case, the model specifies the probability that the cell will
moveto the right in the next move,m
t
, conditioned on its pastexperi-
ences encoded as six variables; (i) its last previous movement
direction, (ii) whether the last movement led to a reward site,
(iii) A
R
—the number of reward sites it has encountered on the
right arm (plus one pseudo-observation); (iv) A
L
—the number of
reward sites it has encountered on the left arm (plus one pseudo-
observation); (v) B
R
—the number of non-reward sites on the right
arm (plus one pseudo-observation); and (vi) B
L
—the number of
non-reward sites on the left arm (plus one pseudo-observation).
The pseudo-observations account for the effect of a uniform prior
distribution on the density of food (between zero and one, see the
electronic supplementary material for more information). The fol-
lowing is a summary of the models considered. Where used,
I(condition) is an indicator variable that takes the value one if the
condition is met and zero otherwise. For models 7 and 8, Q(x,jA,
B) is a function that represents the rational belief of an agent that
food density is x, given previous observations Aand B(see the elec-
tronic supplementary material for details). For all mod els, since there
are only two arms to sample and cells were observed always to
explore, P(m
t
¼L)¼12P(m
t
¼R).
(1) Autocorrelation: move in the same direction as the previous
time step, P(m
t
¼R)¼I(m
t21
¼R).
(2) Anti-autocorrelation: move in the opposite direction to the
previous time step, P(m
t
¼R)¼I(m
t21
¼L).
(3) Most successes: move in the direction where the most
reward has been found, P(m
t
¼R)¼I(A
R
.A
L
)þ0.5 I(A
R
¼A
L
).
(4) Highest mean: move in the direction with the highest mean
number of encountered reward sites, P(m
t
¼R)¼I(A
R
/(A
R
þB
R
).A
L
/(A
L
þB
L
)) þ0.5 I(A
R
/(A
R
þB
R
)¼A
L
/(A
L
þ
B
L
)).
(5) Relative Successes: move with a probability in proportion to
the number of reward sites discovered on each arm, P(m
t
¼
R)¼A
R
/(A
L
þA
R
).
(6) Relative means (Thompson sampling): move with a prob-
ability in proportion to the mean number of reward sites
encountered on each arm,
Pðmt¼RÞ¼ ½AR=ðARþBRÞ
½AL=ðALþBLÞþAR=ðARþBRÞ :
(7) Most likely: move to the arm most likely to have the higher
reward density (as estimated from previous reward encoun-
ters, see below),
Pðmt¼RÞ¼Ið1
0ð1
x
QðyjAR,BRÞQðxjAL,BLÞdydx.0:5

þ0:5Ið1
0ð1
x
QðyjAR,BRÞQðxjAL,BLÞdydx¼0:5

:
(8) Probability matching: move with a probability that matches
the chance of either arm containing the higher reward den-
sity,
Pðmt¼RÞ¼ð1
0ð1
x
QðyjAR,BRÞQðxjAL,BLÞdydx:
(9) Chemotaxis: our experiments were designed to minimize dif-
fusion of food cues through the agar substrate. Nevertheless,
we included a model that accounts for chemotaxis of the cell
towards nearby food sites. If I
R
and I
L
are indicator functions
for the presence (one) or the absence (zero) of food at the next
available position on the right- and left-hand side,
respectively, then,
Pðmt¼RÞ¼
1, IR.IL
0, IR,IL
0:5, otherwise:
8
<
:
(10) Gittins Index: select the arm with the highest index, which
takes account of future expected rewards from both explora-
tion and exploitation of an arm, based on a Beta prior over its
expected Bernoulli reward probability, and a discount par-
ameter applied to future rewards. To calculate these we
adapted the MatlabTM code from [28], which implements
the calibration method for calculating Gittins indices of
single-armed bandits with Bernoulli rewards, generalizing
this to work for arbitrary hyperparameters of the Beta
distribution.
We further incorporated a noise parameter
u
(detailed in the
electronic supplementary material), which represents the pro-
portion of occasions when the cell does not follow the
dominant heuristic. We then used the standard procedure of
Bayesian performance evaluation via marginal-likelihood, and
examined the relative performance of the ‘Relative Successes’
heuristic, both described in further detail in the electronic
supplementary material.
3. Results
3.1. Choice scenarios
3.1.1. Baseline decision behaviour
Regardless of treatment (31 versus 31, 1 versus 1, 8e versus 8e or
8r versus 8r), the cell explored both arms equally, making no
decision to exploit one over the other. When one arm was con-
siderably more rewarding (1 versus 8e), the cell chose the more
rewarding arm after a short exploration period (figure 2). These
results provide the important information that (i) the cell does
not make a decision to exploit one environment over another
without information suggesting they differ in quality and
(ii) the amount of biomass we used per cell was sufficient for
the organism to fully exploit both environments simul-
taneously. Hence, in later experiments when the slime moulds
do prefer one environment over the other, they are making a
choice to do so. Though the cell is capable of exploring sites
on both arms simultaneously, the cell would then ignore the
valuable information it has acquired and which should be
useful to optimally condition the investment of biomass.
Indeed, we only observed simultaneous exploration of sites
on both arms in 5% of all timesteps over all of our experiments.
3.1.2. Consistent reward with dissimilar, regular environments
As shown in figure 3, the vast majority of replicates in the
4e versus 8e treatment completed exploitation of the
HQ environment (reached the end of the arm) first, demon-
strating that the cell can choose the better of the two
environments. This was the case for all of our subsequent treat-
ments (figure 3), and all relationships were statistically
significant (binomial test, electronic supplementary material,
table S1). The cells on average displayed a short exploration
phase for seven sites on both arms (figure 2), followed by a
rapid and exclusive exploitation of the HQ arm. In regular
environments, therefore, slime mould appears to undertake a
brief period of exploration, followed by exploitation of the
most profitable environment discovered.
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
4
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
3.1.3. Consistent reward with dissimilar, irregular environments
When the reward sites were distributed randomly along each
arm, the same overall pattern of HQ arm exploitation was
observed as for when the reward sites were distributed
evenly (figure 2). These results demonstrate that the cell is
capable of exploiting the most rewarding arm when infor-
mation is noisy and obtained randomly. The cell appears to
explore for a slightly greater distance (around 11 sites on
average) before switching to exploitation than in the above
experiments where information was arranged in a regular
distribution along the arms (figure 2). Similarly, the differ-
ence in exploitation between the arms (figure 2), and the
proportion of replicates completing exploitation of the HQ
arm first (figure 3), are often slightly higher in the evenly
distributed treatments than the randomly distributed treat-
ments. In many randomly distributed treatments, the first
few sites of exploration actually presented more reward
sites on the LQ arm than the HQ arm. Yet the decision to
exploit the HQ arm was made after a similar number of
explored sites as when the rewards were distributed evenly
above. These results suggest that the cell integrates the qual-
ity of each site discovered in the opposing environments over
several explored sites, rather than simply responding to the
first rewarding site discovered.
3.1.4. Sensitivity to reward differences
The results of the two treatments (4 versus 8 and 8 versus 16)
followed identical patterns (figure 2), with a short exploration
phase for seven sites on each arm, followed by a rapid and
exclusive exploitation of the HQ arm. There was no obvious
difference between the evenly and randomly distributed
1 versus 1
4e versus 8e
4r versus 8r
0 102030
30
10
20
0
–10
30
20
10
0
–10
30
20
10
–10
0
0102030
furthest site reached
position difference (HQ – LQ arm)
0 10 20 30 0 10 20 30
8r versus 16r 11r versus 16r non-binary
8e versus 16e 11e versus 16e 8r versus 8r
31 versus 31 1 versus 8e 8e versus 8e
food distribution as specified even random non-binary
(b)(a)(c)(d)
(e)(f)(g)(h)
(i) (k) (l)( j)
Figure 2. Difference in site discovery between HQ and LQ arms for the first time each site was discovered on either arm. For all treatments, we graphed the
difference between the positions of the leading edges of the cell on each arm as each sequential site was first discovered on either arm. That is, whenever
the ith site was first reached on either arm, the number of sites discovered on the LQ arm was subtracted from the number of sites discovered on the HQ
arm. Hence, for treatments where one arm contained a higher reward than the other, positive values indicate more biomass on the HQ arm than the LQ
arm, and negative values indicate the opposite. A position difference of zero indicates that both arms were exploited equally. Where both arms were equally
rewarding, a positive difference indicates choice of the left arm. Filled circles are experimental data means, error bars are the 95% CIs. Solid lines are the pattern
of site discovery predicted by the Relative Successes model, shaded regions are 1.96 s.e. Points where the shaded regions and error bars do not overlap with a
position difference of zero indicate a statistically significant preference for the HQ arm (where the difference is positive) or LQ arm (where the difference is negative),
at the 95% confidence level. (Online version in colour.)
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
5
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
treatments. The choice pattern of the cell indicates that absolute
differences between two options are not as important as their
relative difference in quality—the cell takes into account
the unique qualities of the alternatives available to it, and
chooses the better of the two. Previous choice experiments in
P. polycephalum decision-making have shown that the cell is
capable of making a relative comparison of the quality of two
or three food sources provided simultaneously [7,11,29]. How-
ever in our experiments, the cell was required to integrate the
amount of food present over multiple distant sites and discov-
ered at different times in the exploration process, in order to
determine the better of two foraging environments. The
reduction in relative difference in quality (11 versus 16) did
not result in an extended exploration period; however, the
overall difference in exploitation was slightly lower than in
the other treatments (figure 2).
3.1.5. Random, non-binary reward with dissimilar, irregular
environments
Even for our most complicated choice scenario, the pattern of
exploration/exploitation was similar to those reported above;
a period of exploration of both arms extending to around 12
sites, followed by rapid exploitation of the HQ arm (figure 2).
This simple pattern belies a sophisticated and complex pro-
blem-solving capability for this protist. Taken together, our
results demonstrate that P. polycephalum is able to integrate
the total food quantity and quality in two randomly provi-
sioned environments, in order to swiftly and accurately
predict which environment will provide the most resources
for future growth.
3.2. Bayesian model selection
The model selected with the highest marginal-likelihood was
‘Relative Successes’; P(m
t
¼R)¼A
R
/(A
L
þA
R
), in which the
probability of exploring each arm (e.g. P(m
t
¼R) for the right
arm) is proportional to the number of successes (rewards)
previously encountered on that arm (e.g. A
R
for the right
arm; electronic supplementary material, figure S3). Figure 2
compares the performance of Relative Successes to the per-
formance of the slime mould for each choice scenario (the
relative performance of the other models is provided in the
electronic supplementary material, figures S5 S13). Impor-
tantly, the decision-making heuristic ‘Chemotaxis’ performed
quite poorly in comparison, providing strong evidence that
the recent experience of the cell is the information driving
decision-making, and not solely chemotaxis towards the arm
with the highest reward.
The Relative Successes strategy invokes a level of sophisti-
cation far greater than many of our proposed strategies, yet is
computationally simpler than the ‘optimal’ strategies such as
Thompson Sampling and the Gittins Index. As shown in the
electronic supplementary material, figure S4, this strategy still
performs well relative to the best achievable performance.
Furthermore, the Relative Successes heuristic can be employed
in a fully decentralized manner at the local level in the cell by
reinforcing exploitation in HQ areas (as in [6]), so it does not
require complex global processing based on calculations of
either arm being the best. Nonetheless, this strategy performs
well in identifying and exploiting the arm with the highest
reward, as shown in our experiments and simulations.
4. Discussion
The capacity to solve the two-armed bandit problem has pre-
viously only been demonstrated in animals with brains.
Human subjects have repeatedly been tested with the
multi-armed bandit problem and are usually deemed to oper-
ate sub-optimally. It is commonly thought that human
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 versus 8e
proportion reaching end of HQ arm first
4e versus 8e
4r versus 8r
11e versus 16e
11r versus 16r
8e versus 16e
8r versus 16r
non-binary
Figure 3. Proportion of replicates in each treatment that reached the end of the HQ arm before they reached the end of the LQ arm. All treatments showed a
proportion reaching the end of the HQ arm significantly greater than what would be expected by chance or if the cell could not choose between the two environ-
ments (dashed line, 0.5. Binomial test. For p-values, see the electronic supplementary material, table S1). Only treatments containing two different quality
environments are shown. (Online version in colour.)
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
6
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
subjects tend to naturally allow for the possibility that
reward rates on the different arms change over time
(termed a ‘restless bandit’). Hence, humans tend to switch
between exploration and exploitation, and rarely maximize
their reward by exclusively exploiting the HQ option [30].
Pigeons [31], great tits [17] and stickleback fish [19] have
been shown to learn to exclusively exploit the HQ option in
a two-armed bandit scenario. Besides using biological sub-
jects with sophisticated nervous systems, these previous
studies all recorded an increase in efficiency gained through
repeated testing, and hence learning on behalf of the subject.
Our experiments were not specifically designed to test for the
effects of learning, in contrast to the previous animal studies;
slime mould cells in our experiment were each tested a single
time, and so could not learn from past testing. Therefore, the
efficiency of the slime mould’s strategy described in our
results is the result of evolution, rather than individual
experience. In the future, it could be interesting to investigate
whether repeated testing with P. polycephalum leads to an
increase in efficiency through learning, given their documented
abilities to predict occurrences of events [4].
The non-human animals tested with the multi-armed
bandit problem in previous studies [17,19,31] performed
close to the optimal rate predicted by the models proposed
by the authors. These previous studies only compared their
empirical results to models based on economic optimality,
whereas in our study we also chose models that tempered
pure optimality measures with reasonable biological con-
straints within which the tested system should operate. The
ultimate reason why such a problem-solving capacity might
be necessary for a unicellular organism seems clear. The natural
foraging environment of P. polycephalum is the forest floor,
where its prey resources of fungi, bacteria and decaying veg-
etable matter are distributed patchily [32]. The amoeboid
form and large size (potentially exceeding 930 cm
2
[33]) of
the slime mould results in a large area of the environment
which can be sensed and explored simultaneously. The ability
to quickly compute which areas of the foraging environment
will lead to the highest nutritional payoff, and to abandon all
areas less profitable, should result in increased fitness and
hence be favoured by natural selection.
Without a brain or even neurons, what physical or bio-
chemical mechanisms could be responsible for slime mould
decision-making? The slime mould possesses a unique,
coupled-oscillator based sensorimotor system that may be the
key to its highly developed problem-solving abilities. The cell
is composed of many small units, each oscillating at a fre-
quency dependent upon both the local environment and
interactions with neighbouring oscillators [34]. When one of
these units senses attractants such as food, it oscillates faster,
stimulating neighbouring units to do the same, and causing
cytoplasm to flow towards the attractant [7]. The reverse pro-
cess is initiated when repellents such as light are perceived.
The collective behaviour of these coupled oscillators, each pas-
sing on information to entrain its neighbours, is the most likely
platform of decision-making.
The majority of models of decision-making have focused
on how neurons in the vertebrate brain interact to reach a
decision [35–37]. The central mechanism behind most of
these models is the notion that ‘evidence’ in favour of each
alternative, in the form of firing rate, builds in competing
neurons until a decision threshold is reached [35]. The inter-
action of competing oscillators in distant regions of the cell
may form an analogous function in the slime mould. Evi-
dence in favour of each environment is sensed through the
cell membrane and influences the local oscillation pattern.
The local oscillation pattern influences the width of transport
tubules, and hence controls the flow of protoplasm [34].
Distant oscillators entrain to each other’s frequencies, leading
to interactions that may influence the final decision and the
rate at which it is reached, providing a potential analogy to
models of human brains [11]. Similarities in the fundamental
principles of such vastly different decision-making systems as
human brains, slime mould, and social insect colonies have
recently come to the attention of researchers [1,7,38–40].
These similarities raise the compelling notion that deep prin-
ciples of decision-making, problem-solving and information
processing are shared by most, if not all, biological systems.
Our framework is a tool for the comparative study of infor-
mation processing between species and indeed across
nearly all taxa.
The advanced problem-solving capacity of the slime
mould, at a level previously demonstrated only in brained
organisms, provides support for the view that many ‘lower’
organisms can perform cognition-like feats in the absence of
a nervous system (often termed ‘minimal cognition’ [41–44]).
Intelligence, perception and traditionally higher order
cognitive processes are understood to be derived from
sensory-motor coupling [41,45]. Classic models separate the
‘lower’ and ‘higher’ organisms by the flow of sensorimotor
information processing between the organism and its environ-
ment [46,47]; non-cognitive organisms are defined by their
reaction to external stimuli without internal feedback between
the stimulus receptor and the site of action. By contrast, cogni-
tive organisms modulate the receptor by internal neural
feedback from the site of action [46,47]. More recent models
of cognition argue that there are manyalternative sensorimotor
systems that may replace the function of the nervous system in
cognition. For instance, van Duijn et al. [41] argue that the two-
component signal transduction system of the bacterium Escher-
ichia coli is a functional sensorimotor equivalent of a nervous
system. According to classic cognition models, the oscillation
system of P. polycephalum may also be a sensorimotor analogue
of a nervous system; as information is transferred throughout
the cell along the oscillating membrane, oscillators provide
internal feedback to each other, and modulate each other’s
actions. Our results show that taking a wider, more inclusive
view of cognition allows a greater appreciation for the broad
diversity of information processing, problem-solving and
decision-making strategies spread across all taxa.
Data accessibility. The raw data are deposited in an Open Science
Framework online repository, which can be downloaded via this
link; https://osf.io/c4mbk/?view_only=36ef8f82e0104a83be1625e
9762fa4d5.
Authors’ contributions. C.R.R. designed experiments, C.R.R. and
H. M. performed experiments, C.R.R., R.P.M., J.A.R.M. and S.G. per-
formed analysis, R.P.M., J.A.R.M. and S.G. provided tools and
reagents, C.R.R, T.L., R.P.M., J.A.R.M and S.G. wrote the paper. All
authors gave final approval for publication.
Competing interests. We have no competing interests.
Funding. This work was funded by the Branco Weiss Society in Science
Fellowship to T.L., and by the Australian Research Council
(DP110102998) to T.L. and (DP140103643) to T.L. and S.G.
Acknowledgements. The authors thank Amyjaelle Belot and Sima Kalam
for help with performing experiments, and Warren Powell for
discussion.
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
7
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
References
1. Reid CR, Garnier S, Beekman M, Latty T. 2015
Information integration and multiattribute decision
making in non-neuronal organisms. Anim. Behav.
100, 44–50. (doi:10.1016/j.anbehav.2014.11.010)
2. Nakagaki T, Yamada H, To
´th A. 2000 Maze-solving
by an amoeboid organism. Nature 407, 470.
(doi:10.1038/35035159)
3. Reid CR, Beekman M. 2013 Solving the Towers of
Hanoi—how an amoeboid organism efficiently
constructs transport networks. J. Exp. Biol.216,
1546–1551. (doi:10.1242/jeb.081158)
4. Saigusa T, Tero A, Nakagaki T, Kuramoto Y. 2008
Amoebae anticipate periodic events. Phys. Rev. Lett.
100, 018101. (doi:10.1103/PhysRevLett.100.018101)
5. Reid CR, Latty T, Dussutour A, Beekman M. 2012
Slime mold uses an externalized spatial ‘memory’ to
navigate in complex environments. Proc. Natl Acad.
Sci.USA 109, 17 490 –17 494. (doi:10.1073/pnas.
1215037109)
6. Tero A, Takagi S, Saigusa T, Ito K, Bebber DP, Fricker
MD, Yumiki K, Kobayashi R, Nakagaki T. 2010 Rules for
biologically inspired adaptive network design. Science
327, 439442. (doi:10.1126/science.1177894)
7. Latty T, Beekman M. 2011 Irrational decision-
making in an amoeboid organism: transitivity and
context-dependent preferences. Proc. R. Soc. B 278,
307–312. (doi:10.1098/rspb.2010.1045)
8. Tversky A, Simonson I. 2014 Context-dependent
preferences. Man. Sci.39, 1179– 1189. (doi:10.
1287/mnsc.39.10.1179)
9. Schuck-Paim C, Kacelnik A. 2007 Choice processes in
multialternative decision making. Behav. Ecol.18,
541–550. (doi:10.1093/beheco/arm005)
10. Shafir S, Waite T, Smith B. 2002 Context-dependent
violations of rational choice in honeybees (Apis
mellifera) and gray jays (Perisoreus canadensis).
Behav. Ecol. Sociobiol.51, 180–187. (doi:10.1007/
s00265-001-0420-8)
11. Latty T, Beekman M. 2011 Speed-accuracy trade-offs
during foraging decisions in the acellular slime
mould Physarum polycephalum.Proc. R. Soc. B 278,
539–545. (doi:10.1098/rspb.2010.1624)
12. Couzin I. 2007 Collective minds. Nature 445, 715.
(doi:10.1038/445715a)
13. Marshall JAR, Franks NR. 2009 Colony-level
cognition. Curr. Biol.19, R395R396. (doi:10.1016/
j.cub.2009.03.011)
14. Gittins J. 1979 Bandit processes and dynamic
allocation indices. J. R. Stat. Soc. Ser. B 41,
148–177.
15. Jones DM, Gittins JC. 1972 A dynamic allocation
index for the sequential design of experiments. In
Progress in statistics (ed. J Gani), pp. 241 266.
Amsterdam, The Netherlands: North-Holland.
16. Toyokawa W, Kim H-R, Kameda T. 2014 Human
collective intelligence under dual exploration-
exploitation dilemmas. PLoS ONE 9, e95789. (doi:10.
1371/journal.pone.0095789)
17. Krebs JR, Kacelnik A, Taylor P. 1978 Test of optimal
sampling by foraging great tits. Nature 275,
27–31. (doi:10.1038/275027a0)
18. Shettleworth SJ, Plowright CMS. 1989 Time
horizons of pigeons on a two-armed bandit. Anim.
Behav.37, 610–623. (doi:10.1016/0003-
3472(89)90040-7)
19. Thomas G, Kacelnik A, Van Der Meulen J. 1985 The
three-spined stickleback and the two-armed bandit.
Behaviour 93, 227–240. (doi:10.1163/
156853986X00900)
20. Keasar T, Rashkovich E, Cohen D, Shmida A. 2002
Bees in two-armed bandit situations: foraging
choices and possible decision mechanisms. Behav.
Ecol.13, 757–765. (doi:10.1093/beheco/13.6.757)
21. Kobayashi R, Tero A, Nakagaki T. 2006 Mathematical
model for rhythmic protoplasmic movement in the
true slime mold. J. Math. Biol.53, 273– 286.
(doi:10.1007/s00285-006-0007-0)
22. Dove WF, Rusch HP. 1980 Growth and differentiation
in Physarum polycephalum. Princeton, NJ: Princeton
University Press.
23. Deco G, Scarano L, Soto-Faraco S. 2007 Weber’s law
in decision making: integrating behavioral data in
humans with a neurophysiological model.
J. Neurosci.27, 11 19211 200. (doi:10.1523/
JNEUROSCI.1072-07.2007)
24. Kacelnik A, Brito e Abreu F. 1998 Risky choice and
Weber’s Law. J. Theor. Biol.194, 289– 298. (doi:10.
1006/jtbi.1998.0763)
25. Jeffreys H. 1939 Theory of probability. Oxford, UK:
Oxford University Press.
26. Mann RP, Perna A, Stro
¨mbom D, Garnett R, Herbert-
Read JE, Sumpter DJT, Ward AJW. 2013 Multi-scale
inference of interaction rules in animal groups using
Bayesian model selection. PLoS Comput. Biol.9,
113. (doi:10.1371/journal.pcbi.1002961)
27. Thompson W. 1933 On the likelihood that one
unknown probability exceeds another in view of the
evidence of two samples. Biometrika 25, 285294.
(doi:10.1093/biomet/25.3-4.285)
28. Gittins J, Glazebrook K, Weber R. 2011 Multi-armed
bandit allocation processes. New York, NY: Wiley-
Blackwell.
29. Latty T, Beekman M. 2010 Food quality and the risk
of light exposure affect patch-choice decisions in the
slime mold Physarum polycephalum.Ecology 91,
22–27. (doi:10.1890/09-0358.1)
30. Zhang S, Lee MD, Munro M. 2009 Human and
optimal exploration and exploitation in bandit
problems. Ratio 13, 15.
31. Herrnstein R, Loveland D. 1975 Maximizing and
matching on concurrent ratio schedules. J. Exp.
Anal. Behav.24, 107– 116. (doi:10.1901/jeab.1975.
24-107)
32. Kuserk F. 1980 The relationship between cellular
slime molds and bacteria in forest soil. Ecology 61,
1474–1485. (doi:10.2307/1939055)
33. Kessler D. 1982 Cell biology of Physarum and
Didymium. Sydney, Australia: Academic Press.
34. Ueda T, Hirose T, KobatakeY. 1980 Membrane biophysics
of chemoreception and taxis in the plasmodium of
Physarum polycephalum.Biophys. Chem.11, 461– 473.
(doi:10.1016/0301-4622(80)87023-2)
35. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen JD.
2006 The physics of optimal decision making: a
formal analysis of models of performance in two-
alternative forced-choice tasks. Psychol. Rev.113,
700–765. (doi:10.1037/0033-295X.113.4.700)
36. Livnat A, Pippenger N. 2006 An optimal brain can
be composed of conflicting agents. Proc. Natl Acad.
Sci. USA 103, 3198–3202. (doi:10.1073/pnas.
0510932103)
37. Chittka L, Skorupski P, Raine NE. 2009 Speed-
accuracy tradeoffs in animal decision making. Trends
Ecol. Evol.24, 400– 407. (doi:10.1016/j.tree.2009.
02.010)
38. Marshall JAR, Bogacz R, Dornhaus A, Planque
´R,
Kovacs T, Franks NR. 2009 On optimal decision-
making in brains and social insect colonies.
J. R. Soc. Interface 6, 1065–1074. (doi:10.1098/rsif.
2008.0511)
39. Nicolis SC, Zabzina N, Latty T, Sumpter DJT.
2011 Collective irrationality and positive feedback.
PLoS ONE 6, e18901. (doi:10.1371/journal.pone.
0018901)
40. Sasaki T, Pratt SC. 2011 Emergence of group
rationality from irrational individuals. Behav. Ecol.
22, 276281. (doi:10.1093/beheco/arq198)
41. Van Duijn M. 2006 Principles of minimal cognition:
casting cognition as sensorimotor coordination.
Adapt. Behav.14, 157– 170. (doi:10.1177/
105971230601400207)
42. Mu¨ller BS, di Primio F, Lengeler JW. 2001
Contributions of minimal cognition to flexibility. In
SCI 2001 Proc. on the 5th World Multi-Conf. on
Systemics, Cybernetics and Informatics, vol. XIII,
pp. 229–234. Orlando, FL: SCI.
43. Di Primio F, Mu¨ller BS, Lengeler JW. 2000 Minimal
cognition in unicellular organisms. In SAB2000 Proc.
Supplement, Int. Soc. Adapt. Behav., Honolulu, HI,
pp. 3–12.
44. Calvo Garzon P, Keijzer F. 2011 Plants: Adaptive
behavior, root-brains, and minimal cognition.
Adapt. Behav.19, 155– 171. (doi:10.1177/
1059712311409446)
45. Thompson E. 2006 Sensorimotor subjectivity and
the enactive approach to experience. Phenomenol.
Cogn. Sci.4, 407– 427. (doi:10.1007/s11097-005-
9003-x)
46. Von Uexku¨ll J. 1926 Theoretical biology. K. Paul,
Trench, Trubner & Company Limited.
47. Fuster JM, Bressler SL. 2012 Cognit activation:
a mechanism enabling temporal integration in
working memory. Trends Cogn. Sci.16, 207– 218.
(doi:10.1016/j.tics.2012.03.005)
rsif.royalsocietypublishing.org J. R. Soc. Interface 13: 20160030
8
on June 8, 2016http://rsif.royalsocietypublishing.org/Downloaded from
... As we were testing theory developed to explain decision-making by animals with brains, we conducted psychophysical experiments with human subjects. However, we also conducted foraging experiments with a unicellular slime mould; testing theory across multiple species and behavioural tasks increases confidence when multiple agreements with theory are observed [11], and slime moulds have become a model system, with multiple experiments seeking to reproduce behavioural predictions from neuroscience and psychology [24][25][26]. ...
... Unicellular organisms have also been suggested to implement optimal decision rules [27], have been used to test evidence accumulation theory [24,25], and are describable with dynamical models closely related to neural network models [28]. Here, using slime moulds of the species Physarum polycephalum, we also observed strong empirical evidence for magnitude sensitivity with three alternative foraging sources. ...
Article
Full-text available
Optimality analysis of value-based decisions in binary and multi-alternative choice settings predicts that reaction times should be sensitive only to differences in stimulus magnitudes, but not to overall absolute stimulus magnitude. Yet experimental work in the binary case has shown magnitude sensitive reaction times, and theory shows that this can be explained by switching from linear to multiplicative time costs, but also by nonlinear subjective utility. Thus disentangling explanations for observed magnitude sensitive reaction times is difficult. Here for the first time we extend the theoretical analysis of geometric time-discounting to ternary choices, and present novel experimental evidence for magnitude-sensitivity in such decisions, in both humans and slime moulds. We consider the optimal policies for all possible combinations of linear and geometric time costs, and linear and nonlinear utility; interestingly, geometric discounting emerges as the predominant explanation for magnitude sensitivity.
... A non-neural organism with an exceptionally versatile behavioural repertoire is the slime mould Physarum polycephalum -a unicellular, network-shaped organism (Sauer, 1982) of macroscopic dimensions, typically ranging from a millimeter to tens of centimeters. P. polycephalum's complex behaviour is most impressively demonstrated by its ability to solve spatial optimisation and decisionmaking problems (Nakagaki et al., 2000;Tero et al., 2010;Nakagaki and Guy, 2007;Dussutour et al., 2010;Reid et al., 2016), exhibit habituation to temporal stimuli (Boisseau et al., 2016), and use exploration versus exploitation strategy (Aono et al., 2014). Recently, P. polycephalum was found capable of encoding memory about food source locations in the hierarchy of its body plan (Kramar and Alim, 2021) in a process much reminding of synaptic facilitation-the brain's way of creating memories (Jackman and Regehr, 2017). ...
... P. polycephalum is renowned for its ability to make informed decisions and navigate a complex environment (Nakagaki et al., 2000;Tero et al., 2010;Nakagaki and Guy, 2007;Dussutour et al., 2010;Reid et al., 2016;Boisseau et al., 2016;Aono et al., 2014;Ueda et al., 1976;Miyake et al., 1991). It would be fascinating to next follow the variability of contraction dynamics during more complex decision-making processes. ...
Article
Full-text available
What is the origin of behaviour? Although typically associated with a nervous system, simple organisms also show complex behaviours. Among them, the slime mold Physarum polycephalum , a giant single cell, is ideally suited to study emergence of behaviour. Here, we show how locomotion and morphological adaptation behaviour emerge from self-organized patterns of rhythmic contractions of the actomyosin lining of the tubes making up the network-shaped organism. We quantify the spatio-temporal contraction dynamics by decomposing experimentally recorded contraction patterns into spatial contraction modes. Notably, we find a continuous spectrum of modes, as opposed to a few dominant modes. Our data suggests that the continuous spectrum of modes allows for dynamic transitions between a plethora of specific behaviours with transitions marked by highly irregular contraction states. By mapping specific behaviours to states of active contractions, we provide the basis to understand behaviour’s complexity as a function of biomechanical dynamics.
... Tracking of gradients entails changes in local oscillators and the entrainment of other oscillators [85]. Adjustments to oscillations are implicated in learning [16], memory encoded in the tubule sizes within the body [82] and decision making [86]. A formal model of information transfer within the body of Physarum relies on adjusting oscillations [87]. ...
Article
Navigational mechanisms have been characterized as servomechanisms. A navigational servomechanism specifies a goal state to strive for. Discrepancies between the perceived current state and the goal state specify error. Servomechanisms adjust the course of travel to reduce the error. I now add that navigational servomechanisms work with oscillators, periodic movements of effectors that drive locomotion. I illustrate this concept selectively over a vast range of scales of travel from micrometres in bacteria to thousands of kilometres in sea turtles. The servomechanisms differ in sophistication, with some interrupting forward motion occasionally or changing travel speed in kineses and others adjusting the direction of travel in taxes. I suggest that in other realms of life as well, especially in cognition, servomechanisms work with oscillators.
... But continued search comes at the cost of losing additional money. What results is a trade-off between continuing to play the current best choice arm based on existing knowledge of the reward distributions (exploitation) or testing other unknown arms to potentially improve the long-term payout (exploration) (Audibert et al., 2009;Reid et al., 2016). In primates and rodents, recording neural activity and manipulating neurotransmitter concentration during bandit tasks has implicated large subcortical structures in regulating the balance between exploration and exploitation (Cinotti et al., 2019;Costa et al., 2019). ...
Thesis
An animal’s survival depends on timely decisions informed by sensory information. Studies in humans and large model organisms have elucidated auxiliary roles of large brain regions in the evolution of such perceptual decisions. What remains challenging is acquiring a detailed understanding of the underlying neural mechanisms at a synaptic level and across entire brain circuits. The Drosophila melanogaster larva is an apt model system for probing the mechanisms of decision-making given its rich behavioural repertoire, small nervous system, genetic tractability, and available neuronal wiring diagrams. Taking inspiration from the application of two-alternative forced choice (TAFC) tasks to study perceptual decision-making in other model systems, I employed a closed-loop system to optogenetically activate larval nociceptive neurons based on the direction of precisely detected lateral head sweeps (i.e. casts). I sought to uncover the behavioural computations driving the stereotyped larval navigation sequence comprising repeated head casts followed by crawling in a new direction. I found that in control conditions where stimulus intensity is identical between left and right casts, the percentage of larvae that stop exploration and crawl in the direction favourable for survival (i.e. toward the first stimulated direction) significantly increases with number of casts. However, in experimental conditions where the aversive stimulus differs between sides, the percentage that accept the correct side (i.e. lower intensity) increases more significantly with cast number. When controlling for integrated intensity across casts, I observe a higher fraction of larvae accepting the lower intensity stimulus in experimental conditions compared to controls. These results suggest a mechanism of side-to-side comparison and possible sensory evidence accumulation that facilitates improved decision-making. In this thesis, I introduce the construction and implementation of two computational models for comparison to the larval behaviour trajectories. Both models reflect features of the experiment paradigm, though they differ in their assumptions about how the larva uses information from its environment to guide the acceptance or rejection of a given cast. The resulting predictions I generated about larval behaviour capture some, but not all, qualitative signatures within both the experimental and control datasets. I explore avenues for future model investigation and collection of additional behavioural data in order to draw more definitive mechanistic conclusions. While powerful, the closed-loop system I employed tracks only a single larva at a time. Transitioning my sensory discrimination task to a high-throughput system would be advantageous not only to expand the investigation of other stimulus levels but also to screen stimuli of different valences or from other sensory modalities. In this thesis, I detail my contributions to the development, validation, testing, and experimental application of a new tracking system that is capable of behaviour detection and closed-loop optogenetic and thermogenetic stimulation of 16 larvae simultaneously. This facilitated the first observations of operant conditioning in the Drosophila larva in which the animal successfully adapted its casting behaviour following repeated coupling with reward presentation. Although operant learning occurs over a longer time scale than perhaps what is required for perceptual decision-making, the two tasks are related in creating an association between the animal’s body posture and available sensory information. Together, my work on the sensory discrimination task, behavioural modeling, tool development, and analysis of the operant learning results lays a foundation for future investigation of decision-making behaviour in Drosophila larvae, with implications for further understanding the circuit mechanisms underlying larval taxis, learning, and memory.
... As demonstrated in [58,59,60,61], at least n = 3 electromagnets are required to achieve 3 DOFs pointing control. Besides, the position control can be effectively realized with at least n = 4 magnets in 3D workspace, but up to 5 coils are com-300 monly used to improve the system stability [62]. ...
Thesis
Full-text available
This research work mainly focuses on the study of the modeling and control of microrobotic systems in a biomedical context. So far, the use of magnetic actuation has been regarded as the most convenient approach for such achievements. Besides, the cardiovascular system allows reaching most parts of the human body and is then chosen as the main navigation route. This original topic is a rapidly expanding field whose ambition is to modernize current therapies by trying to improve therapeutic targeting while improving patient comfort. To achieve this goal, a good understanding of how microrobots evolve in the human body is an important step. The theoretical foundations and the physical laws that make it possible to describe the various phenomena which act on magnetic microrobots in vascular-like environments have thus been deeply studied. Methodologies for dealing with multiphysics approaches combining different sources of hypotheses and uncertainties have been developed. Great care has been taken in their validations by experimentation when possible, otherwise by numerical analysis. This helps to better understand the dominant dynamics, as well as the predominant parameters in the description of magnetic microrobots in a vascular-like environment. This makes it possible to efficiently characterize and predict their behaviors in a viscous flow and their responses to magnetic fields. On this basis, advanced navigation strategies have been developed. The navigation process can be divided into two stages. First, safe and efficient navigation paths are planned (off-line) based on the fast marching method (FMM). With the proposed navigation planning framework, different constraints and objectives can then be taken into account to obtain a truly feasible reference path. Second, control schemes that drive the magnetic microrobots along the planned reference path to the targeted location are synthesized. To do so, predictive and optimal control laws have been implemented. All the proposed models and navigation strategies have been evaluated through various experiments under different conditions with the platforms developed at the PRISME Laboratory.
Article
Signal detection theory (SDT) has been widely used to identify the optimal response of a receiver to a stimulus when it could be generated by more than one signaler type. While SDT assumes that the receiver adopts the optimal response at the outset, in reality, receivers often have to learn how to respond. We, therefore, recast a simple signal detection problem as a multi-armed bandit (MAB) in which inexperienced receivers chose between accepting a signaler (gaining information and an uncertain payoff) and rejecting it (gaining no information but a certain payoff). An exact solution to this exploration–exploitation dilemma can be identified by solving the relevant dynamic programming equation (DPE). However, to evaluate how the problem is solved in practice, we conducted an experiment. Here humans (n = 135) were repeatedly presented with a four readily discriminable signaler types, some of which were on average profitable, and others unprofitable to accept in the long term. We then compared the performance of SDT, DPE, and three candidate exploration–exploitation models (Softmax, Thompson, and Greedy) in explaining the observed sequences of acceptance and rejection. All of the models predicted volunteer behavior well when signalers were clearly profitable or clearly unprofitable to accept. Overall however, the Softmax and Thompson sampling models, which predict the optimal (SDT) response towards signalers with borderline profitability only after extensive learning, explained the responses of volunteers significantly better. By highlighting the relationship between the MAB and SDT models, we encourage others to evaluate how receivers strategically learn about their environments.
Article
Cells are complex biochemical systems whose behaviors emerge from interactions among myriad molecular components. Computation is often invoked as a general framework for navigating this cellular complexity. However, it is unclear how cells might embody computational processes such that the theories of computation, including finite-state machine models, could be productively applied. Here, we demonstrate finite-state-machine-like processing embodied in cells using the walking behavior of Euplotes eurystomus, a ciliate that walks across surfaces using fourteen motile appendages (cirri). We found that cellular walking entails regulated transitions among a discrete set of gait states. The set of observed transitions decomposes into a small group of high-probability, temporally irreversible transitions and a large group of low-probability, time-symmetric transitions, thus revealing stereotypy in the sequential patterns of state transitions. Simulations and experiments suggest that the sequential logic of the gait is functionally important. Taken together, these findings implicate a finite-state-machine-like process. Cirri are connected by microtubule bundles (fibers), and we found that the dynamics of cirri involved in different state transitions are associated with the structure of the fiber system. Perturbative experiments revealed that the fibers mediate gait coordination, suggesting a mechanical basis of gait control.
Article
Optimality analysis of value-based decisions in binary and multi-alternative choice settings predicts that reaction times should be sensitive only to differences in stimulus magnitudes, but not to overall absolute stimulus magnitude. Yet experimental work in the binary case has shown magnitude sensitive reaction times, and theory shows that this can be explained by switching from linear to geometric time costs, but also by nonlinear subjective utility. Thus disentangling explanations for observed magnitude sensitive reaction times is difficult. Here for the first time we extend the theoretical analysis of geometric time-discounting to ternary choices, and present novel experimental evidence for magnitude-sensitivity in such decisions, in both humans and slime moulds. We consider the optimal policies for all possible combinations of linear and geometric time costs, and linear and nonlinear utility; interestingly, geometric discounting emerges as the predominant explanation for magnitude sensitivity.
Article
The hypothesis of extended cognition (HEC) claims that the cognitive processes that materially realise thinking are sometimes partially constituted by entities that are located external to an agent’s body in its local environment. We show how proponents of HEC need not claim that an agent must have a central nervous system, or physically instantiate processes organised in such a way as to play a causal role equivalent to that of the brain if that agent is to be capable of cognition. Focusing on the case of spatial memory, we make our argument by taking a close look at the striking example of Physarum Polycephalum plasmodium (i.e., slime mould) which uses self-produced non-living extracellular slime trails to navigate its environment. We will argue that the use of externalized spatial memory by basal organisms like Physarum is an example of extended cognition. Moreover, it is a possible evolutionary precursor to the use of internal spatial memory and recall in animals thus demonstrating how extended cognition may have emerged early in evolutionary history.
Article
Full-text available
Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis). We show that these exhibit a stereotypical 'phase transition', whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have 'memory' of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture the observed locality of interactions. Traditional self-propelled particle models fail to capture the fine scale dynamics of the system. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics, while maintaining a biologically plausible perceptual range. We conclude that prawns' movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects.
Article
Full-text available
Decision making is a necessary process for most organisms, even for the majority of known life forms: those without a brain or neurons. The goal of this review is to highlight research dedicated to understanding complex decision making in non-neuronal organisms, and to suggest avenues for furthering this work. We review research demonstrating key aspects of complex decision making, in particular information integration and multiattribute decision making, in non-neuronal organisms when (1) utilizing adaptive search strategies when foraging, (2) choosing between resources and environmental conditions that have several contradictory attributes and necessitate a trade-off, and (3) incorporating social cues and environmental factors when living in a group or colony. We discuss potential similarities between decision making in non-neuronal organisms and other systems, such as insect colonies and the mammalian brain, and we suggest future avenues of research that use appropriate experimental design and that take advantage of emerging imaging technologies.
Article
Full-text available
The exploration-exploitation dilemma is a recurrent adaptive problem for humans as well as non-human animals. Given a fixed time/energy budget, every individual faces a fundamental trade-off between exploring for better resources and exploiting known resources to optimize overall performance under uncertainty. Colonies of eusocial insects are known to solve this dilemma successfully via evolved coordination mechanisms that function at the collective level. For humans and other non-eusocial species, however, this dilemma operates within individuals as well as between individuals, because group members may be motivated to take excessive advantage of others' exploratory findings through social learning. Thus, even though social learning can reduce collective exploration costs, the emergence of disproportionate "information scroungers" may severely undermine its potential benefits. We investigated experimentally whether social learning opportunities might improve the performance of human participants working on a "multi-armed bandit" problem in groups, where they could learn about each other's past choice behaviors. Results showed that, even though information scroungers emerged frequently in groups, social learning opportunities reduced total group exploration time while increasing harvesting from better options, and consequentially improved collective performance. Surprisingly, enriching social information by allowing participants to observe others' evaluations of chosen options (e.g., Amazon's 5-star rating system) in addition to choice-frequency information had a detrimental impact on performance compared to the simpler situation with only the choice-frequency information. These results indicate that humans groups can handle the fundamental "dual exploration-exploitation dilemmas" successfully, and that social learning about simple choice-frequencies can help produce collective intelligence.
Article
Full-text available
Author Summary Group B Streptococcus (GBS) is the leading cause of neonatal invasive diseases and pili, as long filamentous fibers protruding from the bacterial surface, have been discovered as important virulence factors and potential vaccine candidates. The bacterial surface is the main interface between host and pathogen, and the ability of the host to identify molecular determinants that are unique to pathogens has a crucial role for microbial clearance. Here, we describe a strategy to investigate the immunological and structural proprieties of a protective pilus protein, by elucidating the molecular mechanisms, in terms of single residue contributions, by which functional epitopes guide bacterial clearance. We generated neutralizing monoclonal antibodies raised against the protein and identified the epitope region in the antigen. Then, we performed computational docking analysis of the antibodies in complex with the target antigen and identified specific residues on the target protein that mediate hydrophobic interactions at the binding interface. Our results suggest that a perfect balance of shape and charges at the binding interface in antibody/antigen interactions is crucial for the antibody/antigen complex in driving a successful neutralizing response. Knowing the native molecular architecture of protective determinants might be useful to selectively engineer the antigens for effective vaccine formulations.
Article
Observations of natural and manipulated populations were used to investigate the regulation of population sizes and community structure in a group of soil amoebae, the cellular slime molds (Dictyosteliida). Correlations of slime mold abundance and distributional patterns with soil bacteria in the field suggest that food supply is a potential factor in the regulation of these species' numbers. One species, Dictyostelium mucoroides, responds most visibly to seasonal changes, spring and fall being peak seasons. When total slime mold number are partitioned into active and encysted forms, amoebae account for as much as 51% and 24% of the population in the fall and spring, respectively, vs. only 10-12% of the population during the summer and winter months. Large additions of various bacteria to field plots caused significant increases in D. mucoroides numbers. Moreover, the ability of this species to respond to a second addition of bacteria, made several days later, depended on its density. High-density populations failed to respond to additional food, whereas those which had already returned to base levels showed increases. These findings support the hypotheses that cellular slime molds are food limited in nature, and that community diversity is due, at least in part, to differential resource utilization by the species in nature.
Article
Pigeons on concurrent variable-ratio variable-ratio schedules usually, though not always, maximize reinforcements per response. When the ratios are equal, maximization implies no particular distribution of responses to the two alternatives. When the ratios are unequal, maximization calls for exclusive preference for the smaller ratio. Responding conformed to these requirements for maximizing, which are further shown to be consistent with the conception of reinforcement implicit in the matching law governing relative responding in concurrent interval schedules.