ArticlePDF Available

Abstract and Figures

Distributed and hierarchical models of control are nowadays popular in computational modeling and robotics. In the artificial neural network literature, complex behaviors can be produced by composing elementary building blocks or motor primitives, possibly organized in a layered structure. However, it is still unknown how the brain learns and encodes multiple motor primitives, and how it rapidly reassembles, sequences and switches them by exerting cognitive control. In this paper we advance a novel proposal, a hierarchical programmable neural network architecture, based on the notion of programmability and an interpreter-programmer computational scheme. In this approach, complex (and novel) behaviors can be acquired by embedding multiple modules (motor primitives) in a single, multi-purpose neural network. This is supported by recent theories of brain functioning in which skilled behaviors can be generated by combining functional different primitives embedded in “reusable” areas of “recycled” neurons. Such neuronal substrate supports flexible cognitive control, too. Modules are seen as interpreters of behaviors having controlling input parameters, or programs that encode structures of networks to be interpreted. Flexible cognitive control can be exerted by a programmer module feeding the interpreters with appropriate input parameters, without modifying connectivity. Our results in a multiple T -maze robotic scenario show how this computational framework provides a robust, scalable and flexible scheme that can be iterated at different hierarchical layers permitting to learn, encode and control multiple qualitatively different behaviors.
Content may be subject to copyright.
Original Paper
Adaptive Behavior
1–25
ÓThe Author(s) 2015
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/1059712315609412
adb.sagepub.com
Learning programs is better than
learning dynamics: A programmable
neural network hierarchical
architecture in a multi-task scenario
Francesco Donnarumma
1
, Roberto Prevete
2
,AndreadeGiorgio
3
,
Guglielmo Montone
4
and Giovanni Pezzulo
1
Abstract
Distributed and hierarchical models of control are nowadays popular in computational modeling and robotics. In the arti-
ficial neural network literature, complex behaviors can be produced by composing elementary building blocks or motor
primitives, possibly organized in a layered structure. However, it is still unknown how the brain learns and encodes mul-
tiple motor primitives, and how it rapidly reassembles, sequences and switches them by exerting cognitive control. In
this paper we advance a novel proposal, a hierarchical programmable neural network architecture, based on the notion
of programmability and an interpreter-programmer computational scheme. In this approach, complex (and novel) beha-
viors can be acquired by embedding multiple modules (motor primitives) in a single, multi-purpose neural network. This is
supported by recent theories of brain functioning in which skilled behaviors can be generated by combining functional
different primitives embedded in ‘‘reusable’’ areas of ‘‘recycled’’ neurons. Such neuronal substrate supports flexible cogni-
tive control, too. Modules are seen as interpreters of behaviors having controlling input parameters, or programs that
encode structures of networks to be interpreted. Flexible cognitive control can be exerted by a programmer module
feeding the interpreters with appropriate input parameters, without modifying connectivity. Our results in a multiple T-
maze robotic scenario show how this computational framework provides a robust, scalable and flexible scheme that can
be iterated at different hierarchical layers permitting to learn, encode and control multiple qualitatively different
behaviors.
Keywords
Programming neural networks, hierarchical organization, distributed representation, neuronal reuse, cognitive control
1 Introduction
Converging evidences in literature indicate that the cap-
abilities of human and artificial agents to learn and
control complex skilled behaviors are grounded on
some mechanisms of compositionality of primitives.
According to this view, almost all behaviors (including
complex actions such as playing table tennis) can essen-
tially be generated by combining simpler motor acts
(primitives) which are picked out from a predetermined
set of primitives by following rules. One can isolate
three principles involved in the definition/comprehen-
sion of this mechanism as discussed in the following.
Reusable and adaptable primitives. Primitives can
flexibly be used and re-used in order to construct differ-
ent sequences of actions (d’Avella, Portone, Fernandez,
& Lacquaniti, 2006; Thoroughman & Shadmehr, 2000).
For example, the action of eating an apple can be
broken down into a combination of multiple motor pri-
mitives. Some motor primitives would be responsible
for reaching for the apple, some for grasping it and
some for moving the apple toward one’s own mouth.
In other words, each primitive can be used (and re-
used) in different tasks with the ability to adapt to the
specific task ‘‘on the fly’’ (Pezzulo, Donnarumma,
1
Institute of Cognitive Sciences and Technologies, National Research
Council of Italy, Italy
2
Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione,
Universita
`degli Studi di Napoli Federico, Napoli
3
KTH Royal Institute of Technology, Sweden
4
Institut Neuroscience Cognition, Universite
´Paris Descartes, France
Corresponding author:
Francesco Donnarumma, Institute of Cognitive Sciences and
Technologies, National Research Council of Italy, Via San Martino della
Battaglia, 44 Rome 00185, Italy.
Email: francesco.donnarumma@istc.cnr.it
Iodice, Prevete, & Dindo, 2015) and this mechanism is
thought to be at the basis of a shared representation
enabling social interaction with conspecifics (Candidi,
Curioni, Donnarumma, Sacheli, & Pezzulo, 2015;
Pezzulo, Donnarumma, & Dindo, 2013). Consequently,
each primitive does not have to be interpreted as a spe-
cific sequence of motor commands, but rather as a
behavior, i.e. it should be ‘‘fluid’’ and context depen-
dent (Tani, Nishimoto, & Paine, 2008).
Distributed representation. One can individuate two
kinds of behavior representation: local and distributed
representation of behavioral modules (Paine & Tani,
2005). In the first case, the acquisition of novel beha-
viors consists of adding novel, single-purpose modules,
each implementing one specific behavior (Haruno,
Wolpert, & Kawato, 2003; Igari & Tani, 2009). This
strong modularization of the behaviors (i.e. the fact
that each behavior is essentially represented in a differ-
ent module) provides flexible control and avoids prob-
lems of interference. By contrast, the distributed
representation consists of generalizing the function of
(at least some of) the modules. This idea is developed,
for example, in a number of Artificial Neural Network
(ANN) models (see e.g. Agmon & Beer, 2013; Arau´ jo,
Diniz, Passos, & Davids, 2014; Paine & Tani, 2005;
Tani, Ito, & Sugita, 2004; Woodman, Perdikis, Pillai,
Dodel, Huys, Bressler, & Jirsa, 2011), which can embed
multiple behaviors in a single, multi-purpose neural net-
work module. Importantly, recent theories of brain
functioning provide support for the existence of a
neural substrate which could implement such function-
alities; they suggest that skilled behaviors can be gener-
ated by combining functional different primitives
embedded in ‘‘reusable’’ (see Anderson, 2010) areas of
‘‘recycled’’ (Dehaene, 2005) neurons. In addition, a
number of neurophysiological evidence suggests the
presence of neural modules which exhibit drastic, rapid
and reversible changes of behaviors (Bargmann, 2012;
Park & Friston, 2013).
Hierarchical organization. Converging evidence in
neuroscience indicates that the behavioral repertoire of
living organisms is hierarchically organized and
includes multiple levels of control, spanning from sim-
ple spinal reflexes, spinal cord and brainstem, up to
sensory motor cortex and prefrontal cortex
(Donnarumma, Prevete, Chersi, & Pezzulo, 2015b;
Graziano, 2006; Hamilton & Grafton, 2007; Kelly,
1991). This hierarchical organization is widely believed
to support the flexible selection, sequencing and recom-
bination of the primitives (Flash & Hochner, 2005).
For example, concerning the premotor area of monkey
(and human) brains, they are (partially) organized in
hierarchies, with multiple levels of representations (e.g.
effector-dependent and effector-independent actions)
(Fogassi, Ferrari, Chersi, Gesierich, Rozzi, &
Rizzolatti, 2005). From a theoretical perspective, hier-
archical control has been described in terms of
(hierarchical) Bayesian systems and predictive coding
(Friston, 2003; Haruno et al., 2003). In the Artificial
Neural Network (ANN) literature, a somewhat simpler
control scheme is usually adopted, in which hierarchies
are implemented as two-level ANNs. Numerous studies
have addressed the learning of action primitives (Hioki,
Miyazaki, & Nishii, 2013; Paine & Tani, 2004; Tani,
Nishimoto, & Paine, 2008; Yamauchi & Beer, 1994),
the acquisition of multi-level control hierarchies for
robot navigation (Chersi, Donnarumma, & Pezzulo,
2013; Tani, 2003; Tani, Nishimoto, & Paine, 2008; Tani
& Nolfi, 1999) and the acquisition of sub-goals (Bakker
& Schmidhuber, 2004; Dindo, Donnarumma, Chersi,
& Pezzulo, 2015; Maisto, Donnarumma, & Pezzulo,
2015; Mcgovern & Barto, 2001; Mussa-Ivaldi & Bizzi,
2000; Thoroughman & Shadmehr, 2000).
Despite progress in understanding and defining the
capabilities of human and artificial agents to learn and
control complex skilled behaviors, still many aspects
remain unclear, such as for instance how multiple
motor primitives are encoded in the (same areas of the)
brain, what neural substrate permits their learning
without catastrophic forgetting, what the organizing
principle of control hierarchies is, how cognitive control
is exerted, or how parts of the brain can control other
parts of the brain and permit to rapidly (i.e. without re-
learning) change behavior and follow rule-like regulari-
ties. The novel hypothesis gaining ground on the neural
realization of motor primitives envisages neural circuits
capable of changing their behaviors rapidly and reversi-
bly, thus without modifying their structure or modify-
ing (re-learning) synaptic connectivity (Bargmann,
2012). Building on this, a control theory of ANN mod-
ules was developed that could express different dynami-
cal behaviors by switching among them by means of a
set of controlling input parameters (Donnarumma,
Prevete, Chersi, & Pezzulo, 2015b; Donnarumma,
Prevete, & Trautteur, 2010, 2012; Eliasmith, 2005;
Eliasmith & Anderson, 2004; Montone, Donnarumma,
& Prevete, 2011). In particular, in Donnarumma et al.
(2015b, 2012) this type of control is interpreted in terms
of the concept of programming as it is defined in the
context of computer science.
In this paper, we take a computational perspective
and propose a novel view on hierarchical organization
and control in the brain including all three properties
discussed previously. Our starting point is the approach
proposed in Donnarumma et al. (2015b, 2012), on the
basis of which we propose a Hierarchical Programmable
Neural Network Architecture (HPNNA). In particular,
we expected that learning and switching among beha-
vior codes is a ‘‘simpler’’ task if compared with learning
behavior dynamics as a whole. To this aim, here, we
extensively test the learning ability of this architecture
with respect to standard neural network approaches,
and deeply investigate the possibility to obtain multiple
programmable levels in a hierarchical fashion. HPNNA
2Adaptive Behavior
is based on fixed-weight Continuous Time Recurrent
Neural Networks (CTRNNs), which are plausible
(though highly simplified) computational models of bio-
logical neuronal networks. In keeping with the neural
evidence reviewed so far, we assume that multiple primi-
tives could be encoded in the same neural structures,
with higher neural levels exerting control over behavior
by biasing the selection among these primitives.
Compared to existing theoretical and computational
proposals, our work elaborates on the concept of pro-
grammability of neural networks, which entails two
novel proposals: a novel way to encode multiple motor
primitives in multi-purpose and reusable neural net-
works, and a novel control scheme for exerting cognitive
control. We test our approach in a Robotic scenario: a
multiple T-maze with eight possible different goals.
Firstly, our experimental scenario starts with a compari-
son of the learning capability of standard non-
programmable approach versus the HPNNA in an idea-
lization of eight different sub-tasks. Then we test the
overall HPNNA, implementing on the lower layer an
interpreter of motor primitives, receiving commands
from a higher level interpreter layer capable of sequen-
cing the low-level primitives in order to achieve the
proper task. The proposed computational scheme is
compared with a non-organized architecture (NOA), i.e.
a neural network without a structure explicitly subsum-
ing neither program nor a hierarchical organization,
and results in a robust, scalable and flexible scheme able
to successfully decompose the desired task. Because of
these features, ours results in an appealing proposal to
explain brain function and hierarchical control organi-
zation. Furthermore, this computational scheme pro-
vides many advantages from a learning perspective,
including the possibility to learn novel primitives incre-
mentally without disrupting the existing functionalities,
to flexibly reassemble and off-line learning novel beha-
vioral sequences using feedback signals generated by the
existing motor primitives and to build modular net-
works (interpreters) splitting the task space into more
manageable (learnable) parts.
2 Hierarchical programmable neural
network architecture
Our proposed architecture takes as its starting point
the programmable neural network (PNN) architecture
introduced by Donnarumma et al. (2012). This neural
model is endowed with a programming capability. The
term programmability is not intended in a metaphorical
sense but in a precise computational sense, as a general-
ization of the concept of programming to dynamical
systems (Trautteur & Tamburrini, 2007). Following this
work, a system can be considered endowed with pro-
grammability if three conditions are satisfied:
(a) there exists an effective encoding of the structure of
the single systems into patterns of input, output,
and internal variables;
(b) the codes provided by such encoding can be applied
to specific systems of the class, interpreters, realiz-
ing the behavior of the coded system;
(c) the codes can be processed by the systems of the
class on a par with the input, output, and internal
variables.
Ensuring those requirements, a PNN realizes a virtual
machine resulting in an interpreter of a finite set of
neural networks, or in other words it can simulate a well-
defined (finite) set of neural networks. In other words, a
PNN realizes a neural sub-system fully controllable (pro-
grammable) behavior without changing connectivity and
efficacies associated with the synaptic connections.
Moreover, a distributed representation scheme is also
ensured, as multiple motor primitives can be embedded
in the same (fixed-structured) neural population.
At a computational level, this can be achieved by
the presence of multiplicative sub-networks that enable
a first (programmer) network to provide input values
to a second (interpreter) network through auxiliary
input lines. More in detail, the dynamic behavior of an
artificial neural network can be defined as an output y
i
based on the sums of the products between connection
weights w
ij
and neuron output signals x
j
yi=fX
j
wij xj
!
From a mathematical point of view, one can ‘‘pull
out’’ the multiplication operation w
ij
x
j
by means of a
multiplication (mul) sub-network that can compute the
result of the multiplication between the output and the
weight, inputs to the mul sub-network
yi=fX
j
mul wij,xj

!
This procedure (called w-substitution in
Donnarumma et al. (2012)) is at the basis of the con-
struction of a PNN with a line of auxiliary inputs capa-
ble of modulating its behavior ‘‘as if ’’ the synaptic
efficacies were varied (see Figure 1). As a consequence,
a PNN gets the results of receiving two kinds of input
lines: auxiliary (or programming) input lines and stan-
dard data input lines. The newly introduced program-
ming inputs are meant to be fed with a code, or
program, describing the network to be ‘‘simulated’’.
In principle, a PNN architecture can be implemen-
ted using several kinds of recurrent neural networks.
Here we introduce an implementation of PNN
using Continuous Time Recurrent Neural Networks
Donnarumma et al. 3
(CTRNNs), which are generally considered to be biolo-
gically plausible networks of neurons and are described
by equation (1) (Beer, 1995; Hopfield & Tank, 1986)
ti
dyi
dt =yi+sX
N
j=1
wijyj+ui+Ie
i(t)
!
ð1Þ
where i2{1, .,N} and Nis the total number of the
neurons in the network. Thus, for each neuron i:
t
i
is the membrane time constant;
y
i
is the mean firing rate;
u
i
is the threshold (or bias);
s(x) is the standard logistic activation function, i.e.
s(x)= 1
1+ex;
Ie
i=PQ
j=1wi,j+Nxjis a weighted external input
current coming from Qexternal sources x
j
;
w
ij
is the synaptic efficacy (weight) of the connection
coming from the neuron jor external sources x
j
to
the neuron i.
The equation (1) has a solution of y(t)=(y
1
(t), .,
y
N
(t)) describing the dynamics of the neurons of the
network. In general such a solution cannot be found
exactly and an approximation of it can be computed by
numerical integration.
Following Donnarumma et al. (2015b, 2012) it is
possible to build a PNN, in the CTRNN framework,
that is able to simulate (behaving like an interpreter)
the behavior of the encoded CTRNN networks on the
data coming from the standard input lines when vary-
ing the programming input line.
To this aim, the first step is to build a mul network in
the CTRNN framework. A mul can be written as
mul[(um_mm=mul a,b
ðÞ
=mm+s
Cma+X
M
j=1
C0
mjmj+C00
mb
!
:ð2Þ
with m2{1, .,M}. Equation (2) describes a network
of Mneurons receiving two inputs, aand b;them-th
neuron has time constant theta
m
, mean firing rate m
m
,
C0
mj is the weight value of the connection coming from
the j-th neuron. C
m
and C00
mweight the input aand b,
respectively. The connection of the mul network, C
m
,
C0
mj and C00
mare tuned in order that solutions of equation
(2), m(t)=(m(t), .,m
M
(t)) are constrained to satisfy
lim
t!mM(t)abð3Þ
8a,b2(0,1). In Donnarumma et al. (2012) an
approximated mul solution is found for M= 3, (the
same that we used in Section 3) by means of an evolu-
tionary approach. Note that larger Mvalues, while
improving the behavior of mul networks, on the other
hand increase the cost of computational simulation. By
means of mul networks it is possible to construct a hier-
archy of PNN layers with the higher level programming
the lower in an increasing complexity of programs. In a
first approximation, each layer is composed of the
interaction of slower neurons (with higher time con-
stants) whose activity is denoted with y
n
, and faster
neurons (with smaller time constants) with activity m
m
,
belonging to the mul networks. To show how this is
achieved, we first construct a layer L
l
(1, 1), i.e. with
one slow neuron and one input source. It is described
by the following system
Ll1,1ðÞ[
t1_
y(l)
1=y(l)
1+s^
wm1
M(l)+m2
M(l)

+~
wy(l)
1+x(l)
1

u1
m_m1
m=mul y(l)
1,y(l+1)
1
 m2f1,...,M(l)g
u2
m_m2
m=mul x(l)
1,y(l+1)
2
 m2f1,...,M(l)g
8
>
>
>
<
>
>
>
:
ð4Þ
A depiction of the system of equations (4) is given in
Figure 2. The Layer is a PNN composed of a slow neu-
ron of activity y(l)
1and receiving an input x(l)
1and two
programming inputs y(l+1)
1and y(l+1)
2from the higher
level (the superscript indicates the belonging to l-th
level). The first mul network mul(l)
1=mul(y(l)
1,y(l+1)
1)
connects y(l)
1with the programming input y(l+1)
1coming
from the layer l+1. In the same way, the second mul
network mul(l)
2=mul(x(l)
1,y(l+1)
2) modulates by means
Figure 1. The ‘‘pulling out’’ of the multiplication (on top)
performed by means of the w-substitution procedure. Distinct
mul networks ‘‘break’’ weights in order to effectively implement
a PNN that acts as an interpreter of neural programs.
4Adaptive Behavior
of the programming input y(l)
1contribute of the input
source x(l)
1to y(l)
1activation. ^
wweights the connection
from neurons m1
Mof the mul (l)
1networks to the slow
neuron y(l)
1, and ~
wweights the self-connection of y(l)
1.
Equation (4) can be straightforwardly generalized to
accomplish N
(l)
slow neurons and Q
(l)
input sources
Ll(N(l),Q(l))[
tn_
y(l)
n=y(l)
n+s^
wPT(l)n+1
h=T(l)(n1)+1mh
M(l)+~
wPT(l)
j=1y(l)
j

n2f1,...,N(l)g
uk
m_mk
m=mul y(l)
k,y(l+1)
k
m2f1,...,M(l)g,k2f1,...,N(l)(l)g
8
<
:ð5Þ
where
we set yNlðÞ+j=xjand T
(l)
=N
(l)
+Q
(l)
;
mk
mis the activation of the m-th (fast) neuron of the
k-th mul network;
y(l)
nis the activation of the n-th slow neuron;
the condition on the time constants uk
mmin tn
fg
is
imposed in order that the mul networks have faster
dynamics with respect to the slow neurons y(l)
n;
^
wweights the connections from neurons mk
Mthe mul
networks to the slow neurons y(l)
n;
~
wweights the connections among slow neurons y(l)
n.
The output of a PNN can be redirected as an input
of a new PNN, i.e. this computational scheme can eas-
ily be iterated at multiple hierarchical layers, with the
result that a network playing the role of programmer
relative to a lower-level interpreter can also play the
role of an interpreter relative to a higher-level program-
mer, providing a homogeneous hierarchical organizing
principle that extends over an indefinite number of
layers (see Figure 3).
Notice that for an ideal mul, the solution of L
l
(N
(l)
,
Q
(l)
) restricted to the slow neurons
y(l)(t)= y(l)
n(t;x(l);y(lþ1)Þ;...;yðlÞ
NðlÞðt;xðlÞ;yðlþ1ÞÞ

ð6Þ
when varying the programming inputs y(l+1)
j, can
approximate solutions ytðÞ=ðfynðt;W,xi
fg
Q
i=1:gN
n=1Þ
of an ‘‘ordinary’’ CTRNN of equation (1) when
N
(l)
=N,x
i
=x
(l+1)
. Or more formally, 8e.0,8t
y(t)= y1(t;x;WÞ;...;yNðt;x;WÞÞðð7Þ
In other words, a PNN system performs a substitu-
tion of variables which lets the programming inputs
vary the system in the same way the changing of
weights does in an ‘‘ordinary’’ CTRNN. When an
approximated mul is given, however, it is a difficult
task to formally establish an ebound for large net-
works (Donnarumma, Murano, & Prevete, 2015a),
thus the fine-tuning of the system relies on experimen-
tal considerations and can be improved in a way to sat-
isfy Condition (3): (a) by increasing the ‘‘speed’’ of the
mul networks tuning the setting of time constants in
order to improve the approximation uk
mmin tn
fg
and (b) by refining its output response increasing the
size Mof the mul networks.
Finally, we stress that in our modelization, all the
connections of the layers are fixed connections and
thus, the dynamic behaviors they exhibit are completely
due only to the change of their input, i.e. data input x(l)
i
and programs from the upper layer determined by the
activation y(l+1)
j. In other words, the layer L
l+1
may
Figure 2. Depiction of a Layer L
l
(1, 1) described in the system
of equations (4). The Layer is a PNN composed of one slow
neuron yl
1and two mul networks. It receives an input x(l)
1and
two programming inputs y(l+1)
1and y(l+1)
2from the higher level.
In particular, the network mul(l)
1=mul(y(l)
1,y(l+1)
1)connects y(l)
1
with the programming input y(l+1)
1and the network
mul(l)
2=mul(x(l)
1,y(l+1)
2)connects y(l)
1with the input source x(l)
1.
Figure 3. Hierarchical multi-layer architecture of the
programmable neural network (PNN) architecture. In the
proposed scheme, the upper layers send programs to the lower
ones that act, in their turn, as a neural interpreter.
Donnarumma et al. 5
eventually fall into attractor states ‘‘readable’’ on its
neurons y(l+1)
i. These values form the programming
inputs sent to the layer L
l
, which consequently rapidly
changes its behavior dynamics. This means that the
changes of behavior we model are qualitatively differ-
ent from learning, because they do not involve synaptic
weight changes. Moreover they are reversible, because
previous enacted dynamic behaviors can be elicited
whenever suitable programming inputs to the layer are
sent.
3 Experiments
We present a Hierarchical Programming Neural
Network Architecture (HPNNA) built for a robotic
scenario. We considered an agent learning eight differ-
ent tasks corresponding to eight different goals in a
T-maze environment (see Figure 6), starting from a
fixed start location. The HPNNA is composed of two
interpreters of networks (PNNs): a (higher) level L
2
and a (lower) level L
1
(see Figure 4).
L
2
receives reach-goal codes on the programming
input lines Tand trigger inputs on the data input lines
from L
1
.L
1
receives motor-primitive codes, from L
2
,on
the programming input lines and sensor data on the
data input lines and outputs the control signals that
govern the agent in the environment. L
1
can be pro-
grammed to implement different motor primitives when
its programming input lines are fed with suitable codes.
The trigger signal encodes the completion of a motor
primitive. L
2
can be programmed to implement differ-
ent sequences of motor-primitive codes when its pro-
gramming input lines are fed with suitable codes.
The programming input learning is achieved by a
two-step learning strategy which can be described as
follows;
1. In the first step we sought 2
3
reach-goal codes for
the programming input lines of L
2
. The learning
ensures that when L
2
is fed with one of these codes,
the agent is able to perform a sequence of three con-
secutive motor primitive programs constituting the
reach-goal program. The switch between the motor
primitive programs occurs when L
2
detects a T-inter-
section by means of the trigger signal coming from
L
1
.
2. In the second step we sought two different lower
level programs, Right-Wall Follower (P
r
) and Left-
Wall Follower (P
l
), which encode the basic motor
primitives of our control architecture. The agent
exhibits two different behaviors in the environment
according to two different codes P
r
and P
l
. When L
1
Figure 5. General presentation of sought L
2
module. Its output
controls the activation of the proper motor primitive P
l
or P
r
.It
has two inputs, a reach-goal code on which the coded task is
presented, and a Trigger input, carrying the information on
when the proper primitive should be enacted.
Figure 6. The multiple T-maze scenario used in the
experiments. The starting position of the agent is indicated by a
gray circle at the bottom of the maze. Each corridor of the maze
has the same length d.G
1
,.,G
8
are the eight possible goal
positions corresponding to the given task T
1
,.,T
8
.
Figure 4. The hierarchical programmable neural network
architecture built for the robotic scenario. Two layers of
interpreters (PNNs) are present: a (higher) level L
2
and a
(lower) level L
1
.L
2
receives reach-goal codes on the
programming input lines and trigger signals on the data input
lines. L
1
receives motor-primitive codes, from L
2
, on the program
input lines and sensor data on the data input lines and outputs
the control signals which govern the robot in the environment.
6Adaptive Behavior
is fed with the programming input P
r
or P
l
the robot
follows the wall to its right or its left, respectively.
In Subsection 3.1 we show the first learning step,
preparing a synthetic dataset, in which the stimuli and
the programs are simplified in order to study the differ-
ent learning properties of the proposed architecture
versus a non-programmable one. The second step is
presented in Subsection 3.2 where the primitives are
actually learned in a simulated robotic environment
and then the overall architecture is tested in the multi-
ple T-maze simulated robotic scenario.
3.1 Learning motor primitives composition -
HPNNA versus NOA
The task of this section is the learning phase of the
L
2
PNN module. The aim of the learning is to endow
L
2
with the capability of driving the agent in a multiple
T-maze environment by sequencing specific motor
primitives (see Figure 5).
Following from Yamauchi and Beer (1994), we
sought a network capable of changing its state when an
external trigger is given. Let us suppose we have a net-
work that selects the two programs P
r
and P
l
by means
of the output of one of its neurons. A high value of this
neuron selects the program P
r
while a low value selects
the program P
l
.
In order to test how able the proposed architecture is
to learn different programs, in this section we prepare a
synthetic dataset in order to capture the stylized differ-
ent tasks in a multiple T-maze code. We compare our
HPNNA versus a traditional non-organized architec-
ture (NOA, see below) showing how learning multiple
behaviors is computationally more difficult with respect
to learning multiple behavior codes. In this experimental
scenario we imagined an agent exploring a multiple
T-maze (see Figure 6). In the considered mazes each
corridor has the same length d. This dparameter is an
environment variable we varied during the experiments.
Accordingly with our strategy, the control module
of HPNNA, sequencing primitives, L
2
has two kinds of
input lines:
the data input line is fed with the external trigger;
the programming input line encodes the different
sequences that constitute our high level program.
Thus, given the fixed structure interpreter L
2
,we
learn the structure of a neural network memorizing the
input codes to be sent to L
2
, testing our HPNNA
approach. As a comparison a similar learning is per-
formed on a CTRNN layer that it is not structured as
in equation (5) but follows the ordinary CTRNN equa-
tion (1); we refer to this module as a non-organized
architecture (NOA).
By means of layer L
2
, the agent is supposed to con-
trol two different low-level primitives:
Left-Wall Follower denoted with P
l
, i.e. the beha-
vior ‘‘follow the wall on the left’’;
Right-Wall Follower denoted with P
r
, i.e. the beha-
vior ‘‘follow the wall on the right’’.
We assume to learn a control module of an agent,
with two inputs, a task-input Tand a trigger-input I
D
,
plus a motor-output Ucalling the two primitives, P
l
or
P
r
. The agent is supposed to move inside the maze per-
ceiving it with sensors able to detect walls. When it
reaches a T-cross, the trigger-input is activated and the
agent consequently moves in order to regain the wall
performing respectively a left-turn or a right-turn
depending on the input task the agent receives.
Each of the eight tasks T
1
,.,T
8
corresponds to the
successful reaching of one of the goals G
1
,.,G
8
in the
maze (see Figure 7). Each task of the agent can be
decomposed into a sequence of three low-level primi-
tives [P
1
P
2
P
3
] with P
i
=P
r
or P
l
. Each sequence is
recalled by the corresponding task-input Tsent to L
2
,
i.e.
½PlPlPl!T1=000½
½PlPlPr!T2=001½
½PlPrPl!T3=010½
½PlPrPr!T4=011½
½PrPlPl!T5=100½
½PrPlPr!T6=101½
½PrPrPl!T7=110½
½PrPrPr!T8=111½
Therefore each program corresponds to a high level
representation of the possible agent’s behaviors.
Ideally, at the end of the learning phase, by selecting a
task-input T
i
, the agent is asked to assume the behavior
that allows it to reach the corresponding goal G
i
in the
maze. It is important to stress that this program forma-
lization does not point at any specific trajectory, but at
asequencing of low-level primitives.
In this test, the trigger-input I
D
(t)2{0, 1} is the
idealization of a time varying input signal: it is high
(I
D
= 1) when the agent is turning (i.e. the agent is at
the end of the corridor) and low (I
D
= 0) when the
agent moves forward along the corridors of the maze.
In other words, the trigger tells the controlling unit
when the robot turns left or right and, therefore, when
it is necessary to select the next primitive from the pro-
gram sequence ‘‘stored’’ in T
i
.
In this first experiment we assume the agent moves
at constant velocity v
A
. Consequently, the duration
DT
low
of the low trigger-input can be considered pro-
portional to the length dof any of the corridors, while
the duration DT
high
of the high trigger-input is consid-
ered proportional to the time spent in the turning at
Donnarumma et al. 7
each T-cross and is assumed constant across maze
dimensions.
Given these conditions, it is possible to define a para-
meter l
l=DTlow
DThigh
=d
vADThigh )l}dð8Þ
which corresponds to the size of a chosen labyrinth,
with respect to the trigger inputs to the controlling unit.
We build two different synthetic datasets, D1and D2:
D1represents a single multiple T-maze with l=2,
i.e. a maze with a single dlength of the corridor;
Figure 7. A depiction of the eight goal-reach tasks T
1
,.,T
8
defined for the experimental scenario. They are composed of a series
of three motor primitives and correspond to the reaching of the corresponding goals G
1
,.,G
8
in the maze.
8Adaptive Behavior
D2represents three different multiple T-mazes, with
l=2, l= 3 and l= 4, respectively, i.e. three
mazes with three different dlengths of the corridor.
By means of the parameter lthis information is
implicitly stored in the trigger-input signal. The corre-
sponding Target output O(t) is consequently created in
order to create datasets of input-output couples. The
aim of learning is to replicate this target on the network
output U(t) at each time step. A number of ten target
sequences for programs has been created for each
maze. A reference sequence is about 40 time steps
t= (1/5) t, where tis the time constant unit used for
the neural network modules. This means that dataset
D1has 80 sample input-target sequences, while D2has
240 sequences.
3.1.1 Learning a control module by differential evolution. We
adopt a learning algorithm based on an evolutionary
approach, Differential Evolution (DE) Algorithm (De
Falco, Della Cioppa, Donnarumma, Maisto, Prevete,
& Tarantino, 2008; Price, Storn, & Lampinen, 2005).
DE is an evolutionary population based algorithm
proved to be very efficient in the continuous domain,
fitting the case of learning of parameters of neural net-
works (De Falco et al., 2008). DE addresses a generic
optimization problem with mreal parameters by start-
ing with a randomly initialized population consisting of
nindividuals, each made up of mreal values. The pop-
ulation is updated from one generation to the next by
means of many different transformation schemes com-
monly named as strategies (Price et al., 2005). In all of
these strategies DE generates new individuals by add-
ing to an individual a number of weighted difference
vectors between couples of population individuals.
In this experiment the HPNNA learning is assigned
an architecture based on equation (5), which is a fixed
structure neural network (no synaptic connections are
learned for this module), and an input module, that is a
neural network that has to ‘‘memorize’’ the different
codes allowing the different task. The aim of the learn-
ing is to find the programming inputs able to let
HPNNA solve the task. The NOA architecture is a
full-connected CTRNN of equation (1), without any
particular internal structure. In this case, the aim of the
learning is to find suitable CTRNN weights able to let
NOA solve the presented task. To keep the comparison
fair, we keep a similar number of parameters during
the experiments. Thus, for both the compared architec-
tures, DE performs a search for solutions in a para-
meter space SR24 (see Table 1).
The learning procedure is described in detail in
Algorithm 1. There is an outer loop which iterates the
procedure for I
max
. Each program is evaluated separately
with a fitness function which is proportional to the dis-
tance between the target output O
k
of the corresponding
program selected by the task input T
k
,fromthemotor
output of the control module U
k
(t)=U(T
k
(t), I
D
(t)).
Thus the fitness value is computed on the set of sequences
relative to the program T
k
Fk=X
TkX
t
Uk(t)Ok(t)
ðÞ
2ð9Þ
The function selectSamples is a function that selects
sequences corresponding to which a program is going
to be evaluated, allowing DE to move population
towards a better solution of the parameter space.
Samples related to programs that have been correctly
learned can be excluded from learning. However, this
option can be selected only for HPNNA architecture,
because for NOA architecture, if this option is selected,
the change in the synaptic weights would cause the
well-known effect of catastrophic forgetting of previous
learned behaviors, so that the learned module would at
last collapse to learn only the last program selected.
The results of the tests are evaluated by comparing
20 learning-runs for each architecture, HPNNA and
NOA. It is possible to see that HPNNA is able to
achieve solutions that correctly perform eight out of
eight programs. We show:
Algorithm 1 Control Module Learning D,M,opt,b,Imax
ðÞ.
Require: Dataset D, Architecture Model M, optimum fitness threshold opt,
DE parameters b, Maximum number of total iterations I
max
.
Ensure: Model Parameters u
1: initialize Model Parameters u(M)
2: set Fitness Values F
k
=+Nfor each Task T
k
2TaskSet
3: set iteration i=0
4: while F
k
.opt for all Tasks P
k
2TaskSet and i\I
max
do
5: select samples to learn selectSamples DT=DðÞ
6: execute Differential Evolution step DE (DT,b)
7: update Architecture Model best parameters u
8: update Fitness Values F
k
9: update iteration number i
10: end while
Donnarumma et al. 9
results in learning D1dataset, with samples from a
maze with single dlength (see Paragraph 3.1.2);
results in learning D2dataset, with samples from
three mazes with three different dlengths (see
Paragraph 3.1.3);
testing in maze of dlengths different from the one
seen during the learning phase (see Paragraph
3.1.4).
3.1.2 Learning D1dataset – single maze size. Table 2 and
Table 3 detail the final fitness value for NOA and
HPNNA. For dataset D1HPNNA is able to learn all
programs in the 40% of learning-runs. In the remaining
60% of learning-runs, HPNNA learns at least seven
out of eight programs. On the other hand NOA archi-
tecture is able to learn only a maximum of five out of
eight programs (see Table 5). Notice that a NOA with
the selectSamples catastrophically forgets previous pro-
grams and is able to learn only the last seen program
(see Table 5). On the other hand, learning HPNNA is
more computationally efficient if counting the number
of total steps required while it is able to learn all pro-
grams with a smaller number of iterations (see Table
4). In Figures 8 and 9 sample executions for NOA and
HPNNA are shown.
3.1.3 D2dataset – three different maze sizes. The dataset
D2is built with sequences by different dlengths, varied
Figure 8. NOA sample outputs for Dataset D1.
10 Adaptive Behavior
by means of the parameter l. The parameter values are
l=2,l= 3 and l= 4. HPNNA was able to learn
all eight programs (see Table 9), while NOA was not
able to learn more than five programs. Table 4 also
shows that a smaller number of iterations is needed to
learn HPNNA. In Figures 13 to 15 sample HPNNA
executions are shown while in Figures 10 to 12 sample
NOA executions are shown. Overall the results speak
about better performances for HPNNA capable of
learning programs in different maze sizes.
3.1.4 Testing on unknown maze sizes. We test all the
learned instances of the previous subsections in
sequences subsuming mazes of never seen size. This is
to verify the generalization capabilities of the
architectures. We choose test mazes with l= 2.4 and
l= 3.6 (see equation (8)). For both the architectures
we test best and worst cases learned in datasets D1and
D2. Though, as expected, modules learned in dataset
D1perform worse (HPNNA could not execute all pro-
grams (see Table 10)), modules learned in dataset D2
generalize very well on new unseen dimensions
(HPNNA can successfully execute all eight programs
(see Table 10)).
3.2 HPNNA in a simulated robotic environment
In the previous section we made the hypothesis of hav-
ing ideal motor primitives, in order to build a control
module L
2
with suitable inputs to guide the agent
Figure 9. HPNNA sample outputs for Dataset D1.
Donnarumma et al. 11
towards the desidered goal. In this section we actually
implement a lower level interpreter L
1
of motor primi-
tives (see Figure 16) in order to complete the HPNNA
architecture and show its performance in a simulated
robotic environment. Robot simulations were carried
out using the open source software project Player-
Stage (Gerkey, Vaughan, & Howard, 2003) to simulate
aPioneer 3DX robot (see Figure 17). The robot is
Table 1. Experimental parameters for the Primitive Sequencing learning task.
Network parameters bTime constants
Minimum weight value wmin
Maximum weight value wmax
Integration step Dt
8
>
>
<
>
>
:
5t
5
+5
0:2
Fitness optimum value opt 0.99
Maximum number of iterations I
max
20000
DE parameters uPopulation number
Parameter space size
Step size
Cross over probability
Strategy
8
>
>
>
>
<
>
>
>
>
:
100
24
0:7
0:8
DE=RAND=1=BIN
Table 2. D1Dataset results for NOA on 20 learning-runs. Fitness value F
k
(mean, standard deviation, maximum and minimum) and
success rate are shown for each Task T
k
. The best NOA was not able to learn all programs. The low standard deviation suggests that
similar results are expected if further runs were made.
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
Mean F
k
0.004 8.212 13.945 2.956 1.280 15.205 11.125 0.005
Standard deviation 0.001 5.919 3.739 2.447 0.971 0.225 5.375 0.003
Maximum 0.009 14.149 15.168 10.119 3.043 16.160 14.148 0.016
Minimum 0.003 1.028 2.050 0.028 0.022 15.153 1.110 0.003
Success rate 100% 55% 10% 95% 100% 0% 25% 100%
Table 3. D1Dataset results for HPNNA on 20 learning-runs. Fitness value F
k
(mean, standard deviation, maximum and minimum)
and success rate are shown for each task T
k
. The best HPNNA was able to execute eight out of eight programs. A high standard
deviation in some cases underlines that the respective programs are more difficult to be learned than others (T
3
and T
7
); this can
also be noticed from the corresponding success rate values.
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
Mean F
k
0.005 0.080 8.210 0.100 0.038 0.058 0.773 0.005
Standard deviation \10
23
0.014 7.379 0.217 0.019 0.013 3.147 \10
23
Maximum 0.005 0.100 15.155 1.020 0.092 0.088 14.145 0.005
Minimum 0.005 0.051 0.059 0.029 0.018 0.040 0.025 0.005
Success rate 100% 100% 45% 100% 100% 100% 95% 100%
Table 4. Iterations for HPNNA learning-runs in D1. The table shows mean, standard deviation, maximum and minimum of the
number of iterations in the 20 runs. The avoiding of catastrophic forgetting effect allows to skip learning of programs for which F
k
is
less than the opt value. We stress that for NOA, it is always necessary to execute a number of iterations equal to I
max
= 20 000.
HPNNA Learning iterations per task Total iterations
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
Mean 50 192 3760 390 72 780 860 50 6155
Standard deviation 3 78 1946 1087 25 199 1037 3 2269
Maximum 52 400 5000 5000 100 1100 5000 52 12 500
Minimum 48 100 600 50 50 450 250 49 2450
12 Adaptive Behavior
equipped with ten sonars placed on the frontal and the
lateral parts of the robot (s1, .,s10 in Figure 17b.)
Note that L
1
governs the robot by setting its angular
and linear velocity corresponding to the output of two
neurons belonging to L
1
. During this learning phase,
the environment is a single T-maze consisting of corri-
dors of fixed length and three times as wide as the robot
size.
Table 5. Dataset D1success percentage. The comparison summarizes successful results for the architectures HPNNA and NOA. In
this table we show also the NOA learning results, when excluding samples from the dataset by the selectSamples procedure. In this
case NOA meets the well-known catastrophic forgetting effect.
Learned programs HPNNA (% on 20 runs) NOA (% on 20 runs) NOA Catastrophic forgetting (% on 20 runs)
1/8 100 100 100
2/8 100 100 0
3/8 100 100 0
4/8 100 100 0
5/8 100 85 0
6/8 100 0 0
7/8 100 0 0
8/8 40 0 0
Table 6. D2Dataset results for NOA on 20 learning-runs. Fitness value F
k
(mean, standard deviation, maximum and minimum) and
success rate are shown for each Task T
k
. The best NOA was not able to learn all programs. The low standard deviation suggests that
similar results are expected if further runs were made.
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
Mean F
k
0.016 36.360 45.518 9.205 2.661 60.918 57.583 0.012
Standard deviation 0.013 23.197 21.472 5.500 2.975 0.934 0.004 0.002
Maximum 0.056 57.585 60.675 17.422 9.110 63.657 57.591 0.017
Maximum 0.010 2.484 6.158 0.087 0.076 60.610 57.580 0.010
Success rate 100% 35% 30% 95% 100% 0% 0% 100%
Table 7. D2Dataset results for HPNNA on 20 learning-runs. Fitness value F
k
(mean, standard deviation, maximum and minimum)
and success rate are shown for each Task T
k
. The best HPNNA was able to execute eight out of eight programs. A high standard
deviation in some cases underlines that the respective programs are more difficult to be learned than others (T
2
,T
3
and T
7
); this can
also be noticed from the corresponding success rate values.
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
Mean F
k
0.014 3.493 39.818 0.195 0.178 0.203 11.945 0.014
Standard deviation \10
23
12.759 29.087 0.057 0.097 0.076 23.423 \10
23
Maximum 0.014 57.584 60.614 0.329 0.351 0.412 57.584 0.014
Minimum 0.014 0.177 0.277 0.093 0.055 0.125 0.102 0.014
Success rate 100% 95% 35% 100% 100% 100% 80% 100%
Table 8. Iterations for HPNNA learning-runs in D2. The table shows mean, standard deviation, maximum and minimum of the
number of iterations in the 20 runs. The avoiding of catastrophic forgetting effect allows to skip learning of programs for which F
k
is
less than the opt value. We stress that for NOA, it is always necessary to execute a number of iterations equal to I
max
= 20 000.
HPNNA Learning iterations per task Total iterations
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
Mean 50 693 2070 148 65 953 1218 50 5245
Standard deviation 4 825 769 73 24 352 864 3 1472
Maximum 52 2500 2500 400 100 1600 2500 53 7700
Minimum 47 150 550 50 50 450 150 48 2800
Donnarumma et al. 13
The L
1
module should realize an interpreter on which
two motor-primitive programs are learned: Right-Wall
Follower (P
r
) and Left-Wall Follower (P
l
). According
to Section 2, this module has two kinds of input lines: a
data input line and a programming input line.
The data input line consists of three inputs {I
1
,I
2
,I
3
}
that are the weighted sum of three sonars facing right,
the three in which basic motor-primitives are learned
facing left and two frontal sonars, respectively, as in the
following equations
I1=0:2S2+0:4S3+0:4S4
I2=0:2S9+0:4S8+0:4S7
I3=0:5S5+0:5S6ð10Þ
Table 9. Dataset D2success percentage. The comparison
summarizes successful results for the architectures HPNNA and
NOA.
Learned programs HPNNA
(% on 20 runs)
NOA
(% on 20 runs)
1/8 100 100
2/8 100 100
3/8 100 100
4/8 100 100
5/8 100 70
6/8 100 0
7/8 85 0
8/8 25 0
Figure 10. NOA sample outputs for Dataset D2and l=2.
14 Adaptive Behavior
Thus, two neuron outputs of the L
1
module control
the robot. In particular the activation of one neuron is
devoted to the control of the linear speed of the robot,
while another neuron controls the robot’s angular
velocity.
The neurons of the module share the same value of
the characteristic time t, that is of an order of magni-
tude bigger than the characteristic time of the multipli-
cative networks u.
Then, by means of the w-substitution we construct
the fixed structure interpreter L
1
made of 39 neurons
with two kinds of input lines:
a data input line that consists of the three inputs
from the sonars;
a programming input line that consists of inputs
that codify the different structures simulated by the
interpreter.
Consequently, we evolved a vector of 12 parameters
in order to find the suitable programs able to let the net-
work control the robot and perform the correct motor
primitives. In our approach, given the fixed structure
interpreter L
1
, we used Algorithm 1 to learn the suitable
motor-primitive codes. This is done by building suitable
fitness functions, one for Right-Wall Follower P
r
primi-
tive, and a second one for the Left-Wall Follower P
l
.
Note that, in contrast with other approaches, it is possi-
ble to do this because the network structure is fixed and
we do not evolve weights. Thus, we can divide the
Figure 11. NOA sample outputs for Dataset D2and l=3.
Donnarumma et al. 15
learning into two epochs without erasing previously
learned capabilities.
For each epoch we initialized a population of 20 ele-
ments controlled by networks with codes randomly
chosen in the range [25, 5]. Each controller obtained is
evaluated with a fitness function specific for each pro-
gram, i.e. F
R
and F
L
, while performing the task of
behaving as a right or as a left follower, respectively. A
new population is obtained using the best element of
the previous population. In our training we used a
crossover coefficient (CR 2[0, 1]) of 0.8 and a step-size
coefficient (F2[0, 1]) of 0.85, this means that our algo-
rithm builds the next generation preserving the archi-
tecture of the best element of the previous generation
(the value of the crossover coefficient is low), but even
preserving the variance of the previous generation (the
value of the step-size coefficient is high). The task used
to evaluate the robot is structured as follows. We used
aT-maze as the learning environment (see Figure 17b).
Each robot is placed at the beginning of each cross-
roads and it is free to run for about 30 seconds. The
final evaluation of the ‘‘life’’ of a robot is the product of
the evaluations obtained in each of the distinct simula-
tions. The fitness function that evaluates the robot in
every crossroad is made of two components. The first
component F
M
is derived from the one proposed by
Floreano and Mondada (1994) and consists of a reward
for straight, fast movements and obstacle avoidance.
This component is the same in the left and the right-
follower task. The second component changes between
the two epochs; in the right-follower training it rewards
the robot that turns right at a crossroads (F
R
), in the
Figure 12. NOA sample outputs for Dataset D2and l=4.
16 Adaptive Behavior
left-follower training it rewards the robot that turns left
(F
L
). In equation (11)
Vis the average speed of the
robot, V
A
is the average angular speed. S
min
is the value
of the shortest distance measured from an obstacle dur-
ing the task period
FM=
V(1ffiffiffiffiffi
VA
p)Smin VA0,1,Smin 0,1;
ð11Þ
FR=
S1+
S2
S9
S10 FL=
S9+
S10
S1
S2:
ð12Þ
In F
R
the average measure of the left sonars over the
task period is subtracted from the average measure of
the right ones, the opposite happens in F
L
.
In Figure 18 we show the mean fitness evolution per
step of the motor primitives. The interpreter L
1
fed with
the best evolved code programs was tested placing the
robot in ten different positions in the maze and obser-
ving the robot behavior while driving through the cross-
roads three times. The positions were chosen in such a
way that the robot starts its test in the middle of a corri-
dor, oriented with its lateral part parallel to the wall.
We tested one code at a time for each execution without
dynamically changing the values. In these conditions
the interpreter L
1
fed with P
r
and P
l
was successful in
all the trials, showing the appropriate behavior in each
of the corridors: L
1
was able to control the robot with-
out crashing and preserving the right motor primitive.
Finally, we show the results of the whole HPNNA
control framework in mazes of the kind of Figure 6, for
Figure 13. HPNNA sample outputs for Dataset D2and l=2.
Donnarumma et al. 17
all possible programs learned in the exploring behavior
experiments. The trigger signal for the higher level
interpreter is for simplicity derived from the output of
the angular velocity neuron of the first interpreter (how-
ever clever triggering signals from the interpreter could
be imagined). We tested it on mazes with different sizes
(l22, 3, 4). This is to stress that what we learned is
not a particular trajectory in an environment but a high
level goal encoded by the program and not influenced
by moderate changing in the environment. A test is con-
sidered successful if the distance from the goal is under
a certain threshold and the robot does not crash. Thus,
if the robot reaches a place different from the one ‘‘pro-
grammed’’ the test fails. Moreover, we stressed the
robustness of the programs learned by applying a rela-
tive error eon each learned parameter pduring the
execution in the maze environment. These noisy para-
meter values were drawn from a Gaussian distribution
centred in the parameter value pand with a standard
deviation of e=kw/100 where k2{10, 15, 20}.
Table 13 shows the results. The small decrease in per-
formance for shorter corridor lengths is mainly due to
the shorter duration of the trigger on the higher level net-
work. High values of relative error make the probability
of failure increase in the maze exploring. However even a
relative error of 15% does not erase the behavior of the
HPNNA preserving a high success rate.
4 Conclusions
We have proposed a hierarchical programmable neural
network architecture, HPNNA, composed of a
Figure 14. HPNNA sample outputs for Dataset D2and l=3.
18 Adaptive Behavior
hierarchy of modules where each module can be viewed
both as an interpreter network capable of running dif-
ferent programs without modifying its synaptic connec-
tions and as programmer network capable of
controlling the behavior of the lower modules. This
implies that the same neuronal substrate can encode
multiple motor primitives. Furthermore, the motor pri-
mitives can be learned incrementally by increasingly
adding more programs to the interpreter network. The
learning of primitives in a lower level is transferred to
the higher level; new primitives can be added in a fixed
lower level by searching for the corresponding pro-
gramming inputs that the higher level should send. We
explored the parameter space resulting from this mode-
lization by means of an evolutionary-based learning
approach. The programming inputs of a higher level
are fixed with respect to the dynamics of its correspond-
ing lower level, thus learning multiple behavior codes
(programs) resulted in being computationally simpler
with respect to learning dynamics of multiple beha-
viors. We successfully tested the performance of the
HPNNA architecture in tasks of increasing complexity.
Our proposal has implications from both neuroscienti-
fic and computational perspectives as we discuss below.
4.1 Neuroscientific perspectives
From a neuroscientific perspective, we present a novel
proposal on (hierarchical) action organization and con-
trol by the brain, which can be summarized as an
Figure 15. HPNNA sample outputs for Dataset D2and l=4.
Donnarumma et al. 19
interpreter-programmer computational scheme. The
interpreter network is able to store multiple action pri-
mitives within a common neural substrate. Not only is
this encoding scheme parsimonious, avoiding the short-
comings of strong modularity, but it also affords flex-
ible and plausible cognitive control by the programmer
network. The programmer network can enforce rule-
like behaviors by instantaneously instructing the inter-
preter network, without the necessity of re-learning.
Such fast switches of behavior are the hallmark of cog-
nitive control.
The system learns to represent goals (encoded in the
programming input), not trajectories in the environ-
ment; this affords the flexible adaptation to changing
environmental conditions (e.g. moderate changes of
dimensions and sensory cues in a maze). Furthermore,
the proposed computational scheme can be iterated to
realize hierarchies having increasing levels of complex-
ity (as a network playing the role of programmer rela-
tive to a lower-level interpreter can also play the role of
an interpreter relative to a higher-level programmer).
This provides a novel organizing principle for cortical
hierarchies and their role in supporting goal-directed
actions.
Overall, the proposed interpreter-programmer
scheme is consistent, on the one hand, with the idea of
multiple motor primitives in (pre)motor areas
(Rizzolatti, Camarda, Fogassi, Gentilucci, Luppino, &
Matelli, 1988), and on the other hand with control- and
information-theoretic approaches to prefrontal cortex
(Koechlin & Summerfield, 2007), and with its role in
biasing (instantaneously) behavior (Miller & Cohen,
2001). At the same time, it goes beyond theoretical pro-
posals on executive functions and suggests a plausible
neural mechanism (programmability) for exerting cog-
nitive control, which is based on the idea of ‘‘reusable’
or ‘‘recycled’’ neuronal networks (Anderson, 2010;
Table 10. Generalization capabilities of HPNNA and NOA when trained with dataset D1(l= 2). The architectures were tested for two unknown new maze sizes corresponding to
l= 2.4 and l= 3.6.
lArchitecture B/W case Fitness value (number of correct turns) per task Task success
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
2.4 NOA Best 0.003 (3/3) 16.165 (2/3) 3.055 (3/3) 1.034 (3/3) 1.033 (3/3) 17.177 (2/3) 16.163 (2/3) 0.003 (3/3) 5/8
Worst 0.003 (3/3) 16.164 (2/3) 17.173 (2/3) 0.048 (3/3) 0.024 (3/3) 17.174 (2/3) 16.164 (2/3) 0.003 (3/3) 4/8
HPNNA Best 0.005 (3/3) 7.137 (3/3) 0.065 (3/3) 0.045 (3/3) 0.028 (3/3) 1.064 (3/3) 2.088 (3/3) 0.005 (3/3) 8/8
Worst 0.005 (3/3) 3.109 (3/3) 17.175 (2/3) 0.085 (3/3) 0.099 (3/3) 0.054 (3/3) 0.098 (3/3) 0.005 (3/3) 7/8
3.6 NOA Best 0.003 (3/3) 22.226 (2/3) 3.062 (3/3) 1.037 (3/3) 1.036 (3/3) 23.237 (2/3) 22.223 (2/3) 0.003 (3/3) 5/8
Worst 0.003 (3/3) 22.224 (2/3) 23.233 (2/3) 1.060 (3/3) 0.027 (3/3) 23.234 (2/3) 22.224 (2/3) 0.004 (3/3) 4/8
HPNNA Best 0.005 (3/3) 16.215 (2/3) 0.071 (3/3) 0.048 (3/3) 0.031 (3/3) 23.254 (2/3) 7.142 (3/3) 0.005 (3/3) 6/8
Worst 0.005 (3/3) 20.258 (2/3) 23.235 (2/3) 0.103 (3/3) 0.122 (3/3) 0.065 (3/3) 6.149 (3/3) 0.005 (3/3) 6/8
Figure 16. General presentation of the sought L
1
module. Its
output controls the Pioneer 3DX angular and linear velocity. It
has two inputs, a motor-primitive code and the sonars.
20 Adaptive Behavior
Dehaene, 2005) (and is therefore alternative to the idea
of ‘‘gates’’ and of strongly modular networks). Further
studies are of course necessary to evaluate the merits of
this proposal, but it has to be noted that its computa-
tional parsimony, robustness and scalability (compared
to alternative proposals) could offer advantages from
an evolutionary viewpoint.
4.2 Computational perspectives
From a computational perspective, this architecture has
numerous advantages. Concerning learning, the frame-
work permits incremental learning and the separation
of learning phases in different epochs. In fact, it is cer-
tainly possible to learn the same behavior in a ‘‘classi-
cal’’ way, by learning the weights of a network that at
the same time receives the trigger and a program. In
that case, one can apply two different strategies: (a) to
train a single network able to exhibit all the behaviors;
(b) to train one specific network for each behavior so as
to obtain eight networks performing the desired beha-
viors. However, in both cases it is not trivial to
accomplish this kind of training. In the first case,
because one should be forced to learn all behaviors at
the same time, which results in increasing difficulty as
soon as the number of behaviors increases. In the sec-
ond case, the drawback is the necessity of constructing
a single network that combines different special pur-
pose networks and switches among their output when-
ever it is needed. Moreover, in both cases it is difficult
to add new behaviors to the learned system. Thus, our
architecture suggests a promising neural network
approach for these kinds of issues (Umedachi, Ito, &
Ishiguro, 2015).
Furthermore, the possibility to steer goal sequences
entails flexible behavioral control in the face of uncer-
tain and (moderately) changing environments.
Robustness of control is also advantageous to scale up
the architecture hierarchically. As moderate errors in
program values do not change the overall architecture
behavior, programs can be used as outputs of other
network modules, realizing hierarchies of control.
When a hierarchical organization is built, higher-level
modules are necessarily slower than lower-level ones, as
they need to guide the realization of sequences of
actions (Paine & Tani, 2004).
4.3 Open issues
Finally, the proposed architecture can further be
improved in a number of directions. Firstly, in our
approach the discovery of new input programs lets the
level exhibit novel primitive patterns, without having to
relearn already acquired behavior (i.e. incrementally).
However, this incremental learning may eventually suf-
fer limitations, for two main reasons;
1. Each hierarchical level has a fixed level complexity,
i.e. can simulate networks of a finite size. If the pri-
mitive to be learned has a larger complexity, it
Figure 17. Pioneer 3DX simulation in Player-Stage environment.
Figure 18. Mean cost evolution per step in Motor-Primitive
code learning in the simulated robotic environment.
Donnarumma et al. 21
cannot be added without increasing the size of
‘‘slow’’ neurons of the level.
2. Each hierarchical level is affected by an intrinsic
‘‘precision error’’ because of the presence of the mul
approximation that crucially relies on the settings of
the time constants and on a finite number of neu-
rons M. In other words, an output noise on the mul
networks is present that could prevent the learning
from adding the wanted novel primitive behavior.
In both cases, the changing of a hierarchical level
structure exposes the cost of potentially disrupting all
previously learned primitives. A future modeling
improvement would be to add a mechanism capable of
augmenting the structure without disrupting the exist-
ing programs.
Moreover, in our hierarchical scheme implementa-
tion, each level receives a ‘‘standard’’ data input that
can be an external sensory input (as in our tests with
sonars in L
1
). In principle, the L
2
data input could rely
on some other sensor output, however in our tests we
showed that, as a matter of fact, the trigger information
on T-intersection detection is already contained in the
outputs of L
1
. As a general consideration, in our
scheme, it is a good idea to include, on the data input
line, a feedback input coming from the lower level that
could bring information on the timing of the task.
Table 11. Generalization capabilities of HPNNA and NOA when trained with dataset D2(l2{2, 3, 4}). The architectures are tested for two new maze sizes corresponding to l= 2.4
and l= 3.6.
lArchitecure B/W case Fitness value (number of correct turns) per task Task success
T
1
T
2
T
3
T
4
T
5
T
6
T
7
T
8
2.4 NOA Best 0.013 (3/3) 2.134 (3/3) 17.178 (2/3) 0.102 (3/3) 1.036 (3/3) 17.174 (2/3) 16.164 (2/3) 0.004 (3/3) 5/8
Wor s t 0.003 (3/3) 16.164 (2/3) 17.179 (2/3) 1.066 (3/3) 0.036 (3/3) 17.175 (2/3) 16.163 (2/3) 0.003 (3/3) 4/8
HPNNA Best 0.005 (3/3) 0.052 (3/3) 0.083 (3/3) 0.061 (3/3) 0.059 (3/3) 0.062 (3/3) 0.164 (3/3) 0.005 (3/3) 8/8
Wor s t 0.005 (3/3) 0.107 (3/3) 17.175 (2/3) 0.059 (3/3) 0.106 (3/3) 0.045 (3/3) 16.165 (2/3) 0.005 (3/3) 6/8
3.6 NOA Best 0.017 (3/3) 3.176 (3/3) 23.240 (2/3) 1.127 (3/3) 1.040 (3/3) 23.234 (2/3) 22.225 (2/3) 0.004 (3/3) 5/8
Wor s t 0.003 (3/3) 22.224 (2/3) 23.240 (2/3) 0.082 (3/3) 0.041 (3/3) 23.235 (2/3) 22.223 (2/3) 0.003 (3/3) 4/8
HPNNA Best 0.005 (3/3) 0.066 (3/3) 0.102 (3/3) 0.071 (3/3) 0.078 (3/3) 0.068 (3/3) 0.221 (3/3) 0.005 (3/3) 8/8
Wor s t 0.005 (3/3) 0.141 (3/3) 23.235 (2/3) 0.069 (3/3) 0.128 (3/3) 0.048 (3/3) 22.225 (2/3) 0.005 (3/3) 6/8
Table 12. Experimental parameters for the Motor Primitives
learning task.
Network
parameters b
Time constants
Minimum weight value wmin
Maximum weight value wmax
Integration step Dt
8
>
>
<
>
>
:
2t
5
5
0:2
Fitness optimum value opt 2
Maximum number of iterations I
max
1000
DE
parameters u
Population number
Parameter space size
Step size
Crossover probability
Strategy
8
>
>
>
>
<
>
>
>
>
:
20
12
0:85
0:8
DE=BEST=
2=BIN
Table 13. Results of the HPNNA control in T-mazes in tests
with three different sizes and with different relative errors on
the programs. For each test the success rate is reported.
Maze Type Success Rate
l= 3 100%
l= 4 99%
l= 2 94%
l= 3, 10% error 98%
l= 3, 15% error 77%
l= 3, 20% error 68%
22 Adaptive Behavior
However, possibly other (probably lower) levels could
bring essential information for the ‘‘current level’’, so
this choice could be somewhat limiting. On the other
hand, to include all the levels as a possible input con-
nection would increase computational cost especially
for ‘‘deep’’ hierarchies. One line of research would be
to add this input choice at a learning level, letting the
system decide on the input (coming from the available
levels) that maximizes the primitive learning and conse-
quently adapt its connections.
Another open issue is how and where to store the
program values in a neural system so that they will be
available when needed. This might be met using some
reverberant scheme, which in the end will probably
require appealing again to synaptic plasticity in ancil-
lary networks. Finally, two lines for future research are
assessing the biological plausibility of the proposed
model, and advancing more detailed proposals (at the
neuronal level) of its mechanisms of learning and recall
of the programs.
Acknowledgment
The authors would like to thank Giuseppe Trautteur for the
inspiring discussions and comments that greatly contributed
to improve the paper.
Funding
The present research is funded by the Human Frontier
Science Program (HFSP), award number RGY0088/2014, by
the EU’s FP7 under grant agreement no FP7-ICT-270108
(Goal-Leaders). The GEFORCE Titan used for this research
was donated by the NVIDIA Corporation.
References
Agmon, E., & Beer, R. D. (2013). The evolution and analysis
of action switching in embodied agents. Adaptive Behavior,
22(1), 3–20.
Anderson, M. L. (2010). Neural re-use as a fundamental orga-
nizational principle of the brain. Behavioral and Brain
Sciences,33(04), 245–266.
Arau´ jo, D., Diniz, A., Passos, P., & Davids, K. (2014). Deci-
sion making in social neurobiological systems modeled as
transitions in dynamic pattern formation. Adaptive Beha-
vior,22(1), 21–30.
Bakker, B., & Schmidhuber, J. (2004). Hierarchical reinforce-
ment learning based on subgoal discovery and subpolicy
specialization. In: F. Groen, N. Amato, A. Bonarini,
E. Yoshida, & B. Krose (Eds.), Proceedings of the 8th Con-
ference on Intelligent Autonomous Systems, IAS-8 (pp.
438–445). Amsterdam, The Netherlands.
Bargmann, C. I. (2012). Beyond the connectome: How neuro-
modulators shape neural circuits. Bioessays,34, 485–65.
Beer, R. D. (1995). On the dynamics of small continuous-time
recurrent neural networks. Adaptive Behavior,3(4),
469–509.
Candidi, M., Curioni, A., Donnarumma, F., Sacheli, L. M.,
& Pezzulo, G. (2015). Interactional leader–follower
sensorimotor communication strategies during repetitive
joint actions. Journal of The Royal Society Interface,
12(110), 453–467.
Chersi, F., Donnarumma, F., & Pezzulo, G. (2013). Mental
imagery in the navigation domain: A computational model
of sensory-motor simulation mechanisms. Adaptive Beha-
vior,21(4), 251–262.
d’Avella, A., Portone, A., Fernandez, L., & Lacquaniti, F.
(2006). Control of fast-reaching movements by muscle
synergy combinations. The Journal of Neuroscience,
26(30), 7791–7810.
De Falco, I., Della Cioppa, A., Donnarumma, F., Maisto, D.,
Prevete, R., & Tarantino, E. (2008). CTRNN parameter
learning using differential evolution. In: M. Ghallab, C.
D. Spyropoulos, N. Fakotakis, & N. Avouris (Eds.), ECAI
2008, 18th European Conference on Artificial Intelligence
(pp. 783–784). Patras, Greece: IOS Press.
Dehaene, S. (2005). Evolution of human cortical circuits for
reading an arithmetic: The ‘‘Neuronal Recycling’’ hypoth-
esis. From Monkey Brain to Human Brain: A Fyssen Foun-
dation Symposium (pp. 133–157). Bradford, USA: MIT
Press.
Dindo, H., Donnarumma, F., Chersi, F., & Pezzulo, G.
(2015). The intentional stance as structure learning: A
computational perspective on mindreading. Biological
Cybernetics,109(4), 453–467.
Donnarumma, F., Murano, A., & Prevete, R. (2015a).
Dynamic network functional comparison via approxi-
mate-bisimulation. Control & Cybernetics,44(1), 99–127.
Donnarumma, F., Prevete, R., Chersi, F., & Pezzulo, G.
(2015b). A programmer–interpreter neural network archi-
tecture for prefrontal cognitive control. International Jour-
nal of Neural Systems,25(6), 1550017 (16 pages).
Donnarumma, F., Prevete, R., & Trautteur, G. (2010). How
and over what timescales does neural reuse actually occur?
Commentary on ‘‘Neural re-use as a fundamental organi-
zational principle of the brain’’, by Michael L Anderson.
Behavioral and Brain Sciences,33(04), 272–273.
Donnarumma, F., Prevete, R., & Trautteur, G. (2012). Pro-
gramming in the brain: A neural network theoretical
framework. Connection Science,24(2–3), 71–90.
Eliasmith, C. (2005). A unified approach to building and con-
trolling spiking attractor networks. Neural Computation,
17(6), 1276–1314.
Eliasmith, C., & Anderson, C. H. (2004). Neural engineering:
Computation, representation, and dynamics in neurobiologi-
cal systems. Cambridge, MA: The MIT Press.
Flash, T., & Hochner, B. (2005). Motor primitives in verte-
brates and invertebrates. Current Opinion in Neurobiology,
15(6), 660–666.
Floreano, D., & Mondada, F. (1994). Automatic creation of
an autonomous agent: Genetic evolution of a neural-
network driven robot. In: Proceedings of the Conference on
Simulation of Adaptive Behavior (pp.421–430). Cambridge,
MA: MIT Press.
Fogassi, L., Ferrari, P., Chersi, F., Gesierich, B., Rozzi, S., &
Rizzolatti, G. (2005). Parietal lobe: From action organiza-
tion to intention understanding. Science,308, 662–667.
Friston, K. (2003). Learning and inference in the brain.
Neural Networks,16(9), 1325–1352.
Gerkey, B., Vaughan, R., & Howard, A. (2003). The player/
stage project: Tools for multi-robot and distributed sensor
Donnarumma et al. 23
systems. In: International Conference on Advanced Robotics
(ICAR) (pp. 317–323). Coimbra, Portugal: IEEE Press.
Graziano, M. (2006). The organization of behavioral reper-
toire in motor cortex. Annual Review of Neuroscience,29,
105–134.
Hamilton, A. F. d. C., & Grafton, S. T.2007. The motor hier-
archy: From kinematics to goals and intentions. In:
P. Haggard, Y. Rossetti, & M. Kawato (Eds.), Sensorimo-
tor foundations of higher cognition (pp. 381–408 ). Oxford:
Oxford University Press.
Haruno, M., Wolpert, D., & Kawato, M. (2003). Hierarchical
MOSAIC for movement generation. In: T. Ono,
G. Matsumoto, R. Llinas, A. Berthoz, H. Norgren, &
R. Tamura (Eds.), Excepta Medica International Coun-
gress Series (pp. 575–590). Amsterdam, The Netherlands:
Elsevier Science.
Hioki, T., Miyazaki, Y., & Nishii, J. (2013). Hierarchical con-
trol by a higher center and the rhythm generator contri-
butes to realize adaptive locomotion. Adaptive Behavior,
21(2), 86–95.
Hopfield, J. J., & Tank, D. W. (1986). Computing with neural
circuits: A model. Science,233, 625–633.
Igari, I., & Tani, J. (2009). Incremental learning of sequence
patterns with a modular network model. Neurocomputing,
72(7–9), 1910–1919.
Kelly, J. P. (1991). The neural basis of perception and move-
ment. Principles of Neural Science (3rd ed., pp. 283–295).
New York, NY: Elsevier.
Koechlin, E., & Summerfield, C. (2007). An information theo-
retical approach to prefrontal executive function. Trends in
Cognitive Science,11(6), 229–235.
Maisto, D., Donnarumma, F., & Pezzulo, G. (2015). Divide
et impera: Subgoaling reduces the complexity of probabil-
istic inference and problem solving. Journal of The Royal
Society Interface,12(104), 20141335.
Mcgovern, A., & Barto, A. G. (2001, June 18–22). Accelerat-
ing reinforcement learning through the discovery of useful
subgoals. In: Proceedings of the 6th International Sympo-
sium on Artificial Intelligence, Robotics, and Automation in
Space: i-SAIRAS, Canadian Space Agency (pp. 13–18).
Montreal, Canada: Electronically Published.
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of
prefrontal cortex function. Annual Review on Neuroscience,
24, 167–202.
Montone, G., Donnarumma, F., & Prevete, R. (2011). A
robotic scenario for programmable fixed-weight neural
networks exhibiting multiple behaviors. In: Adaptive and
Natural Computing Algorithms (pp. 250–259). Heidelberg,
Germany: Springer Berlin.
Mussa-Ivaldi, F. A., & Bizzi, E. (2000). Motor learning
through the combination of primitives. Philosophical
Transactions of the Royal Society of London. Series B: Bio-
logical Sciences,355(1404), 1755–1769.
Paine, R. W., & Tani, J. (2004). Motor primitive and sequence
self-organization in a hierarchical recurrent neural net-
work. Neural Networks,17(8–9), 1291–1309.
Paine, R. W., & Tani, J. (2005). How hierarchical control self-
organizes in artificial adaptive systems. Adaptive Behavior,
13(3), 211–225.
Park, H.-J., & Friston, K. (2013). Structural and functional
brain networks: From connections to cognition. Science,
342(6158), 1238411.
Pezzulo, G., Donnarumma, F., & Dindo, H. (2013). Human
sensorimotor communication: A theory of signaling in
online social interactions. PLoS ONE,8(11), e79876.
Pezzulo, G., Donnarumma, F., Iodice, P., Prevete, R., &
Dindo, H. (2015). The role of synergies within generative
models of action execution and recognition: A computa-
tional perspective: Comment on grasping synergies: A
motor-control approach to the mirror neuron mechanism
by A D’Ausilio et al. Physics of Life Reviews,12, 114–117.
Price, K. V., Storn, R. M., & Lampinen, J. A. (2005). Differ-
ential evolution: A practical approach to global optimization.
Natural Computing Series. Springer-Verlag.
Rizzolatti, G., Camarda, R., Fogassi, L., Gentilucci, M.,
Luppino, G., & Matelli, M. (1988). Functional organiza-
tion of inferior area 6 in the macaque monkey. II. Area F5
and the control of distal movements. Experimental brain
research,71(3), 491–507.
Tani, J. (2003). Learning to generate articulated behavior
through the bottom-up and the top-down interaction pro-
cesses. Neural Networks,16(1), 11–23.
Tani, J., Ito, M., & Sugita, Y. (2004). Self-organization of dis-
tributedly represented multiple behavior schemata in a
mirror system: Reviews of robot experiments using
RNNPB. Neural Networks,18(1), 103–104.
Tani, J., Nishimoto, R., & Paine, R. W. (2008). Achieving
‘‘organic compositionality’’ through self-organization:
Reviews on brain-inspired robotics experiments. Neural
Networks,21(4), 584–603.
Tani, J., & Nolfi, S. (1999). Learning to perceive the world as
articulated: An approach for hierarchical learning in
sensory-motor systems. Neural Networks,12(7), 1131–1141.
Thoroughman, K. A., & Shadmehr, R. (2000). Learning of
action through adaptive combination of motor primitives.
Nature,407(6805), 742–747.
Trautteur, G., & Tamburrini, G. (2007). A note on discrete-
ness and virtuality in analog computing. Theoretical Com-
puter Science,371, 106–114.
Umedachi, T., Ito, K., & Ishiguro, A. (2015). Soft-bodied
amoeba-inspired robot that switches between qualitatively
different behaviors with decentralized stiffness control.
Adaptive Behavior,3(2), 97–108.
Woodman, M., Perdikis, D., Pillai, A. S., Dodel, S., Huys,
R., Bressler, S., & Jirsa, V. (2011). Building neurocognitive
networks with a distributed functional architecture.
Advances in Experimental Medicine and Biology, 718, 101
109. doi:10.1007/978-1-4614-0164-3_9
Yamauchi, B. M., & Beer, R. D. (1994). Sequential behavior
and learning in evolved dynamical neural networks. Adap-
tive Behavior,2(3), 219–246.
24 Adaptive Behavior
About the Authors
Francesco Donnarumma (MSc in physics, PhD in computer and information science) has been
a research fellow at ISTC-CNR since 2011. His research focuses on computational modelling of
cognitive brain functions by the developing of biologically inspired models investigating social
interactions and studying multi-purpose interpreter architectures based on dynamical neural
networks.
Roberto Prevete (MSc in physics, PhD in information science) is an Assistant Professor of
Computer Science at the Dept.of Electrical Engineering and Information Technologies (DIETI),
University of Naples Federico II, Italy. Director of the laboratory for Computational Vision and
Neural Networks (ViNe) at DIETI. His current research interests include computational models
of brain mechanisms, machine learning and artificial neural networks and their applications.
Andrea de Giorgio is currently working on his final thesis in machine learning as a master stu-
dent at KTH, Royal Institute of Technology in Stockholm, Sweden. In 2013 he received his
bachelor’s degree in electronic engineering at the University of Naples Federico II, Italy. His
research interests focus on deep learning and modeling of brain functions.
Guglielmo Montone (MSc in physics, PhD in computer and information science) is a postdoc-
toral researcher at LPP, UniversitA
˜l’ Paris Descartes. His research focuses on artificial neural net-
works and their applications in artificial intelligence. His current research is about the
development of the concept of space in simple and biologically plausible agents.
Giovanni Pezzulo (MSc and PhD in cognitive psychology) is a researcher at the Institute of
Cognitive Sciences and Technologies, National Research Council, Rome, Italy. His main research
interests are prediction, goal-directed behaviour, internally generated neuronal activity and joint
action in living organisms and robots. His current research interests are focused on the realization
of biologically realistic cognitive models for decision making and planning.
Donnarumma et al. 25
... Further, when dealing with time series, successful dynamic approaches unfolding the depth of the network through time were proposed, like Long-Short Memory Networks (LTSM) and variants. [51][52][53][54][55] In this paper, the proposed modeling choice of a Multi-layered NARX network lies at the edge between a static and a dynamic deep network approach. In general, successful NARX based approaches have been extensively studied (see, e.g. ...
Article
Full-text available
A full-fledged neural network modeling, based on a Multi-layered Nonlinear Autoregressive Exogenous Neural Network (NARX) architecture, is proposed for quasi-static and dynamic hysteresis loops, one of the most challenging topics for computational magnetism. This modeling approach overcomes drawbacks in attaining better than percent-level accuracy of classical and recent approaches for accelerator magnets, that combine hybridization of standard hysteretic models and neural network architectures. By means of an incremental procedure, different Deep Neural Network Architectures are selected, fine-tuned and tested in order to predict magnetic hysteresis in the context of electromagnets. Tests and results show that the proposed NARX architecture best fits the measured magnetic field behavior of a reference quadrupole at CERN. In particular, the proposed modeling framework leads to a percent error below 0.02% for the magnetic field prediction, thus outperforming state of the art approaches and paving a very promising way for future real time applications.
... Thus, the degree of neuromorphism present in photonic systems is only useful insofar as it helps with computing tasks. The hardware's adherence to neural models unlocks a wealth of metrics [71], algorithms [72,73], tools [74,75], and benchmarks [76] developed specifically for neural networks. A diversity of research in neuromorphic photonics are covered in the textbook "Neuromorphic Photonics" [77], which includes results and discussion on: applications of neuromorphic photonics (Chp. ...
Article
Full-text available
Microelectronic computers have encountered challenges in meeting all of today’s demands for information processing. Meeting these demands will require the development of unconventional computers employing alternative processing models and new device physics. Neural network models have come to dominate modern machine learning algorithms, and specialized electronic hardware has been developed to implement them more efficiently. A silicon photonic integration industry promises to bring manufacturing ecosystems normally reserved for microelectronics to photonics. Photonic devices have already found simple analog signal processing niches where electronics cannot provide sufficient bandwidth and reconfigurability. In order to solve more complex information processing problems, they will have to adopt a processing model that generalizes and scales. Neuromorphic photonics aims to map physical models of optoelectronic systems to abstract models of neural networks. It represents a new opportunity for machine information processing on sub-nanosecond timescales, with application to mathematical programming, intelligent radio frequency signal processing, and real-time control. The strategy of neuromorphic engineering is to externalize the risk of developing computational theory alongside hardware. The strategy of remaining compatible with silicon photonics externalizes the risk of platform development. In this perspective article, we provide a rationale for a neuromorphic photonics processor, envisioning its architecture and a compiler. We also discuss how it can be interfaced with a general purpose computer, i.e. a CPU, as a coprocessor to target specific applications. This paper is intended for a wide audience and provides a roadmap for expanding research in the direction of transforming neuromorphic photonics into a viable and useful candidate for accelerating neuromorphic computing.
... Other examples include Eliasmith, [37], who shows how attractor networks can be used as subsystems in larger neural systems that reiterates concepts that use modularity, and Donnarumma et. al, who presented a neural network hierarchical architecture where multiple modules used the notion of multi-task learning to model complex behavior [38]. The method was motivated by theories of brain functioning in which skilled behaviors can be generated by combining functional different motor primitives [39]. ...
Article
Full-text available
Due to modular knowledge representation in biological neural systems, the absence of certain sensory inputs does not hinder decision-making processes. For instance, damage to an eye does not result in loss of one's entire vision. In our earlier work, we presented coevolutionary multi-task learning that featured a synergy between multi-task learning and coevolutionary algorithms. In this paper, we extend this method for robust decision making in pattern classification problems given incomplete information. The method trains a cascaded neural network architecture to autonomously address the absence of certain input features and disruptions to neural connections. The results show that the method is comparable to conventional learning methods whilst having the advantage decision making given incomplete information. Moreover, the method provides a way for developmental learning and simultaneously quantifies feature contribution.
Chapter
An understanding of the functional repertoire of neural circuits and their plasticity requires knowledge of neural connectivity diagrams and their dynamical evolution. However, one must additionally take into account the fast and reversible functional effects induced by neuromodulatory mechanisms which do not alter neural circuit diagrams. Neuromodulators contribute crucially to determine the performativity of a neural circuit, that is, its ability to change behavior, and especially behavioral changes occurring under temporal constraints that are incompatible with the longer time scales of Hebbian learning and other forms of neural learning. This paper focuses on two properties of neuromodulatory action that have been relatively neglected so far. These properties are the functional soundness of neuromodulated circuits and the robustness of neuromodulatory action. Both properties are analyzed here as sources of functional specifications for the computational modeling of neural circuit performativity. In particular, taking dynamical systems that are based on CTRNNs (Continuous Time Recurrent Neural Networks) as an exemplary class of computational models, it is argued that robustness is suitably modeled there by means of a hysteresis process, and functional soundness by means of a multiplicity of stable fixed points.
Article
Full-text available
We present a framework for robotic cognitive control endowed with adaptive mechanisms for attentional regulation and task execution. In cognitive psychology, cognitive control is the process that orchestrates executive and cognitive processes supporting adaptive responses and complex goal-directed behaviors. Similar mechanisms can be deployed in robotic systems in order to flexibly execute complex structured tasks. In this work, following a supervisory attentional system paradigm, we propose an approach that permits to learn how to exploit top-down and bottom-up attentional regulations to guide the execution of hierarchically structured tasks. We present the overall framework discussing its functioning in a mobile robot case study considering pick-carry-place tasks. In this setting, we show that the proposed system can be on-line trained by a user in order to execute incrementally complex activities.
Article
Full-text available
Understanding and defining the meaning of “action” is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey, we thus first review existing ideas and theories on the notion and meaning of action. Subsequently, we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a meticulous literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook towards promising research directions.
Article
Full-text available
E' sempre più evidente che una comprensione sia del repertorio funzionale sia della plasticità dei comportamenti di un circuito neurale-qui indicate complessivamente come capacità performative del circuito-non possono basarsi solo su una conoscenza del diagramma di connettività neurale e delle sue variazioni. Le capacità performative di un circuito sono infatti alterate qualitativamente, in maniera rapida e reversibile da sostanze che agiscono nel fluido extracellulare, dette neuromodulatori. Questo articolo si concentra su due proprietà dell'azione neuromodulatoria che sono state poco tematizzate nell'ambito dei modelli computazionali della neuromodulazione. Si tratta dell'idoneità funzionale dei circuiti soggetti a neuromodulazione e della robustezza dell'azione neuromodulatoria. Entrambe le proprietà sono qui esaminate come sorgenti di specifiche funzionali per modelli computazionali della neuromodulazione. In particolare, si evidenzia come in un modello computazionale basato sulle CTRNN (Continuous Time Recurrent Neural Networks) la robustezza si possa modellare mediante un processo di isteresi e l'idoneità funzionale mediante una molteplicità di punti fissi stabili del sistema dinamico.
Preprint
Full-text available
Understanding and defining the meaning of "action" is substantial for robotics research. This becomes utterly evident when aiming at equipping autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey we thus first review existing ideas and theories on the notion and meaning of action. Subsequently we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a systematic literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook towards promising research directions.
Article
Full-text available
Converging evidence shows that hand-actions are controlled at the level of synergies and not single muscles. One intriguing aspect of synergy-based action-representation is that it may be intrinsically sparse and the same synergies can be shared across several distinct types of hand-actions. Here, adopting a normative angle, we consider three hypotheses for hand-action optimal-control: sparse-combination hypothesis (SC) - sparsity in the mapping between synergies and actions - i.e., actions implemented using a sparse combination of synergies; sparse-elements hypothesis (SE) - sparsity in synergy representation - i.e., the mapping between degrees-of-freedom (DoF) and synergies is sparse; double-sparsity hypothesis (DS) - a novel view combining both SC and SE - i.e., both the mapping between DoF and synergies and between synergies and actions are sparse, each action implementing a sparse combination of synergies (as in SC), each using a limited set of DoFs (as in SE). We evaluate these hypotheses using hand kinematic data from six human subjects performing nine different types of reach-to-grasp actions. Our results support DS, suggesting that the best action representation is based on a relatively large set of synergies, each involving a reduced number of degrees-of-freedom, and that distinct sets of synergies may be involved in distinct tasks.
Article
Full-text available
Non-verbal communication is the basis of animal interactions. In dyadic leader-follower interactions, leaders master the ability to carve their motor behaviour in order to 'signal' their future actions and internal plans while these signals influence the behaviour of follower partners, who automatically tend to imitate the leader even in complementary interactions. Despite their usefulness, signalling and imitation have a biomechanical cost, and it is unclear how this cost-benefits trade-off is managed during repetitive dyadic interactions that present learnable regularities. We studied signalling and imitation dynamics (indexed by movement kinematics) in pairs of leaders and followers during a repetitive, rule-based, joint action. Trial-by-trial Bayesian model comparison was used to evaluate the relation between signalling, imitation and pair performance. The different models incorporate different hypotheses concerning the factors (past interactions versus online movements) influencing the leader's signalling (or follower's imitation) kinematics. This approach showed that (i) leaders' signalling strategy improves future couple performance, (ii) leaders used the history of past interactions to shape their signalling, (iii) followers' imitative behaviour is more strongly affected by the online movement of the leader. This study elucidates the ways online sensorimotor communication help individuals align their task representations and ultimately improves joint action performance. © 2015 The Author(s).
Article
Full-text available
Recent theories of mindreading explain the recognition of action, intention, and belief of other agents in terms of generative architectures that model the causal relations between observables (e.g., observed movements) and their hidden causes (e.g., action goals and beliefs). Two kinds of probabilistic generative schemes have been proposed in cognitive science and robotics that link to a "theory theory" and "simulation theory" of mindreading, respectively. The former compares perceived actions to optimal plans derived from rationality principles and conceptual theories of others' minds. The latter reuses one's own internal (inverse and forward) models for action execution to perform a look-ahead mental simulation of perceived actions. Both theories, however, leave one question unanswered: how are the generative models - including task structure and parameters - learned in the first place? We start from Dennett's "intentional stance" proposal and characterize it within generative theories of action and intention recognition. We propose that humans use an intentional stance as a learning bias that sidesteps the (hard) structure learning problem and bootstraps the acquisition of generative models for others' actions. The intentional stance corresponds to a candidate structure in the generative scheme, which encodes a simplified belief-desire folk psychology and a hierarchical intention-to-action organization of behavior. This simple structure can be used as a proxy for the "true" generative structure of others' actions and intentions and is continuously grown and refined - via state and parameter learning - during interactions. In turn - as our computational simulations show - this can help solve mindreading problems and bootstrap the acquisition of useful causal models of both one's own and others' goal-directed actions.
Article
Full-text available
It is generally unknown how to formally determine whether different neural networks have a similar behaviour. This question intimately relates to the problem of finding a suitable similarity measure to identify bounds on the input-output response distances of neural networks, which has several interesting theoretical and computational implications. For example, it can allow one to speed up the learning processes by restricting the network parameter space, or to test the robustness of a network with respect to parameter variation. In this paper we develop a procedure that allows to compare neural structures among them. In particular, we consider dynamic networks composed of neural units characterised by non-linear differential equations, described in terms of autonomous continuous dynamic systems. The comparison is established by importing and adapting from the formal verification setting the concept of δ−approximate bisimulations techniques for non-linear systems. We have positively tested the proposed approach over continuous time recurrent neural networks (CTRNNs).
Article
Full-text available
There is wide consensus that the prefrontal cortex (PFC) is able to exert cognitive control on behavior by biasing processing toward task-relevant information and by modulating response selection. This idea is typically framed in terms of top-down influences within a cortical control hierarchy, where prefrontal-basal ganglia loops gate multiple input–output channels, which in turn can activate or sequence motor primitives expressed in (pre-)motor cortices. Here we advance a new hypothesis, based on the notion of programmability and an interpreter–programmer computational scheme, on how the PFC can flexibly bias the selection of sensorimotor patterns depending on internal goal and task contexts. In this approach, multiple elementary behaviors representing motor primitives are expressed by a single multi-purpose neural network, which is seen as a reusable area of "recycled" neurons (interpreter). The PFC thus acts as a "programmer" that, without modifying the network connectivity, feeds the interpreter networks with specific input parameters encoding the programs (corresponding to network structures) to be interpreted by the (pre-)motor areas. Our architecture is validated in a standard test for executive function: the 1-2-AX task. Our results show that this computational framework provides a robust, scalable and flexible scheme that can be iterated at different hierarchical layers, supporting the realization of multiple goals. We discuss the plausibility of the "programmer–interpreter" scheme to explain the functioning of prefrontal-(pre)motor cortical hierarchies.
Article
Full-text available
The goal of this research is to understand the underlying mechanism of the behavioral diversity of animals and then use the findings to build truly adaptive robots. Behavioral diversity is an inherent feature of all animals, and it is also important for robots to perform adaptively in unknown and dynamically changing environments. This feature enables animals to select adaptive behavior from among versatile behaviors. However, most designers have avoided or ignored behavioral diversity while constructing artificial systems, with the aim of achieving highly optimized performance in specific environments for given tasks; this leads to vulnerability of these systems to environmental changes. To understand how behavioral diversity can be embedded into artificial systems, we focus on a large amoeba-like unicellular organism, i.e., the plasmodium of true slime mold (Physarum polycephalum), in this study. Despite the absence of a central nervous system, the plasmodium exhibits various types of locomotion (i.e., exploratory, taxis, and escape behaviors) and switches its behavior depending on the environment. Inspired by this primitive yet intelligent living organism, we build a modular robot that exhibits exploratory and taxis locomotions, and spontaneously switches between them in a fully decentralized manner according to the situation encountered. The results are expected to shed new light on a design scheme for lifelike robots that exhibit amazingly versatile and adaptive behaviors.
Article
Full-text available
It has long been recognized that humans (and possibly other animals) usually break problems down into smaller and more manageable problems using subgoals. Despite a general consensus that subgoaling helps problem solving, it is still unclear what the mechanisms guiding online subgoal selection are during the solution of novel problems for which predefined solutions are not available. Under which conditions does subgoaling lead to optimal behaviour? When is subgoaling better than solving a problem from start to finish? Which is the best number and sequence of subgoals to solve a given problem? How are these subgoals selected during online inference? Here, we present a computational account of subgoaling in problem solving. Following Occam's razor, we propose that good subgoals are those that permit planning solutions and controlling behaviour using less information resources, thus yielding parsimony in inference and control. We implement this principle using approximate probabilistic inference: subgoals are selected using a sampling method that considers the descriptive complexity of the resulting sub-problems. We validate the proposed method using a standard reinforcement learning benchmark (four-rooms scenario) and show that the proposed method requires less inferential steps and permits selecting more compact control programs compared to an equivalent procedure without subgoaling. Furthermore, we show that the proposed method offers a mechanistic explanation of the neuronal dynamics found in the prefrontal cortex of monkeys that solve planning problems. Our computational framework provides a novel integrative perspective on subgoaling and its adaptive advantages for planning, control and learning, such as for example lowering cognitive effort and working memory load.
Article
The dynamical systems approach in the cognitive and behavioral sciences studies how systems made of many coupled components across brain, body, and environment self-organize to generate behavior. This approach has mostly focused on models of single actions and has not addressed how a dynamical system can engage in multiple different directed actions. In this paper, we introduce a family of artificial life models that demonstrate how dynamical agents can engage in multiple different actions and autonomously switch between them. These described agents engage in a food foraging task, and are driven by both internal, metabolic variables and external, sensory variables. The analysis of one of these agents demonstrates how different actions can arise through transient modes of sensorimotor coordination, in which a subset of the available sensors and effectors become engaged while others are ignored. Transitions between actions are analyzed and shown to correspond to rapid movements through the agent's state space. In these transitions, some of the previously controlling sensors and effectors disengage, and new sets of sensors and effectors are engaged.