Conference PaperPDF Available

Social Action in Socially Situated Agents

Authors:

Abstract and Figures

Two systems pursuing their own goals in a shared world can interact in ways that are not so explicit-such that the presence of another system alone can interfere with how one is able to achieve its own goals. Drawing inspiration from human psychology and the theory of social action, we propose the notion of employing social action in socially situated agents as a means of alleviating interference in interacting systems. Here we demonstrate that these specific issues of behavioural and evolutionary instability caused by the unintended consequences of interactions can be addressed with agents capable of a fusion of goal-rationality and traditional action, resulting in a stable society capable of achieving goals during the course of evolution.
Content may be subject to copyright.
Social Action in Socially Situated Agents
Chloe M. Barnes, Anik´
o Ek´
art and Peter R. Lewis
Aston Lab for Intelligent Collectives Engineering (ALICE)
Aston University, Aston Triangle, Birmingham B4 7ET, UK
Email: {barnecm1 — a.ekart — p.lewis}@aston.ac.uk
Abstract—Two systems pursuing their own goals in a shared
world can interact in ways that are not so explicit - such that
the presence of another system alone can interfere with how
one is able to achieve its own goals. Drawing inspiration from
human psychology and the theory of social action, we propose
the notion of employing social action in socially situated agents
as a means of alleviating interference in interacting systems.
Here we demonstrate that these specific issues of behavioural and
evolutionary instability caused by the unintended consequences
of interactions can be addressed with agents capable of a fusion
of goal-rationality and traditional action, resulting in a stable
society capable of achieving goals during the course of evolution.
I. INTRODUCTION
Socio-technical systems are growing increasingly complex,
comprising many interacting components that pursue their
own, often conflicting, goals [1]. Vehicular networks [2],
smart energy grids [3], and trading agents [4] are examples
of complex socio-technical systems, which face uncertainties
caused by the interaction and co-evolution of many different
components. These systems operate in dynamic and uncertain
environments where interactions are potentially unanticipated,
requiring learning and reasoning on the fly; one approach to
tackling uncertainty is designing systems with self-awareness,
to continuously react to dynamicity in both their environment
and interactions. Socio-technical systems are currently unable
to perceive and reason about the self-awareness capabilities of
other systems [5], or the impact other systems have on their
own ability for learning, evolution and expression [6]; they
are also unaware of their own impact on the world around
them; this can have a catastrophic and unpredictable effect. For
example, a US$1 trillion stock market crash occurred in just
36 minutes in 2010, caused at least in part by the unforeseen
interactions of several automated trading agents [7]. These
unintended consequences of interaction and the subsequent
learning that takes place can have a large and diverse impact
on the world. Therefore, socio-technical systems that are aware
of their place in their own society and are able to judge
their impact on - and the impact they receive from - their
world, would require social self-awareness [8] or networked
interaction-awareness [5]. Social action [6] is expressing and
acting upon this knowledge.
We explore the impact that simply introducing two systems
into a shared world has on their own abilities to learn and
achieve goals, compared to existing in an individual world. It
is logical to act in ways that maximise the chance of meeting
goals, thus resulting in self-interested goal-rational behaviour.
When pursuing individual goals in a shared environment, for
example in a social dilemma or socio-technical system, goal-
rationality may not be optimal in the long-term. In other words,
self-interested agents may succeed individually but will be
worse off if all agents are self-interested, such that ‘individ-
ual rationality leads to collective irrationality’ [9]. Explicit
interactions can be designed for when integrating systems,
however, not understanding the unintended interactions with
others in a shared space can lead to worse performance [10].
Humans express more complex and diverse behaviours than
pure goal-rational action, as a result of social intelligence [11];
possessing the abilities to mimic or coordinate [12] (a form
of social learning [13]), and perceiving one’s similarities and
differences to others have enabled humans to maintain and
extend the capacity for self-awareness during evolution [11].
Perceiving and reacting appropriately to social situations is
therefore beneficial for both advancing oneself and acting
cohesively when interacting with others. In human societies, it
is not possible to know of all people or how to act best in any
given situation; there is no global knowledge or central point of
control to manage society. How then, is it possible for humans
to interact effectively, and how can we take inspiration from
this? Organic Computing approaches this by observing and
controlling a group of interacting, self-organising entities [14],
but employing a microsociological approach may create agent
societies capable of socially intelligent action at the individual
level. The theory of social action describes the motivations
for social actions in humans, which hold meaning [15], with
goal-rationality being one example. We take inspiration from
social action theory to mitigate the consequence of unintended
interactions of agents in a shared space. We developed the
River Crossing Dilemma (RCD) to explore arbitrarily complex
problems, and the impact that the unintended interactions
arising from coexistence has on learning.
II. RE LATE D WORK
Social behaviour is also observed in non-human species,
where interactions enable individuals to achieve goals in a
shared space; interacting with others means the group can
achieve more than individuals alone. Self-organisation and
cooperation in ants and bees has inspired the development of
optimisation algorithms [16], [17]. Swarm robotics is inspired
by the cooperation in groups of social animals to achieve
goals or complete tasks [18]. In swarm robotics systems,
individual robots interact and cooperate with others, where
the swarm can solve more complex tasks than the single
robots [19]. Pursuing self-interested action in a social setting
however can lead to collective irrationality [9]. However,
sociality through self-organising institutions can enable groups
of self-interested individuals to govern themselves, supporting
sustainable management of common pool resources [20].
Sociality is described as the cooperation and organisation of
two or more agents in a shared world [21]. These agents are
goal-oriented, and are social because their actions positively
or negatively interfere with one another in terms of achieving
goals. Systems that intentionally cooperate, coordinate, or act
socially, require social awareness [6], where agents have an
understanding of others. The evolution of cooperation has been
extensively explored in the literature [22], [23], with social
dilemmas such as the Prisoner’s Dilemma or the Snowdrift
Game used to explore social dynamics and strategies in social
situations [9], [24]. Although this study explores agents that
evolve, and may evolve to cooperate, in this paper, promoting
cooperation is not the focus. Rather, our questions concern the
impact of coexistence on whether agents continue to achieve
their own goals in a shared environment, and the stability of
this outcome. We also explore the notion of traditional social
action and its effect on behaviour in socially situated agents.
III. INS PI RATI ON FRO M TH E THE ORY OF SOCIAL ACTION
Unintended interactions can occur when existing in a shared
environment; consequently, achieving shared and individual
goals can be more difficult due to interference. Humans
overcome these issues by acting socially and not purely
individualistically, when situated in a social environment.
When discussing interactions in systems, it therefore seems
logical to draw parallels with the exploration of human social
phenomena in sociology. Many computer science researchers
have been inspired by theories of psychology, sociology and
cognitive science, such as in Organic Computing [14], self-
awareness [25], [26], and social dilemmas, altruism and agent
societies [27]–[29]. To explore social concepts in agents,
one must first adopt and define the relevant terminology.
Weber defines social actions as those oriented towards and
considering the behaviour of others [15], where an action
holds a ‘subjectively understandable’ meaning. Actions with
inanimate objects are thus not considered social as no other
actor is involved. Additionally, Sztompka clarifies common
terms in a hierarchy of social action [30]:
Social Behaviour holds no meaning: it is reactive
Social Action holds meaning to the actor through a
rational decision to act in a certain way
Social Interaction requires responses from both actors.
We explore social action in agents as defined by these terms,
as a prerequisite to social interaction.
A. Ideal Types of Social Action
Weber’s social action theory defines four ‘idealtypus’ [15] of
social action; these ideal types describe the motivations behind
social actions in a simplified model, to aid analysis of complex
human actions.
Instrumental-Rational Action: chosen for its effectiveness in
achieving a goal; related aspects such as other goals, possible
actions, and consequences are considered. Often called goal-
rational action, the means of satisfying the goal are chosen
specifically, and are justifiable. An example is planning actions
to reach long-term goals. Most agents, especially in machine
learning, are instrumental-rational by this definition.
Value-Rational Action: determined by values or beliefs one
holds, such that the action itself carries meaning rather than the
result of the action. Actions are rationalised by one’s ethical
or religious beliefs, or to any cause one may value. The more
highly-regarded the value, the less rational the action; the
consequences are considered less when choosing the action.
Rationality is a justifiable, conscious decision of how one acts.
Pure value-rational action disregards any personal cost, such
as a soldier sacrificing himself for another - the value held
outweighs the consequence of the action.
Affective Action: a reaction to an emotional state, such as
acting out of anger or passion. An ‘exceptional stimulus’ [15]
prompts an impulsive action. Affective action is inherently
irrational, as the consequences of the action may not be
considered and thus may be difficult to justify. An example
is striking someone out of rage.
Traditional Action: ahabitual action or acting according to
a cultural custom. These actions can be described as mindless,
automatic or ritualistic, as ‘it has always been done this way.’
An example is eating with a knife and fork; it is not questioned
and is second nature, with no obligation to act this way.
B. Social Action in Computational Systems
Computational systems with actions determined by objective-
function-based search or error-function-based learning are
therefore goal-rational; they are engineered to maximise their
ability to achieve a particular goal. However, this paper shows
that there are unintended consequences associated with inter-
action between goal-rational agents that coexist in a shared
world, that can manifest as instability and a loss of ability
to achieve one’s own goals. These are due to interference on
each other through the environment, and a lack of knowledge
of the impact this has. In humans, evolution has favoured
social behaviour to deal with issues arising from living in the
presence of others [31]. Inspired by this, we explore if and
how computational systems may also begin to overcome these
issues by operationalising other ideal types of social action.
Social action can be operationalised in different ways.
Traditional action could simply be choosing the action that
most others are doing; this would form traditions over time.
Value-rational action will vary depending on the system and
the values held; values would be especially critical when
systems make decisions on behalf of humans. This poses
the question of how one can trust that the decisions made
align with the values held by the human. Affective action
could be taken when one does not know how to proceed
in an unencountered situation; as such, all actions may be
irrational unless abstracting previous knowledge. Affective
action may occasionally be beneficial if the system is able to
learn something new; however, as in humans, affective action
would be expected to not be the most effective. This work
explores the notion of traditional social action and its effect
on behaviour in socially situated agents.
IV. THE RI VER CRO SS IN G DILEMMA:
A SHARED-ENVIRONMENT TES TBED FOR SOCIAL AGENTS
An appropriate testbed is needed to explore the unintended
consequence of interaction in a shared environment. Socio-
technical systems comprise many components that occupy a
shared space and pursue individual goals; the testbed should
represent this in a minimal way, that can be extended to
explore arbitrarily complex interactions. We have developed
a testbed that extends the River Crossing Task [32] - a
simple grid-world for problem-solving. We call this testbed
for exploring interaction the River Crossing Dilemma.
A. The River Crossing Task
The River Crossing Task (RCT) is a 2D grid-world problem
designed to explore how agents learn to solve complex tasks
with no a priori knowledge of the task or their environment. It
is therefore impossible for agents to predetermine a sequence
of actions - they must be capable of reacting to dynamic
elements and previously unseen environments on the fly [32].
An RCT instance comprises an n×ngrid, with a river
formed from Water objects. An agent’s goal is to retrieve the
Resource from the opposite side of the river, which provides a
large positive fitness. Conversely, falling in the river provides
a large negative fitness as the agent drowns. This presents a
challenge: the river is impassable yet agents can only achieve
their goal by crossing it; they must therefore learn to achieve
sub-goals such as building a bridge with Stones to cross safely.
Existing approaches [32]–[34] to RCT agent design use a
two-layered neural network architecture capable of reacting
to dynamic environments (such as a change in environment
size or configuration) without requiring a priori knowledge,
using neuroevolution as a learning algorithm. The deliberative
layer is a fully-connected neural network that generates sub-
goals based on the agent’s current state. The reactive layer
uses these goals to create an activation landscape through
a topologically-organised n×nlattice of neurons; this is
used to hill-climb towards sub-goals. The method of activation
propagation is inspired by the shunting model proposed by
Yang and Meng [35], [36], characterised by a biologically-
inspired equation [37]. This calculates the activity of each
neuron based on its own activation and the activity surrounding
it (Equation 1). Arepresents the passive decay rate, xiis the
current neuron, Iis the Iota value of the neuron (corresponding
to the sub-goals from the deliberative layer: for a value of 1,
I= 15, for a value of 1,I=15 and I= 0 otherwise),
wij is the weight between neurons xiand xjwhere xjis one
of the surrounding cells in xi’s Moore neighbourhood (where
k= 8). This is calculated for each neuron at each time-step
to create a dynamic activation landscape.
dxi
dt =Axi+Ii+
k
X
j=1
wij [xj]+(1)
B. The River Crossing Dilemma
Here we propose the River Crossing Dilemma (RCD), which
extends the RCT to explore social situations with multiple
agents. These may be tractable social dilemmas, such as in
this paper, but in general are not constrained in their com-
plexity, since RCD instances may be designed to be arbitrarily
complex. We designed the RCD to explore environmental
interference caused by agents that pursue individual goals in
a shared environment, and how this affects their learning and
ability to achieve goals. The RCD is a 19×19 grid with a two
cell deep river, formed of Water objects (Figure 1); a bridge is
successfully built if two Stones are placed in the same space
in the river. An agent’s individual goal is to collect both of
its allocated Resources - one from either side of the river. The
RCD uses a reduced set of RCT objects [32] (which included
things like Traps), as extra objects increase the task complexity
without contributing to the exploration of interference; agents
only encounter objects sufficient to achieve their goals.
Fig. 1: The River Crossing Dilemma:
A Gamified Shared Environment for Studying Social Agents
The RCD is gamified through introducing a cost for placing
Stones in the river, inspired by the Snowdrift Game [38] (also
known as the Chicken Game [9], [39], [40] or the Hawk-Dove
Game [41]); this is a two-person social dilemma with a cost
for cooperation. Gamification adds a subtle complexity to the
task as there is less incentive to cooperate, but a worse result
for defection, through not being able to achieve one’s goal.
An agent’s fitness, or payoff, is calculated with Equation 2.
pi=ri
ni
c×si
21 + sif(2)
Cooperation in social dilemmas is influenced by knowing
the dilemma exists, and being able to reason about morals
and norms [42]; willingness to cooperate can be negatively
influenced if the dilemma’s characteristics are unknown or
dynamic [40] - an inherent feature of a shared environment
where another’s actions change one’s view of the world. To
calculate the payoff pfor agent iwith Equation 2, riis how
many Resources pihas collected, niis the total number of
Resources to collect per agent to achieve their individual goal,
cis the cost of placing a Stone in the river and siis the number
of Stones pihas placed in the river. cand niare constants, with
c= 0.1and n= 2.f= 1 if the agent falls in the river, and is
otherwise 0. The cost of placing Stones increases as more are
placed, encouraging agents to exert the least effort to achieve
their goal. This equation evaluates agent payoff individually,
meaning the payoff received is independent of others; this also
allows agents to learn alone in the environment.
Table I shows a simplified payoff matrix using Equation 2.
The highest payoff when alone is pi= 0.7as two Stones
must be placed. When two agents share an environment, one
can receive a payoff of pi= 1.0by exploiting the other, who
receives pi= 0.7. The overall optimal payoff is pi= 0.9
when agents cooperate by sharing the cost.
TABLE I: Payoff Matrix, where Sxis Stones placed by Agentx
Agent 1
Agent 2 S2= 0 S2= 1 S2= 2
S1= 0 0.0
0.0
0.0
-0.1
1.0
0.7
S1= 1 -0.1
0.0
0.9
0.9
0.9
0.7
S1= 2 0.7
1.0
0.7
0.9
0.7
0.7
C. Agent Design
Agents use a two-layered neural network architecture, inspired
by Robinson et. al [32]. The deliberative layer generates high-
level sub-goals with a neural network; the input layer has six
neurons, the hidden layer has four neurons, and the output
layer has three neurons which represent sub-goals (Figure 2).
Inputs correlate to the agent’s current state: Grass,Resource,
Water or Stone, if it is Carrying aStone and if a partial Bridge
exists (i.e. one Stone in the river). The reactive layer uses
the shunting equation (Equation 1) to create dynamic activity
landscapes based on current sub-goals, enabling agents to hill-
climb; output values are 1for attraction, 0for neutral or 1
for avoidance, and correlate to Resources,Stones and Water.
BCS
W
RG
W
SR
Fig. 2: Deliberative Neural Network for Generating Sub-Goals
Inputs: Grass, Resource, Water, Stone, Carrying Status, Partial Bridge Built
Outputs: Resource, Stone, Water
D. Experimental Setup
The experiments in this paper use the following common
parameters, inspired by previous work [32], [33]. A population
of 25 randomly initialised agents were evolved with a Steady
State Genetic Algorithm, where agents were given 500 time-
steps to solve the task; the number of generations is specified
per experiment. At each generation, three agents from the
population were randomly selected to compete in a tourna-
ment, where the worst-performing agent was replaced by the
offspring of the winners. For each chromosome (weights of
each neuron), the offspring had a probability of Pone = 0.95
to inherit from a random parent; single-point crossover was
otherwise used. The weights of the resulting offspring were
then mutated by a random value from a Gaussian distribution
with mean µ=wcurrent and variance σ= 0.01. Fitnesses
were calculated with Equation 2. For experiments with two
agents, agents learn alone for a period of 500,000 generations
before being introduced into a shared environment.
V. LEARNING ALONE WITH GOAL -RATIONA LI TY
In order to explore whether environmental interference exists
in an environment shared by two agents, one must first explore
how agents learn to achieve their individual goals when they
exist alone. Ten agents were evolved for 500,000 generations
in their own individual environments. This section shows that
an agent in an environment alone can achieve individual goals,
by learning independently through pursuing goal-rational ac-
tion. Agents engage in independent asocial learning based
on the received fitness, directly derived from their ability to
achieve the goal with minimal effort (Equation 2). An example
of this evolution is depicted in Figure 3.
Fig. 3: Average Population Fitness of an Individual Agent Evolving Alone:
Agents initially learn to collect one Resource to get a fitness of 0.5; this
quickly increases around generation 50,000 to 0.7, indicating the agents
learn how to place two Stones in the river in order to collect both Resources
A. Results
Agents encounter a very large, neutral network landscape
during evolution; the time taken to evolve goal-achieving
behaviours can vary dramatically. Whilst not impossible to
achieve, this task initially appears difficult to solve simply
because the fitness function does not ‘lead’ agents towards
their goals with incremental rewards. In each experiment, the
behaviours required to solve the task were stable by and
maintained from generation 50,000. Figure 4 shows an average
performance of all ten agent populations during evolution.
Random mutations during breeding periodically create agents
with lower fitnesses, which may fall in the river for example,
thus reducing the overall fitness average. These solutions are
replaced quickly, leaving the beneficial behaviours to remain.
These ten, individually evolved agents are henceforth labelled
Agents A through J.
Fig. 4: Average Population Fitness of 10 Individual Agents Evolving Alone:
All ten agents that evolved alone were able to sustain the behaviours
necessary to achieve their goal by generation 50,000
B. Implications
These results show that goal-rational action will stabilise
when agents exist alone; the impact of interference between
agents in a shared environment can henceforth be observed by
comparing the performance to when they exist when alone.
VI. COEXISTENCE AND INSTABILITY
Two agents existing and pursuing goal-rational action in a
shared environment is enough to change their behaviour,
compared to existing alone. This is due to interference from
the presence of another agent - even though both agents can
achieve goals independently. The first set of experiments did
not evolve agents further, whilst the second set evolved agents
together. Agent fitnesses were calculated the same as when
they were alone for comparison.
A. Results: Coexistence Without Ongoing Learning
Ten random pairs of agents were selected from Agents A
to J; these were put into a shared environment to study the
effect that co-existence can have on learnt behaviours and the
ability to achieve goals. The ten experiments ran for 500,000
Fig. 5: Average Fitnesses with No Evolution:
Agent B exploits Agent H, receiving a higher average fitness from not
exerting as much effort; Agent H largely performs the same as when alone
iterations without evolution. The observed fitnesses differ to
when existing alone, as stable, learnt behaviour is affected by
the actions of the other agent. As the results are dependent on
the interactions with a specific partner, graphing an average
of all ten experiments would mask the specific interactions. A
representative sample of three experiments showing average
population fitnesses are therefore shown in Figures 5 to 7.
Fig. 6: Average Fitnesses with No Evolution:
Agent B is more unpredictable than when alone, and is often unable to
achieve its goal; Agent G largely performs the same as when alone
Agents are affected in one of three general ways in these
non-evolutionary experiments: one is able to exploit the other
to receive a higher payoff (Figure 5), they mostly co-exist and
reach their own goals just as they were able to alone (Figure 6),
or one or both are unable to reach their goals because of
the behavioural interference that now exists (Figure 7). In the
latter, this may cause agents to continue putting more Stones in
the river which accrues a larger cost, or it may even alarmingly
make them walk into the river - simply due to the behavioural
interference caused by the actions of another changing their
view of the world. The fitnesses fluctuate based on the actions
of both individuals at every iteration, and as such appear very
unstable. Emergent exploitative behaviour has no long-term
implication as no further learning is involved; if an agent was
placed back into an environment alone, it would continue to
Fig. 7: Average Fitnesses with No Evolution:
Agent B cannot achieve its goal; Agent F performs the same as when alone
achieve its goals just as before. The critical observation here
is that merely sharing an environment is enough to change
one’s behaviour and ability to achieve goals, due to the world
changing in unexpected ways. To explore this interference
further, agents were then coevolved.
B. Results: Coexistence with Ongoing Learning
30 random pairs of agents were selected from Agents A to J to
explore how environmental interference affects coevolution in
a shared environment. The 30 experiments evolved the agents
in an shared environment for 1,500,000 generations. Table II
shows common fitnesses and their associated behaviours.
Fig. 8: Best in Population with Continued Evolution:
A very unstable evolution. Cooperation is seen with a 0.9fitness. Agents
periodically are unable to achieve their goals when the fitness is below 0.7.
Exploitative behaviour stabilises, where Agent B exploits Agent F
Learning with another can potentially interfere with learnt
knowledge, and one’s ability to solve tasks independently.
However, this also presents the opportunity to exploit others to
receive a higher payoff at the expense of the other agent; this
was the most common observation. In all experiments, at least
one agent was observed to evolve different behaviours when
compared to evolving alone; no instance was observed where
both agents sustained their individually learnt behaviours. This
shows that interference can cause stable behaviours to be
Fig. 9: Best in Population with Continued Evolution:
Both agents initially lose their ability to achieve goals, with fitness 0.5; this
then stabilises into Agent F exploiting Agent D
altered, even when not explicitly interacting with another.
Figures 8 to 12 depict the best fitness in each population,
to show that learnt knowledge can be lost with co-evolution.
Figure 8 shows the most extreme case of interference
observed; the agents evolved to be codependent. Both endure
periods of being unable to achieve their goals (a fitness
below 0.7), with Agent B being more negatively affected than
Agent F. Knowledge loss, exploitation and cooperation are all
observed; one can thus postulate that the effect of interference
can be great, complex and uncertain at the same time. What
each agent learns depends on the actions and learning of the
other. This codependency and instability shows how much of
an impact the presence of another can have on learning.
Fig. 10: Best in Population with Continued Evolution:
Cooperation is initially observed with a fitness of 0.9, however Agent G is
frequently exploitative, resulting in Agent D sustaining its independent
behaviour and being exploited by Agent G
Other dynamics that can be observed to a lesser extent
are periodic dips in fitness, which then lead to exploitative
behaviour (Figure 9). Initial exploitation in early evolution
leads to a mutual loss of fitness and inability to achieve goals.
This then evolves into an exploitative relationship. The spikes
observed in Agent D’s fitnesses indicate that occasionally
Agent F will cooperate, but is unreliable and predictably
self-interested. As a result, Agent D sustains its independent
behaviour, whilst Agent F evolves to capitalise on this.
Figure 10 shows an initial period of cooperation (also
seen in Figure 8). This devolves into a stable, exploitative
relationship. This is again due to Agent D learning it is more
beneficial to be independent, as Agent G is not predictable
enough to depend on for cooperation.
Figures 11 and 12 each show a coevolved exploitative
relationship between the two agent populations. As with
Figure 9, peaks in the exploited agent’s fitness indicate that
TABLE II: Commonly Observed Agent Fitnesses and Behaviours
Resources
Stones S= 0 S= 1 S= 2
R= 0 0.0 -0.1 -0.3
R= 1 0.5 0.4 0.2
R= 2 1.0 0.9 0.7
Fig. 11: Best in Population with Continued Evolution:
Agent J sustains independence with a fitness of 0.7; Agent C exploits Agent
J to receive a fitness of 1.0
the exploitative agent occasionally helps to share the cost
of building a bridge; in this case, the exploited agent can
periodically access a higher fitness than when acting alone.
This cooperative behaviour is lost fairly early during evolution,
as agents find and maintain a better fitness (i.e. an exploitative
payoff of 1.0). Exploitative behaviour therefore prevails; this,
again, evidences that environmental interference affects how
and what one learns, and thus the behaviours maintained
during evolution. The remainder of the 30 experiments with
continued evolution not depicted fall under this category of
stabilised exploitation, which is most commonly observed.
Fig. 12: Best in Population with Continued Evolution:
Agent E sustains independence with a fitness of 0.7; Agent F exploits
Agent E to receive a fitness of 1.0
C. Implications
This section shows that stable behaviour is not maintained
when agents share an environment in which they pursue
individual goals. The unintended interactions change each
agent’s perception of the world; if agents cannot perceive that
the cause of the change in environment is due to another agent,
they will adapt their stable goal-achieving behaviour, and thus
lose this knowledge. Exploitative agents lose the ability to
interact with Stones over time when learning, as they depend
on the other agent to achieve their goals without explicitly
knowing this. This would be detrimental if exploitative agents
suddenly became alone in an environment, as they have
evolved to be codependent rather than independent.
Performance in a shared environment depends on what is
learnt when alone; performance also depends on who each
agent is paired with, and what knowledge they have. If one
agent acts in such a way that the other agent encounters an
unexpected state, it may not know how to act appropriately;
for example, it may continue to put Stones in the river when a
bridge has already been built, in which case it will achieve a
lower fitness. An inability to overcome unexpected situations
and unintended interactions means agents are susceptible to
knowledge loss and a change in behaviour; this produces
instability and unpredictability in evolution. We operationalise
traditional social action to mitigate this instability.
VII. TRADITIONAL ACTION CAN PROMOT E STABIL IT Y
This section explores the impact that traditional and goal-
rational action have on coexistence and learning. Motivated
by the observed change in agent behaviour due to interference
in a shared environment, traditional social action [15] was
operationalised to reduce the impact of said interference on
agents; this also led to an increase in predictability and stability
in socially situated agents when pursuing individual goals.
Traditional action was introduced into the breeding process,
in addition to the current goal-rational action. The worst-
performer of each tournament is henceforth replaced by the
current goal-rational offspring of the best two parents at a 90%
chance, and with an offspring that is a representative state of
the population at that time-step the remaining 10% of the time.
This representative state is captured with the calculated median
of all weights in the deliberative layer across all agents in the
population; this allows the potential to establish traditions over
time that fluctuate with the fluidity of society as it evolves.
Higher rates of traditional social action saturate the population,
such that it is increasingly difficult for evolution to explore the
fitness landscape, whereas lower rates hold less emphasis on
forming and stabilising traditions. The effect of this traditional
social action was explored with the same 30 pairs of agents
as Section VI-B, and were evolved for 1,500,000 generations.
A. Results
Figures 8 and 13 depict the same pair of agents when subjected
to continued evolution with pure goal-rational action, and a
blend of goal-rational and traditional social action respectively.
The once chaotic evolution of agents in Figure 8 has been
drastically improved with traditional action (Figure 13). Agent
B still endures a period of knowledge loss as it learns about
the its new environment, however this period is much shorter
than seen previously; social traditions are established based on
the common behaviours captured in the population, and lead
the population towards stablility. Additionally from roughly
generation 200,000 in Figure 13, both agents are able to
receive a mutual benefit from the presence of the other, such
that they coevolve to learn cooperative behaviours that stabilise
Fig. 13: Best in Population with Continued Evolution and Traditional
Action: Agents B and F endure a lesser period of lower fitness, and stabilise
into cooperative behaviour at a mutual fitness of 0.9
throughout the rest of evolution. This is compared to Figure 8,
where both agents lose the ability to achieve their goals.
Cooperation can be seen briefly in Figure 8, however this
eventually devolves into an exploitative relationship; this brief
cooperation is not stable as the highest fitness for Agent B
continuously fluctuates between 0.9(cooperation, placing one
Stone and collecting two Resources) and 0.4(placing one
Stone but only collecting one Resource - thus not achieving
the goal). This happens because Agent B has learned it can
sometimes put one Stone in the river to achieve its goal, but
sometimes cannot; the reason behind this is that whilst Agent
B is performing cooperative behaviour, Agent F sometimes
does not build a bridge, so Agent B cannot achieve its
goal. This is the reason it falls back to solving the task
independently, to achieve a fitness of 0.7. With traditional
action, cooperative behaviour is maintained, therefore enabling
both agents to achieve their goals with the best overall payoff.
Figures 9 and 14 also present the same pair of agents;
a notable difference is that traditional action means agents
endure a smaller period of being unable to achieve their
goals at the beginning of evolution (Figure 14). Figure 9
shows agents cannot initially achieve goals for around 100,000
generations, due to the environmental interference of acting
for the first time with another. This is reduced significantly
with traditional action. Instead of a dip in fitness, agents
instead encounter a brief period of cooperation mixed with
exploitation, which devolves into exploitation of Agent D - just
as seen when without traditional action. Agents are observed
overall to benefit from traditional action in this scenario.
Figures 11 and 15 are also directly comparable. Both figures
show exploitative behaviour; however, employing traditional
action causes a period of roughly 100,000 generations where
both agents drop in fitness (Figure 15). The fitnesses here
correlate with agents losing the ability to build bridges, and
as they only collect one Resource each with a payoff of
0.5. Agents simultaneously learn about the actions of the
other, which are constantly changing, too. This anomalous
evolutionary event causes a temporary dip in fitness that is
Fig. 14: Best in Population with Continued Evolution and Traditional
Action: Agents endure a reduction in loss of knowledge and as such
stabilise their exploitative relationship in less time
not observed without traditional action - however this still
evidences that interference exists. Furthermore, whilst this dip
in fitness may be endured, it is the traditional action of the
population that also enables the agents to overcome this. This
indicates that evolving traditions can potentially aid agents
with learning how to cope with unexpected events, such that
they can re-stabilise their behaviour when this occurs.
Fig. 15: Best in Population with Continued Evolution and Traditional
Action: A temporary period of low fitness is endured, but importantly is
rectified with the traditions held by the populations
Figures 10 and 16 demonstrate that cooperation can exist in
part with exploitation. A fluctuation between a fitness of 0.9
and 1.0in Agent G indicates that it is not always possible for
it to sustain a purely exploitative behaviour, in which case it
must still know how to cooperate by building a bridge at least
in part. Agent D on the other hand is more independent, and
fluctuates between a fitness of 0.7and 0.9; this shows that
it is less affected by coevolution as it can achieve the goal
independently. Introducing traditional action means that both
of these agents have an increased opportunity for cooperation,
which has become stabilised for a longer period during the
evolution; agents therefore are generally much better off in
this scenario, as they have evolved to enjoy the benefits of
cooperation frequently, supported by their evolved behaviours.
Fig. 16: Best in Population with Continued Evolution and Traditional
Action: Agent G fluctuates between a fitness of 1.0which is exploitation and
0.9which is cooperative; Agent D fluctuates between a fitness of 0.9which
is cooperative, and 0.7which is independent. This is a stable equilibrium
Figures 12 and 17 show that traditional action (in all other
experiments) can be minimal on evolution. If pure goal-
rational agents evolve exploitative behaviours, it is observed
in the remainder of the experiments that this behaviour will
stabilise. In this sense, agents deal with the impact of interfer-
ence by evolving codependent relationships; one becomes fully
dependent on the other to achieve its goals, whilst the other
remains independent. This could potentially be detrimental if
agents then have to act alone, after becoming codependent. The
effect of environmental interference here is that exploitative
agents have evolved to unlearn all of the knowledge from
acting alone. This is a stark change in behaviour, simply
caused by sharing an environment with another.
Fig. 17: Best in Population with Continued Evolution and Traditional
Action: Agents E and F stabilise into an exploitative relationship
B. Implications
This section uses traditional action to mitigate the impact that
unintended interactions, arising from coexistence, can have on
agent learning and performance. Whilst an explicit awareness
of others is required for intentional cooperation [21], unin-
tentional cooperation can be observed due to agents receiving
a mutual benefit from each other, without understanding it.
Traditional action in this case enables agents to both pursue
their own goals, and achieve a better fitness overall, as a result
of emergent sociality [21]. In addition to stabilised cooperative
behaviour, coexisting agents are observed to stabilise their
behaviours faster as a result of establishing traditions; this
allows agents to tackle unforeseen events or unexpected states,
such that they can recover their goal-achieving behaviour.
VIII. CONCLUSIONS AND DISCUSSION
In this work we have explored the unintended interactions that
arise from coexisting in a shared environment, and the conse-
quences these have on learning and achieving goals; we devel-
oped a simple yet sufficient testbed called the River Crossing
Dilemma to explore this. Emergent exploitation, knowledge
loss, cooperation, or a combination of these characteristics can
be observed in goal-rational agents in a shared environment.
Evolving exploitative behaviour, and thus adapting how one
has already learnt to achieve goals alone, may be costly in
the long-term due to evolved dependence on another’s actions
to achieve individual goals. This would be detrimental if the
presence of another is unpredictable; codependence here may
result in a loss of learned behaviour, preventing agents from
achieving their goals independently. Additionally, the impact
of another in the environment may be so great that it derails
one’s own knowledge completely, thus preventing both agents
from achieving their goals. The occurrence of both agents
maintaining their own abilities to achieve their goals exactly
as when learning alone has not been observed. Whilst it
cannot be said that this is impossible, the failure of emergence
of sustained individuality in a shared environment provides
evidence for the hypothesis that co-existence has a greater
impact on learning and performance than expected.
Traditional social action enabled agents that shared
an environment to stabilise expected behaviour in fewer
generations than pure goal-rational action alone. Agents also
recovered from unexpected events or long periods of low
fitness with traditional action; there is potential for agents to
adopt a cooperative equilibrium in which both agents receive
mutual benefit from the other, without explicit knowledge
of the cause. This could be used to explore the emergence
of computational superstition [43]. In comparison, pure
goal-rational action was observed to consistently devolve into
exploitative relationships. Traditional action has proven to be
a beneficial step towards realising socially self-aware systems
that can manage the impact of others on themselves in a
socially acceptable way.
The results provide an essential building block to develop fur-
ther understanding about interactions in systems. It is intended
that this will eventually realise the aspiration for socially
self-aware systems; these will possess an awareness of their
surroundings, and an ability to judge the impact of their actions
on the society they are integrated into - be that purely technical
or socio-technical. Vehicular networks and smart energy grids
involve a number of components with often conflicting goals,
that exist in a shared space; these are undoubtedly subject
to interference, due to their socially situated nature. This
work has highlighted the impact that unintended interactions
between systems can have when systems coexist; this is not
as explicit as direct interactions, but can have a large effect
on whether systems are able to achieve individual goals in
a shared environment. Operationalising social action theory
and adopting a microsociological approach to social systems
has the potential to alleviate interference, and the unintended
consequence of interaction in these systems; next steps will
aim to ascertain the extent to which these results generalise.
REFERENCES
[1] P. R. Lewis, “Self-aware computing systems: From psychology to
engineering,” in Proceedings of the 2017 Design, Automation and Test
in Europe, DATE 2017, 2017, pp. 1044–1049.
[2] X. Fei, H. S. Mahmassani, and P. Murray-Tuite, “Vehicular network sen-
sor placement optimization under uncertainty,Transportation Research
Part C, vol. 29, pp. 14–31, 2013.
[3] D. Ngar-yin Mah, J. M. van der Vleuten, J. Chi-man Ip, and
P. Ronald Hills, “Governing the transition of socio-technical systems: A
case study of the development of smart grids in Korea,Energy Policy,
vol. 45, pp. 133–141, 6 2012.
[4] D. Cliff and L. Northrop, “The Global Financial Markets: An Ultra-
Large-Scale Systems Perspective. Springer, 2012, pp. 29–70.
[5] L. Esterle and J. N. Brown, “Levels of Networked Self-Awareness,” in
2018 IEEE 3rd International Workshops on Foundations and Applica-
tions of Self* Systems (FAS*W). IEEE, 9 2018, pp. 237–238.
[6] K. Bellman, J. Botev, H. Hildmann, P. R. Lewis, S. Marsh, J. Pitt,
I. Scholtes, and S. Tomforde, “Socially-Sensitive Systems Design:
Exploring Social Potential,” IEEE Technology and Society Magazine,
vol. 36, no. 3, pp. 72–80, 2017.
[7] U.S. CFTC and U.S. SEC, “Findings regarding the market
events of May 6, 2010,” Tech. Rep., 2010. [Online]. Available:
https://www.sec.gov/news/studies/2010/marketevents-report.pdf
[8] E. Diener and T. K. Srull, “Self-awareness, psychological perspective,
and self-reinforcement in relation to personal and social standards.”
Journal of Personality and Social Psychology, vol. 37, no. 3, pp. 413–
423, 1979.
[9] P. Kollock, “Social Dilemmas: The Anatomy of Cooperation,” Annual
Review of Sociology, 1998.
[10] G. H. Walker, N. A. Stanton, D. Jenkins, P. Salmon, M. Young,
and A. Aujla, “Sociotechnical Theory and NEC System Design,” in
Engineering Psychology and Cognitive Ergonomics. Springer, 2007,
pp. 619–628.
[11] G. G. Gallup, “Self-awareness and the evolution of social intelligence,
Behavioural Processes, 1998.
[12] D. J. Shaw, K. Czek´
oov´
a, J. Chromec, R. Mareˇ
cek, and M. Br´
azdil,
“Copying you copying me: Interpersonal motor co-ordination influences
automatic imitation,” PLoS ONE, 2013.
[13] B. P. Jolley, J. M. Borg, and A. Channon, “Analysis of social learning
strategies when discovering and maintaining behaviours inaccessible to
incremental genetic evolution,” in Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), vol. 9825 LNCS, 2016, pp. 293–304.
[14] C. M¨
uller-Schloer, “Organic Computing – On the Feasibility of Con-
trolled Emergence,” Proceedings of the 2nd IEEE/ACM/IFIP interna-
tional conference on Hardware/software codesign and system synthesis,
2004.
[15] M. Weber, G. Roth, and C. Wittich, Economy and society : an outline
of interpretive sociology, G. Roth and C. Wittich, Eds. University of
California Press, 1978.
[16] M. Dorigo, V. Maniezzo, and A. Colorni, “Ant system: optimization by
a colony of cooperating agents,” IEEE Transactions on Systems, Man
and Cybernetics, Part B (Cybernetics), vol. 26, no. 1, pp. 29–41, 1996.
[17] D. Karaboga and B. Basturk, “A powerful and efficient algorithm for
numerical function optimization: artificial bee colony (ABC) algorithm,”
vol. 39, pp. 459–471, 2007.
[18] M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo, “Swarm
robotics: a review from the swarm engineering perspective,” Swarm
Intelligence, vol. 7, no. 1, pp. 1–41, 3 2013.
[19] F. Mondada, L. Maria Gambardella, D. Floreano, and M. Dorigo,
“SWARM-BOTS: Physical Interactions in Collective Robotics,” Tech.
Rep. [Online]. Available: www.swarm-bots.org
[20] J. Pitt, J. Schaumeier, and A. Artikis, “The Axiomatisation of Socio-
Economic Principles for Self-Organising Systems,” in 2011 IEEE Fifth
International Conference on Self-Adaptive and Self-Organizing Systems.
IEEE, 10 2011, pp. 138–147.
[21] C. Castelfranchi, “Modelling social action for AI agents,” Artificial
Intelligence, vol. 103, no. 1-2, pp. 157–182, 8 1998.
[22] R. Boyd and P. J. Richerson, “Culture and the evolution of human
cooperation,” 2009.
[23] R. Axelrod and W. D. Hamilton, “The evolution of cooperation,” Science
(New York, N.Y.), vol. 211, no. 4489, pp. 1390–6, 3 1981.
[24] M. Doebeli and C. Hauert, “Models of cooperation based on the
Prisoner’s Dilemma and the Snowdrift game,” pp. 748–766, 2005.
[25] P. R. Lewis, A. Chandra, and K. Glette, “Self-awareness and Self-
expression: Inspiration from Psychology,” in Self-aware Computing
Systems, 1st ed., P. R. Lewis, M. Platzner, B. Rinner, J. Torresen, and
X. Yao, Eds. Springer International Publishing, 2016, ch. 2, pp. 9–21.
[26] S. Kounev, P. Lewis, K. L. Bellman, N. Bencomo, J. Camara, A. Di-
aconescu, L. Esterle, K. Geihs, H. Giese, S. G¨
otz, P. Inverardi, J. O.
Kephart, and A. Zisman, “The Notion of Self-aware Computing,” in
Self-Aware Computing Systems, 2017, pp. 3–16.
[27] C. Y. Huang, S. W. Wang, and C. T. Sun, “Self-aware intelligent
agents in the prisoner’s dilemma,” in Proceedings - 2011 International
Conference on Future Computer Sciences and Application, ICFCSA
2011, 2011, pp. 127–131.
[28] J. X. Wang, E. Hughes, C. Fernando, W. M. Czarnecki,
E. A. Du´
e˜
nez-Guzm´
an, and J. Z. Leibo, “Evolving intrinsic
motivations for altruistic behavior,” Tech. Rep. [Online]. Available:
https://arxiv.org/pdf/1811.05931.pdf
[29] S. Ossowski and A. Garcia-Serrano, “Social structure in artificial agent
societies: Implications for autonomous problem-solving agents,” Intelli-
gent Agents V: Agent Theories, Architectures, and Languages, 1999.
[30] P. Sztompka, Socjologia. Krak´
ow: Znak, 2002.
[31] J. A. Simpson and D. T. Kenrick, Evolutionary social psychology.
Lawrence Erlbaum Associates, 1997.
[32] E. Robinson, T. Ellis, and A. Channon, “Neuroevolution of agents
capable of reactive and deliberative behaviours in novel and dynamic
environments,” in Advances in Artificial Life. Springer, 2007, pp. 1–
10.
[33] J. Borg, A. Channon, and C. Day, “Discovering and Maintaining
Behaviours Inaccessible to Incremental Genetic Evolution Through
Transcription Errors and Cultural Transmission,. . . on the Synthesis
and Simulation of . . . , 2011.
[34] A. Stanton and A. Channon, “Incremental Neuroevolution of Reactive
and Deliberative 3D Agents,European Conference on Artificial Life
(ECAL), pp. 341–348, 2015.
[35] S. X. Yang and M. Meng, “An efficient neural network approach to
dynamic robot motion planning,” Neural Networks, vol. 13, no. 2, pp.
143–148, 2000.
[36] ——, “An efficient neural network method for real-time motion planning
with safety consideration,” Robotics and Autonomous Systems, 2000.
[37] S. Grossberg, “Nonlinear neural networks: Principles, mechanisms, and
architectures,” 1988.
[38] K. Mogielski and T. Płatkowski, “A mechanism of dynamical interac-
tions for two-person social dilemmas,” Journal of Theoretical Biology,
2009.
[39] J. Z. Leibo, V. Zambaldi, M. Lanctot, J. Marecki, and T. Graepel, “Multi-
agent Reinforcement Learning in Sequential Social Dilemmas,” Pro-
ceedings of the 16th Conference on Autonomous Agents and MultiAgent
Systems, 2017.
[40] P. A. Van Lange, J. Joireman, C. D. Parks, and E. Van Dijk, “The
psychology of social dilemmas: A review,Organizational Behavior and
Human Decision Processes, 2013.
[41] K. Sigmund and M. A. Nowak, “Evolutionary game theory,Current
Biology, 1999.
[42] R. M. Dawes, “Social Dilemmas,” Annual Review of Psychology, 1980.
[43] C. M. Barnes, L. Esterle, and J. N. A. Brown, ““when you believe
in things that you don’t understand”: the effect of cross-generational
habits on self-improving system integration,” in Proceedings of the
IEEE International Workshops on Foundations and Applications of Self*
Systems (FAS*W), 2019.
... Neuroevolution is the process of evolving ANNs with an evolutionary algorithm in accordance to a fitness function [7]; a population of individuals is evolved with mutations and/or crossover over many generations [8]. Many applications of neuroevolution focus on evolving weights of ANNs [8]- [11], however more complex approaches that evolve both the weights and topologies of ANNs exist [12], [13]. This process of evolving ANNs by adjusting connection weights over time to encode new information can result in a degradation of performance and catastrophic forgetting when learning new tasks or experiencing novel environmental contexts [8], [14]- [16]; learnt knowledge must be changed -and is often lost -in order to learn new things and express new behaviours [8]. ...
... This process of evolving ANNs by adjusting connection weights over time to encode new information can result in a degradation of performance and catastrophic forgetting when learning new tasks or experiencing novel environmental contexts [8], [14]- [16]; learnt knowledge must be changed -and is often lost -in order to learn new things and express new behaviours [8]. Learning complex, sequential or multi-stage tasks is also made difficult as complete information about the environment -including the available actions, their cues and their consequences -is not usually accessible [1], [17]; this is also evident when environments are shared, as the actions of individuals change the context of the environment for others [11]. ...
... We explore how activity-gating neuromodulation may help neural controllers to overcome the challenges associated with learning multi-stage and gamified tasks, without a priori knowledge of the task or environment. ANNs are just one example of an agent controller in which behaviour can be learnt; we use ANNs in this context in line with previous River Crossing testbeds [9]- [11], to explore how ANNs make decisions in social environments. Here, a multi-stage task is defined as one that an agent must learn, and pass, through multiple states, and perform different behaviours in different contexts in order to achieve their goal; this definition is inspired by [17]. ...
Conference Paper
Full-text available
Neural networks have been widely used in agent learning architectures; however, learning multiple context-dependent tasks simultaneously or sequentially is problematic when using them. Behavioural plasticity enables humans and animals alike to respond to changes in context and environmental stimuli, without degrading learnt knowledge; this can be achieved by regulating behaviour with neuromodulation – a biological process found in the brain. We demonstrate that modulating activity-propagating signals when evolving neural networks enables agents to learn context-dependent and multi-stage tasks more easily. Further, we show that this benefit is preserved when agents occupy an environment shared with other neuromodulated agents. Additionally we show that neuromodulation helps agents that have evolved alone to adapt to changes in environmental stimuli when they continue to evolve in a shared environment.
... The experiments use the River Crossing Dilemma as a testbed [4], which was designed to explore arbitrarily complex problems in shared environments. Firstly we explore the impact that interference has on agents that are able to achieve individual goals alone, to assess how learnt knowledge is maintained. ...
... This work extends that of [4], and explores the volatility and maintenance of goal-achieving behaviour in agents that experience interference when they have no models of others in the environment, as a prerequisite to social awareness in systems. Specifically, we conduct further experiments to explore and generalise the effects that both goal-rational and traditional action, as well as random action, have on expected fitness and volatility in learning on a broader scale. ...
... An important distinction between this work and that of the work it is extended from is a change in terminology. [4] explored the instability in evolution caused by interference, whereas in this work we investigate the volatility in evolution as a result of interference. This change in terminology was influenced by the additional statistical analysis that has been conducted in this work; three newly defined metrics have been proposed and used to capture different elements of volatility in evolution, which were inspired and adapted from the established historical volatility metric used in financial modelling and volatility forecasting [67]. ...
Article
Full-text available
Systems that pursue their own goals in shared environments can indirectly affect one another in unanticipated ways, such that the actions of other systems can interfere with goal-achievement. As humans have evolved to achieve goals despite interference from others in society, we thus endow socially situated agents with the capacity for social action as a means of mitigating interference in co-existing systems. We demonstrate that behavioural and evolutionary volatility caused by indirect interactions of goal-rational agents can be reduced by designing agents in a more socially-sensitive manner. We therefore challenge the assumption that designers of intelligent systems typically make, that goal-rationality is sufficient for achieving goals in shared environments.
... The River Crossing Task (RCT) was originally developed to explore how agents learn to solve increasingly complex tasks (Robinson et al., 2007); initially this was done using a novel neural network architecture that separated reactive from deliberative processes. Since then, the testbed has been extended to explore the effect of social learning strategies (Jolley et al., 2016), learning by imitation (Borg et al., 2011), and social action (Barnes et al., 2019), as well as how agents learn in a 3D world (Stanton and Channon, 2015). One of the difficulties of the RCT and its extensions is that agents must learn sub-tasks in order to achieve their goal -specifically they must learn to build a bridge to cross a river safely to access their reward object, without any prior knowledge of the task or environment. ...
... Current strategies to solve the RC+ task include learning by imitation through transcription errors and cultural transmission in the RC+ environment (Borg et al., 2011), and by employing teacher-learner social learning strategies (Jolley et al., 2016). Other extensions include the 3D River Crossing (3D RC) Task used to evolve 3D virtual agents in a 3D world (Stanton and Channon, 2015), and the River Crossing Dilemma (RCD), to explore how agents coevolve to pursue individual goals in a shared environment (Barnes et al., 2019). ...
... The offspring is a mutation of the parent, which it always replaces. Figure 4: The base RCT environment inspired by Robinson et al. (2007), adapted from Barnes et al. (2019). The goal of the agent is to collect the Resource object on the opposite side of the river; to do this, it must learn to build a bridge with a Stone by placing it in the Water. ...
Conference Paper
Full-text available
Evolving agents to learn how to solve complex, multi-stage tasks to achieve a goal is a challenging problem. Problems such as the River Crossing Task are used to explore how these agents evolve and what they learn, but it is still often difficult to explain why agents behave in the way they do. We present the Minimal River Crossing (RC-) Task testbed, designed to reduce the complexity of the original River Crossing Task while keeping its essential components, such that the fundamental learning challenges it presents can be understood in more detail. Specifically to illustrate this, we demonstrate that the RC- environment can be used to investigate the effect that a cost to movement has on agent evolution and learning, and more importantly that the findings obtained as a result can be generalised back to the original River Crossing Task.
Chapter
Autonomous agents are able to work toward their given goal and interact with their environment and other systems without the immediate help from humans. To accomplish this, they need to understand their environment, their goals, the task at hand as well as other agents in their vicinity. This makes deep learning an essential factor among the capabilities required for autonomous agents. In this chapter, we will dive into the specific characteristics and concomitant challenges of deep learning for autonomous agents and multiagent systems. Specifically in real-world situations, where autonomous agents operate and control physical systems, several challenges come together. We will discuss various aspects and approaches of deep learning that can be utilized by autonomous agents. Finally, we present different open challenges to be tackled to improve the abilities of autonomous agents.
Article
The size of sensor networks supporting smart cities is ever increasing. Sensor network resiliency becomes vital for critical networks such as emergency response and waste water treatment. One approach is to engineer “self-aware” sensors that can proactively change their component composition in response to changes in work load when critical devices fail. By extension, these devices could anticipate their own termination, such as battery depletion, and offload current tasks onto connected devices. These neighboring devices can then reconfigure themselves to process these tasks, thus avoiding catastrophic network failure. In this article, we compare and contrast two types of self-aware sensors. One set uses Q-learning to develop a policy that guides device reaction to various environmental stimuli, whereas the others use a set of shallow neural networks to select an appropriate reaction. The novelty lies in the use of field programmable gate arrays embedded on the sensors that take into account internal system state, configuration, and learned state-action pairs, which guide device decisions to meet system demands. Experiments show that even relatively simple reward functions develop both Q-learning policies and shallow neural networks that yield positive device behaviors in dynamic environments.
Conference Paper
The size of sensor networks supporting smart cities is ever increasing. Sensor network resiliency becomes vital for critical networks such as emergency response and waste water treatment. One approach is to engineer 'self-aware' sensors that can proactively change their component composition in response to changes in work load when critical devices fail. By extension, these devices could anticipate their own termination, such as battery depletion, and offload current tasks onto connected devices. These neighboring devices can then reconfigure themselves to process these tasks, thus avoiding catastrophic network failure. In this article, we present an array of self-aware sensors who use Q-learning to develop a policy that guides device reaction to various environmental stimuli. The novelty lies in the use of field programmable gate arrays embedded on the sensors that take into account internal system state, configuration, and learned state-action pairs, that guide device decisions in order to meet system demands. Experiments show that even relatively simple reward functions develop Q-learning policies that yield positive device behaviors in dynamic environments.
Conference Paper
Full-text available
It has been demonstrated that social learning can enable agents to discover and maintain behaviours that are inaccessible to incremental genetic evolution alone. However, previous models investigating the ability of social learning to provide access to these inaccessible behaviours are often limited. Here we investigate teacher-learner social learning strategies. It is often the case that teachers in teacher-learner social learning models are restricted to one type of agent, be it a parent or some fit individual; here we broaden this exploration to include a variety of teachers to investigate whether these social learning strategies are also able to demonstrate access to, and maintenance of, behaviours inaccessible to incremental genetic evolution. In this work new agents learn from either a parent, the fittest individual, the oldest individual, a random individual or another young agent. Agents are tasked with solving a river crossing task, with new agents learning from a teacher in mock evaluations. The behaviour necessary to successfully complete the most difficult version of the task has been shown to be inaccessible to incremental genetic evolution alone, but achievable using a combination of social learning and noise in the Genotype-Phenotype map. Here we show that this result is robust in all of the teacher-learner social learning strategies explored here.
Chapter
Full-text available
Self-awareness concepts from psychology are inspiring new approaches for engineering computing systems which operate in complex dynamic environments. There has been a broad and long-standing interest in self-awareness for computing, but only recently has a systematic understanding of self-awareness and how it can be used and evaluated been developed. In this chapter, we take inspiration from human self-awareness to develop new notions of computational self-awareness and self-expression. We translate concepts from psychology to the domain of computing, introducing key ideas in self-aware computing. In doing so, this chapter therefore paves the way for subsequent work in this book.
Article
In human society, individuals have long voluntarily organized themselves in groups, which embody, provide and/or facilitate a range of different social concepts, such as governance, justice, or mutual aid. These social groups vary in form, size, and permanence, but in different ways provide benefits to their members. In turn, members of these groups use their understanding and awareness of group expectations to help determine their own actions, to the benefit of themselves, each other, and the health of the group.
Conference Paper
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
Article
Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.
Chapter
We define the notion of “self-aware computing” and the relationship of this term to related terms such as autonomic computing, self-management, and similar. The need for a new definition, driven by trends that are only partially addressed by existing areas of research, is motivated. The semantics of the provided definition are discussed in detail examining the selected wording and explaining its meaning to avoid misleading interpretations. This chapter also provides an overview of the existing usage of the term self-aware computing, respectively self-awareness, in related past projects and initiatives.