Available via license: CC BY 4.0
Content may be subject to copyright.
Understanding Emergent Behaviours in Multi-Agent Systems
with Evolutionary Game Theory
The Anh Han1,?
1School of Computing, Engineering and Digital Technologies, Teesside University, Middlesbrough, UK TS1 3BA
?Corresponding: The Anh Han (t.han@tees.ac.uk)
1
arXiv:2205.07369v1 [cs.AI] 15 May 2022
2
Abstract
The mechanisms of emergence and evolution of collective behaviours in dynamical Multi-
Agent Systems (MAS) of multiple interacting agents, with diverse behavioral strategies in co-
presence, have been undergoing mathematical study via Evolutionary Game Theory (EGT).
Their systematic study also resorts to agent-based modelling and simulation (ABM) tech-
niques, thus enabling the study of aforesaid mechanisms under a variety of conditions, param-
eters, and alternative virtual games. This paper summarises some main research directions
and challenges tackled in our group, using methods from EGT and ABM. These range from
the introduction of cognitive and emotional mechanisms into agents’ implementation in an
evolving MAS, to the cost-efficient interference for promoting prosocial behaviours in com-
plex networks, to the regulation and governance of AI safety development ecology, and to the
equilibrium analysis of random evolutionary multi-player games. This brief aims to sensitize
the reader to EGT based issues, results and prospects, which are accruing in importance
for the modeling of minds with machines and the engineering of prosocial behaviours in dy-
namical MAS, with impact on our understanding of the emergence and stability of collective
behaviours. In all cases, important open problems in MAS research as viewed or prioritised
by the group are described.
Keywords: Evolutionary Game Theory, Emergent Behaviours, Collective Behaviours, Co-
operation, AI Regulation, Agent-based Modelling
3
1 Introduction
The problem of promoting the emergence and stability of diverse collective behaviours in popu-
lations of abstract individuals have been undergoing mathematical study via Evolutionary Game
Theory (EGT) (Axelrod,1984,Hardin,1968,Nowak,2006a,Sigmund,2010). They have at-
tracted sustained attention from fields as diverse as Multi-Agent Systems (MAS), Economics,
Biology and Social Sciences.
Since its inception, EGT has become a powerful mathematical framework for the modelling
and analysis of complex, dynamical MAS (Han et al.,2017,Hofbauer and Sigmund,1998,Paiva
et al.,2018), in biological, social contexts as well as computerised systems (Han,2013,Perc
et al.,2017,Pereira et al.,2021,Santos et al.,2020,Tuyls and Parsons,2007). It has been
used widely and successfully to study numerous important and challenging questions faced by
many disciplines and societies, such as: what are the mechanisms underlying the evolution of
cooperative behaviour at various levels of organisation (from genes to human society) (Nowak,
2006b,Perc et al.,2017)? How to mitigate existential risks such as those posed by climate
change (Santos and Pacheco,2011,Santos et al.,2020) or advanced Artificial Intelligence (AI)
technologies (Han et al.,2020)? What are the roles of cognition and emotions in behavioural
evolution (Han,2013,Pereira et al.,2017)?
In EGT, the payoff agents obtained from interacting with other agents in the system, is
interpreted as individual fitness. An important property in EGT is the so called frequency
dependent selection, where the fitness of an individual does not only depend on its strategy,
but also on the composition of the population in relation with (multiple) other strategies, nat-
urally leading to a dynamical approach. EGT was originated back in 1973 with John Maynard
Smith and George R. Price’s formalisation of animal contests, extending classical game theory
(Von Neumann and Morgenstern,1944) to provide the mathematical criteria that can be used to
predict the evolutionary outcomes of interaction among competing strategies (Maynard Smith
and Price,1973).
Diverse mathematical approaches and methods for analysing EGT models have been devel-
oped over the years. These include continuous approaches such as replicator equations which
assume large population limits, unusually requiring analysis of systems of differential equations
(Hofbauer and Sigmund,1998). In finite size systems, stochastic approaches are required and
usually approximations methods are needed (e.g. using mean-field analysis). Here computer sim-
ulations and methods from statistical physics, such as Monte Carlo simulations, are very useful
to analyse highly complex systems where analytical results are hard to achieve, for example,
when populations are distributed on complex networks (Perc et al.,2017).
Resorting to these complementary approaches, we have been tackling a number of funda-
4
mental research challenges in MAS and collective behaviour studies. First, we initiated the
introduction in an evolving MAS, of cognitive and emotional capabilities, as inspired on tech-
niques and theories of AI, namely those pertaining to intention recognition, commitment, trust
and guilt (Section 2). Second, we initiated baseline EGT models that capture the complex
ecology of choices associated with a competition for domain supremacy using AI technology,
allowing ones to formally explore various regulatory proposals of AI development (Section 3).
Third, we introduced cost-efficient optimisation problems in dynamical MAS on complex net-
works in order to capture various stochastic aspects of MAS interactions and evolution over time
(Section 4). Finally, we described our extensive analysis of statistical properties of the number
of (stable) equilibria in a random evolutionary multi-player game, providing insights on generic
equilibrium properties of large, dynamical MAS (Section 5).
2 Cognitive and Emotional Mechanisms for Promoting Proso-
cial Behaviours in dynamical MAS
In its simplest form, a cooperative act is metaphorically described as the act of paying a cost
to convey a benefit to someone else. If two players simultaneously decide to cooperate or not,
the best possible response will be to try to receive the benefit without paying the cost. In an
evolutionary setting, we may also wonder why would natural selection equip selfish individuals
with altruistic tendencies while it incites competition between individuals and thus apparently
rewards only selfish behavior? Several mechanisms responsible for promoting cooperative behav-
ior have been recently identified, ranging from kin and group ties, to different forms of reciprocity
and networked populations (see surveys in (Nowak,2006b,Perc et al.,2017,Sigmund,2010)).
Moreover, more complex strategies based on the evaluation of interactions between third
parties allow the emergence of kinds of cooperation that are immune to exploitation because
then interactions are channelled to just those who cooperate. Questions of justice and trust, with
their negative (punishment) and positive (help) incentives, are fundamental in games with large
diversified groups of individuals gifted with intention recognition capabilities. In allowing them
to choose amongst distinct behaviours based on suggestive information about the intentions of
their interaction partners—these in turn influenced by the behaviour of the individual himself—
individuals are also influenced by their tolerance to error or noise in the communication. One
hopes that, to start with, understanding these capabilities can be transformed into mechanisms
for spontaneous organization and control of swarms of autonomous robotic agents (Bonabeau
et al.,1999), these being envisaged as large populations of agents where cooperation can emerge,
but not necessarily to solve a priory given goals, as in distributed MAS.
5
Intention recognition The ability of recognizing (or reading) intentions of others has been
observed and shown to play an important role in many cooperative interactions, both in humans
and primates (Meltzoff,2005,Tomasello,2008). Intentions and their recognition play a central
role in morality, as shown for example in studies on the doctrines of double and triple effect
(Hauser,2007). However, most studies on the evolution of cooperation, grounded on evolution-
ary dynamics and game theory, have neglected the important role played by a basic form of
intention recognition in behavioral evolution. In our work (Han et al.,2011a,b,2012a,2015a),
we have addressed explicitly this issue, characterizing the dynamics emerging from a population
of intention recognizers.
Intention recognition was implemented using Bayesian Network probabilistic methods, tak-
ing into account the information of current signals of intent, as well as the mutual trust and
tolerance accumulated from previous one-on-one play experience – including how an agent’s
previous defections may influence another agent’s intent – but without resorting to informa-
tion gathered regarding agents’ overall reputation in the population. A player’s present intent
can be understood here as how he is going to behave in the next round, whether by cooperat-
ing or defecting. Intention recognition can also be learned from a corpus of prior interactions
among game strategies, where each strategy can be envisaged and detected as players’ (possibly
changing) intent to behave in a certain way (Han,2013). In both cases, we experimented with
populations with different proportions of diverse strategies in order to calculate, in particular,
what is the minimum fraction of individuals capable of intention recognition for cooperation to
emerge, invade, prevail, and persist.
It is noteworthy that intention recognition techniques have been studied actively in AI for
several decades (Charniak and Goldman,1993,Han and Pereira,2013c,Sadri,2011,Sukthankar
et al.,2014), with several applications such as for improving human-computer interactions,
assisting living, moral reasoning, and team work (Han and Pereira,2013b,Roy et al.,2007).
In most of these applications the agents engage in repeated interactions with each other. Our
results suggest that equipping the agents with an ability to recognize intentions of others can
improve their cooperation and reduce misunderstanding that can result from noise and mistakes.
The notions of intentions used in our two models have been specialized for the concrete
game-theoretical contexts in place. A more general definition, e.g. as described in Bratman’s
seminal work (Bratman,1987), can accommodate for intention changes or even abandonment.
For instance, a player can change his intention or strategy before the next interaction or round
of game takes place. That aspect was not considered in our EGT models, as players can change
their strategy only at the end of a generation (Sigmund et al.,2010). We envisage a conve-
nient extension to that direction since our intention recognition methods—performed through
Bayesian Network inference techniques, as described in (Han and Pereira,2013a)—can cope
6
with intention changes and abandonment.
Commitments Agents make commitments towards others when they give up options in order
to influence others. Most commitments depend on some incentive that is necessary to ensure
that an action (or even an intention) is in the agent’s interest and thus will be carried out in the
future (Gintis,2000,Nesse,2001). Asking for prior commitments can just be used as a strategy
to clarify the intentions of others, whilst at the same time manifesting our own (Han et al.,
2015a). All parties then clearly know to what they commit and can refuse such a commitment
whenever the offer is made (Han et al.,2017).
In our work (Han et al.,2017,Han,2022,Han et al.,2012b,2013a), we investigate analytically
and numerically whether costly commitment strategies, in which players propose, initiate and
honor a deal, are viable strategies for the evolution of cooperative behavior. Resorting the EGT
mathematical analysis and simulations, we have shown that when the cost of arranging a com-
mitment is justified with respect to the benefit of cooperation, substantial levels of cooperation
can be achieved, especially when one insists on sharing the arrangement cost (Han et al.,2013a).
On the one hand, such commitment proposers can get rid of fake committers by proposing a
strong enough compensation cost. On the other hand, they can maintain a sufficient advantage
over the commitment free-riders, because a commitment proposer will cooperate with players
alike herself, while the latter defect among themselves. We have also compared the commitment
strategy with the simple costly punishment strategy—an important one for the evolution of
cooperation (Fehr and Gachter,2002)—where no prior agreements are made. The results show
that the first strategy leads to a higher level of cooperation than the latter one. Furthermore,
we have shown that these observations regarding evolutionary viability of commitment-based
strategies are also applicable to other collective behaviours including coordination (Ogbo et al.,
2021) and AI safety (Han et al.,2022).
It is noteworthy that there has been an extensive literature of AI and MAS research on
commitment, e.g., (Castelfranchi and Falcone,2010,Chopra and Singh,2009,Singh,2013,
Wooldridge and Jennings,1999), but the main concern therein is how to formalize different
aspects of commitment and how a commitment mechanism can be implemented in multi-agent
interactions to enhance these (e.g. for improved collaborative problem solving (Wooldridge and
Jennings,1999)), especially in the context of classical game theory. Our work would provide
insights into the design of MAS that rely on commitments or punishment in order to promote
collective behaviour among agents (Hasan and Raja,2013).
Trust in autonomous systems (hybrid MAS) The actions of intelligent agents, such as
chatbots, recommender systems, and virtual assistants are typically not fully transparent to the
7
user (Beldad et al.,2016). Consequently, users take the risk that such agents act in ways opposed
to the users’ preferences or goals (Dhakal et al.,2022,Luhmann,1979). It is often argued that
people use trust as a cognitive shortcut to reduce the complexity of such interactions.
In our recent work (Han et al.,2021b), we study this by using EGT modelling to examine the
viability of trust-based strategies in the context of an iterated prisoner’s dilemma game (Sigmund
et al.,2010). We show that these strategies can reduce the opportunity cost of verifying whether
the action of their co-player was actually cooperative, and out-compete strategies that are always
conditional, such as Tit-for-Tat. We argue that the opportunity cost of checking the action of the
co-player is likely to be greater when the interaction is between people and intelligent artificial
agents, because of the reduced transparency of the agent.
In our work, trust-based strategies are reciprocal strategies that cooperate as long as the
other player is observed to be cooperating. Unlike classic reciprocal strategies, once mutual
cooperation has been observed for a threshold number of rounds they stop checking their co-
player’s behaviour every round, and instead only check it with some probability. By doing so,
they reduce the opportunity cost of verifying whether the action of their co-player was actually
cooperative. We demonstrate that these trust-based strategies can out-compete strategies that
are always conditional, such as Tit-for-Tat, when the opportunity cost is non-negligible.
We argue that this cost is likely to be greater when the interaction is between people and
intelligent agents within a hybrid MAS, since the interaction becomes less transparent to the user
(e.g. when it is done over the internet), and artificial agents have limited capacity to explain their
actions compared to humans (Pu and Chen,2007). Consequently, we expect people to use trust-
based strategies more frequently in interactions with intelligent agents. Our results provide new,
important insights into the design of mechanisms for facilitating interactions between humans
and intelligent agents, where trust is an essential factor.
Trust is a commonly observed mechanism in human interactions, and discussions on the role
of trust are being extended to social interactions between humans and intelligent machines (An-
dras et al.,2018). It is therefore important to understand how people behave when interacting
with those machines; particularly, whether and when they might exhibit trust behaviour towards
them? Answering this is crucial for designing mechanisms to facilitate human-intelligent ma-
chine interactions, e.g. in engineering pro-sociality in a hybrid society of humans and machines
(Paiva et al.,2018).
Guilt Machine ethics involving the capacity for artificial intelligence to act morally is an open
project for scientists and engineers (Pereira and Lopes,2020). One important concern is how
to represent emotions that are thought to modulate human moral behaviour, such as guilt, in
computational models. Upon introspection, guilt is present as a feeling of being worthy of blame
8
for a moral offence. Burdened with guilt, an agent may then act to restore a blameless internal
state in which this painful emotion is no longer present.
Inspired by psychological and evolutionary studies, we have constructed an EGT model
representing guilt in order to study its role in promoting pro-social behaviour (Pereira et al.,
2017). We modelled guilt in terms of two characteristics. First, guilt involves a record of
transgressions which is formalised by the number of offences. Second, guilt involves a threshold
over which the guilty agent must alleviate the associated emotional pain, in the case of our
models, through apology (Han et al.,2013b), and also by involving self-punishment (Pereira
et al.,2017), as required by the guilty feelings, both of which affect the payoff for the guilty
agent. With this work, we were able to show that cooperation does not emerge when agents
alleviate guilt without considering their co-players’ attitudes about the alleviation of guilt too.
In that case, guilt-prone agents are easily dominated by agents who do not express guilt or who
are without motivation to alleviate their own guilt. When, on the other hand, the tendency
to alleviate guilt is mutual, and the guilt-burdened agent alleviates guilt in interactions with
co-players who also act to alleviate guilt when similarly burdened, then cooperation thrives.
From a MAS perspective, including mixed social-technological communities encompassing
potentially autonomous artificial agents, and invoking the so-called “value alignment” prob-
lem (Gabriel,2020), our models confirm that conflicts can be avoided when morally salient
emotions, like guilt, help to guide participants toward acceptable behaviours. In this context,
systems involving possible future artificial moral agents may be designed to include guilt, so
as to align agent-level behaviour with human expectations, thereby resulting in overall social
benefits through improved cooperation, as evinced by our prospective work on modelling guilt.
3 Governance of Artificial Intelligence development: A Game-
Theoretical Approach
Rapid technological advancements in Artificial Intelligence (AI), together with the growing de-
ployment of AI in new application domains such as robotics, face recognition, self-driving cars,
genetics, are generating an anxiety which makes companies, nations and regions think they
should respond competitively (Cave and ´
Oh´
Eigeartaigh,2018,Lee,2018). AI appears for in-
stance to have instigated a race among chip builders, simply because of the requirements it
imposes on the technology. Governments are furthermore stimulating economic investments in
AI research and development as they fear of missing out, resulting in a racing narrative that
increases further the anxiety among stake-holders (Cave and ´
Oh´
Eigeartaigh,2018).
Races for supremacy in a domain through AI may however have detrimental consequences
since participants to the race may well ignore ethical and safety checks in order to speed up the
9
1
0
Frequency of unsafe behaviour
(I)
(II)
(III)
A
risk probability, pr
development speed, S
(I)
(II)
(III)
(I)
(II)
(III)
B C
Figure 1. Frequency of unsafe behaviour as a function of development speed (s) and the
disaster risk (pr). Panel A, in absence of incentives (Han et al.,2020), the parameter
space can be split into three regions. In regions (I) and (III), safe and unsafe/innovation,
respectively, are the preferred collective outcome also selected by natural selection, thus no
regulation being required. Region (II) requires regulation as safe behaviour is preferred but
not the one selected. Panel B, when unsafe behaviour is sanctioned unconditionally
(Han et al.,2021a), while unsafe behaviour is reduced in region II, over-regulation occurs in
region III, reducing beneficial innovation. Panel C, unsafe behaviour is sanctioned only
in presence of a voluntary commitment (Han et al.,2022), unsafe behaviour is
significantly reduced in region II while avoiding over-regulation.
development and reach the market first. AI researchers and governance bodies, such as the EU,
are urging to consider together both the normative and the social impact of major technological
advancements concerned (European Commission,2020). However, given the breadth and depth
of AI and its advances, it is not an easy task to assess when and which AI technology in a
concrete domain needs to be regulated. This issue was, among others, highlighted in the recent
EU White Paper on AI (European Commission,2020) and the UK National AI strategy.
Several proposals for mechanisms on how to avoid, mediate, or regulate the development
and deployment of AI (Cave and ´
Oh´
Eigeartaigh,2018,Cimpeanu et al.,2022,O’Keefe et al.,
2020). Essentially, regulatory measures such as restrictions and incentives are proposed to limit
harmful and risky practices in order to promote beneficial designs (Baum,2017). Examples
include financially supporting the research into beneficial AI and making AI companies pay
fines when found liable for the consequences of harmful AI.
Although such regulatory measures may provide solutions for particular scenarios, one needs
to ensure that they do not overshoot their targets, leading to a stifling of novel innovations,
hindering investments into the development into novel directions as they may be perceived to be
10
too risky (Hadfield,2017,Lee,2018,Wooldridge,2020). Worries have been expressed by different
organisations and academic societies that too strict policies may unnecessarily affect the benefits
and societal advances that novel AI technologies may have to offer. Regulations affect moreover
big and small tech companies differently: A highly regulated domain makes it more difficult for
small new start-ups, introducing an inequality and dominance of the market by a few big players
(Lee,2018). It has been emphasised that neither over-regulation nor a laissez-faire approach
suffices when aiming to regulate AI technologies. In order to find a balanced answer, one clearly
needs to have first an understanding of how a competitive development dynamic actually could
work and how governance choices impact this dynamic, a task well-suited for dynamic systems
or agent-based models.
In our recent work (Cimpeanu et al.,2022,Han et al.,2022,2021a,2020), we examine this
problem using EGT modelling, see Figure 1. We first developed a baseline model describing a de-
velopment competition where technologists can choose a safe (SAFE) vs risk-taking (UNSAFE)
course of development (Han et al.,2020). Namely, it considers that to reach domain supremacy
through AI in a certain domain, a number of development steps or technological advancement
rounds are required (Han et al.,2020). In each round the technologists (or players) need to
choose between one of two strategic options: to follow safety precautions (the SAFE action) or
ignore safety precautions (the UNSAFE action). Because it takes more time and more effort to
comply with precautionary requirements, playing SAFE is not just costlier, but implies slower
development speed too, compared to playing UNSAFE. Moreover, there is a probability that a
disaster occurs if UNSAFE developments take place during this competition (see (Han et al.,
2020) for a full description).
We demonstrate that unconditional sanctioning will negatively influence social welfare in
certain conditions of a short-term race towards domain supremacy through AI technology (Han
et al.,2021a), leading to over-regulation of beneficial innovation (see Figure 1B). Since data to
estimate the risk of a technology is usually limited (especially at an early stage of its development
or deployment), simple sanctioning of unsafe behaviour (or reward of safe behaviour) could not
fully address the issue.
To solve this critical over-regulation dilemma in AI development, we propose an alternative
approach, which is to allow technologists or race participants to voluntarily commit themselves
to safe innovation procedures, signaling to others their intentions (Han et al.,2015a,Nesse,
2001). Specifically, this bottom-up, binding agreement (or commitment) is established for those
who want to take a safe choice, with sanctioning applied to violators of such an agreement. It
is shown that, by allowing race participants to freely pledge their intentions and enter (or not)
in bilateral commitments to act safely and avoid risks, accepting thus to be sanctioned in case
of misbehavior, high levels of the most beneficial behaviour, for the whole, are achieved in all
11
regions of the parameter space, see Figure 1C.
In (Cimpeanu et al.,2022), we explore how different interaction structures among race par-
ticipants can alter collective choices and requirements for regulatory actions. We show when
participants portray a strong diversity in terms of connections and peer-influence (as modelled
through scale free networks), the conflicts that exist in homogeneous settings (Han et al.,2020)
are significantly reduced, thereby reducing the need for taking regulatory actions.
Overall, our results are directly relevant for the design of self-organized AI governance mech-
anisms and regulatory policies that aim to ensure an ethical and responsible AI technology
development process (such as the one proposed in the EU White Paper).
It is noteworthy that all our above-mentioned works focused on the binary extremes of race
participants’ behaviour, safe or unsafe development, in an effort to focus an already expansive
problem into a manageable discussion. The addition of conditional, mixed, or random strategies
could provide the basis for a novel piece of work. Also, as observed with conditionally safe players
in the well-mixed scenario, we envisage that these additions would show little to no effect in
the early regime, with the opposite being true for the late regime, at least in homogeneous
network settings. Finally, it would be important to clarify the combined effects of incentive
and/or commitment mechanisms within a complex network setting.
4 Cost-efficient Interference for Promoting Collective Behaviours
in Complex Networks
As discussed, the problem of promoting the evolution of collective behaviour within populations
of self-regarding individuals has been intensively investigated across diverse fields of behavioural,
social and computational sciences (Nowak,2006a,Perc et al.,2017). In most studies, a desired
collective behaviour such as cooperation, coordination, trust and fairness, are assumed to emerge
from the combined actions of participating individuals within the populations, without taking
into account the possibility of external interference and how it can be performed in a cost-
efficient way. However, in many scenarios, such behaviours are advocated and promoted by an
exogenous decision maker, who is not part of the system (e.g. the United Nation interferes in
political systems for conflict resolution or the World Wildlife Fund organisation interferes in
ecosystems to maintain biodiversity). Thus, a new set of heuristics capable of engineering a
desired collective behaviour in a self-organised MAS is required (Paiva et al.,2018).
In our work, we bridged this gap by employing EGT analysis and ABM simulations, to
study cost-efficient interference strategies for enhancing cooperation, fairness and AI safety in
the context of social games, for both well-mixed (Duong and Han,2021,Han and Tran-Thanh,
2018,Han et al.,2015b) and networked populations (including square lattice and scale free
12
networks) (Cimpeanu et al.,2019,2021,Han et al.,2018).
In a well-mixed population, each player interacts with all others in the population while in
a structured population the player interacts with its immediate neighbors. A player’s fitness
is its averaged payoff over all its interactions, which is then used for strategy update (through
social learning) (Sigmund et al.,2010). Namely, a player A with fitness fAchooses to copy
the strategy of a randomly selected player in the population (well-mixed) or randomly selected
neighbor (structured) with a probability given by the Fermi function, (1+ eβ(fA−fB))−1, where β
represents the intensity of selection (Traulsen et al.,2006). When β= 0 corresponds to neutral
drift while β→ ∞ leads to increasingly deterministic selection. Weak or even close to neutral
selections (small β) are abundant in nature, while the strong selection regime has been reported
as predominant in social settings. As an alternative to this stochastic update rule, one can also
consider a deterministic update in which agents copy, if advantageous, the most successful player
in their neighbourhood.
An interference strategy or scheme can be generally defined as a sequence of decisions about
which players in the population to invest in (i.e. reward the player an amount, denoted by θ),
in order to achieve the highest level of cooperation while minimising the total cost of invest-
ment. These decisions can be made by considering different aspects of the population such as its
global statistics and/or its structural properties. In the context of a well-mixed population, an
interference scheme solely depends on its composition (e.g., how many cooperators and defectors
players there are at the time of decision making). In this case, we have derived analytical con-
ditions for which a general interference scheme can guarantee a given level of desired behaviour
while at the same time minimising the total cost of investment (e.g., for rewarding cooperative
behaviours), and show that the results are highly sensitive to the intensity of selection by in-
terference (Duong and Han,2021,Han and Tran-Thanh,2018). Moreover, we have studied a
specific class of interference strategies that make investments whenever the number of players
with a desired behaviour (e.g., cooperative or fair players) reaches a certain threshold, denoted
by t(∀t∈ {1, . . . , N −1}), showing that there is a wide range of tthat it outperforms standard
institutional incentive strategies—which unconditionally interfere into the system regardless of
its composition, corresponding to t=N−1 (Chen et al.,2015).
With a structured population, individuals (even of the same strategy) might reside in differ-
ent kinds of neighborhood (with different cooperativeness levels), and therein local information
might be useful to enhance cost-efficiency and cooperation. To this end, we test several interfer-
ence paradigms (Cimpeanu et al.,2019,2021,Han et al.,2018) that make investment decisions
based on a player’s current level of desirable behaviour (e.g., the number of cooperators in the
neighborhood), and compared their efficiency with the population-based strategies (as in the
well-mixed case). Furthermore, in this context, we study the impact of social diversity, em-
13
ployed using heterogeneous networks of interaction (e.g. scale free networks). We show that
social diversity indeed substantially influences the choice of investment approaches available to
institutions. Investment is not trivial in these settings, contrary to the findings in well-mixed
and lattice populations (Cimpeanu et al.,2021). Counterintuitively, incentivising positive be-
haviour can lead to the exploitation of cooperators, harming pro-sociality in lieu of fostering it.
We observe that highly clustered scale-free networks make it easy to select the most effective
candidates for receiving endowments.
The interference mechanisms we have proposed thus far have advanced the literature on
external interference, but our contribution in this regard remains incipient. Future works include
analysis of more complex interference strategies such as those vary the cost of investment over
time or combine different forms of incentives (Chen et al.,2015,Han,2016). We might also resort
to more complex optimisation algorithms and control theory techniques, to solve the bi-objective
optimisation problem posed by cost-effective interference, therefore identifying solutions that are
robust across a wider range of MAS interaction settings.
5 Equilibrium Statistics in Random Evolutionary Games
Random evolutionary games, in which the payoff entries are random variables, have been em-
ployed extensively to model multi-agent strategic interactions in which very limited information
about them is available, or where the environment changes so rapidly and frequently that one
cannot describe the payoffs of their inhabitants’ interactions (Duong and Han,2016,Duong
et al.,2019,Gokhale and Traulsen,2010,Han et al.,2012c). Equilibrium points of such a
dynamical system are the compositions of strategy frequencies where all the strategies have
the same average accumulated payoff (or fitness). They thus predict potential co-existence of
different strategic behaviours in a dynamical MAS.
In random games, due to the randomness of the payoff entries, it is essential to study statis-
tical properties of equilibria. How to determine the distribution of internal equilibria in random
evolutionary games is an intensely investigated subject with numerous practical ramifications
in diverse fields including MAS, evolutionary biology, social sciences and economics, providing
essential understanding of complexity in a dynamical system, such as its behavioural, cultural
or biological diversity and the maintenance of polymorphism. Properties of equilibrium points,
particularly the probability of observing the maximal number of equilibrium points, the attain-
ability and stability of the patterns of evolutionarily stable strategies have been studied (Gokhale
and Traulsen,2010,Han et al.,2012c). However, as these prior works used a direct approach
that consists of solving a system of polynomial equations, the mathematical analysis was mostly
restricted to evolutionary games with a small number of players, due to the impossibility of
14
solving general polynomial equations of a high degree (according to Abel–Ruffini’s theorem).
In our work, we analyse random evolutionary games with an arbitrary number of players
(Duong and Han,2016,Duong et al.,2019). The key technique that we develop is to connect
the number of equilibria in an evolutionary game to the number of real roots of a system of
multi-variate random polynomials (Edelman and Kostlan,1995). Assuming that we consider
d-player n-strategy evolutionary games, then the system consists of n−1 polynomial equations
of degree d−1:
X
0≤k1,...,kn−1≤d−1,
n−1
P
i=1
ki≤d−1
βi
k1,...,kn−1 d−1
k1, ..., kn!n−1
Y
i=1
yki
i= 0,
for i= 1, . . . , n −1. Here βi
k1,...,kn−1:= αi
k1,...,kn−αn
k1,...,knwhere αi0
k1,...,kn:= αi0
i1,...,id−1is the
payoff of the focal player and ki, 1 ≤i≤n, with Pn
i=1 ki=d−1, is the number of players
using strategy iin {i1, . . . , id−1}. In (Duong and Han,2016), we analyze the mean number
E(n, d) and the expected density f(n, d) of internal equilibria in a general d-player n-strategy
evolutionary game when the individuals’ payoffs are independent, normally distributed. We
provide computationally implementable formulas of these quantities for the general case and
characterize their asymptotic behaviour for the two-strategy games (i.e. E(2, d) and f(2, d)),
estimating their lower and upper bounds as dincreases. For instance, under certain assumptions
on the payoffs, we obtain
•Asymptotic behaviour of E(2, d):
√d−1.E(2, d).√d−1 ln(d−1).
As a consequence, lim
d→∞
ln E(2,d)
ln(d−1) =1
2.
•Explicit formula of E(n, 2): E(n, 2) = 1
2n−1.
For a general d-player n-strategy game, as supported by extensive numerical results, we describe
a conjecture regarding the asymptotic behaviours of E(n, d) and f(n, d). We also show that the
probability of seeing the maximal possible number of equilibria tends to zero when dor n
respectively goes to infinity and that the expected number of stable equilibria is bounded within
a certain interval.
In (Duong et al.,2018), we generalize our analysis for random evolutionary games where the
payoff matrix entries are correlated random variables. In social and biological contexts, correla-
tions may arise in various scenarios particularly when there are environmental randomness and
interaction uncertainty such as when individual contributions are correlated to the surrounding
15
contexts (e.g. due to limited resource). We establish a closed formula for the mean numbers of
internal (stable) equilibria and characterize the asymptotic behaviour of this important quantity
for large group sizes and study the effect of the correlation. The results show that decreasing
the correlation among payoffs (namely, of a strategist for different group compositions) leads to
larger mean numbers of (stable) equilibrium points, suggesting that the system or population
behavioral diversity can be promoted by increasing independence of the payoff entries.
As a further development, in (Duong et al.,2019) we derive a closed formula for the distribu-
tion of internal equilibria, for both normal and uniform distributions of the game payoff entries.
We also provide several universal upper and lower bound estimates, which are independent of
the underlying payoff distribution, for the probability of obtaining a certain number of internal
equilibria. The distribution of equilibria provides more elaborate information about the level
of complexity or the number of different states of biodiversity that will occur in a dynamical
system, compared to what obtained with the expected number of internal equilibria.
In short, by connecting EGT to random polynomial theory, we have achieved new results on
the expected number and distribution of internal equilibria in multi-player multi-strategy games.
Our studies provide new insights into the overall complexity of dynamical MAS, as the numbers
of players and strategies in a MAS interaction increase. As the theory of random polynomials
is rich, we expect that our novel approach can be extended to obtain results for other more
complex models in population and learning dynamics such as the replicator-mutator equation
and evolutionary games with environmental feedback.
6 Concluding Remarks
Since its inception, research in EGT has been both exciting and challenging, as well as highly
rewarding and inspiring. Analysing large-scale dynamical MAS of multiple interacting agents,
with diverse strategic behaviours in co-presence, is highly complex and requires innovative com-
binations of diverse mathematical methods and simulation techniques (Perc et al.,2017). On
the other hand, EGT research has proven extremely powerful and found its applications in so
many fields, leading to important findings often being reported in prestigious venues.
Despite the many years of active research, there are still a number of important open prob-
lems in EGT research. The problem of explaining the mechanisms of collective behaviour such as
cooperation and altruism, is still far from settled (Pennisi,2005). Also, extending and generalis-
ing the existing set of mathematical techniques and modelling tools to capture and understand
realistic and ever more complex systems, is crucial. For example, modern societies become in-
creasingly more convoluted with the advancement of technologies, changing the way humans live
and interact with others. How EGT can be applied to model this hybrid society and understand
16
its dynamics is very challenging (Paiva et al.,2018). But solving it can prove very rewarding as
it can provide insights to design appropriate mechanisms to ensure the greatest benefit for our
societies, or at least to avoid existential risks that we might otherwise have to face.
7 Acknowledgement
I would like to thank my group members (both current and formal), Theodor Cimpenu, Marcus
Krellner, Cedric Perret, Bianca Ogbo, Zainab Alalawi, and Aiman Elgarig, and my collabora-
tors, Luis Moniz Pereira, Tom Lenaerts, Francisco C. Santos, Hong Duong, Long Tran-Thanh,
Alessandro Di Stefano, and Simon Powers, who have made all the works reported here possible.
T.A.H. is supported by Future of Life Institute AI safety grant (RFP2-154) and Leverhulme
Research Fellowship (RF-2020-603/9).
References
Andras, P., Esterle, L., Guckert, M., Han, T. A., Lewis, P. R., Milanovic, K., Payne, T., Perret,
C., Pitt, J., Powers, S. T., Urquhart, N., and Wells, S. (2018). Trusting Intelligent Machines:
Deepening Trust Within Socio-Technical Systems. IEEE Technology and Society Magazine,
37(4):76–83.
Axelrod, R. (1984). The Evolution of Cooperation. Basic Books, New York.
Baum, S. D. (2017). On the promotion of safe and socially beneficial artificial intelligence. AI
& Society, 32(4):543–551.
Beldad, A., Hegner, S., and Hoppen, J. (2016). The effect of virtual sales agent (VSA) gender
– product gender congruence on product advice credibility, trust in VSA and online vendor,
and purchase intention. Computers in Human Behavior, 60:62–72.
Bonabeau, E., Dorigo, M., and Theraulaz, G. (1999). Swarm Intelligence: From Natural to
Artificial Systems. Oxford University Press, USA.
Bratman, M. E. (1987). Intention, Plans, and Practical Reason. The David Hume Series, CSLI.
Castelfranchi, C. and Falcone, R. (2010). Trust Theory: A Socio-Cognitive and Computational
Model (Wiley Series in Agent Technology). Wiley.
Cave, S. and ´
Oh´
Eigeartaigh, S. (2018). An AI Race for Strategic Advantage: Rhetoric and
Risks. In AAAI/ACM Conference on Artificial Intelligence, Ethics and Society, pages 36–40.
17
Charniak, E. and Goldman, R. P. (1993). A Bayesian model of plan recognition. Artificial
Intelligence, 64(1):53–79.
Chen, X., Sasaki, T., Br¨annstr¨om, ˚
A., and Dieckmann, U. (2015). First carrot, then stick: how
the adaptive hybridization of incentives promotes cooperation. Journal of The Royal Society
Interface, 12(102):20140935.
Chopra, A. K. and Singh, M. P. (2009). Multiagent commitment alignment. In AAMAS’2009,
pages 937–944.
Cimpeanu, T., Han, T. A., and Santos, F. C. (2019). Exogenous Rewards for Promoting Coop-
eration in Scale-Free Networks. In ALIFE 2019, pages 316–323. MIT Press.
Cimpeanu, T., Perret, C., and Han, T. A. (2021). Cost-efficient interventions for promoting
fairness in the ultimatum game. Knowledge-Based Systems, 233:107545.
Cimpeanu, T., Santos, F. C., Pereira, L. M., Lenaerts, T., and Han, T. A. (2022). Artificial
intelligence development races in heterogeneous settings. Scientific Reports, 12(1):1–12.
Dhakal, S., Chiong, R., Chica, M., and Han, T. A. (2022). Evolution of cooperation and trust
in an n-player social dilemma game with tags for migration decisions. Royal Society Open
Science, 9(5):212000.
Duong, M. H. and Han, T. A. (2016). Analysis of the expected density of internal equilibria
in random evolutionary multi-player multi-strategy games. Journal of Mathematical Biology,
73(6):1727–1760.
Duong, M. H. and Han, T. A. (2021). Cost efficiency of institutional incentives for promoting
cooperation in finite populations. Proceedings of the Royal Society A, 477(2254):20210568.
Duong, M. H., Tran, H. M., and Han, T. A. (2018). On the expected number of internal
equilibria in random evolutionary games with correlated payoff matrix. Dynamic Games and
Applications.
Duong, M. H., Tran, H. M., and Han, T. A. (2019). On the distribution of the number of internal
equilibria in random evolutionary games. Journal of Mathematical Biology, 78(1):331–371.
Edelman, A. and Kostlan, E. (1995). How many zeros of a random polynomial are real? Bull.
Amer. Math. Soc. (N.S.), 32(1):1–37.
European Commission (2020). White paper on Artificial Intelligence – An European approach
to excellence and trust. Technical report, European Commission.
18
Fehr, E. and Gachter, S. (2002). Altruistic punishment in humans. Nature, 415:137–140.
Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–
437.
Gintis, H. (2000). Game Theory Evolving. Princeton University Press, Princeton.
Gokhale, C. S. and Traulsen, A. (2010). Evolutionary games in the multiverse. Proc. Natl. Acad.
Sci. U.S.A., 107(12):5500–5504.
Hadfield, G. K. (2017). Rules for a flat world: why humans invented law and how to reinvent it
for a complex global economy. Oxford University Press.
Han, T., Pereira, L. M., and Lenaerts, T. (2017). Evolution of commitment and level of partici-
pation in public goods games. Autonomous Agents and Multi-Agent Systems, 31(3):561–583.
Han, T. A. (2013). Intention Recognition, Commitments and Their Roles in the Evolution of
Cooperation: From Artificial Intelligence Techniques to Evolutionary Game Theory Models.
Springer.
Han, T. A. (2016). Emergence of social punishment and cooperation through prior commitments.
In AAAI, pages 2494–2500.
Han, T. A. (2022). Institutional incentives for the evolution of committed cooperation: ensuring
participation is as important as enhancing compliance. Journal of The Royal Society Interface,
19(188):20220036.
Han, T. A., Lenaerts, T., Santos, F. C., and Pereira, L. M. (2022). Voluntary safety commitments
provide an escape from over-regulation in ai development. Technology in Society, 68:101843.
Han, T. A., Lynch, S., Tran-Thanh, L., and Santos, F. C. (2018). Fostering cooperation in
structured populations through local and global interference strategies. In IJCAI-ECAI, pages
289–295. AAAI Press.
Han, T. A. and Pereira, L. M. (2013a). Context-dependent incremental decision making scruti-
nizing the intentions of others via bayesian network model construction. Intelligent Decision
Technologies, 7(4):293–317.
Han, T. A. and Pereira, L. M. (2013b). Intention-based decision making and its applications. In
Guesgen, H. and Marsland, S., editors, Human Behavior Recognition Technologies: Intelligent
Applications for Monitoring and Security, chapter Chapter 9, pages 174–211. IGI Global.
19
Han, T. A. and Pereira, L. M. (2013c). State-of-the-art of intention recognition and its use in
decision making – a research summary. AI Communication Journal, 26(2):237–246.
Han, T. A., Pereira, L. M., Lenaerts, T., and Santos, F. C. (2021a). Mediating Artificial Intelli-
gence Developments through Negative and Positive Incentives. PLOS ONE, 16(1):e0244592.
Han, T. A., Pereira, L. M., and Santos, F. C. (2011a). Intention recognition promotes the
emergence of cooperation. Adaptive Behavior, 19(3):264–279.
Han, T. A., Pereira, L. M., and Santos, F. C. (2011b). The role of intention recognition in the
evolution of cooperative behavior. In Walsh, T., editor, Proceedings of the 22nd international
joint conference on Artificial intelligence (IJCAI’2011), pages 1684–1689. AAAI.
Han, T. A., Pereira, L. M., and Santos, F. C. (2012a). Corpus-based intention recognition in
cooperation dilemmas. Artificial Life journal, 18(4):365–383.
Han, T. A., Pereira, L. M., and Santos, F. C. (2012b). The emergence of commitments and
cooperation. In AAMAS’2012, pages 559–566.
Han, T. A., Pereira, L. M., Santos, F. C., and Lenaerts, T. (2013a). Good agreements make
good friends. Scientific reports, 3(2695).
Han, T. A., Pereira, L. M., Santos, F. C., and Lenaerts, T. (2013b). Why Is It So Hard to Say
Sorry: The Evolution of Apology with Commitments in the Iterated Prisoner’s Dilemma. In
IJCAI’2013, pages 177–183. AAAI Press.
Han, T. A., Pereira, L. M., Santos, F. C., and Lenaerts, T. (2020). To Regulate or Not: A
Social Dynamics Analysis of an Idealised AI Race. Journal of Artificial Intelligence Research,
69:881–921.
Han, T. A., Perret, C., and Powers, S. T. (2021b). When to (or not to) trust intelligent
machines: Insights from an evolutionary game theory analysis of trust in repeated games.
Cognitive Systems Research, 68:111–124.
Han, T. A., Santos, F. C., Lenaerts, T., and Pereira, L. M. (2015a). Synergy between intention
recognition and commitments in cooperation dilemmas. Scientific reports, 5(9312).
Han, T. A. and Tran-Thanh, L. (2018). Cost-effective external interference for promoting the
evolution of cooperation. Scientific reports, 8.
Han, T. A., Tran-Thanh, L., and Jennings, N. R. (2015b). The cost of interference in evolving
multiagent systems. In 14th International Conference on Autonomous Agents and Multiagent
Systems, pages 1719–1720.
20
Han, T. A., Traulsen, A., and Gokhale, C. S. (2012c). On equilibrium properties of evolutionary
multi-player games with random payoff matrices. Theoretical Population Biology, 81(4):264 –
272.
Hardin, G. (1968). The tragedy of the commons. Science, 162:1243–1248.
Hasan, M. R. and Raja, A. (2013). Emergence of cooperation using commitments and complex
network dynamics. In IEEE/WIC/ACM Intl Joint Conferences on Web Intelligence and
Intelligent Agent Technologies, pages 345–352.
Hauser, M. D. (2007). Moral Minds, How Nature Designed Our Universal Sense of Right and
Wrong. Little Brown.
Hofbauer, J. and Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cam-
bridge University Press, Cambridge.
Lee, K.-F. (2018). AI superpowers: China, Silicon Valley, and the new world order. Houghton
Mifflin Harcourt.
Luhmann, N. (1979). Trust and Power. John Wiley & Sons, Chichester.
Maynard Smith, J. and Price, G. R. (1973). The logic of animal conflict. Nature, 246:15–18.
Meltzoff, A. N. (2005). Imitation and other minds: the “like me” hypothesi. In Perspectives on
imitation: From neuroscience to social science. Imitation, human development, and culture,
pages 55–77. Cambridge, MA: MIT Press.
Nesse, R. M. (2001). Evolution and the capacity for commitment. Foundation series on trust.
Russell Sage.
Nowak, M. A. (2006a). Evolutionary Dynamics. Harvard University Press, Cambridge, MA.
Nowak, M. A. (2006b). Five rules for the evolution of cooperation. Science, 314:1560–1563.
Ogbo, N. B., Elragig, A., and Han, T. A. (2021). Evolution of coordination in pairwise and
multi-player interactions via prior commitments. Adaptive Behavior, page 1059712321993166.
O’Keefe, C., Cihon, P., Garfinkel, B., Flynn, C., Leung, J., and Dafoe, A. (2020). The windfall
clause: Distributing the benefits of ai for the common good. In Proceedings of the AAAI/ACM
Conference on AI, Ethics, and Society, pages 327–331.
Paiva, A., Santos, F. P., and Santos, F. C. (2018). Engineering pro-sociality with autonomous
agents. In Thirty-second AAAI conference on artificial intelligence.
21
Pennisi, E. (2005). How did cooperative behavior evolve? Science, 309(5731):93–93.
Perc, M., Jordan, J. J., Rand, D. G., Wang, Z., Boccaletti, S., and Szolnoki, A. (2017). Statistical
physics of human cooperation. Physics Reports, 687:1–51.
Pereira, L. M., Han, T. A., and Lopes, A. B. (2021). Employing ai to better understand our
morals. Entropy, 24(1):10.
Pereira, L. M., Lenaerts, T., Martinez-Vaquero, L. A., and Han, T. A. (2017). Social manifesta-
tion of guilt leads to stable cooperation in multi-agent systems. In AAMAS, pages 1422–1430.
Pereira, L. M. and Lopes, A. B. (2020). Machine ethics: from machine morals to the machinery
of morality. Springer.
Pu, P. and Chen, L. (2007). Trust-inspiring explanation interfaces for recommender systems.
Knowledge-Based Systems, 20(6):542–556.
Roy, P., Bouchard, B., Bouzouane, A., and Giroux, S. (2007). A hybrid plan recognition model
for alzheimer’s patients: interleaved-erroneous dilemma. In Proceedings of IEEE/WIC/ACM
International Conference on Intelligent Agent Technology, pages 131–137.
Sadri, F. (2011). Logic-based approaches to intention recognition. In Handbook of Research on
Ambient Intelligence: Trends and Perspectives, pages 346–375.
Santos, F. C. and Pacheco, J. M. (2011). Risk of collective failure provides an escape from the
tragedy of the commons. Proc Natl Acad Sci U S A.
Santos, F. P., Mascarenhas, S., Santos, F. C., Correia, F., Gomes, S., and Paiva, A. (2020).
Picky losers and carefree winners prevail in collective risk dilemmas with partner selection.
Autonomous Agents and Multi-Agent Systems, 34(2):1–29.
Sigmund, K. (2010). The calculus of selfishness. Princeton Univ. Press.
Sigmund, K., De Silva, H., Traulsen, A., and Hauert, C. (2010). Social learning promotes
institutions for governing the commons. Nature, 466:861–863.
Singh, M. P. (2013). Norms as a basis for governing sociotechnical systems. ACM Transactions
on Intelligent Systems and Technology (TIST), 5(1):21.
Sukthankar, G., Geib, C., Bui, H., Pynadath, D., and Goldman, R. P. (2014). Plan, activity,
and intent recognition: Theory and practice. Newnes.
Tomasello, M. (2008). Origins of Human Communication. MIT Press.
22
Traulsen, A., Nowak, M. A., and Pacheco, J. M. (2006). Stochastic dynamics of invasion and
fixation. Phys. Rev. E, 74:11909.
Tuyls, K. and Parsons, S. (2007). What evolutionary game theory tells us about multiagent
learning. Artificial Intelligence, 171(7):406–416.
Von Neumann, J. and Morgenstern, O. (1944). Theory of games and economic behavior. Prince-
ton university press.
Wooldridge, M. (2020). The road to conscious machines: The story of AI. Penguin UK.
Wooldridge, M. and Jennings, N. R. (1999). The cooperative problem-solving process. In Journal
of Logic and Computation, pages 403–417.