ArticlePDF Available

Cost efficiency of institutional incentives for promoting cooperation in finite populations

The Royal Society
Proceedings of the Royal Society A
Authors:

Abstract

Institutions can provide incentives to enhance cooperation in a population where this behaviour is infrequent. This process is costly, and it is thus important to optimize the overall spending. This problem can be mathematically formulated as a multi-objective optimization problem where one wishes to minimize the cost of providing incentives while ensuring a minimum level of cooperation, sustained over time. In this paper, we provide a rigorous analysis of this optimization problem, in a finite population and stochastic setting, studying both pair-wise and multi-player cooperation dilemmas. We prove the regularity of the cost functions for providing incentives over time, characterize their asymptotic limits (infinite population size, weak selection and large selection) and show exactly when reward or punishment is more cost efficient. We show that these cost functions exhibit a phase transition phenomena when the intensity of selection varies. By determining the critical threshold of this phase transition, we provide exact calculations for the optimal cost of incentive, for any given intensity of selection. Numerical simulations are also provided to demonstrate analytical observations. Overall, our analysis provides for the first time a selection-dependent calculation of the optimal cost of institutional incentives (for both reward and punishment) that guarantees a minimum level of cooperation over time. It is of crucial importance for real-world applications of institutional incentives since intensity of selection is often found to be non-extreme and specific for a given population.
royalsocietypublishing.org/journal/rspa
Research
Cite this article: Duong MH, Han TA. 2021
Cost eciency of institutional incentives for
promoting cooperation in nite populations.
Proc.R.Soc.A477: 20210568.
https://doi.org/10.1098/rspa.2021.0568
Received: 13 July 2021
Accepted: 16 September 2021
Subject Areas:
applied mathematics, mathematical
modelling
Keywords:
institutional incentives, evolutionary game
theory, evolution of cooperation
Author for correspondence:
Manh Hong Duong
e-mail: h.duong@bham.ac.uk
Electronic supplementary material is available
online at https://doi.org/10.6084/m9.gshare.
c.5647030.
Cost eciency of institutional
incentives for promoting
cooperationinnite
populations
Manh Hong Duong1and The Anh Han2
1School of Mathematics, University of Birmingham, Birmingham
B15 2TT, UK
2School of Computing, Engineering and Digital Technologies,
Teesside University, Middlesbrough TS1 3BX, UK
MHD, 0000-0002-4361-0795; TAH, 0000-0002-3095-7714
Institutions can provide incentives to enhance
cooperation in a population where this behaviour
is infrequent. This process is costly, and it is thus
important to optimize the overall spending. This
problem can be mathematically formulated as a
multi-objective optimization problem where one
wishes to minimize the cost of providing incentives
while ensuring a minimum level of cooperation,
sustained over time. Prior works that consider this
question usually omit the stochastic effects that drive
population dynamics. In this paper, we provide a
rigorous analysis of this optimization problem, in a
finite population and stochastic setting, studying both
pairwise and multi-player cooperation dilemmas.
We prove the regularity of the cost functions for
providing incentives over time, characterize their
asymptotic limits (infinite population size, weak
selection and large selection) and show exactly
when reward or punishment is more cost efficient.
We show that these cost functions exhibit a phase
transition phenomenon when the intensity of selection
varies. By determining the critical threshold of this
phase transition, we provide exact calculations for
the optimal cost of the incentive, for any given
intensity of selection. Numerical simulations are also
provided to demonstrate analytical observations.
Overall, our analysis provides for the first time a
selection-dependent calculation of the optimal cost
of institutional incentives (for both reward and
punishment) that guarantees a minimum level of
2021 The Authors. Published by the Royal Society under the terms of the
Creative Commons Attribution License http://creativecommons.org/licenses/
by/4.0/, which permits unrestricted use, provided the original author and
source are credited.
2
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
cooperation over time. It is of crucial importance for real-world applications of institutional
incentives since the intensity of selection is often found to be non-extreme and specific for a
given population.
1. Introduction
The problem of promoting the evolution of cooperative behaviour within populations of self-
regarding individuals has been intensively investigated across diverse fields of behavioural, social
and computational sciences [15]. Various mechanisms responsible for promoting the emergence
and stability of cooperative behaviours among such individuals have been proposed. They
include kin and group selection [6,7], direct and indirect reciprocities [812], spatial networks
[1316], reward and punishment [1722] and pre-commitments [2327]. Institutional incentives,
namely rewards for cooperation and punishment for wrongdoing, are among the most important
ones [22,2836]. Different from other mechanisms, in order to carry out institutional incentives,
it is assumed that there exists an external decision maker (e.g. institutions such as the United
Nations and the European Union) that has a budget to interfere in the population to achieve
a desirable outcome. Institutional enforcement mechanisms are crucial for enabling large-scale
cooperation. Most modern societies implement certain forms of institutions for governing and
promoting collective behaviours, including cooperation, coordination and technology innovation
[3742].
Providing incentives is costly and it is therefore important to minimize the cost while ensuring
a sustained level of cooperation over time [28,31,41]. Despite its paramount importance, so far
there have been only a few works exploring this question. In particular, Wang et al.[35]use
optimal control theory to provide an analytical solution for cost optimization of institutional
incentives assuming deterministic evolution and infinite population sizes (modelled using
replicator dynamics). This work therefore does not take into account various stochastic effects
of evolutionary dynamics such as mutation and non-deterministic behavioural update [4,43,44].
In a deterministic system consisting of cooperators and defectors, once the latter disappear (for
instance through strong institutional punishment), there is no further change to the system
and thus no further interference in it is required. When mutation is present, this behaviour
can however recur and become abundant over time, requiring institutions to spend more of
their budget on providing further incentives. Moreover, a key factor of behavioural update, the
intensity of selection [4]—which determines how strongly an individual bases their decision
to copy another individual’s strategy on their fitness difference—might strongly impact an
institutional incentives strategy and its cost efficiency. Its value is usually found to be specific for
a given population [4548] and thus should be taken into account when designing suitable cost-
efficient incentives. For instance, when selection is weak such that behavioural update is close to
a random process (i.e. an imitation decision is independent of how large the fitness difference is),
providing incentives would make little difference to cause behavioural change, however strong it
is. When selection is strong, incentives that ensure a minimum fitness advantage to cooperators
would ensure a positive behavioural change.
In a stochastic, finite-population context, so far this problem has been investigated
primarily using agent-based and numerical simulations [28,31,4952]. Results demonstrate
several interesting phenomena, such as the significant influence of the intensity of selection
on incentive strategies and optimal costs. However, there is no satisfactory rigorous analysis
available at present that allows one to determine the optimal way of providing incentives. This
is a challenging problem because of the large but finite population size and the complexity of
stochastic processes governing the population dynamics.
In this paper, we provide exactly such a rigorous analysis. We study cooperation dilemmas
in both pairwise (the Donation game (DG)) and multi-player (the Public Goods game (PGG))
settings [4]. They are among the most well-studied models for investigating the evolution of
3
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
cooperative behaviour where individual defection is always preferred over cooperation while
mutual cooperation is the preferred collective outcome for the population as a whole. Adopting
a popular stochastic evolutionary game approach for analysing well-mixed finite populations
[5355], we derive the total expected costs of providing institutional reward or punishment,
characterize their asymptotic limits (namely, for an infinite population, weak selection and strong
selection) and show the existence of a phase transition phenomenon in the optimization problem
when the intensity of selection varies. We calculate the critical threshold of phase transitions and
study the minimization problem when the selection is less than and greater than the critical value.
We furthermore provide numerical simulations to demonstrate the analytical results.
The rest of the paper is organized as follows. In §2, we introduce the models and methods,
deriving mathematical optimization problems that will be studied. The main results of the paper
are presented in §3. In §4, we discuss possible extensions for future work. Finally, detailed
computations, technical lemmas and proofs of the main results are provided in the electronic
supplementary material.
2. Models and methods
(a) Cooperation dilemmas
We consider a well-mixed, finite population of Nself-regarding individuals or players, who
interact with each other using one of the following one-shot (i.e. non-repeated) cooperation
dilemmas: the DG or its multi-player version, the PGG. In these games, a player can choose either
to cooperate (i.e. a cooperator or Cplayer) or to defect (i.e. a defector, or Dplayer).
Let ΠC(i)andΠD(i) be the average pay-offs of a Cplayer and a Dplayer in a population with
iCplayers and NiDplayers, respectively (see also §2.3 for more details). We show below that
the difference δ=ΠC(i)ΠD(i) does not depend on i. For cooperation dilemmas, it is always the
case that δ<0.
(i) Donation game
The pay-off matrix of the DG (for a row player) is given as follows:
CD
Cbcc
Db0,
where cand brepresent the cost and benefit of cooperation, where b>c. DG is a special version
of the Prisoner’s Dilemma (PD) game.
Denoting πX,Yas the pay-off of a strategist Xwhen playing with strategist Yfrom the pay-off
matrix above, we obtain
ΠC(i)=(i1)πC,C+(Ni)πC,D
N1=(i1)(bc)+(Ni)(c)
N1
and
ΠD(i)=iπD,C+(Ni1)πD,D
N1=ib
N1.
Thus,
δ=ΠC(i)ΠD(i)=−c+b
N1.
(ii) Public Goods game
In a PGG, players interact in a group of size n, where they decide to cooperate, contributing
an amount c>0 to a common pool, or to defect, contributing nothing to the pool. The total
contribution in a group will be multiplied by a factor r,where1<r<n(for the PGG to be a
4
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
social dilemma), which is then shared equally among all members of the group, regardless of
their strategy.
We obtain [56]
ΠC(i)=
n1
j=0
i1
j Ni
n1j
N1
n1(j+1)rc
nc=rc
n1+(i1) n1
N1c
and
ΠD(i)=
n1
j=0
i
jN1i
n1j
N1
n1
jrc
n=rc(n1)
n(N1)i.
Thus,
δ=ΠC(i)ΠD(i)=−c1r(Nn)
n(N1).
(b) Cost of institutional reward and punishment
To reward a cooperator (respectively, punish a defector), the institution has to pay an amount θ/a
(resp., θ/b) so that the cooperator’s (defector’s) pay-off increases (decreases) by θ,wherea,b>0
are constants representing the efficiency ratios of providing the corresponding incentive. As we
study reward and punishment separately, without losing generality, we set a=b=1[22,28]. Thus,
the key question here is: What is the optimal value of the individual incentive cost θthat ensures a
sufficient desired level of cooperation in the population (in the long run) while minimizing the total cost
spent by the institution?
(i) Deriving the expected cost of providing institutional incentives
We adopt here the finite population dynamics with the Fermi strategy update rule [44], stating
that a player Awith fitness fAadopts the strategy of another player Bwith fitness fBwith a
probability given by PA,B=(1 +eβ(fBfA))1,whereβrepresents the intensity of selection (see
details in §2c). We compute the expected number of times the population contains iCplayers,
1iN1. For that, we consider an absorbing Markov chain of (N+1) states, {S0,...,SN},
where Sirepresents a population with iCplayers. S0and SNare absorbing states. Let U={uij}N1
i,j=1
denote the transition matrix between the N1 transient states, {S1,...,SN1}. The transition
probabilities can be defined as follows, for 1 iN1:
ui,i±j=0forallj2,
ui,i±1=Ni
N
i
N(1 +eβ[ΠC(i)ΠD(i)+θ])1
and ui,i=1ui,i+1ui,i1.
(2.1)
The entries nij of the so-called fundamental matrix N=(nij)N1
i,j=1=(IU)1of the absorbing
Markov chain give the expected number of times the population is in the state Sjif it starts in the
transient state Si[57]. As a mutant can randomly occur at either S0or SN, the expected number of
visits at state Siis, thus, 1
2(n1i+nN1,i).
The total cost per generation is
θi=i×θin the case of institutional reward,
(Ni)×θin the case of institutional punishment.
5
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
Hence, the expected total costs of interference for institutional reward and institutional
punishment are, respectively,
Er(θ)=θ
2
N1
i=1
(n1i+nN1,i)iand Ep(θ)=θ
2
N1
i=1
(n1i+nN1,i)(Ni). (2.2)
(ii) Cooperation frequency
Since the population consists of only two strategies, the fixation probabilities of a C(D) player
in a homogeneous population of D(C) players when the interference scheme is carried out are,
respectively,
ρD,C=
1+
N1
i=1
i
k=1
1+eβ(ΠC(k)ΠD(k)+θ)
1+eβ(ΠC(k)ΠD(k)+θ)
1
and
ρC,D=
1+
N1
i=1
i
k=1
1+eβ(ΠD(k)ΠC(k)θ)
1+eβ(ΠD(k)ΠC(k)θ)
1
.
Computing the stationary distribution using these fixation probabilities, we obtain the frequency
of cooperation (see §2.3), ρD,C
ρD,C+ρC,D
.
Hence, this frequency of cooperation can be maximized by maximizing
max
θ(ρD,CC,D). (2.3)
The fraction in equation (2.3) can be simplified as follows [54]:
ρD,C
ρC,D
=
N1
k=1
T(k)
T+(k)=
N1
k=1
1+eβ[ΠC(k)ΠD(k)+θ]
1+eβ[ΠC(k)ΠD(k)+θ]
=eβN1
k=1(ΠC(k)ΠD(k)+θ)
=eβ(N1)(δ+θ). (2.4)
In the above transformation, T(k)andT+(k) are the probabilities of decreasing or increasing the
number of Cplayers (i.e. k) by one in each time step, respectively.
We consider non-neutral selection, i.e. β>0 (under neutral selection, there is no need to use
incentives). Assuming that we desire to obtain at least an ω[0, 1] fraction of cooperation, i.e.
ρD,C/(ρD,C+ρC,D)ω, it therefore follows from equation (2.4) that
θθ0(ω)=1
(N1)βlog ω
1ωδ. (2.5)
Therefore, it is guaranteed that, if θθ0(ω), at least an ωfraction of cooperation can be expected.
This condition implies that the lower bound of θmonotonically depends on β. Namely, when
ω0.5 it increases with β, while when ω<0.5 it decreases with β.
(iii) Optimization problems
Bringing all these factors together, we obtain the following cost-optimization problems of
institutional incentives in stochastic finite populations:
min
θθ0(ω)E(θ), (2.6)
where Eis either Eror Ep, defined in (2.2), which respectively corresponds to institutional reward
or punishment. We show in the electronic supplementary material that θ→ E(θ) is a smooth
function on R.
6
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
(c) Methods: evolutionary dynamics in nite populations
We adopt in our analysis the evolutionary game theory (EGT) methods for finite populations
[5355]. Herein, individuals’ pay-offs represent their fitness or social success, and evolutionary
dynamics is shaped by social learning [4,43], whereby the most successful players will tend to
be imitated more often by the other players. Here, social learning is modelled using the pairwise
comparison rule [44], that is, a player Awith fitness fAadopts the strategy of another player B
with fitness fBwith probability given by the Fermi function,
PA,B=(1 +eβ(fBfA))1,
where βconveniently describes the selection intensity (β=0 represents neutral drift while β→∞
represents increasingly deterministic selection).
In the absence of mutations or exploration, the end states of evolution are inevitably
monomorphic: once such a state is reached, it cannot be escaped through social learning. We
assume that, with a certain mutation probability, an individual switches randomly to a different
strategy without imitating another individual. In addition, we assume here the small mutation
limit [53,55,58]. Thus, at most two strategies are present in the population at a time. The
evolutionary dynamics can be described by a Markov chain, where each state represents a
homogeneous population and the transition probabilities between any two states are given by
the fixation probability of a single mutant [53,55,58]. The resulting Markov chain has a stationary
distribution, which describes the average time the population spends in an end state. The small
mutation limit allows us to obtain an analytical form of the frequency of cooperation (see below).
It is noteworthy that, although we focus here on the small mutation limit, this approach has been
shown to be widely applicable to scenarios which go well beyond the strict limit of very small
mutation rates [45,46,48,59].
The fixation probability of a single mutant Ataking over a whole population with (N1) B
players is as follows (see [44,55,60] for details)
ρB,A=
1+
N1
i=1
i
j=1
T(j)
T+(j)
1
,
where T±(k)=((Nk)/N)(k/N)[1 +eβ[ΠA(k)ΠB(k)]]1describes the probability of changing the
number of Aplayers by ±one in a time step. Specifically, when β=0, ρB,A=1/N, representing
the transition probability at the neutral limit.
Considering the set of two strategies Cand D(see [53,58] for the calculation for any number of
strategies). Their stationary distribution is given by the normalized eigenvector associated with
the eigenvalue 1 of the transpose of a matrix [53,58]
M=1ρC,DρC,D
ρD,C1ρD,C,
which is {ρD,C/(ρD,C+ρC,D), ρC,D/(ρD,C+ρC,D)}. The first term is the frequency of cooperation
and the second one is that of defection.
3. Main results
The present paper provides a rigorous analysis of the expected total cost of providing an
institutional incentive (2.2) and the associated optimization problem (2.6). In this section, we state
our main analytical results, theorems 3.1–3.4, and provide numerical simulations to illustrate
the analytical results. The proofs of these results, which require a delicate analysis of the cost
functions, are presented in the electronic supplementary material.
7
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
In the following theorems, Edenotes the cost function for either institutional reward, Er,or
institutional punishment, Ep, as obtained in (2.2). Also, HNdenotes the well-known harmonic
number
HN:=
N1
j=1
1
j. (3.1)
Our first main result provides qualitative properties and asymptotic limits of E.
Theorem 3.1 (qualitative properties and asymptotic limits of total cost functions).
(I)(finite population estimates) The expected total cost of providing an incentive satisfies the following
estimates for all finite populations of size N:
N2θ
2HN+1
N1E(θ)N(N1)θ(HN+1). (3.2)
(II)(infinite population limit) The expected total cost of providing an incentive satisfies the following
asymptotic behaviour when the population size N tends to +∞:
lim
N→+∞
E(θ)
N2θ
2(ln N+γ)=1+eβ|θc|for DG,
1+eβ|θc|eβcr
nfor PGG,(3.3)
where γ=0.5772 ··· is the Euler–Mascheroni constant.
(III)(weak selection limit) The expected total cost of providing an incentive satisfies the following
asymptotic limit when the selection strength βtends to 0:
lim
β0E(θ)=N2θHN. (3.4)
(IV)(strong selection limit) The expected total cost of providing an incentive satisfies the following
asymptotic limit when the selection strength βtends to +∞:
lim
β→+∞ Er(θ)=
N2
2θ1
N1+HNfor θ<δ,
N2θHNfor θ=−δ,
N2
2θ(1 +HN)for θ>δ
(3.5)
and
lim
β→+∞ Ep(θ)=
N2θ
2(1 +HN)for θ<δ,
N2θHNfor θ=−δ,
N2θ
2HN+1
N1for θ>δ.
(3.6)
The lower and upper bounds obtained in part (I) of the theorem suggest that the total expected
cost function Efor both reward and punishment behaves asymptotically in the order of (N2HN)×
θfor sufficiently large N. This is confirmed in part (II), noting that HNln N.Wealsoshow
that the leading asymptotic coefficient of Edepends on the game (i.e. DG or PGG) and its
parameters. Hence, it is important to adopt a precise optimal value of θ(e.g. obtained by solving
the optimization problem (2.6)), as a small increase in this individual incentive cost can lead to
a significant increase in E, especially when the population size is large. Figure 1 numerically
demonstrates this asymptotic limit.
Parts (III) and (IV) of the theorem provide theoretical estimations of Eunder the weak (β0)
and strong (β→+) selection limits. For the weak selection limit, the expected total costs are the
same for reward and punishment, i.e. Er(θ)=Ep(θ). For the strong selection limit, Eris smaller
8
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
population size, N
b = 1
E(q)
N2q
2(ln N + g)
b = 5
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 200 400 600 800 1000
population size, N
0 200 400 600 800 1000
Figure 1. Large population sizelimit.Wecalculate numerically the expected total cost of incentive Efor reward and punishment,
varying population size N, for dierent values of θand β. The dashed lines represent the corresponding theoretical limiting
values obtained in theorem 3.1 for the large population size limit, N→+. We observe that numerical results are in close
accordance with those obtained theoretically. Results are obtained for DG with b=2, c=1. (Online version in colour.)
q = 0.5 < –d
E
Eq = 3 > –d
4.5
5.0
5.5
6.0
6.5
b
b
b
b
b
b
28
30
32
34
36
38
40
80
100
120
140
500
600
700
800
2500
3000
3500
4000
4500
5000
5500
20 000
25 000
30 000
reward
punishmen
t
N = 3 N = 10N = 50
10–4 10–4 10–2 102104
10–2 102104
1
10–4 10–2 102104
1
10–4 10–2 102104
1
1
10–4 10–2 102104
10–4 10–2 102104
1
1
Figure 2. Weak and strong selection limits. We calculate numerically the total expected cost of incentive Efor reward and
punishment, by varying the intensity of selection, for dierent values of Nand β. The dashed lines representthe corresponding
theoretical limiting values obtained in theorem 3.1 for weak and strong selection limits. We observe that numerical results are
in close accordance with those obtained theoretically. Results are obtained for DG with b=2, c=1. (Online version in colour.)
than, equal to or greater than Ep, depending on whether θis smaller than, equal to or greater
than δ.Figure 2 provides numerical validation of the theoretical weak and strong selection
asymptotic behaviours of E, for different population sizes N. We can observe that, for a given
individual incentive cost θ, the range of Eincreases significantly for larger N.
Our second main result concerns the optimization problem (2.6). We show that the cost
function Eexhibits a phase transition when the selection intensity βvaries.
9
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
Theorem 3.2 (optimization problems and phase transition phenomenon).
(I)(phase transition phenomena and behaviour under the threshold) Define
F=min{F(u):P(u)>0}in the reward case,
min{ˆ
F(u):ˆ
P(u)>0}in the punishment case,
where P(u)and F(u)as well as ˆ
Pand ˆ
F are defined in the electronic supplementary material (see
§§1 and 2 there, respectively). There exists a threshold value βgiven by
β=−F
δ>0,
such that θ→ E(θ)is non-decreasing for all ββand is non-monotonic when β>β
.Asa
consequence, for ββ
min
θθ0
E(θ)=E(θ0). (3.7)
(II)(behaviour above the threshold value) For β>β
, the number of changes of the sign of E(θ)is
at least two for all N and there exists an N0such that the number of changes is exactly two for
NN0. As a consequence, for N N0,thereexistθ1
2such that, for β>β
,E(θ)is increasing
when θ<θ
1, decreasing when θ1
2and increasing when θ>θ
2.Thus,forNN0,
min
θθ0
E(θ)=min{E(θ0), E(θ2)}.
The proofs of theorems 3.1 and 3.2 for the cases of reward and punishment are given in §§1 and
2 in the electronic supplementary material, respectively. We also provide explicit computations
for N=3andN=4 to illustrate these theorems in §3 in the electronic supplementary material.
Based on numerical simulations, we conjecture that the requirement NN0could be removed
and theorem 3.2 is true for all finite N. In electronic supplementary material, figure S2, using
numerical calculation we have shown that N0=100 satisfies the conjecture, ensuring the validity
of the numerical examples below. Theorem 3.2 gives rise to the following algorithm to determine
the optimal value θfor NN0.
Algorithm 3.3 (finding optimal cost of incentive θ). Inputs: (i) N N0: population size, (ii) β:
intensity of selection, (iii) game and parameters: PD (c and b) or PGG (c, r and n), and (iv) ω: minimum
desired cooperation level.
(1)Compute δ{in PD: δ=−(c+(b/(N1))); in PGG: δ=−c(1 ((r(Nn))/(n(N1))))}.
(2)Compute θ0=(1/(N1)β) log(ω/(1 ω)) δ.
(3)Compute
F=min{F(u):P(u)>0}in the reward case,
min{ˆ
F(u):ˆ
P(u)>0}in the punishment case,
where P(u)and F(u),aswellas ˆ
Pand ˆ
F, are defined in the electronic supplementary material.
(4)Compute β=−(F).
(5)If ββ:
θ=θ0,minE(θ)=E(θ0).
(6)Otherwise (i.e. if β>β
)
(a)Compute u2that is the largest root of the equation F(u)+βδ =0for the reward case or that
of ˆ
F(u)+βδ =0for the punishment case.
(b)Compute θ2=((log u2))δ:
—ifθ2θ0:θ=θ0,minE(θ)=E(θ0).
10
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
E(q)
(a)
(b)
E(q)E(q)
b = 1
qq q
qq q
5000
10 000
15 000
20 000
25 000
5000
10 000
15 000
20 000
25 000
5000
10 000
15 000
20 000
25 000
1 2 3 4
10
20
30
40
1 2 3 4
10
20
30
40
1 2 3 4
10
20
30
40
N = 3; b* = 5.752
N = 50; b* = 3.039
1234 1234 1234
q0 for w = 0.25, 0.7, 0.999999
q0 = 1.98 for w = 0.7
q2 = 1.18
q2 = 2.16
q0 = 2.32 for w = 0.7
E(q)E(q)E(q)q0 for w = 0.01, 0.25, 0.999999
q0 = 1.04 for w = 0.7
q0 = 1.05 for w = 0.7
b = 5 b = 10
b = 1 b = 3 b = 5
Figure 3. We use algorithm 3.3 to nd optimal θthat minimizes E(θ) (for institutional reward)while ensuring a minimum level
of cooperation ω. We also use as examples a small population size (N=3(a)) and a larger one (N=50(b)), for DG (b=1.8,
c=1). (Online version in colour.)
Otherwise (if θ2
0):
—ifE(θ0)E(θ2):θ=θ0,minE(θ)=E(θ0);
—ifE(θ2)<E(θ0):θ=θ2,minE(θ)=E(θ2).
Output:θand E(θ).
To illustrate theorem 3.2 and algorithm 3.3, we focus on the case of reward. Figure 3 shows
the cost function Eras a function of θ, for different values of N,βand ωto illustrate the phase
transition when varying β, in a DG. We can see that, in all cases, these numerical observations
are in close accordance with theoretical results. For example, with N=3(figure 3a), we found
β=f =10.9291/1.9 =5.752. For β<β
,E(θ) are increasing functions of θ. Thus, the optimal
cost of incentive θ=θ0, for a given required minimum level of cooperation ω. For example, with
N=3, for β=1 to ensure at least 70% of cooperation (ω=0.7), then θ=θ0=2.32. When ββ
one needs to compare E(θ0)andE(θ2). For example, with N=3, β=10: for ω=0.25 (black dashed
line), then E(θ0)=23.602 <25.6124 =EC(θ2), so θ=θ0=1.845; for ω=0.7 (green dashed line),
then E(θ0)=26.446 >25.6124 =EC(θ2), so θ=θ2=2.16 (red solid line); for ω=0.999999 (blue
dashed line), since θ2
0,θ=θ0=2.59078.
Similarly, with a larger population size (N=50; see figure S1 in the electronic supplementary
material, bottom row), we obtained β=3.15/1.03673 =3.039. In general, similar observations are
obtained as in the case of a small population size N=3. Except that, when Nis large, the values of
θ0for different non-extreme values of minimum required cooperation ω(say, ω(0.01, 0.99)) are
very small (given the log scale of ω/(1 ω) in the formula of ω0). This value is also smaller than θ0,
with a cost E(θ0)>E(θ2), making θ2the optimal cost of incentive. Similar results are obtained for
PGG (figure 4). When ωis extremely high (i.e. greater than 1 10k,foralargek) (we do not look
at extremely low values since we would like to ensure at least a sufficient level of cooperation),
then we can also see other scenarios where the optimal cost is θ0(see figure S1 in the electronic
supplementary material, bottom row). We thus can observe that for ω(0.01, 0.99), for sufficiently
large population size Nand large enough β(β>β
+a bit more), then the optimal value of ωis
always θ2. Otherwise, θ0is the optimal cost.
Our last result provides a comparison of the expected total costs for providing institutional
reward and punishment, for different individual incentive costs θ.
11
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
qqq
PGG: N = 50, b* = 7.016
q2 = 0.59
E(q)E(q)E(q)q0 for w = 0.25, 0.7, 0.999999
q0 = 0.45 for w = 0.7
q0 = 0.47 for w = 0.7
b = 10b = 6b = 1
0.5 1.0 1.5 2.0
5000
10 000
15 000
0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0
2000
4000
6000
8000
10 000
12 000
14 000
2000
4000
6000
8000
10 000
12 000
14 000
Figure 4. We use algorithm 3.3 to nd optimal θthat minimizes E(θ) while ensuring a minimum level of cooperation ω,for
PGG (r=3, n=5, c=1) with N=50. Similar observations to those for DG are obtained. (Online version in colour.)
E(q)E(q)E(q)E(q)
E(q)E(q)E(q)E(q)
E(q)E(q)E(q)E(q)
1234123412341234
1234 1234 1234 1234
1234 1234 1 234 1234
10
20
30
40
50
10 000
20 000
30 000
40 000
5000
10 000
15 000
25 000
20 000
5000
10 000
15 000
25 000
20 000
5000
10 000
15 000
25 000
20 000
reward punishment
b = 0.01 b = 1 b = 10
qq
qq
q
q
qq
qqqq
b = 1000
d = 2
d = 1.22
d = 1.04
N = 3N = 10
N = 50
10
20
30
40
10
20
30
40
10
20
30
40
50
800
600
1000
400
200
800
600
400
200 100
200
300
400
500
600
700
100
200
300
400
500
600
700
Figure 5. Comparison of the total costs Efor reward and punishment as a function of θ, for dierent values of Nand β.
Reward is less costly than punishment (Er<Ep) for small θ, and vice versa. The threshold of θfor this change was obtained
analytically (see theorem 3.1), which is exactly equal to δ. Results are obtained for DG with b=2, c=1. (Online version
in colour.)
Theorem 3.4 (reward versus punishment costs). The difference between the expected total costs of
reward and punishment is given by
(ErEp)(θ)=
<0, for θ<δ,
=0, for θ=−δ,
>0, for θ>δ.
(3.8)
As a consequence, when βmin{β
r,β
p}we have
E
r=Er(θ0)and E
p=Ep(θ0).
In this case,
(E
rE
p)=Er(θ0)Ep(θ0)=
<0for ω<0.5,
=0for ω=0.5,
>0for ω>0.5.
(3.9)
The proof of theorem 3.4 is given in §3 in the electronic supplementary material. Numerical
calculation in figure 5 shows the expected total costs for reward and punishment (DG), for
12
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
E(q*)
ww
E(q*)
reward punishment
10
20
30
40
50
10 500
11 000
11 500
12 000
12 500
N = 3 N = 50
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
Figure 6. Compare the total costs Efor reward and punishment at the optimal value θ(obtained using algorithm 3.3), by
varying the minimum required level of cooperation, ω. Reward is more cost ecient for small ω, while punishment is more
cost ecient when ωis larger. In both cases, the threshold is around ω=0.5. Other parameters: β=1, DG with b=2, c=1.
(Online version in colour.)
varying θ. We observe that reward is less costly than punishment (Er<Ep)forθ<δand vice
versa when θ>δ. It is exactly as shown analytically in theorem 3.4. This analytical result is
confirmed here for different population size Nand intensity of selection β.Figure 6 also confirms
the second part of the theorem, where for small β, if one can choose the type of incentive to use,
either reward or punishment, then the former can provide a lower cost when requiring less than
50% cooperation at minimum and the latter otherwise. This is in line with previous work showing
that reward mechanisms work very well to promote cooperation in environments in which it is
rare, while punishment mechanisms are better at maintaining high levels of cooperation (e.g.
[28,35,52]).
4. Discussion
Institutional incentives such as punishment and reward provide an effective tool for promoting
the evolution of cooperation in social dilemmas. Both theoretical and experimental analysis
has been carried out [29,36,37,52,6163]. However, past research usually ignores the question
of how institutions’ overall spending, i.e. the total cost of providing these incentives, can be
minimized, while at the same time guaranteeing a minimum desired level of cooperation over
time. Answering this question allows one to estimate exactly how incentives should be provided,
that is, how much to reward a cooperator and how severely to punish a wrongdoer. Existing
works that consider this question usually omit the stochastic effects that drive population
dynamics, namely when the intensity of selection varies.
Resorting to a stochastic evolutionary game approach for finite, well-mixed populations, we
have provided theoretical results for the optimal cost of incentives that ensure a desired level of
cooperation while minimizing the total budget, for a given intensity of selection, β. We show that
this cost strongly depends on the value of β, owing to the existence of a phase transition in the
cost functions when βvaries. This behaviour is missing in works that consider a deterministic
evolutionary approach [35]. The intensity of selection plays an important role in evolutionary
processes. Its value differs depending on the pay-off structure (i.e. scaling game pay-off matrix by
a factor is equivalent to dividing βby that factor) and is usually found to be specific for a given
population, which can be estimated through behavioural experiments [4548]. Thus, our analysis
provides a way to calculate the optimal incentive cost for a given population and game pay-off
matrix at hand.
With regard to theoretical importance, we characterized asymptotic behaviours of the total
cost functions for both reward and punishment (namely, in the limits of a large population, weak
13
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
selection and strong selection) and compared these functions for the two types of incentive. We
showed that punishment is always more costly for a small (individual) incentive cost (θ)butless
so when this cost is above a certain threshold. We provided an exact formula for this threshold.
This result provides insights into the choice of which type of incentives to use.
In the context of institutional incentives modelling, a crucial issue is the question of how to
maintain the budget for providing incentives [59,64]. The problem of who pays or contributes
to the budget is a social dilemma in itself, and how to escape this dilemma is a critical research
question. In this work, we focus on the question of how to optimize the budget used for the
provided incentives.
There are several simplifications made for the theoretical analysis to be possible. First, in order
to derive the analytical formula for the frequency of cooperation, we assumed the small mutation
limit. Despite the simplified assumption, this small mutation limit approach has been shown to be
widely applicable to scenarios which go well beyond the strict limit of very small mutation rates
[46,48,59]. Relaxing this assumption would make the derivation of a close form for the frequency
of cooperation intractable.
Second, we focused in this paper on two important cooperation dilemmas, the DG and the
PGG. They have in common a useful property that the difference in (average) pay-off between
a cooperator and a defector, δ=ΠC(i)ΠD(i), does not depend on i, the number of cooperators
in the population. This property allows us to simplify the fundamental matrix to a tridiagonal
form and apply the techniques of matrix analysis to obtain a close form of its inverse matrix
(see electronic supplementary material). In games with more complex pay-off matrices such as
the PD in its general form and the collective risk game [65], the difference δdepends on iand
the technique in this paper cannot be directly applied. We might consider other approaches to
approximate the inverse matrix, exploiting its block structure.
Data accessibility. This article has no additional data.
Authors’ contributions. M.H.D. and T.A.H. designed the research, performed the research and wrote the paper.
Both authors gave final approval for publication and agree to be held accountable for the work performed
herein.
Competing interests. We declare we have no competing interests.
Funding. T.A.H. was supported by a Leverhulme Research Fellowship (RF-2020-603/9).
References
1. Han TA. 2013 Intention recognition, commitments and their roles in the evolution of cooperation:
from artificial intelligence techniques to evolutionary game theory models, vol. 9. Springer SAPERE
Series. Berlin, Germany: Springer Verlag.
2. Nowak MA. 2006 Five rules for the evolution of cooperation. Science 314, 1560.
(doi:10.1126/science.1133755)
3. Perc M, Jordan JJ, Rand DG, Boccaletti S, Szolnoki A. 2017 Statistical physics of human
cooperation. Phys. Rep. 687, 1–51. (doi:10.1016/j.physrep.2017.05.004)
4. Sigmund K. 2010 The calculus of selfishness. Princeton, NJ: Princeton University Press.
5. West SA, Griffin AA, Gardner A. 2007 Evolutionary explanations for cooperation. Curr. Biol.
17, R661–R672. (doi:10.1016/j.cub.2007.06.004)
6. Hamilton WD. 1964 The genetical evolution of social behaviour. I. J. Theor. Biol. 7, 1–16.
(doi:10.1016/0022-5193(64)90038-4)
7. Traulsen A, Nowak MA. 2006 Evolution of cooperation by multilevel selection. Proc. Natl Acad.
Sci. USA 103, 10952. (doi:10.1073/pnas.0602530103)
8. Han TA, Pereira LM, Santos FC. 2012 Corpus-based intention recognition in cooperation
dilemmas. Artif. Life 18, 365–383. (doi:10.1162/ARTL_a_00072)
9. Krellner M, Han TA. 2020 Putting oneself in everybody’s shoes—pleasing enables indirect
reciprocity under private assessments. In Proc. ALIFE 2020: The 2020 Conf. on Artificial Life,
Virtual, 13–18 July 2020, pp. 402–410. Cambridge, MA: MIT Press.
10. Nowak MA, Sigmund K. 2005 Evolution of indirect reciprocity. Nature 437, 1291–1298.
(doi:10.1038/nature04131)
14
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
11. Ohtsuki H, Iwasa Y. 2006 The leading eight: social norms that can maintain cooperation by
indirect reciprocity. J. Theor. Biol. 239, 435–444. (doi:10.1016/j.jtbi.2005.08.008)
12. Okada I. 2020 A review of theoretical studies on indirect reciprocity. Games 11, 27.
(doi:10.3390/g11030027)
13. Antonioni A, Cardillo A. 2017 Coevolution of synchronization and cooperation in costly
networked interactions. Phys. Rev. Lett. 118, 238301. (doi:10.1103/PhysRevLett.118.238301)
14. Peña J, Wu B, Arranz J, Traulsen A. 2016 Evolutionary games of multiplayer cooperation on
graphs. PLoS Comput. Biol. 12, e1005059. (doi:10.1371/journal.pcbi.1005059)
15. Perc M, Gómez-Gardeñes J, Szolnoki A, Floría LM, Moreno Y. 2013 Evolutionary dynamics
of group interactions on structured populations: a review. J. R. Soc. Interface 10, 20120997.
(doi:10.1098/rsif.2012.0997)
16. Santos FC, Pacheco JM, Lenaerts T. 2006 Evolutionary dynamics of social dilemmas
in structured heterogeneous populations. Proc. Natl Acad. Sci. USA 103, 3490–3494.
(doi:10.1073/pnas.0508201103)
17. Boyd R, Gintis H, Bowles S, Richerson PJ. 2003 The evolution of altruistic punishment. Proc.
Natl Acad. Sci. USA 100, 3531–3535. (doi:10.1073/pnas.0630443100)
18. Boyd R, Gintis H, Bowles S. 2010 Coordinated punishment of defectors sustains cooperation
and can proliferate when rare. Science 328, 617–620. (doi:10.1126/science.1183665)
19. Fehr E, Gachter S. 2000 Cooperation and punishment in public goods experiments. Am. Econ.
Rev. 90, 980–994. (doi:10.1257/aer.90.4.980)
20. Hauert C, Traulsen A, Brandt H, Nowak MA, Sigmund K. 2007 Via freedom to coercion: the
emergence of costly punishment. Science 316, 1905–1907. (doi:10.1126/science.1141588)
21. Herrmann B, Thöni C, Gächter S. 2008 Antisocial punishment across societies. Science 319,
1362–1367. (doi:10.1126/science.1153808)
22. Sigmund K, Hauert C, Nowak M. 2001 Reward and punishment. Proc. Natl Acad. Sci. USA 98,
10 757–10 762. (doi:10.1073/pnas.161155698)
23. Han TA, Pereira LM, Santos FC, Lenaerts T. 2013 Good agreements make good friends. Sci.
Rep. 3, 2695. (doi:10.1038/srep02695)
24. Han TA, Pereira LM, Lenaerts T. 2016 Evolution of commitment and level of
participation in public goods games. Auton. Agents Multi-Agent Syst. 31, 561–583.
(doi:10.1007/s10458-016-9338-4)
25. Martinez-Vaquero LA, Han TA, Pereira LM, Lenaerts T. 2017 When agreement-accepting
free-riders are a necessary evil for the evolution of cooperation. Sci. Rep. 7, 1–9.
(doi:10.1038/s41598-016-0028-x)
26. Nesse RM. 2001 Evolution and the capacity for commitment. Foundation Series on Trust.
New York, NY: Russell Sage.
27. Sasaki T, Okada I, Uchida S, Chen X. 2015 Commitment to cooperation and peer punishment:
its evolution. Games 6, 574–587. (doi:10.3390/g6040574)
28. Chen X, Sasaki T, Brännström Å, Dieckmann U. 2015 First carrot, then stick: how the
adaptive hybridization of incentives promotes cooperation. J. R. Soc. Interface 12, 20140935.
(doi:10.1098/rsif.2014.0935)
29. García J, Traulsen A. 2019 Evolution of coordinated punishment to enforce cooperation from
an unbiased strategy space. J. R. Soc. Interface 16, 20190127. (doi:10.1098/rsif.2019.0127)
30. Góis AR, Santos FP, Pacheco JM, Santos FC. 2019 Reward and punishment in climate change
dilemmas. Sci. Rep. 9, 1–9. (doi:10.1038/s41598-019-52524-8)
31. Han TA, Tran-Thanh L. 2018 Cost-effective external interference for promoting the evolution
of cooperation. Sci. Rep. 8, 1–9. (doi:10.1038/s41598-018-34435-2)
32. Powers ST, Ekárt A, Lewis PR. 2018 Modelling enduring institutions: the
complementarity of evolutionary and agent-based approaches. Cogn. Syst. Res. 52, 67–81.
(doi:10.1016/j.cogsys.2018.04.012)
33. Sigmund K, De Silva H, Traulsen A, Hauert C. 2010 Social learning promotes institutions for
governing the commons. Nature 466, 7308. (doi:10.1038/nature09203)
34. Vasconcelos VV, Santos FC, Pacheco JM. 2013 A bottom-up institutional approach
to cooperative governance of risky commons. Nat. Clim. Change 3, 797. (doi:10.1038/
nclimate1927)
35. Wang S, Chen X, Szolnoki A. 2019 Exploring optimal institutional incentives for
public cooperation. Commun. Nonlinear Sci. Numer. Simul. 79, 104914. (doi:10.1016/
j.cnsns.2019.104914)
15
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
36. Wu J-J, Li C, Zhang B-Y, Cressman R, Tao Yi. 2014 The role of institutional incentives and the
exemplar in promoting cooperation. Sci. Rep. 4, 6421. (doi:10.1038/srep06421)
37. Bardhan P. 2005 Institutions matter, but which ones? Econ. Transit. 13, 499–532.
(doi:10.1111/ecot.2005.13.issue-3)
38. Bowles S. 2009 Microeconomics: behavior, institutions, and evolution. Princeton, NJ: Princeton
University Press.
39. Bowles S, Gintis H. 2002 Social capital and community governance. Econ. J. 112, F419–F436.
(doi:10.1111/1468-0297.00077)
40. Han TA, Pereira LM, Lenaerts T, Santos FC. 2021 Mediating artificial intelligence
developments through negative and positive incentives. PLoS ONE 16, e0244592.
(doi:10.1371/journal.pone.0244592)
41. Ostrom E. 1990 Governing the commons: the evolution of institutions for collective action.
Cambridge, UK: Cambridge University Press.
42. Scotchmer S. 2004 Innovation and incentives. Cambridge, MA: MIT Press.
43. Hofbauer J, Sigmund K. 1998 Evolutionary games and population dynamics. Cambridge, UK:
Cambridge University Press.
44. Traulsen A, Nowak MA, Pacheco JM. 2006 Stochastic dynamics of invasion and fixation. Phys.
Rev. E 74, 11909. (doi:10.1103/PhysRevE.74.011909)
45. Domingos EF, Gruji´c J, Burguillo JC, Kirchsteiger G, Santos FC, Lenaerts T. 2020 Timing
uncertainty in collective risk dilemmas encourages group reciprocation and polarization.
Iscience 23, 101752. (doi:10.1016/j.isci.2020.101752)
46. Rand DG, Tarnita CE, Ohtsuki H, Nowak MA. 2013 Evolution of fairness in the one-
shot anonymous ultimatum game. Proc. Natl Acad. Sci. USA 110, 2581–2586. (doi:10.1073/
pnas.1214167110)
47. Traulsen A, Semmann D, Sommerfeld RD, Krambeck H-J, Milinski M. 2010 Human
strategy updating in evolutionary games. Proc. Natl Acad. Sci. USA 107, 2962–2966.
(doi:10.1073/pnas.0912515107)
48. Zisis I, Di Guida S, Han TA, Kirchsteiger G, Lenaerts T. 2015 Generosity motivated
by acceptance—evolutionary analysis of an anticipation games. Sci. Rep. 5, 18076.
(doi:10.1038/srep18076)
49. Cimpeanu T, Han TA, Santos FC. 2019 Exogenous rewards for promoting cooperation in scale-
free networks. In Proc. of the 2018 Conf. on Artificial Life: A Hybrid of the European Conf. on
Artificial Life (ECAL) and the Int. Conf. on the Synthesis and Simulation of Living Systems (ALIFE),
pp. 316–323. Cambridge, MA: MIT Press.
50. Cimpeanu T, Perret C, Han TA. 2021 Promoting fair proposers, fair responders or both?
Cost-efficient interference in the spatial ultimatum game. In Proc. of the 20th Int. Conf.
on Autonomous Agents and MultiAgent Systems, London (Virtual Event), 3–7 May 2021,
pp. 1480–1482. International Foundation for Autonomous Agents and Multiagent Systems.
51. Han TA, Lynch S, Tran-Thanh L, Santos FC. 2018 Fostering cooperation in structured
populations through local and global interference strategies. In Proc. IJCAI-ECAI’2018,
Stockholm, Sweden, 13–19 July 2018, pp. 289–295. International Joint Conferences on Artificial
Intelligence.
52. Sasaki T, Brännström Å, Dieckmann U, Sigmund K. 2012 The take-it-or-leave-it option allows
small penalties to overcome social dilemmas. Proc. Natl Acad. Sci. USA 109, 1165–1169.
(doi:10.1073/pnas.1115219109)
53. Imhof LA, Fudenberg D, Nowak MA. 2005 Evolutionary cycles of cooperation and defection.
Proc. Natl Acad. Sci. USA 102, 10 797–10 800. (doi:10.1073/pnas.0502589102)
54. Nowak MA. 2006 Evolutionary dynamics: exploring the equations of life. Cambridge, MA:
Harvard University Press.
55. Nowak MA, Sasaki A, Taylor C, Fudenberg D. 2004 Emergence of cooperation and
evolutionary stability in finite populations. Nature 428, 646–650. (doi:10.1038/nature02414)
56. Hauert C, Traulsen A, Brandt H, Nowak MA, Sigmund K. 2007 Via freedom to coercion: the
emergence of costly punishment. Science 316, 1905–1907. (doi:10.1126/science.1141588)
57. Kemeny J, Snell J. 1976 Finite Markov chains. Undergraduate Texts in Mathematics. Berlin,
Germany: Springer.
58. Fudenberg D, Imhof LA. 2005 Imitation processes with small mutations. J. Econ. Theory 131,
251–262. (doi:10.1016/j.jet.2005.04.006)
59. Sigmund K, De Silva H, Traulsen A, Hauert C. 2010 Social learning promotes institutions for
governing the commons. Nature 466, 861–863. (doi:10.1038/nature09203)
16
royalsocietypublishing.org/journal/rspa Proc. R. Soc. A 477:20210568
..........................................................
60. Karlin S, Taylor HE. 1975 A first course in stochastic processes. New York, NY: Academic Press.
61. Baldassarri D, Grossman G. 2011 Centralized sanctioning and legitimate authority promote
cooperation in humans. Proc. Natl Acad. Sci. USA 108, 11 023–11 027. (doi:10.1073/
pnas.1105456108)
62. Dong Y, Sasaki T, Zhang B. 2019 The competitive advantage of institutional reward. Proc. R.
Soc. B 286, 20190001. (doi:10.1098/rspb.2019.0001)
63. Gürerk Ö, Irlenbusch B, Rockenbach B. 2006 The competitive advantage of sanctioning
institutions. Science 312, 108–111. (doi:10.1126/science.1123633)
64. Hilbe C, Traulsen A, Röhl T, Milinski M. 2014 Democratic decisions establish stable authorities
that overcome the paradox of second-order punishment. Proc. Natl Acad. Sci. USA 111,
752–756. (doi:10.1073/pnas.1315273111)
65. Santos FC, Pacheco JM. 2011 Risk of collective failure provides an escape from the tragedy of
the commons. Proc. Natl Acad. Sci. USA 108, 10421–10 425. (doi:10.1073/pnas.1015648108)
... However, cooperation is pervasive in nature, from bacteria to insects to human societies. Scholars have studied multiple mechanisms for promoting it, among which kin selection, direct and indirect reciprocity, network reciprocity [23,29,24,26,34,40,15,1], and, the subject of the work at hand, institutional incentives [28,30,36,7,3,31,34,11,10,31,19,9,35,16,12,20]. ...
... Evolutionary dynamics follow Fermi's rule, allowing us to capture the effect of the intensity of selection on strategy updates. Although many simulation-based studies have explored this question, rigorous analytical works remain limited [36,14,7,5,37,38]. This paper contributes to closing that gap by analysing the behaviour of reward, punishment, and hybrid incentive cost functions in both neutral drift and strong selection limits. ...
... Overview of contribution of this paper. Building on the discrete approach established in [14,7,5,6], we further investigate the problem of optimising the cost of institutional incentives -specifically reward, punishment, and hybrid schemes -to maximise cooperative behaviour (or ensure a minimum level of cooperation) in well-mixed, finite populations. We focus on a fullincentive scheme, where every player receives incentives in each generation. ...
Preprint
Prosocial behaviours, which appear to contradict Darwinian principles of individual payoff maximisation, have been extensively studied across multiple disciplines. Cooperation, requiring a personal cost for collective benefits, is widespread in nature and has been explained through mechanisms such as kin selection, direct and indirect reciprocity, and network reciprocity. Institutional incentives, which reward cooperation and punish anti-social behaviour, offer a promising approach to fostering cooperation in groups of self-interested individuals. This study investigates the behaviour of the cost functions associated with these types of interventions, using both analytical and numerical methods. Focusing on reward, punishment, and hybrid schemes, we analyse their associated cost functions under evolutionary dynamics governed by Fermi's rule, exploring their asymptotic behaviour in the limits of neutral drift and strong selection. Our analysis focuses on two game types: General 2×22\times2 Games (with particular attention paid to the cooperative and defective Prisoner's Dilemma) and the Collective Risk Game. In addition to deriving key analytical results, we use numerical simulations to study how parameters such as the intensity of selection affect the behaviour of the aforementioned incentive cost functions.
... In parallel, several other mechanisms have been considered to explain the evolution of cooperation 14,[29][30][31][32][33][34][35][36][37][38][39][40] . For example, the usage of incentives, like punishing defectors have been proved as an effective tool to promote cooperation 35,[41][42][43][44][45][46][47][48][49] . ...
Preprint
Public goods game serves as a valuable paradigm for studying the challenges of collective cooperation in human and natural societies. Peer punishment is often considered as an effective incentive for promoting cooperation in such contexts. However, previous related studies have mostly ignored the positive feedback effect of collective contributions on individual payoffs. In this work, we explore global and local state-feedback, where the multiplication factor is positively correlated with the frequency of contributors in the entire population or within the game group, respectively. By using replicator dynamics in an infinite well-mixed population we reveal that state-based feedback plays a crucial role in alleviating the cooperative dilemma by enhancing and sustaining cooperation compared to the feedback-free case. Moreover, when the feedback strength is sufficiently strong or the baseline multiplication factor is sufficiently high, the system with local state-feedback provides full cooperation, hence supporting the ``think globally, act locally'' principle. Besides, we show that the second-order free-rider problem can be partially mitigated under certain conditions when the state-feedback is employed. Importantly, these results remain robust with respect to variations in punishment cost and fine.
... 10,[24][25][26][27][28] In parallel, several other mechanisms have been considered to explain the evolution of cooperation. 14,[29][30][31][32][33][34][35][36][37][38][39][40] For example, the usage of incentives like punishing defectors have been proved as an effective tool to promote cooperation. 35,[41][42][43][44][45][46][47][48][49] In the public goods game with peer punishment, individuals can choose to act as prosocial punishers, who contribute not only to the common pool but also to penalize defectors. ...
Article
Full-text available
Public goods game serves as a valuable paradigm for studying the challenges of collective cooperation in human and natural societies. Peer punishment is often considered an effective incentive for promoting cooperation in such contexts. However, previous related studies have mostly ignored the positive feedback effect of collective contributions on individual payoffs. In this work, we explore global and local state-feedback, where the multiplication factor is positively correlated with the frequency of contributors in the entire population or within the game group, respectively. By using replicator dynamics in an infinite well-mixed population, we reveal that state-based feedback plays a crucial role in alleviating the cooperative dilemma by enhancing and sustaining cooperation compared to the feedback-free case. Moreover, when the feedback strength is sufficiently strong or the baseline multiplication factor is sufficiently high, the system with local state-feedback provides full cooperation, hence supporting the “think globally, act locally” principle. Besides, we show that the second-order free-rider problem can be partially mitigated under certain conditions when the state-feedback is employed. Importantly, these results remain robust with respect to variations in punishment cost and fine.
... PGGs are essential for understanding the mechanisms that sustain cooperation in human societies and biological populations. Several mechanisms supporting cooperation have been recognized such as direct and indirect reciprocity (cooperation can evolve when individuals interact repeatedly [HST + 18, XWPW23] or when reputation plays a role [NS98,XWPW23]), network and group structures (the spatial arrangement of individuals can promote cooperation by enabling clusters of cooperators to form and persist [SP05,SF07]) and institutional incentives (the introduction of costly punishment for defectors or rewards for contributors can sustain cooperation) [SHN01,HDP24,CSBD15,DH21b]. The latter mechanism is the focus of the present paper. ...
Preprint
Full-text available
Understanding the emergence and stability of cooperation in public goods games is important due to its applications in fields such as biology, economics, and social science. However, a gap remains in comprehending how mutations, both additive and multiplicative, as well as institutional incentives, influence these dynamics. In this paper, we study the replicator-mutator dynamics, with combined additive and multiplicative mutations, for public goods games both in the absence or presence of institutional incentives. For each model, we identify the possible number of (stable) equilibria, demonstrate their attainability, as well as analyse their stability properties. We also characterise the dependence of these equilibria on the model's parameters via bifurcation analysis and asymptotic behaviour. Our results offer rigorous and quantitative insights into the role of institutional incentives and the effect of combined additive and multiplicative mutations on the evolution of cooperation in the context of public goods games.
... As another example, [18] suggested to improve organizational knowledge-sharing culture by changing the performance climate. [30,31] investigated the cost-effectiveness of institutional incentives for promoting cooperation and proposed mathematical optimization models for reward and punishment mechanisms in the evolution of group cooperation. Building on this foundation, this paper further extends the application scenarios of incentive mechanisms by focusing on the design of dynamic incentive strategies in the process of organizational online knowledge sharing. ...
Article
Full-text available
Knowledge sharing is critical for an organization to acquire sustained competitive advantage. Bestowing monetary rewards may possibly the most direct method of stimulating online knowledge sharing. Under the monetary reward mechanism for promoting knowledge sharing, we intend to find a satisfactory knowledge-sharing promotion policy. First, based on a state evolutionary model for the knowledge-sharing community, we reduce the original problem to an optimal control model. Second, applying optimal control theory to the model, we give an algorithm for solving the model. Next, we validate the feasibility of the algorithm. Finally, we inspect the applicability of the algorithm. To our knowledge, this is the first time the optimal control modeling technique is applied to the research of knowledge sharing.
... However, the application of these incentive mechanisms often implies a high cost, especially as reward is typically more expensive than punishment, potentially jeopardizing the sustainability of such mechanisms. Furthermore, Zhou et al have emphasized the emergence of wealth disparities resulting from incentive mechanisms, indicating that these mechanisms only maximize total wealth when the enforcer's wealth closely aligns with that of others in the network [36][37][38]. Currently, it remains an open question to identify those potential factors that influence the effectiveness of incentives. ...
Article
Full-text available
The lack of cooperation can easily result in inequality among members of a society, which provides an increasing gap between individual incomes. To tackle this issue, we introduce an incentive mechanism based on individual strategies and incomes, wherein a portion of the income from defectors is allocated to reward low-income cooperators, aiming to enhance cooperation by improving the equitable distribution of wealth across the entire population. Moreover, previous research has typically employed network structures or game mechanisms characterized by homogeneity. In this study, we present a network framework that more accurately reflects real-world conditions, where agents are engaged in multiple games, including prisoner’s dilemma games in the top-layer and public good games in the down-layer networks. Within this framework, we introduce the concept of ‘external coupling’ which connects agents across different networks as acquaintances, thereby facilitating access to shared datasets. Our results indicate that the combined positive effects of external coupling and incentive mechanism lead to optimal cooperation rates and lower Gini coefficients, demonstrating a negative correlation between cooperation and inequality. From a micro-level perspective, this phenomenon primarily arises from the regular network, whereas suboptimal outcomes are observed within the scale-free network. These observations help to give a deeper insight into the interplay between cooperation and wealth disparity in evolutionary games in large populations.
... Over the years several cooperation supporting mechanisms were identified, including the "main five", as kin selection, direct reciprocity, indirect reciprocity, group selection and network reciprocity, summarized by Nowak in his seminal essay [17]. In addition, as a result of continuous development of evolutionary game theory, other positive aspects were also recognized such as memory effects [18][19][20], multi-gaming [21,22], reputation [23][24][25], and the usage of incentives, like rewards [26][27][28] or punishment [29][30][31][32]. Among all of these mechanisms, punishment is one of the most widely studied mechanisms [33][34][35][36][37]. ...
Preprint
It is a challenging task to reach global cooperation among self-interested agents, which often requires sophisticated design or usage of incentives. For example, we may apply supervisors or referees who are able to detect and punish selfishness. As a response, defectors may offer bribes for corrupt referees to remain hidden, hence generating a new conflict among supervisors. By using the interdependent network approach, we model the key element of the coevolution between strategy and judgment. In a game layer, agents play public goods game by using one of the two major strategies of a social dilemma. In a monitoring layer, supervisors follow the strategy change and may alter the income of competitors. Fair referees punish defectors while corrupt referees remain silent for a bribe. Importantly, there is a learning process not only among players but also among referees. Our results suggest that large fines and bribes boost the emergence of cooperation by significantly reducing the phase transition threshold between the pure defection state and the mixed solution where competing strategies coexist. Interestingly, the presence of bribes could be as harmful for defectors as the usage of harsh fines. The explanation of this system behavior is based on a strong correlation between cooperators and fair referees, which is cemented via overlapping clusters in both layers.
... In human society, climate change [10,11], corruption [12][13][14], and the spread of diseases [15] are huge challenges that remind us that global cooperation is necessary. Although cooperation is socially desirable and widespread in nature and human society, it often incurs individual own costs to bring benefits to other individuals, resulting in that cooperation is not favored by natural selection [16][17][18][19][20][21][22][23][24]. Therefore, it has always been a great challenge to explain how cooperative behaviour evolves [25]. ...
Preprint
Social exclusion has been regarded as one of the most effective measures to promote the evolution of cooperation. In real society, the way in which social exclusion works can be direct or indirect. However, thus far there is no related work to explore how indirect exclusion influences the evolution of cooperation from a theoretical perspective. Here, we introduce indirect exclusion into the repeated public goods game where the game organizer probabilistically selects cooperators after the first game round to participate in the following possible game interactions. We then investigate the evolutionary dynamics of cooperation both in infinite and finite well-mixed populations. Through theoretical analysis and numerical calculations, we find that the introduction of indirect exclusion can induce the stable coexistence of cooperators and defectors or the dominance of cooperators, which thus effectively promotes the evolution of cooperation. Besides, we show that the identifying probability of the organizer has a nonlinear effect on public cooperation when its value is lower than an intermediate value, while the higher identifying probability can maintain a high level of cooperation. Furthermore, our results show that increasing the average rounds of game interactions can effectively promote the evolution of cooperation.
Article
Two mechanisms that have been used to study the evolution of cooperative behavior are altruistic punishment, in which cooperative individuals pay additional costs to punish defection, and multilevel selection, in which competition between groups can help to counteract individual-level incentives to cheat. Boyd, Gintis, Bowles, and Richerson have used simulation models of cultural evolution to suggest that altruistic punishment and pairwise group-level competition can work in concert to promote cooperation, even when neither mechanism can do so on its own. In this paper, we formulate a PDE model for multilevel selection motivated by the approach of Boyd and coauthors, modeling individual-level birth-death competition with a replicator equation based on individual payoffs and describing group-level competition with pairwise conflicts based on differences in the average payoffs of the competing groups. Building off of existing PDE models for multilevel selection with frequency-independent group-level competition, we use analytical and numerical techniques to understand how the forms of individual and average payoffs can impact the long-time ability to sustain altruistic punishment in group-structured populations. We find several interesting differences between the behavior of our new PDE model with pairwise group-level competition and existing multilevel PDE models, including the observation that our new model can feature a non-monotonic dependence of the long-time collective payoff on the strength of altruistic punishment. Going forward, our PDE framework can serve as a way to connect and compare disparate approaches for understanding multilevel selection across the literature in evolutionary biology and anthropology.
Article
Full-text available
The field of Artificial Intelligence (AI) is going through a period of great expectations, introducing a certain level of anxiety in research, business and also policy. This anxiety is further energised by an AI race narrative that makes people believe they might be missing out. Whether real or not, a belief in this narrative may be detrimental as some stake-holders will feel obliged to cut corners on safety precautions, or ignore societal consequences just to “win”. Starting from a baseline model that describes a broad class of technology races where winners draw a significant benefit compared to others (such as AI advances, patent race, pharmaceutical technologies), we investigate here how positive (rewards) and negative (punishments) incentives may beneficially influence the outcomes. We uncover conditions in which punishment is either capable of reducing the development speed of unsafe participants or has the capacity to reduce innovation through over-regulation. Alternatively, we show that, in several scenarios, rewarding those that follow safety measures may increase the development speed while ensuring safe choices. Moreover, in the latter regimes, rewards do not suffer from the issue of over-regulation as is the case for punishment. Overall, our findings provide valuable insights into the nature and kinds of regulatory actions most suitable to improve safety compliance in the contexts of both smooth and sudden technological shifts.
Article
Full-text available
Social dilemmas are often shaped by actions involving uncertain returns only achievable in the future, such as climate action or voluntary vaccination. In this context, uncertainty may produce non-trivial effects. Here, we assess experimentally — through a collective risk dilemma — the effect of timing uncertainty, i.e. how uncertainty about when a target needs to be reached affects the participants' behaviors. We show that timing uncertainty prompts not only early generosity but also polarized outcomes, where participants' total contributions are distributed unevenly. Furthermore, analyzing participants' behavior under timing uncertainty reveals an increase in reciprocal strategies. A data-driven game-theoretical model captures the self-organizing dynamics underpinning these behavioral patterns. Timing uncertainty thus casts a shadow on the future that leads participants to respond early, whereas reciprocal strategies appear to be important for group success. Yet, the same uncertainty also leads to inequity and polarization, requiring the inclusion of new incentives handling these societal issues.
Article
Full-text available
Despite the accumulation of research on indirect reciprocity over the past 30 years and the publication of over 100,000 related papers, there are still many issues to be addressed. Here, we look back on the research that has been done on indirect reciprocity and identify the issues that have been resolved and the ones that remain to be resolved. This manuscript introduces indirect reciprocity in the context of the evolution of cooperation, basic models of social dilemma situations, the path taken in the elaboration of mathematical analysis using evolutionary game theory, the discovery of image scoring norms, and the breakthroughs brought about by the analysis of the evolutionary instability of the norms. Moreover, it presents key results obtained by refining the assessment function, resolving the punishment dilemma, and presenting a complete solution to the social dilemma problem. Finally, it discusses the application of indirect reciprocity in various disciplines.
Conference Paper
Full-text available
Indirect reciprocity is an important mechanism for promoting cooperation among self-interested agents. Simplified, it means you help me, therefore somebody else will help you (in contrast to direct reciprocity: you help me, therefore I will help you). Indirect reciprocity can be achieved via reputation and norms. Strategies relying on these principles can maintain high levels of cooperation and remain stable against invasion, even in the presence of errors. However, this is only the case if the reputation of an agent is modeled as a shared public opinion. If agents have private opinions and hence can disagree if somebody is good or bad, even rare errors can cause cooperation to break apart. This paper examines a novel approach to overcome this private information problem , where agents act in accordance to others' expectations of their behavior (i.e. pleasing them) instead of being guided by their own, private assessment. As such, a pleasing agent can achieve better reputations than previously considered strategies when there is disagreement in the population. Our analysis shows that pleasing significantly improves stability as well as cooperativeness. It is effective even if only the opinions of few other individuals are considered and when it bears additional costs.
Article
Full-text available
Mitigating climate change effects involves strategic decisions by individuals that may choose to limit their emissions at a cost. Everyone shares the ensuing benefits and thereby individuals can free ride on the effort of others, which may lead to the tragedy of the commons. For this reason, climate action can be conveniently formulated in terms of Public Goods Dilemmas often assuming that a minimum collective effort is required to ensure any benefit, and that decision-making may be contingent on the risk associated with future losses. Here we investigate the impact of reward and punishment in this type of collective endeavors — coined as collective-risk dilemmas — by means of a dynamic, evolutionary approach. We show that rewards (positive incentives) are essential to initiate cooperation, mostly when the perception of risk is low. On the other hand, we find that sanctions (negative incentives) are instrumental to maintain cooperation. Altogether, our results are gratifying, given the a-priori limitations of effectively implementing sanctions in international agreements. Finally, we show that whenever collective action is most challenging to succeed, the best results are obtained when both rewards and sanctions are synergistically combined into a single policy.
Article
Full-text available
The emergence and maintenance of punishment to protect the commons remains an open puzzle in social and biological sciences. Even in societies where pro-social punishing is common, some individuals seek to cheat the system if they see a chance to do so-and public goods are often maintained in spite of cheaters who do not contribute. We present a model accounting for all possible strategies in a public goods game with punishment. While most models of punishment restrict the set of possible behaviours, excluding seemingly paradoxical anti-social strategies from the start, we show that these strategies can play an important role in explaining large-scale cooperation as observed in human societies. We find that coordinated punishment can emerge from individual interactions, but the stability of the associated institutions is limited owing to anti-social and opportunistic behaviour. In particular, coordinated anti-social punishment can undermine cooperation if individuals cannot condition their behaviour on the existence of institutions that punish. Only when we allow for observability and conditional behaviours do anti-social strategies no longer threaten cooperation. This is due to a stable coexistence of a minority supporting pro-social institutions and those who only cooperate if such institutions are in place. This minority of supporters is enough to guarantee substantial cooperation under a wide range of conditions. Our findings resonate with the empirical observation that public goods are resilient to opportunistic cheaters in large groups of unrelated individuals. They also highlight the importance of letting evolution, and not modellers, decide which strategies matter.
Article
Prosocial incentive can promote cooperation, but providing incentive is costly. Institutions in human society may prefer to use an incentive strategy which is able to promote cooperation at a reasonable cost. However, thus far few works have explored the optimal institutional incentives which minimize related cost for the benefit of public cooperation. In this work, in combination with optimal control theory we thus formulate two optimal control problems to explore the optimal incentive strategies for institutional reward and punishment respectively. By using the approach of Hamilton–Jacobi–Bellman equation for well-mixed populations, we theoretically obtain the optimal positive and negative incentive strategies with the minimal cumulative cost respectively. Additionally, we provide numerical examples to verify that the obtained optimal incentives allow the dynamical system to reach the desired destination at the lowest cumulative cost in comparison with other given incentive strategies. Furthermore, we find that the optimal punishing strategy is a cheaper way for obtaining an expected cooperation level when it is compared with the optimal rewarding strategy.