Access to this full-text is provided by Springer Nature.
Content available from Scientific Reports
This content is subject to copyright. Terms and conditions apply.
1
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
Reward and punishment in climate
change dilemmas
António R. Góis1,2,3, Fernando P. Santos4,1,2, Jorge M. Pacheco5,6,2 & Francisco C. Santos
1,2,7*
Mitigating climate change eects involves strategic decisions by individuals that may choose to limit
their emissions at a cost. Everyone shares the ensuing benets and thereby individuals can free ride
on the eort of others, which may lead to the tragedy of the commons. For this reason, climate action
can be conveniently formulated in terms of Public Goods Dilemmas often assuming that a minimum
collective eort is required to ensure any benet, and that decision-making may be contingent on the
risk associated with future losses. Here we investigate the impact of reward and punishment in this type
of collective endeavors — coined as collective-risk dilemmas — by means of a dynamic, evolutionary
approach. We show that rewards (positive incentives) are essential to initiate cooperation, mostly
when the perception of risk is low. On the other hand, we nd that sanctions (negative incentives)
are instrumental to maintain cooperation. Altogether, our results are gratifying, given the a-priori
limitations of eectively implementing sanctions in international agreements. Finally, we show that
whenever collective action is most challenging to succeed, the best results are obtained when both
rewards and sanctions are synergistically combined into a single policy.
Climate change stands as one of our biggest challenges in what concerns the emergence and sustainability of
cooperation1,2. Indeed, world citizens build up high expectations every time a new International Environmental
Summit is settled, unfortunately with few resulting solutions implemented so far. is calls for the development of
more eective incentives, agreements and binding mechanisms. e problem can be conveniently framed resort-
ing to the mathematics of game theory, being a paradigmatic example of a Public Goods Game3: at stake there is
a global good from which every single individual can prot, irrespectively of contributing to maintain it. Parties
may free ride on the eorts of others, avoiding any eort themselves, while driving the population into the trag-
edy of the commons4. Moreover, since here cooperation aims at averting collective losses, this type of dilemmas
is oen referred as public bad games, in which achieving collective goals oen depends on reaching a threshold
number of cooperative group members5–8.
One of the multiple obstacles attributed to such agreements is misperceiving the actual risk of future losses,
which signicantly aects the ensuing dynamics of cooperation5,9. Another problem relates to both the incapac-
ity to sanction those who do not contribute to the welfare of the planet, and/or to reward those who subscribe
to green policies10. Previous cooperation studies show that reward (positive incentives), punishment (nega-
tive incentives) and the combination of both11–23 have a dierent impact depending on the dilemma in place.
Assessing the impact of reward and punishment (isolated or combined) in the context of N-person threshold
games — and in the particular case of climate change dilemmas — remains, however, an open problem.
Here we study, theoretically, the role of both institutional reward and punishment in the context of climate
change agreements. Previous works consider the public good as a linear function of the number of contribu-
tors12,17,21,22 and conclude that punishment is more eective than reward (for an optimal combination of pun-
ishment and reward see ref.12). We depart from this linear regime by modeling the returns on the public good
as a threshold problem, combined with an uncertain outcome, represented by a risk of failure. As a result – and
as detailed below – the dynamical portrait of our model reveals new internal equilibria9, allowing to identify the
dynamics of coordination and coexistence typifying collective action problems. As discussed below, the reward
and punishment mechanisms will impact, in a non-trivial way, those equilibria.
1INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, IST-Taguspark, 2744-016, Porto, Salvo, Portugal.
2ATP-group, P-2744-016, Porto, Salvo, Portugal. 3Unbabel, R. Visc. de Santarém 67B, 1000-286, Lisboa, Portugal.
4Department of Ecology and Evolutionary Biology, Princeton University, Princeton, USA. 5Centro de Biologia
Molecular e Ambiental, Universidade do Minho, 4710 - 057, Braga, Portugal. 6Departamento de Matemática e
Aplicações, Universidade do Minho, 4710 - 057, Braga, Portugal. 7Machine Learning Group, Université Libre de
Bruxelles, Boulevard du Triomphe CP212, 1050, Bruxelles, Belgium. *email: franciscocsantos@tecnico.ulisboa.pt
OPEN
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
We consider a population of size Z, where each individual can be either a Cooperator (C) or a Defector (D),
when participating in a N-player Collective-Risk dilemma (CRD)5,9,10,24–30. In this game, each participant starts
with an initial endowment B (viewed as the asset value at stake) that may be used to contribute to the mitigation
of the eects of climate change. A cooperator incurs a cost corresponding to a fraction c of her initial endowment
B, in order to help prevent a collective failure. On the other hand, a defector refuses to have any cost, hoping to
free ride on the contributions of others. We require a minimum number of 0 < M ≤ N cooperators in a group of
size N before collective action is realized; if a group of size
N
does not contain at least M Cs, all members lose their
remaining endowments with a probability r, where r (0 ≤ r ≤ 1) stands as the risk of collective failure. Otherwise,
everyone will keep whatever she has. is CRD formulation has been shown to capture some of the key features
discovered in recent experiments5,24,31–33, while highlighting the importance of risk. In addition, it allows one to
test model parameters in a systematic way that is not possible in human experiments. Moreover, the adoption of
non-linear returns mimics situations common to many human and non-human endeavors6,34–41, where a mini-
mum joint eort is required to achieve a collective goal. us, the applicability of this framework extends well
beyond environmental governance, given the ubiquity of such type of social dilemmas in nature and societies.
Following Chen et al.12, we include both reward and punishment mechanisms in this model. A xed group
budget Nδ (where δ ≥ 0 stands for a per-capita incentive) is assumed to be available, of which a fraction w is
applied to a reward policy and the remaining 1-w to a punishment policy. We assume the eective impact of both
policies to be equivalent, meaning that each unit spent will directly increase/decrease the payo of a cooperator/
defector by the same amount. For details on policies with dierent eciencies, see Methods.
Instead of considering a collection of rational agents engaging in one-shot Public Goods Games32,42, here we
adopt an evolutionary description of the behavioral dynamics9, in which individuals tend to copy those appearing
to be more successful. Success (or tness) of individuals is here associated with their average payo. All individu-
als are equally likely to interact with each other, causing all cooperators and defectors to be equivalent, on average,
and only distinguishable by the strategy they adopt. erefore, and considering that only two strategies are avail-
able, the number of cooperators is sucient to describe any conguration of the population. e number of indi-
viduals adopting a given strategy (either C or D) evolves in time according to a stochastic birth–death process43,44,
which describes the time evolution of the social learning dynamics (with exploration): At each time-step each
individual (X, with tness fX) is given the opportunity to change strategy; with probability μ, X randomly explores
the strategy space45 (a process similar to mutations in a biological context that precludes the existence of absorb-
ing states). With probability (1-μ), X may adopt the strategy of a randomly selected individual (Y, with tness
fY), with a probability that increases with the tness dierence (fY–fX)44. is renders the stationary distribution
(see Methods) an extremely useful tool to rank the most visited states given the ensuing evolutionary dynamics
of the population. Indeed, the stationary distribution provides the prevalence of each of the population’s possible
conguration, in terms of the number of Cs (k) and Ds (Z-k). Combined with the probability of success charac-
terizing each conguration, the stationary distribution can be used to compute the overall success probability of
a given population – the average group achievement, ηG. is value represents the average fraction of groups that
will overcome the CRD, successfully preserving the public good.
Results
In Fig.1 we compare the average group achievement ηG (as a function of risk) in four scenarios: (i) a reference
scenario without any policy (i.e., no reward or punishment, in black); and three scenarios where a budget is
applied to (ii) rewards, (iii) punishment and (iv) a combination of rewards and sanctions (see below). Our results
are shown for the two most paradigmatic regimes: low (Fig.1A) and high (Fig.1B) coordination requirements.
Naturally ηG improves whenever a policy is applied. Less obvious is the dierence between the various policies.
Applying only rewards (blue curves in Fig.1) is more eective than only punishment (red curve) for low values of
risk. e opposite happens when risk is high. On scenarios with a low relative threshold (Fig.1A), rewards play
the key role, with sanctions only marginally outperforming them for very high values of risk. For high coordina-
tion thresholds (Fig.1B) reward and punishment portray comparable eciency in the promotion of cooperation,
with pure-Punishment (w = 0) performing slightly better than pure-Reward (w = 1).
Justifying these dierences is dicult from the analysis of ηG alone. To better understand the behavior dynam-
ics under Reward and Punishment, we show in Fig.2 the gradients of selection (top panels) and stationary distri-
butions (lower panels) for each case and dierent budget values. Each gradient of selection represents, for each
discrete state k/Z (i.e., fraction of Cs), the dierence
=−
+−
GkTkTk() () ()
among the probability to increase
(T+(k)) and decrease (T−(k)) the number of cooperators (see Methods) by one. Whenever G(k) > 0 the fraction
of Cs is likely to increase; whenever G(k) < 0 the opposite is expected to happen. e stationary distributions
show how likely it is to nd the population in each (discrete) conguration of our system. e panels on the
le-hand side show the results obtained for the CRD under pure-Reward; on the right-hand side, we show the
results obtained for pure-Punishment.
Naturally, both mechanisms are inoperative whenever the per-capita incentives are inexistent (δ = 0), creat-
ing a natural reference scenario in which to study the impact of Reward and Punishment on the CRD. In this
case, above a certain value of risk (r), decision-making is characterized by two internal equilibria (i.e., adjacent
nite population states with opposite gradient sign, representing the analogue of xed points in a dynamical
system characterizing evolution in innite populations). Above a certain fraction of cooperators the population
overcomes the coordination barrier and naturally self-organizes towards a stable co-existence of cooperators
and defectors. Otherwise, the population is condemned to evolve towards a monomorphic population of defec-
tors, leading to the tragedy of the commons9. As the budget for incentives increases, using either Reward or
Punishment leads to very dierent outcomes, as depicted in Fig.2.
Contrary to the case of linear Public Goods Games12, in the CRD coordination and co-existence dynam-
ics already exist in the absence of any reward/punishment incentive. Reward is particularly effective when
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
cooperation is low (small k/Z), showing a signicant impact on the location of the nite population analogue of
an unstable xed point. Indeed, increasing δ lowers the minimum number of cooperators required to reach the
cooperative basin of attraction (as well as increasing the prevalence of cooperators in co-existence point on the
right), which ultimately disappears for high δ (Fig.2A). is means that a smaller coordination eort is required
before the population dynamics start to naturally favor the increase of cooperators. Once this initial barrier is
surpassed, the population will naturally tend towards an equilibrium state, which does not improve appreciably
under Reward. e opposite happens under Punishment. e location of the coordination point is little aected,
yet once this barrier is overcome, the population will evolve towards a more favorable equilibrium (Fig.2B). us,
while Reward seems to be particularly eective to bootstrap cooperation towards a more cooperative basin of
attraction, Punishment seems eective in sustaining high levels of cooperation.
As a consequence, the most frequently observed congurations are very dierent when using each of the
policies. As shown by the stationary distributions (Fig.2C,D), under Reward the population visits more oen
states with intermediate values of cooperation (i.e., where Cs and Ds co-exist). Intuitively, this happens because
the coordination eort is eased by the rewards, causing the population to eectively overcome it and reach the
coexistence point (the equilibrium state with an intermediate amount of cooperators) thus spending most of the
time near it. On the other hand, Punishment will not ease the coordination eort, and thus the population will
spend most of the time in states of low cooperation, failing to overcome this barrier. Notwithstanding, once sur-
passed, the population will stabilize on higher states of cooperation. is is especially evident for high budgets,
as shown with δ = 0.02 (blue line). Moreover, since Nδ corresponds to a xed total amount which is distributed
by the existing cooperators/defectors, this causes the per-cooperator/defector budget to vary depending on the
number of existing cooperators/defectors (i.e., each of the j cooperators receives wδN/j and each defector loses
(1 − w)δN/(N − j)). In other words, positive (negative) incentives become very protable (or severe) if defection
(cooperation) prevails within a group. In particular, whenever the budget is signicant (see, e.g., δ = 0.02 in Fig.2)
the punishment becomes so high when there are few defectors within a group, that a new equilibrium emerges
close to full cooperation.
e results in Fig.2 show that Reward can be instrumental in fostering pro-social behavior, while Punishment
can be used for its maintenance. is suggests that, to combine both policies synergistically, pure-Reward (w = 1)
should be applied at rst, when there are few cooperators (low k/Z); above a certain critical point (k/Z = s) one
should switch to pure-Punishment (w = 0). In the Methods section, we demonstrate that, similar to linear Public
Goods Games12, in CRDs this is indeed the policy which minimizes the advantage of the defector, even if we con-
sider the alternative possibility of applying both policies simultaneously. In Methods, we also compute a general
expression for the optimal switching point s*, that is, the value of k above which Punishment should be applied
instead ofReward to maximize cooperation and group achievement. By using such policy — that we denote by s*
— we obtain the best results shown with an orange line in Fig.1. We propose, however, to explore what happens
in the context of a CRD when s* is not used. How much cooperation is lost when we deviate from s* to either of
the pure policies, or to a policy which uses a switching point dierent from the optimal one?
Figure 1. Average group achievement ηG as a function of risk. Le: Group relative threshold M/N = 3/10. Right:
Group relative threshold M/N = 7/10. In both panels, the black line corresponds to a reference scenario where
no policy is applied. e red line shows ηG in the case where all available budget is applied to pure-Punishment
(w = 0), whereas the blue line shows results for pure-Reward (w = 1). Pure-Reward is most eective at low risk
values, while pure-Punishment is marginally the most eective policy at high risk. ese features are more
pronounced for low relative thresholds (le panel), and only at high thresholds does pure-Punishment lead
to a sizeable improvement with respect to pure-Reward. Finally, the orange line shows the results using the
combination of Reward and Punishment, leading (naturally) to the best results. In this case, we adopt pure-
Reward (w = 1) when there are few cooperators and, above a certain critical point k/Z = s = 0.5, we switch to
pure-Punishment (w = 0). As detailed in the main text (see Fig.3 and Methods), s = 0.5 provides the optimal
switching point s* for cooperation to thrive. Other parameters: Population size Z = 50, group size N = 10, cost
of cooperation c = 0.1, initial endowment B = 1, budget δ = 0.025, reward eciency a = 1, punishment eciency
b = 1, intensity of selection β = 5, mutation rate µ = 0.01.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
Figure3 illustrates how the choice of the switching point s impacts the overall cooperation, as evaluated by
ηG, for dierent values of risk. For a switching point of s = k/Z = 1.0 (0.0) a static policy of always pure-Reward
(pure-Punishment) is used. is can be seen on the far right (le) of Fig.3. Figure3 suggests that, for low thresh-
olds, an optimal policy switching (which, for the parameters shown, occurs for s = 50%, see Methods) is only mar-
ginally better than a policy solely based on rewards (s = 1). Figure3 also allows for a comparison of what happens
when the switching point occurs too late (excessive rewards) or too early (excessive sanctions) in a low-threshold
scenario. A late switch is signicantly less harmful than an early one. In other words, our results suggest that
when the population conguration cannot be precisely observed, it is preferable to keep rewarding for longer.
is said, whenever the perception of risk is high (an unlikely situation these days) an early switch is slightly less
harmful than a late one. In the most dicult scenarios, where stringent coordination requirements (large M) are
combined with a low perception of risk (low r), the adoption of a combined policy becomes necessary (see right
panel of Fig.1).
Discussion
One might expect the impact of Reward and Punishment to lead to symmetric outcomes – Punishment would
be eective for high-cooperation the same way that Reward is eective for low-cooperation. In low-cooperation
scenarios (under low risk, threshold or budget) Reward alone plays the most important role. However, in the
opposite scenario, Punishment alone does not have the same impact. Either a favourable scenario occurs, where
any policy yields a satisfying result, or Punishment cannot improve outcomes on its own. In the latter case, the
synergy between both policies becomes essential to achieve cooperation. Such optimal policy involves a combi-
nation of the single policies, Reward and Punishment, which is dynamic, in the sense that the combination does
not remain the same for all congurations of the population. It corresponds to employing pure Reward at rst,
when cooperation is low, switching subsequently to Punishment whenever a pre-determined level of cooperation
is reached.
Figure 2. Gradient of selection (top panels, A and B) and stationary distribution (bottom panels, C and D)
for the dierent values of per-capita budget δ indicated, using either pure-Reward (w = 1, le panels) or pure-
Punishment (w = 0, right panels). e black curve is equal on the le and right panels, since in this case δ = 0.
As δ increases, the behaviour under Reward and Punishment is qualitatively similar, by displacing the (unstable)
coordination equilibrium towards lower values of k/Z, while displacing the (stable) coexistence equilibrium
towards higher values of k/Z. is happens, however, only for low values of δ. Indeed, by further increasing
δ one observes very dierent behaviours under Reward and Punishment: Whereas under Punishment the
equilibria are further moved apart (in accord with what happened for low δ) under Reward the coordination
equilibrium disappears, and the overall dynamics becomes characterized by a single coexistence equilibrium
which consistently shis towards higher values of k/Z with increasing δ. is dierence in behaviour, in turn,
has a dramatic impact in the overall prevalence of congurations achieved by the population dynamics, as
shown by the stationary distributions: On panel C (pure-Reward) the population spends most of the time on
intermediate states of cooperation. On panel D (pure-Punishment) the population spends most of the time on
both extremes (high and low cooperation) but especially on low cooperation states. Other parameters: Z = 50,
N = 10, M = 5, c = 0.1, B = 1, r = 0.5, a = b = 1, β = 5 and µ = 0.01 (see Methods for details).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
e optimal procedure, however, is unlikely to be realistic in the context of Climate Change agreements.
Indeed, and unlike other Public Goods Dilemmas, where Reward and Punishment constitute the main policies
available for Institutions to foster cooperative collective action, in International Agreements it is widely recog-
nized that Punishment is very dicult to implement2,42. is has been, in fact, one of the main criticisms put
forward in connection with Global Agreements on Climate Mitigation: ey suer from the lack ofsanctioning
mechanisms as it is practically impossible to enforce any type of sanctioning at a Global level. In this sense, the
results obtained here by means of our dynamical, evolutionary approach, are gratifying, given these a-priori limi-
tations of sanctioning in CRDs. Not only do we show that Reward is essential to foster cooperation, mostly when
both the perception of risk is low and the overall number of engaged parties is small (low k/Z), but also we show
that Punishment mostly acts to sustain cooperation, aer it has been installed. Given that low-risk scenarios are
more common and harmful to cooperation than high-risk ones, our results in connection with rewards provide
a viable way to explore in the quest for establishing Global cooperative collective action. Reward policies may
also be very relevant in scenarios where Climate Agreements are coupled with other International agreements
from which parties are not interested to deviate from2,42. Finally, the fact that rewards ease coordination towards
cooperative states suggests that positive incentives should also be used within intervention mechanisms aiming at
fostering pro-sociality in articial systems and hybrid populations comprising humans and machines46–49.
e model used takes for granted the existence of an institution with a budget available to implement either
Reward or Punishment. New behaviours may emerge once individuals are called to decide whether or not to
contribute to such an institution, allowing for a scenario where this institution fails to exist10,28,50,51. At present,
and under the Paris agreement, we are witnessing the potential birth of an informal funding institution, whose
goal is to nance developing countries to help them increase their mitigation capacity. Clearly, this is just an
example pointing out to the fact that the prevalence of local and global institutional incentives may depend and
may be inuenced by the distribution of wealth available among parties, in the same way that it inuences the
actual contributions to the public good10,29,33. Finally, several other eects may further inuence and/or aect
the present results. Among others, if intermediate tasks are considered33, or if individuals have the opportunity
to pledge their contribution before their actual action7,40,52, it is likely that pro-social behavior may be enhanced.
Work along these lines is in progress.
Methods
Public goods and collective risks. Let us consider a population with Z individuals, where each individual
can be a cooperator (C) or a defector (D). For each round of this game, a group of N players is sampled from the
original nite population of size Z, which corresponds to a process of sampling without replacement. e proba-
bility of a group comprising any possible combination of Cs and Ds is given by the hypergeometric distribution.
In the context of a given group, a strategy is associated with a payo value corresponding to an individual’s earn-
ings in that round, which depend on the action of the rest of group. Fitness is the expected payo of an individual
in a population, before knowing to which group he was assigned. is way, for a population with k out of Z Cs and
each group containing j out of N Cs, the tness of a D and a C can be written as:
Figure 3. Average group achievement ηG as a function of the location of the switching point s. e switching
point s corresponds to the conguration (fraction of Cs in the population, k/Z) above which w suddenly
switches from pure-Reward (w = 1) to pure-Punishment (w = 0). Assuming both policies are equally ecient,
the optimal switching point occurs at 50% of cooperators (k/Z = 0.5). e far-le values of s correspond
to a static policy of always pure-Punishment – the switch from pure-Reward to pure-Punishment occurs
immediately at 0% of cooperators. On the far-right (switching point = 100%) a pure-Reward policy is depicted.
We can also see what happens when the switch occurs too late or too early, for dierent values of risk. For low
values of risk, it is signicantly less harmful to have a late switch from Reward to Punishment than an early one,
meaning that when the population conguration cannot be precisely observed, it is preferable to keep rewarding
for longer. See Methods for the calculation of the optimal switching point (s*) that maximizes cooperation
tness relative to defection – and consequent group achievement. Other parameters: Z = 50, B = 1, µ = 0.01,
β = 5, N = 10, M = 3, c = 0.1, and δ = 0.025.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
∑
=−
−
−−
−−
Π
−
=
−
()
fZ
Nk
jZk
Nj j
1
11
1()
(1)
Dj
N
1
0
1
D
∑
=−
−
−
−
−−
Π+
−
=
−
()
fZ
NkjZk
Nj j
1
111(1)
(2)
Cj
N
1
0
1
C
where
Πj()
C
and
Πj()
D
stand for the payo or a C and a D in a single round, in a group with N players and j Cs. To
dene the payo functions, let
θx()
be a Heaviside step-function distribution, where θ(x) = 0 if x < 0 and θ(x) = 1
if x ≥ 0. Each player can contribute with a fraction c of her endowment B (with 0 ≤ c ≤ 1), and in case a group
contains less than M cooperators (0 < M ≤ N) there is a risk r of failure (0 ≤ r ≤ 1), in which case no player obtains
her remaining endowment. e payo of a defector (
Πj()
D
) and the payo of a cooperator (
Πj()
C
), before incor-
porating any policy, can be written as9:
θθΠ= −+−−−jBjM rjM() {( )(1)[1 ()]} (3)
D
Π=Π−jjcB() () (4)
CD
Reward and punishment. To include a Reward or a Punishment policy, let us follow ref.12 and consider
a group budget Nδ which can be used to implement any type of policy. e fraction of Nδ applied to Reward
is represented by the weight w, with 0 ≤ w ≤ 1. Parameters a and b correspond to the eciency of Reward and
Punishment (for all Figures above it was assumed that a = b = 1).
δ
Π=Π−
−
−
jj
bwN
Nj
() ()
(1 )
(5)
D
PD
δ
Π=Π+jj
awN
j
() ()
(6)
C
RC
Naturally, these new payo functions can be included into the previous tness functions (
ΠD
P
replaces
ΠD
and
ΠC
R
replaces
ΠC
), letting tness values account for the dierent policies.
Evolutionary dynamics in nite populations. e tness functions written above allow us to setup the
(discrete time) evolutionary dynamics. Indeed, the congurations of the entire population may be used to dene
a Markov Chain, where each state is characterized by number of cooperators9,44. To decide in which direction
the system will evolve, at each step a player i and a neighbour j of her are drawn at random from the population.
Player i decides whether to imitate her neighbour j with a probability depending on the dierence between their
tness43,44. is way, a system with k cooperators may stay in the same state, switch to k − 1 or to k + 1. e prob-
ability of player i imitating player j can be given by the Fermi function:
≡+
β−−
−
()
pk e() [1 ]
(7)
ji ff
,1
ji
where β is the intensity of selection. Using this probability distribution, we can fully characterize this Markov
process. Let k be the total number of cooperators in the population and Z the total size of the population.
+
T k()
and
−
T k()
are the probabilities to increase and decrease k by one, respectively44:
=
−
+β±−
−
Tk
k
Z
Zk
Z
e() [1 ]
(8)
fk fk[()()
1
CD
e most likely direction can be computed using the dierence
≡−
+−
GkTkTk() () ()
. A mutation rate can be
introduced by using transition probabilities
μμ
=− +
μ
++
−
T
kTk() (1 )()
Zk
Z
and
μμ
=− +
μ
−−
TkT
k() (1 )()
k
Z
.
In all cases we used amutation rate μ = 0.01, this way avoiding the population to xate in a monomorphic congu-
ration. In this context, the stationary distribution becomes a very useful tool to analyse the overall population
dynamics, providing the probability =
p
P
()
kk
Z
for each of the Z + 1 states of this Markov Chain to be occupied53,54.
For each given population state k, the hypergeometric distribution can be used to compute the average fraction of
groups that obtain success −aG(k). Using the stationary distribution and the average group success, the average
group achievement (ηG) can then be computed, providing the overall probability of achieving success:
η=
∑
=pa k()
Gk
ZkG
0
.
Combined policies. By allowing the weight w to depend on the frequency of cooperators, we can derive
the optimal switching point s* between positive and negative incentives by minimizing the defector’s advantage
(fD − fC). is is done similarly to ref.12, but using nite populations and therefore a hypergeometric distribution
(see Eqs (1), (2), (5), and (6)), to account for sampling without replacement. From Eqs (1) and (2), we get
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
∑
∑
δ
δ
=
−−
−−
−
−
Π− −
−
=
−
−− −
−−
−
−
Π+−+ +
=
−
=
−
()
()
f
k
j
Zk
Nj
Z
N
jbwN
Nj
f
kjZk
Nj
Z
N
jc
awN
j
1
1
1
1
() (1 )
11( 1)
1
1
1
(( 1) 1)
Dj
N
D
Cj
N
C
0
1
0
1
from which we aim at nding the value of w (with respect to k) that minimizes F′ = fD − fC. Since
Πj()
D
,
Π+j(1)
C
and c do not depend on w, these quantities do not aect the choice of the optimal w, leaving us with the problem
of minimizing the following expression:
∑∑
δδ=−
−−
−−
−
−
−
−
−
−
−
−−
−
−
+
′
=
−
=
−
() ()
FN
k
j
Zk
Nj
Z
N
bw
Nj N
k
j
Zk
Nj
Z
N
aw
j
1
1
1
1
(1 )
1
1
1
1
1
j
N
j
N
0
1
0
1
Since
=
−
−−
−−
=
−
−−
−
−− −−
−
k
j
k
j
Zk
Nj
Zk
Nj
1
and
1
11
,
k
kj
Zk
Nj
Zk
(1)
∑
∑
∑
δ
δ
δ
′=−
−
−
−−
−
−
++−
−−
−− ++
−
=−
−
−
−−
−
−
+−−−
−− ++
−
−
−
−
−−
−
−−−
−− ++
−
=
−
=
−
=
−
()
()
()
FN
kj
Zk
Nj
Z
N
aw
j
bw
Nj
k
kj
ZkNj
Zk
N
kjZk
Nj
Z
N
wa
j
b
Nj
k
kj
ZkNj
Zk
N
kjZk
Nj
Z
N
b
Nj
k
kj
ZkNj
Zk
11
1
11
(1 )1
11
1
11
1
11
1
1
1
j
N
j
N
j
N
0
1
0
1
0
1
e second summation does not depend on w; thus the optimal policy is given by the minimization of:
∑
δ″= −
−
−
−−
−
−
+−−−
−− ++
−
=
−
()
FN
kjZk
Nj
Z
N
wa
j
b
Nj
k
kj
ZkNj
Zk
11
1
11
1
j
N
0
1
Since N and δ are always positive, the whole expression can be divided by Nδ without changing the optimiza-
tion problem. Moreover, by multiplying the expression by (−1), it can nally be shown that minimizing fD − fC is
equivalent to maximizing the following expression:
Figure 4. Optimal switching point s* as a function of the ratio a/b, for dierent values of N (see Methods).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
∑
−
−
−−
−
−
+−−−
−− ++
−
=
−
()
w
kj
Zk
Nj
Z
N
a
j
b
Nj
k
kj
ZkNj
Zk
11
1
11
1
j
N
0
1
where j represents the number of Cs in a group of size N, sampled without replacement from a population of size
Z containing k Cs. Now, let us consider that the optimal switching point s* depends on k. Since this sum decreases
as k increases, containing only one root, the solution to this optimization problem corresponds to having w set to
1 (pure Reward) for positive values of the sum, suddenly switching to w = 0 (pure Punishment) once the sum
becomes negative. e optimal switching point s* depends on the ratio
a
b
, group size N and population size Z. e
eect of population size (Z) and group size (N) on s* is limited, while the impact of the eciency of reward (a)
and punishment (b) is illustrated in Fig.4. For
=1
a
b
the switching point is s* = 0.5 (see Fig.4). Interestingly, we
note that, also in the CRD, s* is not impacted by the group success threshold (M) or the risk associated with los-
ing the retained endowment when collective success is not attained (r). is is the case as we assume that the
decision to punish or reward is independent on M or r. Notwithstanding, the model that we present can, in the
future, be tuned to test more sophisticated incentive tools, such as rewarding or punishing depending on (i) how
far group contributions remained from (or surpassed) the minima to achieve group success or (ii) how so/strict
is the dilemma at stake, given the likelihood of losing everything when collective success is not accomplished.
Received: 20 May 2019; Accepted: 15 September 2019;
Published: xx xx xxxx
References
1. Barrett, S. Self-enforcing international environmental agreements. Oxford Economic Papers 46, 878–894 (1994).
2. Barrett, S. Why cooperate?: the incentive to supply global public goods. (Oxford UP, 2007).
3. Dreber, A. & Nowa, M. A. Gambling for Global Goods. Proc Natl Acad Sci USA 105, 2261–2262 (2008).
4. Hardin, G. e Tragedy of the Commons. Science 162, 1243 (1968).
5. Milinsi, M., Sommerfeld, . D., rambec, H. J., eed, F. A. & Marotze, J. e collective-ris social dilemma and the prevention
of simulated dangerous climate change. Proc Natl Acad Sci USA 105, 2291–2294 (2008).
6. Pacheco, J. M., Santos, F. C., Souza, M. O. & Syrms, B. Evolutionary dynamics of collective action in N-person stag hunt dilemmas.
Proc Soc Lond B 276, 315–321 (2009).
7. Tavoni, A., Dannenberg, A., allis, G. & Löschel, A. Inequality, communication and the avoidance of disastrous climate change in a
public goods game. Proc Natl Acad Sci USA 108, 11825–11829 (2011).
8. Bosetti, V., Heugues, M. & Tavoni, A. Luring others into climate action: coalition formation games with threshold and spillover
eects. Oxford Economic Papers 69, 410–431 (2017).
9. Santos, F. C. & Pacheco, J. M. is of collective failure provides an escape from the tragedy of the commons. Proc Natl Acad Sci USA
108, 10421–10425 (2011).
10. Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. A bottom-up institutional approach to cooperative governance of risy commons.
Nat. Clim. Change 3, 797–801 (2013).
11. Sigmund, ., Hauert, C. & Nowa, M. A. eward and punishment. Proc. Natl. Acad. Sci. USA 98, 10757–10762 (2001).
12. Chen, X., Sasai, T., Brännström, Å. & Diecmann, U. First carrot, then stic: how the adaptive hybridization of incentives promotes
cooperation. Journal of e oyal Society Interface 12, 20140935 (2015).
13. Hilbe, C. & Sigmund, . Incentives and opportunism: from the carrot to the stic. Proceedings of the oyal Society of London B:
Biological Sciences 277, 2427–2433 (2010).
14. Gneezy, A. & Fessler, D. M. Conict, stics and carrots: war increases prosocial punishments and rewards. Proceedings of the oyal
Society of London B: Biological Sciences, rspb20110805 (2011).
15. Sasai, T. & Uchida, S. ewards and the evolution of cooperation in public good games. Biology letters 10, 20130903 (2014).
16. Fehr, E. & Gächter, S. Altruistic punishment in humans. Nature 415, 137–140 (2002).
17. Sigmund, . Punish or perish? etaliation and collaboration among humans. Trends in ecology & evolution 22, 593–600 (2007).
18. Masclet, D., Noussair, C., Tucer, S. & Villeval, M.-C. Monetary and nonmonetary punishment in the voluntary contributions
mechanism. Am. Econ. ev. 93, 366–380 (2003).
19. Charness, G. & Haruvy, E. Altruism, equity, and reciprocity in a gi-exchange experiment: an encompassing approach. Games and
Economic Behavior 40, 203–231 (2002).
20. Andreoni, J., Harbaugh, W. & Vesterlund, L. e carrot or the stic: ewards, punishments, and cooperation. e American economic
review 93, 893–902 (2003).
21. Szolnoi, A. & Perc, M. eward and cooperation in the spatial public goods game. EPL (Europhysics Letters) 92, 38003 (2010).
22. Perc, M. et al. Statistical physics of human cooperation. Physics eports 687, 1–51 (2017).
23. Fang, Y., B eno, T. P., Perc, M., Xu, H. & Tan, Q. Synergistic third-party rewarding and punishment in the public goods game. Proc.
oy. Soc. A 475, 20190349 (2019).
24. Milinsi, M., Semmann, D., rambec, H. J. & Marotze, J. Stabilizing the Earth’s climate is not a losing game: Supporting evidence
from public goods experiments. Proc Natl Acad Sci USA 103, 3994–3998 (2006).
25. Chen, X., Szolnoi, A. & Perc, M. Averting group failures in collective-ris social dilemmas. EPL (Europhysics Letters) 99, 68003
(2012).
26. Chara, M. A. & Traulsen, A. Evolutionary dynamics of strategic behavior in a collective-ris dilemma. PLoS Comput Biol 8,
e1002652 (2012).
27. Chen, X., Szolnoi, A. & Perc, M. is-driven migration and the collective-ris social dilemma. Physical eview E 86, 036101 (2012).
28. Pacheco, J. M., Vasconcelos, V. V. & Santos, F. C. Climate change governance, cooperation and self-organization. Phys Life ev 11,
595–597 (2014).
29. Vasconcelos, V. V., Santos, F. C., Pacheco, J. M. & Levin, S. A. Climate policies under wealth inequality. Proc Natl Acad Sci USA 111,
2212–2216 (2014).
30. Hilbe, C., Chara, M. A., Altroc, P. M. & Traulsen, A. e evolution of strategic timing in collective-ris dilemmas. PloS ONE 8,
e66490 (2013).
31. Barrett, S. Avoiding disastrous climate change is possible but not inevitable. Proc Natl Acad Sci USA 108, 11733 (2011).
32. Barrett, S. & Dannenberg, A. Climate negotiations under scientic uncertainty. Proc Natl Acad Sci USA 109, 17372–17376 (2012).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
33. Milinsi, M., öhl, T. & Marotze, J. Cooperative interaction of rich and poor can be catalyzed by intermediate climate targets.
Climatic change 1–8 (2011).
34. Boesch, C. Cooperative hunting roles among Tai chimpanzees. Human. Nature 13, 27–46 (2002).
35. Creel, S. & Creel, N. M. Communal hunting and pac size in African wild dogs, Lycaon pictus. Animal Behaviour 50, 1325–1339
(1995).
36. Blac, J., Levi, M. D. & De Meza, D. Creating a good atmosphere: minimum participation for tacling the’greenhouse eect’.
Economica 281–293 (1993).
37. Stander, P. E. Cooperative hunting in lions: the role of the individual. Behavioral ecology and sociobiology 29, 445–454 (1992).
38. Alvard, M. S. et al. ousseau’s whale hunt? Coordination among big-game hunters. Current anthropology 43, 533–559 (2002).
39. Souza, M. O., Pacheco, J. M. & Santos, F. C. Evolution of cooperation under N-person snowdri games. J eor Biol 260, 581–588
(2009).
40. Pacheco, J. M., Vasconcelos, V. V., Santos, F. C. & Syrms, B. Co-evolutionary dynamics of collective action with signaling for a
quorum. PLoS Comput Biol 11, e1004101 (2015).
41. Syrms, B. e Stag Hunt and the Evolution of Social Structure. (Cambridge Univ Press, 2004).
42. Barrett, S. Environment and statecra: the strategy of environmental treaty-maing. (Oxford UP, 2005).
43. Sigmund, . e Calculus of Selshness. (Princeton Univ Press, 2010).
44. Traulsen, A., Nowa, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and xation. Phys. ev. E 74, 011909 (2006).
45. Traulsen, A., Hauert, C., De Silva, H., Nowa, M. A. & Sigmund, . Exploration dynamics in evolutionary games. PNAS 106,
709–712 (2009).
46. Paiva, A., Santos, F. P. & Santos, F. C. Engineering pro-sociality with autonomous agents in irty-Second AAAI Conference on
Articial Intelligence, pp. 7994–7999 (2018).
47. Shirado, H. & Christais, N. A. Locally noisy autonomous agents improve global human coordination in networ experiments.
Nature 545, 370 (2017).
48. Santos, F. P., Pacheco, J. M., Paiva, A. & Santos, F. C. Evolution of collective fairness in hybrid populations of humans and agents in
Proceedings of the irty-ird AAAI Conference on Articial Intelligence, Vol. 33, pp. 6146–6153 (2019).
49. ahwan, I. et al. Machine behaviour. Nature 568, 477 (2019).
50. Powers, S. T., van Schai, C. P. & Lehmann, L. How institutions shaped the last major evolutionary transition to large-scale human
societies. Philosophical Transactions of the oyal Society B: Biological Sciences 371, 20150098 (2016).
51. Sigmund, ., De Silva, H., Traulsen, A. & Hauert, C. Social learning promotes institutions for governing the commons. Nature 466,
861 (2010).
52. Santos, F. C., Pacheco, J. M. & Syrms, B. Co-evolution of pre-play signaling and cooperation. J eor Biol 274, 30–35 (2011).
53. ularni, V. G. Modeling and analysis of stochastic systems. (Chapman and Hall/CC, 2016).
54. Hindersin, L., Wu, B., Traulsen, A. & García, J. Computation and simulation of evolutionary game dynamics in nite populations.
Sci. ep. 9, 6946 (2019).
Acknowledgements
This research was supported by Fundação para a Ciência e Tecnologia (FCT) through grants PTDC/EEI-
SII/5081/2014 and PTDC/MAT/STA/3358/2014 and by multiannual funding of INESC-ID and CBMA (under
the projects UID/CEC/50021/2019 and UID/BIA/04050/2013). F.P.S. acknowledges support from the James S.
McDonnell Foundation 21st Century Science Initiative in Understanding Dynamic and Multi-scale Systems -
Postdoctoral Fellowship Award.All authors declare no competing nancial or non-nancial interests in relation
to the work described.
Author contributions
A.R.G., F.P.S, J.M.P. and F.C.S. designed and implemented the research; A.R.G., F.P.S, J.M.P. and F.C.S prepared
all the Figures; A.R.G., F.P.S, J.M.P. and F.C.S. wrote the manuscript; A.R.G., F.P.S, J.M.P. and F.C.S reviewed the
manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to F.C.S.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons license, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons license and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
© e Author(s) 2019
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Available via license: CC BY 4.0
Content may be subject to copyright.