ArticlePDF Available

Reward and punishment in climate change dilemmas

Authors:

Abstract and Figures

Mitigating climate change effects involves strategic decisions by individuals that may choose to limit their emissions at a cost. Everyone shares the ensuing benefits and thereby individuals can free ride on the effort of others, which may lead to the tragedy of the commons. For this reason, climate action can be conveniently formulated in terms of Public Goods Dilemmas often assuming that a minimum collective effort is required to ensure any benefit, and that decision-making may be contingent on the risk associated with future losses. Here we investigate the impact of reward and punishment in this type of collective endeavors — coined as collective-risk dilemmas — by means of a dynamic, evolutionary approach. We show that rewards (positive incentives) are essential to initiate cooperation, mostly when the perception of risk is low. On the other hand, we find that sanctions (negative incentives) are instrumental to maintain cooperation. Altogether, our results are gratifying, given the a-priori limitations of effectively implementing sanctions in international agreements. Finally, we show that whenever collective action is most challenging to succeed, the best results are obtained when both rewards and sanctions are synergistically combined into a single policy.
Gradient of selection (top panels, A and B) and stationary distribution (bottom panels, C and D) for the different values of per-capita budget δ indicated, using either pure-Reward (w = 1, left panels) or purePunishment (w = 0, right panels). The black curve is equal on the left and right panels, since in this case δ = 0. As δ increases, the behaviour under Reward and Punishment is qualitatively similar, by displacing the (unstable) coordination equilibrium towards lower values of k/Z, while displacing the (stable) coexistence equilibrium towards higher values of k/Z. This happens, however, only for low values of δ. Indeed, by further increasing δ one observes very different behaviours under Reward and Punishment: Whereas under Punishment the equilibria are further moved apart (in accord with what happened for low δ) under Reward the coordination equilibrium disappears, and the overall dynamics becomes characterized by a single coexistence equilibrium which consistently shifts towards higher values of k/Z with increasing δ. This difference in behaviour, in turn, has a dramatic impact in the overall prevalence of configurations achieved by the population dynamics, as shown by the stationary distributions: On panel C (pure-Reward) the population spends most of the time on intermediate states of cooperation. On panel D (pure-Punishment) the population spends most of the time on both extremes (high and low cooperation) but especially on low cooperation states. Other parameters: Z = 50, N = 10, M = 5, c = 0.1, B = 1, r = 0.5, a = b = 1, β = 5 and µ = 0.01 (see Methods for details).
… 
This content is subject to copyright. Terms and conditions apply.
1
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
Reward and punishment in climate
change dilemmas
António R. Góis1,2,3, Fernando P. Santos4,1,2, Jorge M. Pacheco5,6,2 & Francisco C. Santos
1,2,7*
Mitigating climate change eects involves strategic decisions by individuals that may choose to limit
their emissions at a cost. Everyone shares the ensuing benets and thereby individuals can free ride
on the eort of others, which may lead to the tragedy of the commons. For this reason, climate action
can be conveniently formulated in terms of Public Goods Dilemmas often assuming that a minimum
collective eort is required to ensure any benet, and that decision-making may be contingent on the
risk associated with future losses. Here we investigate the impact of reward and punishment in this type
of collective endeavors — coined as collective-risk dilemmas — by means of a dynamic, evolutionary
approach. We show that rewards (positive incentives) are essential to initiate cooperation, mostly
when the perception of risk is low. On the other hand, we nd that sanctions (negative incentives)
are instrumental to maintain cooperation. Altogether, our results are gratifying, given the a-priori
limitations of eectively implementing sanctions in international agreements. Finally, we show that
whenever collective action is most challenging to succeed, the best results are obtained when both
rewards and sanctions are synergistically combined into a single policy.
Climate change stands as one of our biggest challenges in what concerns the emergence and sustainability of
cooperation1,2. Indeed, world citizens build up high expectations every time a new International Environmental
Summit is settled, unfortunately with few resulting solutions implemented so far. is calls for the development of
more eective incentives, agreements and binding mechanisms. e problem can be conveniently framed resort-
ing to the mathematics of game theory, being a paradigmatic example of a Public Goods Game3: at stake there is
a global good from which every single individual can prot, irrespectively of contributing to maintain it. Parties
may free ride on the eorts of others, avoiding any eort themselves, while driving the population into the trag-
edy of the commons4. Moreover, since here cooperation aims at averting collective losses, this type of dilemmas
is oen referred as public bad games, in which achieving collective goals oen depends on reaching a threshold
number of cooperative group members58.
One of the multiple obstacles attributed to such agreements is misperceiving the actual risk of future losses,
which signicantly aects the ensuing dynamics of cooperation5,9. Another problem relates to both the incapac-
ity to sanction those who do not contribute to the welfare of the planet, and/or to reward those who subscribe
to green policies10. Previous cooperation studies show that reward (positive incentives), punishment (nega-
tive incentives) and the combination of both1123 have a dierent impact depending on the dilemma in place.
Assessing the impact of reward and punishment (isolated or combined) in the context of N-person threshold
games — and in the particular case of climate change dilemmas — remains, however, an open problem.
Here we study, theoretically, the role of both institutional reward and punishment in the context of climate
change agreements. Previous works consider the public good as a linear function of the number of contribu-
tors12,17,21,22 and conclude that punishment is more eective than reward (for an optimal combination of pun-
ishment and reward see ref.12). We depart from this linear regime by modeling the returns on the public good
as a threshold problem, combined with an uncertain outcome, represented by a risk of failure. As a result – and
as detailed below – the dynamical portrait of our model reveals new internal equilibria9, allowing to identify the
dynamics of coordination and coexistence typifying collective action problems. As discussed below, the reward
and punishment mechanisms will impact, in a non-trivial way, those equilibria.
1INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, IST-Taguspark, 2744-016, Porto, Salvo, Portugal.
2ATP-group, P-2744-016, Porto, Salvo, Portugal. 3Unbabel, R. Visc. de Santarém 67B, 1000-286, Lisboa, Portugal.
4Department of Ecology and Evolutionary Biology, Princeton University, Princeton, USA. 5Centro de Biologia
Molecular e Ambiental, Universidade do Minho, 4710 - 057, Braga, Portugal. 6Departamento de Matemática e
Aplicações, Universidade do Minho, 4710 - 057, Braga, Portugal. 7Machine Learning Group, Université Libre de
Bruxelles, Boulevard du Triomphe CP212, 1050, Bruxelles, Belgium. *email: franciscocsantos@tecnico.ulisboa.pt
OPEN
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
We consider a population of size Z, where each individual can be either a Cooperator (C) or a Defector (D),
when participating in a N-player Collective-Risk dilemma (CRD)5,9,10,2430. In this game, each participant starts
with an initial endowment B (viewed as the asset value at stake) that may be used to contribute to the mitigation
of the eects of climate change. A cooperator incurs a cost corresponding to a fraction c of her initial endowment
B, in order to help prevent a collective failure. On the other hand, a defector refuses to have any cost, hoping to
free ride on the contributions of others. We require a minimum number of 0 < M N cooperators in a group of
size N before collective action is realized; if a group of size
N
does not contain at least M Cs, all members lose their
remaining endowments with a probability r, where r (0 r 1) stands as the risk of collective failure. Otherwise,
everyone will keep whatever she has. is CRD formulation has been shown to capture some of the key features
discovered in recent experiments5,24,3133, while highlighting the importance of risk. In addition, it allows one to
test model parameters in a systematic way that is not possible in human experiments. Moreover, the adoption of
non-linear returns mimics situations common to many human and non-human endeavors6,3441, where a mini-
mum joint eort is required to achieve a collective goal. us, the applicability of this framework extends well
beyond environmental governance, given the ubiquity of such type of social dilemmas in nature and societies.
Following Chen et al.12, we include both reward and punishment mechanisms in this model. A xed group
budget Nδ (where δ 0 stands for a per-capita incentive) is assumed to be available, of which a fraction w is
applied to a reward policy and the remaining 1-w to a punishment policy. We assume the eective impact of both
policies to be equivalent, meaning that each unit spent will directly increase/decrease the payo of a cooperator/
defector by the same amount. For details on policies with dierent eciencies, see Methods.
Instead of considering a collection of rational agents engaging in one-shot Public Goods Games32,42, here we
adopt an evolutionary description of the behavioral dynamics9, in which individuals tend to copy those appearing
to be more successful. Success (or tness) of individuals is here associated with their average payo. All individu-
als are equally likely to interact with each other, causing all cooperators and defectors to be equivalent, on average,
and only distinguishable by the strategy they adopt. erefore, and considering that only two strategies are avail-
able, the number of cooperators is sucient to describe any conguration of the population. e number of indi-
viduals adopting a given strategy (either C or D) evolves in time according to a stochastic birth–death process43,44,
which describes the time evolution of the social learning dynamics (with exploration): At each time-step each
individual (X, with tness fX) is given the opportunity to change strategy; with probability μ, X randomly explores
the strategy space45 (a process similar to mutations in a biological context that precludes the existence of absorb-
ing states). With probability (1-μ), X may adopt the strategy of a randomly selected individual (Y, with tness
fY), with a probability that increases with the tness dierence (fY–fX)44. is renders the stationary distribution
(see Methods) an extremely useful tool to rank the most visited states given the ensuing evolutionary dynamics
of the population. Indeed, the stationary distribution provides the prevalence of each of the populations possible
conguration, in terms of the number of Cs (k) and Ds (Z-k). Combined with the probability of success charac-
terizing each conguration, the stationary distribution can be used to compute the overall success probability of
a given population – the average group achievement, ηG. is value represents the average fraction of groups that
will overcome the CRD, successfully preserving the public good.
Results
In Fig.1 we compare the average group achievement ηG (as a function of risk) in four scenarios: (i) a reference
scenario without any policy (i.e., no reward or punishment, in black); and three scenarios where a budget is
applied to (ii) rewards, (iii) punishment and (iv) a combination of rewards and sanctions (see below). Our results
are shown for the two most paradigmatic regimes: low (Fig.1A) and high (Fig.1B) coordination requirements.
Naturally ηG improves whenever a policy is applied. Less obvious is the dierence between the various policies.
Applying only rewards (blue curves in Fig.1) is more eective than only punishment (red curve) for low values of
risk. e opposite happens when risk is high. On scenarios with a low relative threshold (Fig.1A), rewards play
the key role, with sanctions only marginally outperforming them for very high values of risk. For high coordina-
tion thresholds (Fig.1B) reward and punishment portray comparable eciency in the promotion of cooperation,
with pure-Punishment (w = 0) performing slightly better than pure-Reward (w = 1).
Justifying these dierences is dicult from the analysis of ηG alone. To better understand the behavior dynam-
ics under Reward and Punishment, we show in Fig.2 the gradients of selection (top panels) and stationary distri-
butions (lower panels) for each case and dierent budget values. Each gradient of selection represents, for each
discrete state k/Z (i.e., fraction of Cs), the dierence
=−
+−
GkTkTk() () ()
among the probability to increase
(T+(k)) and decrease (T(k)) the number of cooperators (see Methods) by one. Whenever G(k) > 0 the fraction
of Cs is likely to increase; whenever G(k) < 0 the opposite is expected to happen. e stationary distributions
show how likely it is to nd the population in each (discrete) conguration of our system. e panels on the
le-hand side show the results obtained for the CRD under pure-Reward; on the right-hand side, we show the
results obtained for pure-Punishment.
Naturally, both mechanisms are inoperative whenever the per-capita incentives are inexistent (δ = 0), creat-
ing a natural reference scenario in which to study the impact of Reward and Punishment on the CRD. In this
case, above a certain value of risk (r), decision-making is characterized by two internal equilibria (i.e., adjacent
nite population states with opposite gradient sign, representing the analogue of xed points in a dynamical
system characterizing evolution in innite populations). Above a certain fraction of cooperators the population
overcomes the coordination barrier and naturally self-organizes towards a stable co-existence of cooperators
and defectors. Otherwise, the population is condemned to evolve towards a monomorphic population of defec-
tors, leading to the tragedy of the commons9. As the budget for incentives increases, using either Reward or
Punishment leads to very dierent outcomes, as depicted in Fig.2.
Contrary to the case of linear Public Goods Games12, in the CRD coordination and co-existence dynam-
ics already exist in the absence of any reward/punishment incentive. Reward is particularly effective when
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
cooperation is low (small k/Z), showing a signicant impact on the location of the nite population analogue of
an unstable xed point. Indeed, increasing δ lowers the minimum number of cooperators required to reach the
cooperative basin of attraction (as well as increasing the prevalence of cooperators in co-existence point on the
right), which ultimately disappears for high δ (Fig.2A). is means that a smaller coordination eort is required
before the population dynamics start to naturally favor the increase of cooperators. Once this initial barrier is
surpassed, the population will naturally tend towards an equilibrium state, which does not improve appreciably
under Reward. e opposite happens under Punishment. e location of the coordination point is little aected,
yet once this barrier is overcome, the population will evolve towards a more favorable equilibrium (Fig.2B). us,
while Reward seems to be particularly eective to bootstrap cooperation towards a more cooperative basin of
attraction, Punishment seems eective in sustaining high levels of cooperation.
As a consequence, the most frequently observed congurations are very dierent when using each of the
policies. As shown by the stationary distributions (Fig.2C,D), under Reward the population visits more oen
states with intermediate values of cooperation (i.e., where Cs and Ds co-exist). Intuitively, this happens because
the coordination eort is eased by the rewards, causing the population to eectively overcome it and reach the
coexistence point (the equilibrium state with an intermediate amount of cooperators) thus spending most of the
time near it. On the other hand, Punishment will not ease the coordination eort, and thus the population will
spend most of the time in states of low cooperation, failing to overcome this barrier. Notwithstanding, once sur-
passed, the population will stabilize on higher states of cooperation. is is especially evident for high budgets,
as shown with δ = 0.02 (blue line). Moreover, since Nδ corresponds to a xed total amount which is distributed
by the existing cooperators/defectors, this causes the per-cooperator/defector budget to vary depending on the
number of existing cooperators/defectors (i.e., each of the j cooperators receives wδN/j and each defector loses
(1 w)δN/(N j)). In other words, positive (negative) incentives become very protable (or severe) if defection
(cooperation) prevails within a group. In particular, whenever the budget is signicant (see, e.g., δ = 0.02 in Fig.2)
the punishment becomes so high when there are few defectors within a group, that a new equilibrium emerges
close to full cooperation.
e results in Fig.2 show that Reward can be instrumental in fostering pro-social behavior, while Punishment
can be used for its maintenance. is suggests that, to combine both policies synergistically, pure-Reward (w = 1)
should be applied at rst, when there are few cooperators (low k/Z); above a certain critical point (k/Z = s) one
should switch to pure-Punishment (w = 0). In the Methods section, we demonstrate that, similar to linear Public
Goods Games12, in CRDs this is indeed the policy which minimizes the advantage of the defector, even if we con-
sider the alternative possibility of applying both policies simultaneously. In Methods, we also compute a general
expression for the optimal switching point s*, that is, the value of k above which Punishment should be applied
instead ofReward to maximize cooperation and group achievement. By using such policy — that we denote by s*
— we obtain the best results shown with an orange line in Fig.1. We propose, however, to explore what happens
in the context of a CRD when s* is not used. How much cooperation is lost when we deviate from s* to either of
the pure policies, or to a policy which uses a switching point dierent from the optimal one?
Figure 1. Average group achievement ηG as a function of risk. Le: Group relative threshold M/N = 3/10. Right:
Group relative threshold M/N = 7/10. In both panels, the black line corresponds to a reference scenario where
no policy is applied. e red line shows ηG in the case where all available budget is applied to pure-Punishment
(w = 0), whereas the blue line shows results for pure-Reward (w = 1). Pure-Reward is most eective at low risk
values, while pure-Punishment is marginally the most eective policy at high risk. ese features are more
pronounced for low relative thresholds (le panel), and only at high thresholds does pure-Punishment lead
to a sizeable improvement with respect to pure-Reward. Finally, the orange line shows the results using the
combination of Reward and Punishment, leading (naturally) to the best results. In this case, we adopt pure-
Reward (w = 1) when there are few cooperators and, above a certain critical point k/Z = s = 0.5, we switch to
pure-Punishment (w = 0). As detailed in the main text (see Fig.3 and Methods), s = 0.5 provides the optimal
switching point s* for cooperation to thrive. Other parameters: Population size Z = 50, group size N = 10, cost
of cooperation c = 0.1, initial endowment B = 1, budget δ = 0.025, reward eciency a = 1, punishment eciency
b = 1, intensity of selection β = 5, mutation rate µ = 0.01.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
Figure3 illustrates how the choice of the switching point s impacts the overall cooperation, as evaluated by
ηG, for dierent values of risk. For a switching point of s = k/Z = 1.0 (0.0) a static policy of always pure-Reward
(pure-Punishment) is used. is can be seen on the far right (le) of Fig.3. Figure3 suggests that, for low thresh-
olds, an optimal policy switching (which, for the parameters shown, occurs for s = 50%, see Methods) is only mar-
ginally better than a policy solely based on rewards (s = 1). Figure3 also allows for a comparison of what happens
when the switching point occurs too late (excessive rewards) or too early (excessive sanctions) in a low-threshold
scenario. A late switch is signicantly less harmful than an early one. In other words, our results suggest that
when the population conguration cannot be precisely observed, it is preferable to keep rewarding for longer.
is said, whenever the perception of risk is high (an unlikely situation these days) an early switch is slightly less
harmful than a late one. In the most dicult scenarios, where stringent coordination requirements (large M) are
combined with a low perception of risk (low r), the adoption of a combined policy becomes necessary (see right
panel of Fig.1).
Discussion
One might expect the impact of Reward and Punishment to lead to symmetric outcomes – Punishment would
be eective for high-cooperation the same way that Reward is eective for low-cooperation. In low-cooperation
scenarios (under low risk, threshold or budget) Reward alone plays the most important role. However, in the
opposite scenario, Punishment alone does not have the same impact. Either a favourable scenario occurs, where
any policy yields a satisfying result, or Punishment cannot improve outcomes on its own. In the latter case, the
synergy between both policies becomes essential to achieve cooperation. Such optimal policy involves a combi-
nation of the single policies, Reward and Punishment, which is dynamic, in the sense that the combination does
not remain the same for all congurations of the population. It corresponds to employing pure Reward at rst,
when cooperation is low, switching subsequently to Punishment whenever a pre-determined level of cooperation
is reached.
Figure 2. Gradient of selection (top panels, A and B) and stationary distribution (bottom panels, C and D)
for the dierent values of per-capita budget δ indicated, using either pure-Reward (w = 1, le panels) or pure-
Punishment (w = 0, right panels). e black curve is equal on the le and right panels, since in this case δ = 0.
As δ increases, the behaviour under Reward and Punishment is qualitatively similar, by displacing the (unstable)
coordination equilibrium towards lower values of k/Z, while displacing the (stable) coexistence equilibrium
towards higher values of k/Z. is happens, however, only for low values of δ. Indeed, by further increasing
δ one observes very dierent behaviours under Reward and Punishment: Whereas under Punishment the
equilibria are further moved apart (in accord with what happened for low δ) under Reward the coordination
equilibrium disappears, and the overall dynamics becomes characterized by a single coexistence equilibrium
which consistently shis towards higher values of k/Z with increasing δ. is dierence in behaviour, in turn,
has a dramatic impact in the overall prevalence of congurations achieved by the population dynamics, as
shown by the stationary distributions: On panel C (pure-Reward) the population spends most of the time on
intermediate states of cooperation. On panel D (pure-Punishment) the population spends most of the time on
both extremes (high and low cooperation) but especially on low cooperation states. Other parameters: Z = 50,
N = 10, M = 5, c = 0.1, B = 1, r = 0.5, a = b = 1, β = 5 and µ = 0.01 (see Methods for details).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
e optimal procedure, however, is unlikely to be realistic in the context of Climate Change agreements.
Indeed, and unlike other Public Goods Dilemmas, where Reward and Punishment constitute the main policies
available for Institutions to foster cooperative collective action, in International Agreements it is widely recog-
nized that Punishment is very dicult to implement2,42. is has been, in fact, one of the main criticisms put
forward in connection with Global Agreements on Climate Mitigation: ey suer from the lack ofsanctioning
mechanisms as it is practically impossible to enforce any type of sanctioning at a Global level. In this sense, the
results obtained here by means of our dynamical, evolutionary approach, are gratifying, given these a-priori limi-
tations of sanctioning in CRDs. Not only do we show that Reward is essential to foster cooperation, mostly when
both the perception of risk is low and the overall number of engaged parties is small (low k/Z), but also we show
that Punishment mostly acts to sustain cooperation, aer it has been installed. Given that low-risk scenarios are
more common and harmful to cooperation than high-risk ones, our results in connection with rewards provide
a viable way to explore in the quest for establishing Global cooperative collective action. Reward policies may
also be very relevant in scenarios where Climate Agreements are coupled with other International agreements
from which parties are not interested to deviate from2,42. Finally, the fact that rewards ease coordination towards
cooperative states suggests that positive incentives should also be used within intervention mechanisms aiming at
fostering pro-sociality in articial systems and hybrid populations comprising humans and machines4649.
e model used takes for granted the existence of an institution with a budget available to implement either
Reward or Punishment. New behaviours may emerge once individuals are called to decide whether or not to
contribute to such an institution, allowing for a scenario where this institution fails to exist10,28,50,51. At present,
and under the Paris agreement, we are witnessing the potential birth of an informal funding institution, whose
goal is to nance developing countries to help them increase their mitigation capacity. Clearly, this is just an
example pointing out to the fact that the prevalence of local and global institutional incentives may depend and
may be inuenced by the distribution of wealth available among parties, in the same way that it inuences the
actual contributions to the public good10,29,33. Finally, several other eects may further inuence and/or aect
the present results. Among others, if intermediate tasks are considered33, or if individuals have the opportunity
to pledge their contribution before their actual action7,40,52, it is likely that pro-social behavior may be enhanced.
Work along these lines is in progress.
Methods
Public goods and collective risks. Let us consider a population with Z individuals, where each individual
can be a cooperator (C) or a defector (D). For each round of this game, a group of N players is sampled from the
original nite population of size Z, which corresponds to a process of sampling without replacement. e proba-
bility of a group comprising any possible combination of Cs and Ds is given by the hypergeometric distribution.
In the context of a given group, a strategy is associated with a payo value corresponding to an individuals earn-
ings in that round, which depend on the action of the rest of group. Fitness is the expected payo of an individual
in a population, before knowing to which group he was assigned. is way, for a population with k out of Z Cs and
each group containing j out of N Cs, the tness of a D and a C can be written as:
Figure 3. Average group achievement ηG as a function of the location of the switching point s. e switching
point s corresponds to the conguration (fraction of Cs in the population, k/Z) above which w suddenly
switches from pure-Reward (w = 1) to pure-Punishment (w = 0). Assuming both policies are equally ecient,
the optimal switching point occurs at 50% of cooperators (k/Z = 0.5). e far-le values of s correspond
to a static policy of always pure-Punishment – the switch from pure-Reward to pure-Punishment occurs
immediately at 0% of cooperators. On the far-right (switching point = 100%) a pure-Reward policy is depicted.
We can also see what happens when the switch occurs too late or too early, for dierent values of risk. For low
values of risk, it is signicantly less harmful to have a late switch from Reward to Punishment than an early one,
meaning that when the population conguration cannot be precisely observed, it is preferable to keep rewarding
for longer. See Methods for the calculation of the optimal switching point (s*) that maximizes cooperation
tness relative to defection – and consequent group achievement. Other parameters: Z = 50, B = 1, µ = 0.01,
β = 5, N = 10, M = 3, c = 0.1, and δ = 0.025.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
=
−−
−−
Π
=
()
fZ
Nk
jZk
Nj j
1
11
1()
(1)
Dj
N
1
0
1
D
=
−−
Π+
=
()
fZ
NkjZk
Nj j
1
111(1)
(2)
Cj
N
1
0
1
C
where
Πj()
C
and
Πj()
D
stand for the payo or a C and a D in a single round, in a group with N players and j Cs. To
dene the payo functions, let
θx()
be a Heaviside step-function distribution, where θ(x) = 0 if x < 0 and θ(x) = 1
if x 0. Each player can contribute with a fraction c of her endowment B (with 0 c 1), and in case a group
contains less than M cooperators (0 < M N) there is a risk r of failure (0 r 1), in which case no player obtains
her remaining endowment. e payo of a defector (
Πj()
D
) and the payo of a cooperator (
Πj()
C
), before incor-
porating any policy, can be written as9:
θθΠ= −+−−jBjM rjM() {( )(1)[1 ()]} (3)
D
Π=Π−jjcB() () (4)
CD
Reward and punishment. To include a Reward or a Punishment policy, let us follow ref.12 and consider
a group budget Nδ which can be used to implement any type of policy. e fraction of Nδ applied to Reward
is represented by the weight w, with 0 w 1. Parameters a and b correspond to the eciency of Reward and
Punishment (for all Figures above it was assumed that a = b = 1).
δ
Π=Π−
jj
bwN
Nj
() ()
(1 )
(5)
D
PD
δ
Π=Π+jj
awN
j
() ()
(6)
C
RC
Naturally, these new payo functions can be included into the previous tness functions (
replaces
ΠD
and
replaces
ΠC
), letting tness values account for the dierent policies.
Evolutionary dynamics in nite populations. e tness functions written above allow us to setup the
(discrete time) evolutionary dynamics. Indeed, the congurations of the entire population may be used to dene
a Markov Chain, where each state is characterized by number of cooperators9,44. To decide in which direction
the system will evolve, at each step a player i and a neighbour j of her are drawn at random from the population.
Player i decides whether to imitate her neighbour j with a probability depending on the dierence between their
tness43,44. is way, a system with k cooperators may stay in the same state, switch to k 1 or to k + 1. e prob-
ability of player i imitating player j can be given by the Fermi function:
≡+
β−−
()
pk e() [1 ]
(7)
ji ff
,1
ji
where β is the intensity of selection. Using this probability distribution, we can fully characterize this Markov
process. Let k be the total number of cooperators in the population and Z the total size of the population.
+
T k()
and
T k()
are the probabilities to increase and decrease k by one, respectively44:
=
+β±−
Tk
k
Z
Zk
Z
e() [1 ]
(8)
fk fk[()()
1
CD
e most likely direction can be computed using the dierence
≡−
+−
GkTkTk() () ()
. A mutation rate can be
introduced by using transition probabilities
μμ
=− +
μ
++
T
kTk() (1 )()
Zk
Z
and
μμ
=− +
μ
−−
TkT
k() (1 )()
k
Z
.
In all cases we used amutation rate μ = 0.01, this way avoiding the population to xate in a monomorphic congu-
ration. In this context, the stationary distribution becomes a very useful tool to analyse the overall population
dynamics, providing the probability =
p
P
()
kk
Z
for each of the Z + 1 states of this Markov Chain to be occupied53,54.
For each given population state k, the hypergeometric distribution can be used to compute the average fraction of
groups that obtain success aG(k). Using the stationary distribution and the average group success, the average
group achievement (ηG) can then be computed, providing the overall probability of achieving success:
η=
=pa k()
Gk
ZkG
0
.
Combined policies. By allowing the weight w to depend on the frequency of cooperators, we can derive
the optimal switching point s* between positive and negative incentives by minimizing the defector’s advantage
(fD fC). is is done similarly to ref.12, but using nite populations and therefore a hypergeometric distribution
(see Eqs (1), (2), (5), and (6)), to account for sampling without replacement. From Eqs (1) and (2), we get
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
δ
δ
=
−−
−−
Π−
=
−−
−−
Π+−+ +
=
=
()
()
f
k
j
Zk
Nj
Z
N
jbwN
Nj
f
kjZk
Nj
Z
N
jc
awN
j
1
1
1
1
() (1 )
11( 1)
1
1
1
(( 1) 1)
Dj
N
D
Cj
N
C
0
1
0
1
from which we aim at nding the value of w (with respect to k) that minimizes F = fD fC. Since
Πj()
D
,
Π+j(1)
C
and c do not depend on w, these quantities do not aect the choice of the optimal w, leaving us with the problem
of minimizing the following expression:
∑∑
δδ=−
−−
−−
−−
+
=
=
() ()
FN
k
j
Zk
Nj
Z
N
bw
Nj N
k
j
Zk
Nj
Z
N
aw
j
1
1
1
1
(1 )
1
1
1
1
1
j
N
j
N
0
1
0
1
Since
=
−−
−−
=
−−
−− −−
k
j
k
j
Zk
Nj
Zk
Nj
1
and
1
11
,
k
kj
Zk
Nj
Zk
(1)
δ
δ
δ
′=
−−
++
−−
−− ++
=−
−−
+−−
−− ++
−−
−−
−− ++
=
=
=
()
()
()
FN
kj
Zk
Nj
Z
N
aw
j
bw
Nj
k
kj
ZkNj
Zk
N
kjZk
Nj
Z
N
wa
j
b
Nj
k
kj
ZkNj
Zk
N
kjZk
Nj
Z
N
b
Nj
k
kj
ZkNj
Zk
11
1
11
(1 )1
11
1
11
1
11
1
1
1
j
N
j
N
j
N
0
1
0
1
0
1
e second summation does not depend on w; thus the optimal policy is given by the minimization of:
δ″=
−−
+−−
−− ++
=
()
FN
kjZk
Nj
Z
N
wa
j
b
Nj
k
kj
ZkNj
Zk
11
1
11
1
j
N
0
1
Since N and δ are always positive, the whole expression can be divided by Nδ without changing the optimiza-
tion problem. Moreover, by multiplying the expression by (1), it can nally be shown that minimizing fD fC is
equivalent to maximizing the following expression:
Figure 4. Optimal switching point s* as a function of the ratio a/b, for dierent values of N (see Methods).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
−−
+−−
−− ++
=
()
w
kj
Zk
Nj
Z
N
a
j
b
Nj
k
kj
ZkNj
Zk
11
1
11
1
j
N
0
1
where j represents the number of Cs in a group of size N, sampled without replacement from a population of size
Z containing k Cs. Now, let us consider that the optimal switching point s* depends on k. Since this sum decreases
as k increases, containing only one root, the solution to this optimization problem corresponds to having w set to
1 (pure Reward) for positive values of the sum, suddenly switching to w = 0 (pure Punishment) once the sum
becomes negative. e optimal switching point s* depends on the ratio
a
b
, group size N and population size Z. e
eect of population size (Z) and group size (N) on s* is limited, while the impact of the eciency of reward (a)
and punishment (b) is illustrated in Fig.4. For
=1
a
b
the switching point is s* = 0.5 (see Fig.4). Interestingly, we
note that, also in the CRD, s* is not impacted by the group success threshold (M) or the risk associated with los-
ing the retained endowment when collective success is not attained (r). is is the case as we assume that the
decision to punish or reward is independent on M or r. Notwithstanding, the model that we present can, in the
future, be tuned to test more sophisticated incentive tools, such as rewarding or punishing depending on (i) how
far group contributions remained from (or surpassed) the minima to achieve group success or (ii) how so/strict
is the dilemma at stake, given the likelihood of losing everything when collective success is not accomplished.
Received: 20 May 2019; Accepted: 15 September 2019;
Published: xx xx xxxx
References
1. Barrett, S. Self-enforcing international environmental agreements. Oxford Economic Papers 46, 878–894 (1994).
2. Barrett, S. Why cooperate?: the incentive to supply global public goods. (Oxford UP, 2007).
3. Dreber, A. & Nowa, M. A. Gambling for Global Goods. Proc Natl Acad Sci USA 105, 2261–2262 (2008).
4. Hardin, G. e Tragedy of the Commons. Science 162, 1243 (1968).
5. Milinsi, M., Sommerfeld, . D., rambec, H. J., eed, F. A. & Marotze, J. e collective-ris social dilemma and the prevention
of simulated dangerous climate change. Proc Natl Acad Sci USA 105, 2291–2294 (2008).
6. Pacheco, J. M., Santos, F. C., Souza, M. O. & Syrms, B. Evolutionary dynamics of collective action in N-person stag hunt dilemmas.
Proc  Soc Lond B 276, 315–321 (2009).
7. Tavoni, A., Dannenberg, A., allis, G. & Löschel, A. Inequality, communication and the avoidance of disastrous climate change in a
public goods game. Proc Natl Acad Sci USA 108, 11825–11829 (2011).
8. Bosetti, V., Heugues, M. & Tavoni, A. Luring others into climate action: coalition formation games with threshold and spillover
eects. Oxford Economic Papers 69, 410–431 (2017).
9. Santos, F. C. & Pacheco, J. M. is of collective failure provides an escape from the tragedy of the commons. Proc Natl Acad Sci USA
108, 10421–10425 (2011).
10. Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. A bottom-up institutional approach to cooperative governance of risy commons.
Nat. Clim. Change 3, 797–801 (2013).
11. Sigmund, ., Hauert, C. & Nowa, M. A. eward and punishment. Proc. Natl. Acad. Sci. USA 98, 10757–10762 (2001).
12. Chen, X., Sasai, T., Brännström, Å. & Diecmann, U. First carrot, then stic: how the adaptive hybridization of incentives promotes
cooperation. Journal of e oyal Society Interface 12, 20140935 (2015).
13. Hilbe, C. & Sigmund, . Incentives and opportunism: from the carrot to the stic. Proceedings of the oyal Society of London B:
Biological Sciences 277, 2427–2433 (2010).
14. Gneezy, A. & Fessler, D. M. Conict, stics and carrots: war increases prosocial punishments and rewards. Proceedings of the oyal
Society of London B: Biological Sciences, rspb20110805 (2011).
15. Sasai, T. & Uchida, S. ewards and the evolution of cooperation in public good games. Biology letters 10, 20130903 (2014).
16. Fehr, E. & Gächter, S. Altruistic punishment in humans. Nature 415, 137–140 (2002).
17. Sigmund, . Punish or perish? etaliation and collaboration among humans. Trends in ecology & evolution 22, 593–600 (2007).
18. Masclet, D., Noussair, C., Tucer, S. & Villeval, M.-C. Monetary and nonmonetary punishment in the voluntary contributions
mechanism. Am. Econ. ev. 93, 366–380 (2003).
19. Charness, G. & Haruvy, E. Altruism, equity, and reciprocity in a gi-exchange experiment: an encompassing approach. Games and
Economic Behavior 40, 203–231 (2002).
20. Andreoni, J., Harbaugh, W. & Vesterlund, L. e carrot or the stic: ewards, punishments, and cooperation. e American economic
review 93, 893–902 (2003).
21. Szolnoi, A. & Perc, M. eward and cooperation in the spatial public goods game. EPL (Europhysics Letters) 92, 38003 (2010).
22. Perc, M. et al. Statistical physics of human cooperation. Physics eports 687, 1–51 (2017).
23. Fang, Y., B eno, T. P., Perc, M., Xu, H. & Tan, Q. Synergistic third-party rewarding and punishment in the public goods game. Proc.
oy. Soc. A 475, 20190349 (2019).
24. Milinsi, M., Semmann, D., rambec, H. J. & Marotze, J. Stabilizing the Earth’s climate is not a losing game: Supporting evidence
from public goods experiments. Proc Natl Acad Sci USA 103, 3994–3998 (2006).
25. Chen, X., Szolnoi, A. & Perc, M. Averting group failures in collective-ris social dilemmas. EPL (Europhysics Letters) 99, 68003
(2012).
26. Chara, M. A. & Traulsen, A. Evolutionary dynamics of strategic behavior in a collective-ris dilemma. PLoS Comput Biol 8,
e1002652 (2012).
27. Chen, X., Szolnoi, A. & Perc, M. is-driven migration and the collective-ris social dilemma. Physical eview E 86, 036101 (2012).
28. Pacheco, J. M., Vasconcelos, V. V. & Santos, F. C. Climate change governance, cooperation and self-organization. Phys Life ev 11,
595–597 (2014).
29. Vasconcelos, V. V., Santos, F. C., Pacheco, J. M. & Levin, S. A. Climate policies under wealth inequality. Proc Natl Acad Sci USA 111,
2212–2216 (2014).
30. Hilbe, C., Chara, M. A., Altroc, P. M. & Traulsen, A. e evolution of strategic timing in collective-ris dilemmas. PloS ONE 8,
e66490 (2013).
31. Barrett, S. Avoiding disastrous climate change is possible but not inevitable. Proc Natl Acad Sci USA 108, 11733 (2011).
32. Barrett, S. & Dannenberg, A. Climate negotiations under scientic uncertainty. Proc Natl Acad Sci USA 109, 17372–17376 (2012).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
SCIENTIFIC REPORTS | (2019) 9:16193 | https://doi.org/10.1038/s41598-019-52524-8
www.nature.com/scientificreports
www.nature.com/scientificreports/
33. Milinsi, M., öhl, T. & Marotze, J. Cooperative interaction of rich and poor can be catalyzed by intermediate climate targets.
Climatic change 1–8 (2011).
34. Boesch, C. Cooperative hunting roles among Tai chimpanzees. Human. Nature 13, 27–46 (2002).
35. Creel, S. & Creel, N. M. Communal hunting and pac size in African wild dogs, Lycaon pictus. Animal Behaviour 50, 1325–1339
(1995).
36. Blac, J., Levi, M. D. & De Meza, D. Creating a good atmosphere: minimum participation for tacling the’greenhouse eect’.
Economica 281–293 (1993).
37. Stander, P. E. Cooperative hunting in lions: the role of the individual. Behavioral ecology and sociobiology 29, 445–454 (1992).
38. Alvard, M. S. et al. ousseau’s whale hunt? Coordination among big-game hunters. Current anthropology 43, 533–559 (2002).
39. Souza, M. O., Pacheco, J. M. & Santos, F. C. Evolution of cooperation under N-person snowdri games. J eor Biol 260, 581–588
(2009).
40. Pacheco, J. M., Vasconcelos, V. V., Santos, F. C. & Syrms, B. Co-evolutionary dynamics of collective action with signaling for a
quorum. PLoS Comput Biol 11, e1004101 (2015).
41. Syrms, B. e Stag Hunt and the Evolution of Social Structure. (Cambridge Univ Press, 2004).
42. Barrett, S. Environment and statecra: the strategy of environmental treaty-maing. (Oxford UP, 2005).
43. Sigmund, . e Calculus of Selshness. (Princeton Univ Press, 2010).
44. Traulsen, A., Nowa, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and xation. Phys. ev. E 74, 011909 (2006).
45. Traulsen, A., Hauert, C., De Silva, H., Nowa, M. A. & Sigmund, . Exploration dynamics in evolutionary games. PNAS 106,
709–712 (2009).
46. Paiva, A., Santos, F. P. & Santos, F. C. Engineering pro-sociality with autonomous agents in irty-Second AAAI Conference on
Articial Intelligence, pp. 7994–7999 (2018).
47. Shirado, H. & Christais, N. A. Locally noisy autonomous agents improve global human coordination in networ experiments.
Nature 545, 370 (2017).
48. Santos, F. P., Pacheco, J. M., Paiva, A. & Santos, F. C. Evolution of collective fairness in hybrid populations of humans and agents in
Proceedings of the irty-ird AAAI Conference on Articial Intelligence, Vol. 33, pp. 6146–6153 (2019).
49.  ahwan, I. et al. Machine behaviour. Nature 568, 477 (2019).
50. Powers, S. T., van Schai, C. P. & Lehmann, L. How institutions shaped the last major evolutionary transition to large-scale human
societies. Philosophical Transactions of the oyal Society B: Biological Sciences 371, 20150098 (2016).
51. Sigmund, ., De Silva, H., Traulsen, A. & Hauert, C. Social learning promotes institutions for governing the commons. Nature 466,
861 (2010).
52. Santos, F. C., Pacheco, J. M. & Syrms, B. Co-evolution of pre-play signaling and cooperation. J eor Biol 274, 30–35 (2011).
53. ularni, V. G. Modeling and analysis of stochastic systems. (Chapman and Hall/CC, 2016).
54. Hindersin, L., Wu, B., Traulsen, A. & García, J. Computation and simulation of evolutionary game dynamics in nite populations.
Sci. ep. 9, 6946 (2019).
Acknowledgements
This research was supported by Fundação para a Ciência e Tecnologia (FCT) through grants PTDC/EEI-
SII/5081/2014 and PTDC/MAT/STA/3358/2014 and by multiannual funding of INESC-ID and CBMA (under
the projects UID/CEC/50021/2019 and UID/BIA/04050/2013). F.P.S. acknowledges support from the James S.
McDonnell Foundation 21st Century Science Initiative in Understanding Dynamic and Multi-scale Systems -
Postdoctoral Fellowship Award.All authors declare no competing nancial or non-nancial interests in relation
to the work described.
Author contributions
A.R.G., F.P.S, J.M.P. and F.C.S. designed and implemented the research; A.R.G., F.P.S, J.M.P. and F.C.S prepared
all the Figures; A.R.G., F.P.S, J.M.P. and F.C.S. wrote the manuscript; A.R.G., F.P.S, J.M.P. and F.C.S reviewed the
manuscript.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to F.C.S.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre-
ative Commons license, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons license and your intended use is not per-
mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
© e Author(s) 2019
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Evolutionary game theory (EGT) [15] provides an appropriate tool to study the evolution of cooperative behaviour in social dilemmas, as they are governed by institutional incentives [7,[16][17][18][19][20][21][22] and prior commitments [1,[23][24][25]. However, prior works have not examined the interplay between different forms of incentive and commitment behaviours, including participation in and compliance with a prior commitment. ...
... However, prior works have not examined the interplay between different forms of incentive and commitment behaviours, including participation in and compliance with a prior commitment. On the one hand, existing institutional incentive models do not capture the commitment formation process explicitly [7,16,19,20,22]. These works fail to consider the need for improving participation in a commitment before the interaction. ...
... Also, it might be more costly for institutions to provide rewards when compliant behaviour is frequent. Thus, an interesting direction is to consider how to combine reward and punishment in a cost-efficient way, taking into account that they might have different levels of cost-efficiency, as have been done in the context of social dilemmas (without considering commitment-based behaviours) [7,[16][17][18][19]21]. Our results suggest that further behavioural experiments are needed to examine what incentive mechanisms are efficient and preferred (by people) for ensuring the commitment compliance and cooperation. ...
Article
Both conventional wisdom and empirical evidence suggest that arranging a prior commitment or agreement before an interaction takes place enhances the chance of reaching mutual cooperation. Yet it is not clear what mechanisms might underlie the participation in and compliance with such a commitment, especially when participation is costly and non-compliance can be profitable. Here, we develop a theory of participation and compliance with respect to an explicit commitment formation process and to institutional incentives where individuals, at first, decide whether or not to join a cooperative agreement to play a one-shot social dilemma game. Using a mathematical model, we determine whether and when participating in a costly commitment, and complying with it, is an evolutionarily stable strategy, resulting in high levels of cooperation. We show that, given a sufficient budget for providing incentives, rewarding of commitment compliant behaviours better promotes cooperation than punishment of non-compliant ones. Moreover, by sparing part of this budget for rewarding those willing to participate in a commitment, the overall level of cooperation can be significantly enhanced for both reward and punishment. Finally, the presence of mistakes in deciding to participate favours evolutionary stability of commitment compliance and cooperation.
... In the future, the author will investigate whether the proposed pool punishment similarly does not allow the invasion of defectors due to mutation and can maintain high average payoff in the cases where second-order free riders are punished 14 , or all types of players can punish other players 5 . The author also intends to devise the probabilistic pool reward and introduce the combination of reward and punishment like the following previous studies [47][48][49] . Szolnoki and Perc 47 discuss whether the combined application of reward and punishment is evolutionary advantageous, and find rich dynamical behaviour that shows intricate phase diagrams where continuous and discontinuous phase transitions successively occur. ...
... They find that this policy establishes and recovers full cooperation at lower cost and under a wider range of conditions than either rewards or penalties alone. Góis et al. 49 show similar results that rewards (positive incentives) are essential to initiate cooperation and sanctions (negative incentives) are instrumental to maintain cooperation. As each parameter value of this study conforms to Traulsen et al. 12 , a factor multiplying all summed-up contributions (r) equals 3, which is relatively large and somewhat induces cooperation. ...
Article
Full-text available
The public goods game is a multiplayer version of the prisoner’s dilemma game. In the public goods game, punishment on defectors is necessary to encourage cooperation. There are two types of punishment: peer punishment and pool punishment. Comparing pool punishment with peer punishment, pool punishment is disadvantageous in comparison with peer punishment because pool punishment incurs fixed costs especially if second-order free riders (those who invest in public goods but do not punish defectors) are not punished. In order to eliminate such a flaw of pool punishment, this study proposes the probabilistic pool punishment proportional to the difference of payoff. In the proposed pool punishment, each punisher pays the cost to the punishment pool with the probability proportional to the difference of payoff between his/her payoff and the average payoff of his/her opponents. Comparing the proposed pool punishment with previous pool and peer punishment, in pool punishment of previous studies, cooperators who do not punish defectors become dominant instead of pool punishers with fixed costs. However, in the proposed pool punishment, more punishers and less cooperators coexist, and such state is more robust against the invasion of defectors due to mutation than those of previous pool and peer punishment. The average payoff is also comparable to peer punishment of previous studies.
... For example, fairness concerns emerge and play a crucial role in group interactions, when agents must decide upon outcomes possibly favouring different parts unequally (Teixeira et al., 2021). These concerns arise in many domains -hybrid collectives of humans and machines (Paiva et al., 2018), wildlife management (Levin, 2000), conflict resolution (Pritchett and Genton, 2017) or enforcing global climate change actions (Góis et al., 2019, Ostrom, 2010, Pacheco et al., 2014, just to name a few. In this context, several mechanisms have been identified to explain why fairness is widespread in human decision-making, but it is typically assumed to emerge from the actions of individuals within the system. ...
... Existing models of institutional incentives aimed at promoting collective behaviours, such as cooperation and fairness (Góis et al., 2019, Han, 2022, Sasaki et al., 2012, Sigmund et al., 2001, 2010, Sun et al., 2021, usually ignore the problem of cost-efficiency. These works often consider non-adaptive incentive mechanisms, studying how minimal incentive mechanisms can promote cooperation. ...
Preprint
Full-text available
Institutions and investors are constantly faced with the challenge of appropriately distributing endowments. No budget is limitless and optimising overall spending without sacrificing positive outcomes has been approached and resolved using several heuristics. To date, prior works have failed to consider how to encourage fairness in a population where social diversity is ubiquitous, and in which investors can only partially observe the population. Herein, by incorporating social diversity in the Ultimatum game through heterogeneous graphs, we investigate the effects of several interference mechanisms which assume incomplete information and flexible standards of fairness. We quantify the role of diversity and show how it reduces the need for information gathering, allowing us to relax a strict, costly interference process. Furthermore, we find that the influence of certain individuals, expressed by different network centrality measures, can be exploited to further reduce spending if minimal fairness requirements are lowered. Our results indicate that diversity changes and opens up novel mechanisms available to institutions wishing to promote fairness. Overall, our analysis provides novel insights to guide institutional policies in socially diverse complex systems.
... However, what should be noted is that people may expect the impact of reward and punishment to lead to symmetrical results. In that context, the study of Góis et al (2019) is interesting in looking at the dilemma between reward and punishment policies. Their research found that punishment was effective for high cooperation in the same way as a reward was effective for low cooperation. ...
Article
Full-text available
The discipline of employees regarding working hours at the office of BPJS Health Kotabumi Branch is still relatively low. This low discipline of employees towards working hours can cause a lack of employee productivity. Problems concerning discipline in a company or government agency must be solved by making policies to overcome them. This study aimed to analyze the effect of implementing reward and punishment policies as a solution to increase employee productivity. The type of this study was field research with the quantitative method to analyze the obtained data. Samples in this study were 30 employees of BPJS Health for Kotabumi Branch, selected using the total sampling technique. Meanwhile, to analyze the obtained data, the researcher used multiple linear regression analysis. The results of the calculations using the Multiple Linear Regression analysis at a significance level of 0.05 () showed as follows. First, the p-value for was 0.02 (< 0.05), meaning that was rejected, concluding that the reward policy had an effect on increasing employee performance productivity. Second, the p-value for was 0.00 (< 0.05), meaning that was also rejected, concluding that the punishment policy had an effect on increasing employee performance productivity. Third, the p-value for was 0.00 (< 0.05), meaning that was rejected, concluding that the reward and punishment policy had an effect on increasing employee performance productivity. These findings contributed to increasing the productivity of employee performance. In addition, they can be used by policymakers in a company for increasing the productivity of employee performance.
... Sometimes [9,15] their simultaneous action is desirable for averting the TOC. Both reward [16] and punishment [17] can initiate the first seed of cooperation for it to take over globally. Both are incapable of achieving global cooperation where the population is divided into localized subgroups, and the payoffs are locally inefficient [18]. ...
Article
Full-text available
We consider an unstructured population of individuals who are randomly matched in an underlying population game in which the payoffs depend on the evolving state of the common resource exploited by the population. There are many known mechanisms for averting the overexploitation (tragedy) of the (common) resource. Probably one of the most common mechanism is reinforcing cooperation through rewards and punishments. Additionally, the depleting resource can also provide feedback that reinforces cooperation. Thus, it is an interesting question that how reward and punishment comparatively fare in averting the tragedy of the common in the game-resource feedback evolutionary dynamics. Our main finding is that, while averting the tragedy of the common completely, rewarding cooperators cannot get rid of all the defectors, unlike what happens when defectors are punished; and as a consequence, in the completely replete resource state, the outcome of the population game can be socially optimal in the presence of the punishment but not so in the presence of the reward.
Article
IMPACT Some countries are still struggling to vaccinate residents against Covid 19 despite the wide availability of vaccines. This situation becomes more complex when considering the possible need for regular booster shots. Repeated vaccine mandates that impose fines on vaccine refusers may increase vaccination uptake. However, the uptake may not be sufficient to lift all Covid 19 restrictions. This article recommends that policy-makers consider an alternative financial incentive system that relies on rewards in addition to fines. Theoretical and empirical evidence suggests that a combination can yield a stronger response than using rewards or fines alone.
Article
Exploring the evolution of cooperation has garnered increasing interest in a variety of fields, yet the majority of existing models assume that individuals’ preferences are fixed and stable. Preference reversal, a systematic disparity between people’s valuations and choices, implies that individuals’ preferences are inherently unstable and changing. To better understand how cooperation evolves, we develop a simple model with preference reversal in the context of the public goods game, in which the decision on whether and how to punish defectors is heavily influenced by the cooperators’ preference for punishment in two decision-making processes. We explore the effects of preference reversal on the evolutionary dynamics of cooperation in infinitely well-mixed populations. We also specify the scenarios where preference reversal favors cooperation. The chance of preference reversal as well as the predisposing conditions for preference reversal are critical in determining whether preference reversal facilitates the emergence of cooperation.
Article
Full-text available
Home assistant chat-bots, self-driving cars, drones or automated negotiation systems are some of the several examples of autonomous (artificial) agents that have pervaded our society. These agents enable the automation of multiple tasks, saving time and (human) effort. However, their presence in social settings raises the need for a better understanding of their effect on social interactions and how they may be used to enhance cooperation towards the public good, instead of hindering it. To this end, we present an experimental study of human delegation to autonomous agents and hybrid human-agent interactions centered on a non-linear public goods dilemma with uncertain returns in which participants face a collective risk. Our aim is to understand experimentally whether the presence of autonomous agents has a positive or negative impact on social behaviour, equality and cooperation in such a dilemma. Our results show that cooperation and group success increases when participants delegate their actions to an artificial agent that plays on their behalf. Yet, this positive effect is less pronounced when humans interact in hybrid human-agent groups, where we mostly observe that humans in successful hybrid groups make higher contributions earlier in the game. Also, we show that participants wrongly believe that artificial agents will contribute less to the collective effort. In general, our results suggest that delegation to autonomous agents has the potential to work as commitment devices, which prevent both the temptation to deviate to an alternate (less collectively good) course of action, as well as limiting responses based on betrayal aversion.
Chapter
Globally, the collapse of democracy is perceived as a crisis, and some say that the world governed by the AI driven systems or the central control by the tyrannic government is a dystopia. However, in the COVID-19 pandemic, rather, the negative aspects of democratic governance and human rights that caused the spread of the infection should not be ignored. As the author pointed out the relevance of the AI’s ubiquity to our cyber-physical world (Shibuya, 2020a), while Spinoza’s pantheistic ethics (2008) and Eastern thought may reach this context. Here, the author looks beyond the conflicts between Eastness and Westness to consider what kind of our life world is being reconstituted by digital transformation.
Article
The view that altruistic punishment plays an important role in supporting public cooperation among human beings and other species has been widely accepted by the public. However, the positive role of altruistic punishment in enhancing cooperation will be undermined if corruption is considered. Recently, behavioral experiments have confirmed this finding and further investigated the effects of the leader’s punitive power and the economic potential. Nevertheless, there are relatively few studies focusing on how these factors affect the evolution of cooperation from a theoretical perspective. Here, we combine institutional punishment public goods games with bribery games to investigate the effects of the above factors on the evolution of cooperation. Theoretical and numerical results reveal that the existence of corruption will reduce the level of cooperation when cooperators are more inclined to provide bribes. In addition, we demonstrate that stronger leader and richer economic potential are both important to enhance cooperation. In particular, when defectors are more inclined to provide bribes, stronger leaders can sustain the contributions of public goods from cooperators if the economic potential is weak.
Article
Full-text available
The study of evolutionary dynamics increasingly relies on computational methods, as more and more cases outside the range of analytical tractability are explored. The computational methods for simulation and numerical approximation of the relevant quantities are diverging without being compared for accuracy and performance. We thoroughly investigate these algorithms in order to propose a reliable standard. For expositional clarity we focus on symmetric 2 × 2 games leading to one-dimensional processes, noting that extensions can be straightforward and lessons will often carry over to more complex cases. We provide time-complexity analysis and systematically compare three families of methods to compute fixation probabilities, fixation times and long-term stationary distributions for the popular Moran process. We provide efficient implementations that substantially improve wall times over naive or immediate implementations. Implications are also discussed for the Wright-Fisher process, as well as structured populations and multiple types.
Article
Full-text available
Machines powered by artificial intelligence increasingly mediate our social, cultural, economic and political interactions. Understanding the behaviour of artificial intelligence systems is essential to our ability to control their actions, reap their benefits and minimize their harms. Here we argue that this necessitates a broad scientific research agenda to study machine behaviour that incorporates and expands upon the discipline of computer science and includes insights from across the sciences. We first outline a set of questions that are fundamental to this emerging field and then explore the technical, legal and institutional constraints on the study of machine behaviour. Understanding the behaviour of the machines powered by artificial intelligence that increasingly mediate our social, cultural, economic and political interactions is essential to our ability to control the actions of these intelligent machines, reap their benefits and minimize their harms.
Article
This paper envisions a future where autonomous agents are used to foster and support pro-social behavior in a hybrid society of humans and machines. Pro-social behavior occurs when people and agents perform costly actions that benefit others. Acts such as helping others voluntarily, donating to charity, providing informations or sharing resources, are all forms of pro-social behavior. We discuss two questions that challenge a purely utilitarian view of human decision making and contextualize its role in hybrid societies: i) What are the conditions and mechanisms that lead societies of agents and humans to be more pro-social? ii) How can we engineer autonomous entities (agents and robots) that lead to more altruistic and cooperative behaviors in a hybrid society? We propose using social simulations, game theory, population dynamics, and studies with people in virtual or real environments (with robots) where both agents and humans interact. This research will constitute the basis for establishing the foundations for the new field of Pro-social Computing, aiming at understanding, predicting and promoting pro-sociality among humans, through artificial agents and multiagent systems.
Article
Fairness plays a fundamental role in decision-making, which is evidenced by the high incidence of human behaviors that result in egalitarian outcomes. This is often shown in the context of dyadic interactions, resorting to the Ultimatum Game. The peculiarities of group interactions – and the corresponding effect in eliciting fair actions – remain, however, astray. Focusing on groups suggests several questions related with the effect of group size, group decision rules and the interrelation of human and agents’ behaviors in hybrid groups. To address these topics, here we test a Multiplayer version of the Ultimatum Game (MUG): proposals are made to groups of Responders that, collectively, accept or reject them. Firstly, we run an online experiment to evaluate how humans react to different group decision rules. We observe that people become increasingly fair if groups adopt stricter decision rules, i.e., if more individuals are required to accept a proposal for it to be accepted by the group. Secondly, we propose a new analytical model to shed light on how such behaviors may have evolved. Thirdly, we adapt our model to include agents with fixed behaviors. We show that including hardcoded Pro-social agents favors the evolutionary stability of fair states, even for soft group decision rules. This suggests that judiciously introducing agents with particular behaviors in a population may leverage long-term social benefits.
Article
We study the evolution of cooperation in the spatial public goods game in the presence of third-party rewarding and punishment. The third party executes public intervention, punishing groups where cooperation is weak and rewarding groups where cooperation is strong. We consider four different scenarios to determine what works best for cooperation, in particular, neither rewarding nor punishment, only rewarding, only punishment or both rewarding and punishment. We observe strong synergistic effects when rewarding and punishment are simultaneously applied, which are absent if neither of the two incentives or just each individual incentive is applied by the third party. We find that public cooperation can be sustained at comparatively low third-party costs under adverse conditions, which is impossible if just positive or negative incentives are applied. We also examine the impact of defection tolerance and application frequency, showing that the higher the tolerance and the frequency of rewarding and punishment, the more cooperation thrives. Phase diagrams and characteristic spatial distributions of strategies are presented to corroborate these results, which will hopefully prove useful for more efficient public policies in support of cooperation in social dilemmas.
Book
Building on the author's more than 35 years of teaching experience, Modeling and Analysis of Stochastic Systems, Third Edition, covers the most important classes of stochastic processes used in the modeling of diverse systems. For each class of stochastic process, the text includes its definition, characterization, applications, transient and limiting behavior, first passage times, and cost/reward models. The third edition has been updated with several new applications, including the Google search algorithm in discrete time Markov chains, several examples from health care and finance in continuous time Markov chains, and square root staffing rule in Queuing models. More than 50 new exercises have been added to enhance its use as a course text or for self-study. The sequence of chapters and exercises has been maintained between editions, to enable those now teaching from the second edition to use the third edition. Rather than offer special tricks that work in specific problems, this book provides thorough coverage of general tools that enable the solution and analysis of stochastic models. After mastering the material in the text, readers will be well-equipped to build and analyze useful stochastic models for real-life situations.
Article
Coordination in groups faces a sub-optimization problem and theory suggests that some randomness may help to achieve global optima. Here we performed experiments involving a networked colour coordination game in which groups of humans interacted with autonomous software agents (known as bots). Subjects (n = 4,000) were embedded in networks (n = 230) of 20 nodes, to which we sometimes added 3 bots. The bots were programmed with varying levels of behavioural randomness and different geodesic locations. We show that bots acting with small levels of random noise and placed in central locations meaningfully improve the collective performance of human groups, accelerating the median solution time by 55.6%. This is especially the case when the coordination problem is hard. Behavioural randomness worked not only by making the task of humans to whom the bots were connected easier, but also by affecting the gameplay of the humans among themselves and hence creating further cascades of benefit in global coordination in these heterogeneous systems.