The Spillover E¤ects of Monitoring: A Field
Michèle Belotand Marina Schrödery
September 26, 2014
We provide …eld experimental evidence of the e¤ects of monitoring in
a context where productivity is multi-dimensional and only one dimension
is monitored and incentivized. We hire students to do a job for us. The
job consists of identifying euro coins. We study the direct e¤ects of moni-
toring and penalizing mistakes on work quality and evaluate spillovers on
unmonitored dimensions of productivity (punctuality and theft). We …nd
that monitoring improves work quality only if incentives are strong, but
substantially reduces punctuality irrespectively of the associated incen-
tives. Monitoring does not a¤ect theft, with ten percent of participants
stealing overall. Our …ndings are supportive of a reciprocity mechanism,
whereby workers retaliate for being distrusted.
Keywords : counterproductive behavior, monitoring, …eld experiment
JEL: C93, J24, J30, M42, M52
University of Edinburgh, Scho ol of Management, 30 Buccleuch Place, E dinburgh, EH8
9JT, UK, m ichele.b firstname.lastname@example.org.
yUniversity of Cologne, Faculty of Managem ent, E conomics and Social Sciences, A lbertus-
Magnus-Platz, 50923 Cologn e, Germany, marina.schroeder@ uni-koeln.de.
Experts estimate that, globally, occupational fraud causes annual losses of more
than $3.5 trillion (Association of Certi…ed Fraud Examiners 2012). The question
is what an organization can do to prevent such behavior. One straightforward
instrument regularly applied in practice is to monitor workers and punish them
if they do not comply (or reward them if they do). But are such measures ef-
fective? There is experimental evidence that monitoring and incentivizing may
actually back…re (see Frey 1993, Falk and Kosfeld 2006; Frey and Jegen 2014 for
reviews of this literature). However, the evidence is so far limited to situations
where productivity is unidimensional, such as the number of units produced or
sold, performance at a test or monetary transfers in an experimental game (see
for example Gneezy and Rustichini 2000a; Nagin et al. 2002; Falk and Kosfeld
2006; Fisman and Miguel 2007; Dickinson and Villeval 2008; Boly 2011). These
studies assess the direct e¤ects of monitoring on work behavior in the monitored
productivity dimension. In typical work surroundings, however, productivity is
multi-dimensional and there are multiple ways in which workers can behave
counterproductively: From showing up late to do sloppy work, stealing, bully-
ing, or sabotaging other people’s work, counterproductive behavior has many
possible facets. Negative crowding out e¤ects of monitoring may spill over to
other productivity dimensions. These spillover e¤ects should be incorporated
when evaluating and designing monitoring and incentive schemes.
We study an experimental setup with multiple observable dimensions of pro-
ductivity, in which only one dimension is monitored and incentivised. We vary
(1) whether workers are monitored or not and (2) how "harsh" the incentives
are. We then evaluate the e¤ects of monitoring on the monitored dimension
and on the other non-monitored dimensions. The experimental setup we use is
related to the euro currency. It is a …eld version of the laboratory task proposed
in Belot and Schröder (2013). We recruited students to identify the provenance
of euro coins. Every worker receives four boxes of coins and is asked to identify
and return the coins by an appointed date. The task has the advantage of of-
fering a menu of observable forms of counterproductive behaviors that are very
common in the workplace, i.e., sloppy work, tardiness, and theft. These forms
of counterproductive behavior vary in their nature and perhaps, importantly, in
the non-monetary (or moral) costs associated with them (Robinson and Bennett
While it is obvious that sloppy work and theft a¤ect the principal nega-
tively, tardiness is also generally considered as undesired behavior (Robinson
and Bennett 1995; Gneezy and Rustichini 2000a; Gubler, Larkin and Pierce
2013). However, tardiness is not perceived in the same way across countries
(Basu and Weibull 2003; Krupka and Weber 2013). The experiment was con-
ducted in Germany, where there is a strong social norm of punctuality. Proper
business etiquette is to be exactly on time. For example, a website targeting
English speaking businessmen living in Germany (www.thelocal.de) ranks punc-
tuality as the most important aspect of etiquette for doing business in Germany.
Quoting: "1. Be on time. Being late in Germany is a cardinal sin. Seriously.
Turning up even …ve or ten minutes after the arranged time - especially for a
…rst meeting - is considered personally insulting and can create a disastrous …rst
impression. Minimise reputation damage by calling ahead with a watertight ex-
cuse if you’re going to be held up" This advice is echoed on many international
business websites and guides to German etiquette.1
We compare three treatments with di¤erent degrees of monitoring and incen-
tives for work quality. The …rst treatment (no monitoring) entails no monitoring
at all. We contrast this treatment to treatments with monitoring and incentives.
We consider two alternative monitoring and incentive schemes. The …rst scheme
is a "low pain, low gain" incentive scheme (monitoring & mild incentives), which
introduces a productivity target that is relatively easy to pass and a low penalty
for failing to meet it. The second is a "high pain, high gain" incentive scheme
(monitoring & harsh incentives), which introduces a di¢ cult productivity tar-
get and a high penalty for failing to meet it. These two schemes are interesting
1See for exam ple w ww2.uni-frankfurt.de/46329991/Guide-to-German-culture_and-
etiquette.pdf and www.kwintessential.co.u k/etiquette/doing-business-germ any.html
because it is not clear a priori which of the two triggers greater e¤ort. Harsh
incentives may discipline workers and increase productivity, but incentives may
also discourage the workers if the target is perceived as not worthwhile achiev-
ing. Thus, the e¤ects of these incentive schemes on productivity are unclear ex
We …nd evidence for negative spillover e¤ects that appear as soon as moni-
toring is introduced. Speci…cally, we …nd that tardiness increases substantially:
The fraction of participants who show up late increases by 35% as soon as moni-
toring is implemented, and the magnitude of the increase is similar independent
of the incentives. Theft, on the other hand, remains constant across treatments:
On average, 10% of the participants steal coins. In our experiment, the direct
e¤ect on work quality seems to be driven by incentives. We …nd a positive ef-
fect on work quality only when incentives are harsh. Mild incentives lead to no
improvement in work quality at all, while harsh incentives reduce the number
of mistakes by 40%. In a companion laboratory experiment, we replicate this
result and …nd that the combination of the productivity target and the penalty
is crucial to determine the e¤ectiveness of incentives.2
Overall, our experimental results reveal negative spillover e¤ects of moni-
toring on unmonitored productivity dimensions. The positive direct e¤ects of
monitoring seem to be contingent on harsh incentives and cannot be achieved
by monitoring per se. Our results are most supportive of an interpretation re-
lated to negative reciprocity, whereby workers wish to punish the principal (for
monitoring them) and do so in the least costly manner for themselves (both in
monetary and non-monetary terms).
Our results suggest that monitoring can only be e¢ cient in combination with
harsh incentives. Whether or not monitoring with harsh incentives is e¢ cient
depends on the ratio of the gains in the monitored productivity dimension to
the losses in other unmonitored productivity dimensions.
The rest of the paper is structured as follows: We present the experimen-
2In this laboratory experiment we vary the threshold and the penalty indep endently. We
brie‡y describe the design and …ndings in the Results section. For a detailed description,
please see the Appen dix.
tal design in Section 2 and present the results in Section 3. We discuss the
interpretation of the results in Section 4 and conclude in Section 5.
2 Experimental design and procedure
We recruited students to support a research project. The task is adapted from
Belot and Schröder (2013) and consists of identifying the value and country of
origin of euro coins that were collected in various countries in the euro zone.
Participants in our experiment had one day to complete the task from home and
were requested to return the work materials at a speci…c deadline. Our design
has several methodological advantages. It involves a job that could realistically
be advertised by an economics department and that can be executed in a natural
work environment, i.e., workers can take the coins home rather than working in
an experimental laboratory. Additionally, we can observe multiple dimensions
of productivity that arise naturally: Participants can do a poor job, be late in
completing the job or steal some of the coins. Still, it is straightforward for us to
design a monitoring scheme targeting only one of these dimensions. Also, in this
job, participants who failed to comply in any of these three dimensions can be
categorized as behaving counterproductively, since it is possible for participants
to do a perfect job, provided they are willing to do it.
We recruited student workers via a notice posted at various places on the
campus of the University of Magdeburg. Interested students were asked to
contact the research team by email. Those who had not participated in any
previous related studies received a response mail brie‡y explaining the task.
In the email, we suggested two collection dates with the corresponding return
dates and asked students to choose one of them.3At collection, each participant
received standardized verbal instructions on how to perform the job and on the
monitoring procedure.4After answering all open questions in a standardized
way, we asked participants to indicate the exact time at which they would return
3Collection was always either M ond ay or Wedn esday in the m orning b etween 10:00 a.m.
and 12:30 p.m. and return was the next day between 3:30 p.m. and 6:00 p.m.
4For a detailed overview on the written and verbal communication as well as the work
material, please refer to the online app endix.
the coins the next day.5
We contrast one treatment with no monitoring and incentives to two treat-
ments with monitoring and incentives. In the no monitoring treatment, there
is no monitoring at all. In the two monitoring treatments, 1 out of the 4 boxes
is checked. Before starting to work, participants in both monitoring treatments
were informed that 1 out of the 4 boxes would be checked after returning the
coins. While we kept monitoring …xed in these two treatments, we varied the in-
centives associated with monitoring. In the monitoring & mild incentives treat-
ment participants were allowed to make 10 mistakes. If we found more than 10
mistakes in the box randomly chosen for checking, the participant would only
receive e19 instead of e20. In the monitoring & harsh incentives treatment,
the threshold number of mistakes was only 2. If we found more than 2 mistakes
in the checked box, the participants’payment was only e5 instead of e20. The
…rst incentive scheme is mild: It is an easy threshold to pass and the penalty is
small. The second incentive is harsh: It leaves little room for mistakes and the
penalty is large.6Note that we played on two variables at the same time to vary
the incentives (threshold and penalty) and chose combinations of the two that
are probably most common in the workplace. However, to get more insight into
how the incentive schemes work (and a¤ect performance in the monitored task
in particular), we conducted additional treatments in a laboratory experiment
that vary the penalty and the threshold independently (in a 2x2 design). We
will comment more extensively on the results in the next section.
Ninety one students participated in this study, 30 each in the no monitor-
ing and monitoring & mild incentives treatments and 31 in the monitoring &
harsh incentives treatment. All participants were allowed to take the materials
home. They received a catalog illustrating the most common euro coins and four
identi…cation tables. Each participant received a set of 4 boxes of euro coins
5We gave participants enough time to check their schedule for the best suitable time in the
time horizon between 3:30 p.m. and 6:00 p.m. On ce a participant had decided on the exact
return time, we noted th e time in our calendar and wrote the time on a sheet of paper that
was handed to the participant.
6The incentive scheme was framed in a neutral language for participants. We did not use
the words reward or punishm ent.
collected in 4 di¤erent countries of the euro zone. The lid of each box indicated
the country the coins were collected in. Within one set, the composition of boxes
varied with respect to the value and the number of coins. Across sets, however,
the composition of boxes was similar. Each participant received a total of 780
coins with a value of e114.70.
When participants returned the work materials, we wrote down the exact
time the materials were returned. We also asked the participants for an estimate
of the time they had worked on the task, for their …eld of study, and we recorded
the gender. Participants in the no monitoring treatment immediately received
the full payment of e20 in cash. Participants in the two monitoring treatments
directly received the sure part of the payment and could collect the remaining
part later (usually a day later) if they met the work quality requirements of
the corresponding treatment. Participants were informed about the payment
procedure before working on the task.
Compared to the no monitoring treatment, the two monitoring treatments
are associated with a di¤erent payment procedure that generates some incon-
venience for participants. We see this as a necessary and inherent part of in-
troducing the monitoring technology. If we would have asked participants in
the no monitoring treatment to come back a day later to collect their payment,
they may have felt monitored as well. Given the nature of the task, it was
impossible to run the monitoring treatments without having particpants com-
ing back. Nevertheless, we believe such inconveniences are not atypical and are
often an inherent part of a monitoring scheme. In many real world examples,
monitoring is indeed associated with inconveniences for the worker, e.g., moni-
tored workers have to write extra reports, make detours in order to reach central
time measurement stations, cope with delays due to quality control, or bear the
discomfort of camera surveillance. Thus, we are convinced that inconveniences
are a natural elemant of monitoring mechanisms.
When the experiment was over, we checked all returned materials with re-
spect to coin composition and mistakes in the identi…cation task. Whenever we
observed deviations in the composition of coins, we replaced coins with identical
coins or coins with similar collector’s value before handing the materials to the
3.1 Summary statistics
Table 1 shows summary statistics for the behaviors of interest across the three
treatments. Regarding the productivity in the monitored dimension …rst, we
…nd that the quality of work is on average higher in the monitoring & harsh in-
centives treatment than in the no monitoring and monitoring & mild incentives
treatments. In fact, quality in the no monitoring and the monitoring & mild
incentives treatments is very similar. In these two treatments, workers make 10
mistakes on average (2.5 per box), while they make on average 7 mistakes (1.7
per box) in the monitoring & harsh incentives treatment.
Looking more in detail at the distribution of mistakes, we …nd that most
boxes have fewer than 2 mistakes, but this share is larger in the treatment
with harsh incentives (It is 76.1% in the no monitoring treatment, 71.7% in
the monitoring & mild incentives treatment, and 83.1% in the monitoring &
harsh incentives). Most boxes have fewer than 10 mistakes, suggesting that
this threshold was indeed an easy threshold to reach (97% in the no monitoring
treatment, 95% in the monitoring & mild incentives, and 98% in the monitoring
& harsh incentives treatment).7
7In the m onitoring & mild incentives treatment, all checked boxes were b elow the tolerated
numb er of m istakes. H alf of the participants in the monitoring & mild incentives treatm ent
came back to collect the remaining payment. Com paring those participants who collected the
remainin g paym ent to those w ho did not, we do not …nd signi…cant di¤erences in the number of
mistakes m ade (U-test, p>0.10, two-tailed), stealing (Fisher Exact Test, p>0.10, two-tailed),
or punctuality (Fisher Exact Test, p>0.10, two-tailed). In the m onitoring & h arsh incentive s
treatment, 5 participants did not meet the quality requirements. Of the 26 participants who
met the requirements, 24 came back to collect the remaining payment.
Table 1 Summary statistics (standard deviations in brackets)
no monitoring monitoring
& mild incentives
& harsh incentives
(1) (2) (3)
avg. total no. of mistakes in all 4 boxes 10.23 (16.23) 9.97 (13.45) 6.90 (10.93)
% boxes with 0-2 mistakes 76.1% 71.7% 83.1%
% boxes with 3-10 mistakes 20.6% 23.3% 14.4%
% boxes with more than 10 mistakes 3.3% 5.0% 2.5%
% participants on time (within 5 min) 56.7% 33.3% 35.5%
% participants too early ( 1 min.) 46.6% 33.3% 35.5%
median advance in min. (if early) 11 (584.90) 20 (17.04) 10 (130.31)
% participants too late (1 min) 13.3% 43.3% 45.2%
median delay in min. (if late) 4 (6.29) 5 (15.48) 8 (38.93)
no. of participants who stole coins 3 3 3
avg. reported working time (in min) 111.83 (42.6) 112.5 (45.0) 124.5 (47.7)
% participants eligible for full payment 100% 100% 83.9%
% collected full payment if eligible 100% 50% 92.6%
Turning to the other dimensions of productivity, we …nd that punctuality
varies substantially across treatments. The percentage of participants showing
up on time is much higher in the absence of monitoring. Figure 1 illustrates
a histogram of the deviation from the appointed return time for the separate
treatments.8While only four participants in the no monitoring treatment came
back late (compared to sharp punctuality), more than 40 percent showed up late
in the two monitoring treatments. In all treatments, a substantial fraction of
the participants came back too early.9
Turning to theft, we …nd that 10% of the participants (9 out of 91 par-
ticipants) steal coins. The prevalence of theft is identical across treatments.
8In the graph, we exclude outliers with a deviation ab ove 50 minutes.
9It is unclear what causes participants to come back early. It could be that they try really
hard not to be late and take any potentially delaying eventualities (that do not occur) into
account. However, it could also be plain unpunctuality. Also, the consequences of coming
back ea rly are di¤erent to those of coming back late. By waiting, early participants can still
be on time. T his is clearly not th e case for late participants.
Most delayed participants returned the coins within the time frame. Only one participant
(in the moniting & harsh incentives treatment) returned the coins after 6:00 p .m. For early
participants, we …nd that 15 participants (3 in the no monitorn ig and 6 in each monitoring
treatment) returned the work material before 3:30 p.m .
0.05 .10.05 .1
-50 050 -50 050
monitoring & mild incentives monitoring & harsh incentives
no monitoring overall
deviation in minutes
Figure 1: Deviation from the appointed return time
Overall, it seems that theft in our experiment is motivated by the collectors’
value of coins, rather than the nominal value of circulating coins. Participants
especially steal coins that at the time of the experiment were rarely found in
Germany, such as coins from the Vatican, Slovenia, or Slovakia. These are coins
that have a higher collectors’value than their actual nominal value. For exam-
ple, in three cases a 50 cent coin from the Vatican was stolen. On the German
ebay platform this coin was sold for e3 (plus shipping) at the time of the experi-
ment. In two cases (that occurred in di¤erent treatments) participants replaced
coins from the Vatican with other coins that had the same nominal value. We
categorize these acts as theft as the participants did not inform us that they
replaced the coins.
Our results allow us to observe multiple dimensions of counterproductive be-
havior. We …nd that counterproductive behavior in the di¤erent dimensions is
not correlated, i.e., participants who behave counterproductively in one dimen-
sion are neither more nor less likely to behave counterproductively in another
dimension than other participants. Comparing individuals who steal to those
who do not steal, we do not …nd a signi…cant di¤erence in tardiness (U-test,
p>0.10, two-tailed) or the number of mistakes (U-test, p>0.10, two-tailed).
Further, the number of mistakes is not correlated with the delay in minutes
(Spearman Correlation, p>0.10, two-tailed).
3.2 Regression analysis
We now present a regression analysis of the number of mistakes and tardiness
(we do not analyze theft since there is no variation across treatments), which
allows us to control for some observable characteristics of the workers. Starting
with work quality, Col. (1) shows the results of a Poisson regression.10 We …nd
that there are 40% less mistakes under the monitoring & harsh incentives treat-
ment than under no monitoring. On the other hand, we observe no signi…cant
di¤erences between monitoring & mild incentives and no monitoring. It seems
that monitoring alone does not have an e¤ect on work quality. Work quality is
only improved if monitoring is associated with harsh incentives.
Turning to punctuality, we …rst run a regression (Col. (2)) on whether the
participant showed up on time (within 5 minutes of the appointed time). We
…nd that participants are signi…cantly less likely to show up on time as soon
as monitoring is introduced. Participants are 22 and 20 percent less likely to
show up on time in the monitoring & mild incentives and monitoring & harsh
incentives, respectively. One question is whether participants show up late be-
cause they put more e¤ort into the identi…cation task. We asked participants
how much time they spent on the task and the average reported working time
was 112 minutes for the no monitoring treatment, 113 minutes for the monitor-
10 T he distribution of the numb er of mistakes is not norm al. There is a substantial fraction
of zeros and small positive values. In those cases, count data mo dels are more appropriate.
This is why we use a Poisson regression.
ing & mild incentives treatment, and 124 minutes for the monitoring & harsh
incentives, with none of these di¤erences being statistically signi…cant (U-test,
p>0.10, two-tailed). Since the average time reported is far below 24 hours, it is
unlikely that participants were under time pressure. In Col. (3) we nevertheless
control whether the reported working time and the quality of work explain the
di¤erences in punctuality. In Col. (4) we additionally control for the day of
the week on which participants had to return the work material, for the time
coins were collected, and for the appointed return time. The results remain
unchanged when controlling for these additional variables.
The question is whether this decrease in punctuality is driven by the fact
that more participants come early or whether it is driven by more participants
coming late. Col. (5-10) look at the probability of returning the work material
early or late (compared to sharp punctuality). We only …nd signi…cant di¤er-
ences in the probability of being late. Participants are 35% and 36% more likely
to be late under monitoring & mild incentives and monitoring & harsh incen-
tives, respectively (Col. (8)). The e¤ects of monitoring remain if we control for
the total number of mistakes and the reported work time (Col. (6) and (9)),
which indicates that there is no relationship between e¤ort in the identi…cation
task and tardiness. We also control for the day of the week, the actual collec-
tion time, and the appointed return time (Col. (7) and (10)). Again, we …nd
that participants are signi…cantly more likely to be late in the two monitoring
treatments compared to the no monitoring treatment.11 It seems that introduc-
ing monitoring per se results in a negative spillover e¤ect on punctuality and
that these spillovers are una¤ected by the level of incentives associated with
11 Interestingly, we also …nd sign…cant e¤ects of the day of the week and the app ointed return
time on the probability of being late. Participants who return the work material on a Tuesday
are 23 p ercent more likely to be late compared to participants who retu rn the material on
a Thursday. Further, the probability of being late decreases the later the app ointed retu rn
Table 2 Regression analysis
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
monitoring & mild incentives .003 -.221 -.229 -.243 -.137 -.136 -.120 .348 .356 .372
(.082) (.117)* (.117)* (.118)* (.119) (.119) (.123) (.132)** (.132)*** (.143)**
monitoring & harsh incentives -.407 -.205 -.240 -.250 -.105 -.109 -.135 .363 .389 .491
(.089)*** (.117)* (.117)* (.118)** (.120) (.122) (.122) (.129)*** (.131)*** (.136)***
female -.298 -.034 -.047 -.031 .070 .066 .064 -.000 .015 .027
(.073)*** (.107) (.107) (.110) (.105) (.106) (.108) (.103) (.104) (.109)
total mistakes - - -.007 -.007 - -.001 -.001 - .005 .005
(.004) (.005) (.004) (.004) (.004) (.004)
reported work time - - .001 .001 - .000 .000 - .000 .001
(.001) (.001) (.001) (.001) (.001) (.001)
Tuesday - - - -.001 - - -.166 - - .231
(.112) (.108) (.111)**
collection time - - - -.074 - - -.022 - - .084
(.072) (.066) (.066)
app. return time - - - -.024 - - .0589 - - -.080
(.034) (.039) (0.037)**
constant 2.435 - - - - - - - - -
pseudo) R2.027 .034 .056 .070 .014 .016 .050 .081 .098 .182
Obs. 91 91 91 91 91 91 91 91 91 91
*signi…cance at p<0.10, **signi…cance at p<0.05, ***signi…cance at p<0.001, Margin al e¤ects are reported for Probit estimates in Col. (2-10).
Dep end ent variables: Col. (1): Numb er of mistakes in the identi…cation task, Col. (2-4) dummy indicating whether the participant showed up on time
(within 5 minutes of the appointed time), Col. (5-7) dum my indicating wheth er the participant showed up early (com pared to sharp punctuality),
Col. (8-10) dummy indicating whether the participant showed up late (compared to sharp punctuality).
Our experimental design varies the incentives by playing on two variables at
the same time: the threshold and the size of the reward for meeting the thresh-
old. Since we see a substantial increase in the productivity in the identi…cation
task with harsher incentives, the question is whether this increase is driven by
the higher penalty, the more di¢ cult threshold, or both. To see how these two
variables a¤ect work quality independently of each other and in combination,
we conducted additional treatments in a laboratory setting where we varied the
threshold and the penalty in a 2x2 design. We …nd that both matter: a higher
penalty increases productivity and a more di¢ cult thresholdfurther reinforces
the productivity increase when the penalty is high. Harsh incentives (di¢ cult
threshold, large penalty) appear to be the most e¤ective way of triggering e¤ort,
while a di¢ cult threshold with a small penalty seems to be least e¤ective. In
the latter case (di¢ cult threshold, small penalty), incentives have an adverse
e¤ect as the number of mistakes is substantially higher than in the absense of
incentives (no threshold, no penalty).We present these results in the Appendix.
We …nd that monitoring has a negative e¤ect on punctuality. Independent
of the level of incentives associated with monitoring, punctuality signi…cantly
decreases as soon as monitoring is introduced. What drives this crowding out
e¤ect? In the following we will summarize some existing theories on crowding
out e¤ects and will discuss whether they can explain the observed behavior in
One mechanism that has been proposed to explain crowding out e¤ects is
through information. Bénabou and Tirole (2003) argue that monitoring could
negatively a¤ect workers’perception of a task. Workers who are monitored infer
that the task is di¢ cult or unpleasant and as a consequence put less e¤ort into
the monitored task (Bénabou and Tirole 2003).
Sliwka (2007) proposes that monitoring could reveal information about peers’
behavior. In his model, monitoring work quality signals that the principal ex-
pects a large fraction of workers to work sloppily. Workers who aim at behaving
conform to their peers respond to this signal and choose to behave sloppily as
well. It is important to note that in our task the signal is only informative for
peers’behavior in the monitored productivity dimension. We showed in the re-
sults section that individuals who work sloppily are neither more nor less likely
to steal or to be late. Thus, a signal on peers’work quality is not informative
on their behavior in other productivity dimensions of our experiment. Both the
model by Bénabou and Tirole (2003) and the model by Sliwka (2007) only pre-
dict crowding out e¤ects on the monitored productivity dimension and cannot
explain our observation that crowding out e¤ects spill over to other productivity
Another mechanism driving crowding out e¤ects could be reciprocity (Ra-
bin 1993; Frey 1993). There are multiple ways by which monitoring negatively
e¤ects workers. For a given level of e¤ort, monitoring e¤ectively reduces the
expected payment for a worker because it is associated with a …ne. Addition-
ally, workers infer inconveniences due to the process of monitoring. Monitoring
may further reduce workers’utility due to a reduction in autonomy. Reciprocal
workers may want to reduce the principal’s payo¤ as a consequence of the re-
duction in their own utility (Rabin 1993; Dufwenberg and Kirchsteiger 2004).
It could also be that workers reciprocate distrust. Monitoring and incentives
(independent of the level) may be perceived as a signal of distrust, and workers
may reciprocate distrust by being less trust worthy, i.e., by caring less about
the payo¤ of the principal (Frey 1993).
In a multi-dimensional context, workers should always choose the cheapest
way of reciprocating. In our design, there are three ways in which workers can
negatively reciprocate: (1) They can put less e¤ort, (2) they can steal coins, and
(3) they can be late in returning the work material.12 The …rst way is costly
to the workers because it reduces their expected payment. The other two do
not infer monetary costs for the worker (theft is even associated with monetary
gains) but are associated with costs of breaking social norms. The social and the
12 A ll exp eriments were ru n by the researchers involved in this pro ject. Since monitoring is
not an essential part of a usual work-relation, it is clear that the monitoring choice was made
by the experimenter and that tardiness would a¤ect the experim enter.
legal norm for theft is stronger than that for punctuality (e.g., Robinson and
Bennett 1995). It seems reasonable to assume that tardiness is the cheapest
way of reciprocating. Thus, our …nding that punctuality decreases as soon as
monitoring is implemented is in line with a reciprocity interpretation. It seems
that workers want to retaliate for being monitored by being unpunctual.1 3
With respect to the direct e¤ect of monitoring, we …nd that monitoring
improves work behavior only if it is associated with harsh incentives. If the in-
centives associated with monitoring are mild, monitoring workers does not have
any e¤ect on the monitored productivity dimension. If the incentives are harsh,
the number of mistakes falls signi…cantly. Thus, the improvement in work qual-
ity in the …eld experiment are due to incentives rather than monitoring. In a
laboratory experiment, we disentangle the e¤ect of our two incentive compo-
nents (threshold and penalty). We …nd that a large penalty always results in
a lower number of mistakes compared to a small penalty. With respect to the
threshold, we …nd that a di¢ cult threshold only improves work behavior when
it is associated with a large penalty. The combination of a di¢ cult threshold
and a small penalty has an adverse e¤ect on work behavior as the number of
mistakes made increases substantially compared to a situation without monitor-
ing and incentives. Our …ndings are in line with the existing literature on the
(adverse) e¤ects of incentives on performance (Gneezy and Rustichini 2000b;
Gneezy, Meier, and Rey-Biel 2011) and contribute to this literature in showing
that the combination of threshold and monetary incentives matters.
This paper provides …eld evidence on the e¤ect of monitoring and incentives in a
context where productivity is multi-dimensional and only one of the dimensions
(work quality) is monitored. We observe negative spillovers of monitoring on
unmonitored productivity dimensions. These spillover e¤ects arise independent
13 T he negative e¤ect of monitoring on workers in our exp eriment involves multiple dim en-
sions, e.g., redu ced expected payment, inconveniences associated with the procedure, reduced
autonomy, and distrust. More research is needed to be able to disentangle the e¤ects of the
seperate dimensions of monitoring on work behavior.
of the level of incentives. Thus, they appear to be driven by the mere presence
of monitoring. These observed crowding out e¤ects are in line with a model
of reciprocal behavior. Workers choose to punish the principal for monitoring
them, but they choose to do this through dimensions that have low costs for
We …nd that monitoring improves productivity in the monitored dimension
only if it is associated with harsh incentives. Introducing monitoring and mild
incentives has no e¤ect at all on work quality. Thus, monitoring associated
with mild incentives is ine¢ cient. There is no signi…cant improvement in work
quality and tardiness increases signi…cantly. Monitoring with harsh incentives
is more e¤ective. The number of mistakes falls substantially, but at the same
time the negative spillover e¤ects are as large as in the monitoring treatment
with weak incentives.
Based on these results, we conclude that introducing a monitoring technology
only pays o¤ if (1) the incentives associated with monitoring are su¢ ciently
harsh, (2) the dimensions that cannot be monitored either entail high moral
costs or the relative gains in productivity in the monitored dimension more
than compensate for the losses in other dimensions, and (3) monitoring costs
for the employer are su¢ ciently low.
These …ndings relate more broadly to the literature on adverse e¤ects of
incentives (see Gneezy, Meier, and Rey-Biel 2011 for a recent review) and the
adverse e¤ects of control (Falk and Kosfeld 2006) and monitoring (Frey 1993).
In line with this literature, we …nd that monitoring and mild incentives are less
e¤ective than no monitoring at all.
Appendix A Laboratory Experiment: Threshold
We conducted …ve additional treatments in the laboratory to …nd out how the
threshold and the penalty a¤ect e¤ort in the identi…cation task. In the labora-
tory experiments, we computerized the identi…cation task and asked students
to identify coins on a screen. They had to identify 204 coins that corresponded
to the coins from one of the boxes in the …eld experiment.Since the duration
of the task was shorter (50 minutes on average), we adjusted incentives to make
them comparable to the …eld experiment and to be in accordance with expected
earnings in a typical laboratory experiment.
We introduced a treatment without incentives, where participants were paid
ae10 ‡at fee. Additionally, we ran four treatments with incentives, varying
the threshold and the penalty in a 2x2 design. We o¤ered a e10 payment
to those who met the performance requirements (fewer than 2 or 10 mistakes);
while those who failed would receive either e9.50 (small penalty) or e2.50 (large
penalty). The …ve treatments are summarized in Table A1. Note that T1 corre-
sponds to the "no monitoring" treatment , T2 corresponds to the "monitoring
& mild incentives" treatment, and T5 corresponds to the "monitoring & harsh
incentives" treatment in the …eld experiment.
Table A1 Experimental Design and Number of Participants
no threshold easy threshold
di¢ cult threshold
no penalty T1, N= 30
small penalty (e0.50) T2, N= 32 T3, N= 32
large penalty (e2.50) T4, N= 31 T5, N= 32
We ran sessions for each treatment with a between-subjects design. We had
between 30 and 32 participants per treatment. Sessions were run in the Cologne
Laboratory for Economic Research and subjects recruited via ORSEE (Greiner,
Table A2 summarizes our results from this laboratory study. We replicate
what we …nd in the …eld experiment: Mild incentives (T2) do not signi…cantly
increase e¤ort relative to no incentives (T1) (U-test, p=0.17, two-tailed). How-
ever, harsh incentives (T5) lead to signi…cantly less mistakes than mild incen-
tives (U-test, p<0.05, two-tailed) and than no incentives at all (U-test, p<0.01,
Do these e¤ects come from the change in the threshold or the change in
the penalty? We see that increasing the penalty always decreases the number
of mistakes, irrespective of the threshold (U-test, p<0.10, two-tailed). Making
the threshold more di¢ cult on the other hand leads to a substantial increase in
the number of mistakes made when the penalty is small (U-test, p<0.05, two-
tailed). When the penalty is large (e7.50), a di¢ cult threshold increases the
level of e¤ort compared to an easy threshold, but only slightly (U-test, p<0.10,
These results show that harsh incentives increase productivity through both
channels: a higher penalty increases productivity, and a more di¢ cult threshold
further reinforces the productivity increase when the penalty is high. Harsh
incentives (di¢ cult threshold, large penalty) appear to be the most e¤ective
way of triggering e¤ort, while a di¢ cult threshold with a small penalty seems to
be least e¤ective. In the latter case, it seems that many participants do not put
much e¤ort at all into the task (41% made more than 10 mistakes, compared
to 0% in T5 (harsh incentives), 6% in T2 (mild incentives), and 3% in T1 (no
incentives) and T4 (large penalty and easy threshold)).
Table A2: Average number of mistakes
(standard deviations in brackets)
no threshold easy threshold
di¢ cult threshold
no penalty 3.7 (3.2)
small penalty (e0.50) 4.6 (9.8) 54.5 (77.8)
large penalty (e7.50) 1.9 (2.8) 0.9 (1.4)
The authors thank Uri Gneezy, Bernd Irlenbusch, Karim Sadrieh, and three
anonymous referees for valuable suggestions and comments that lead to substan-
tial improvements. We also bene…ted from comments from participants at the
European Workshop on Experimental and Behavioral Economics in Frankfurt
2013, the Royal Economic Society 2013 Conference, the 2013 Florence Work-
shop on Behavioural and Experimental Economics, and Seminars in Cologne
and Trier.We thank Claudia Gorylla, Markus Hartmann, and Linh Nguyen
for help in conducting the experiments. Financial support by the Institute for
Fraud Prevention and the Deutsche Forschungsgemeinschaft (DFG FOR 1371)
is gratefully acknowledged.
Association of Certi…ed Fraud Examiners. 2012. 2012 Report
to the Nations on Occupational Fraud and Abuse. Available at
report-to-nations.pdf, last access 25.02.2014.
Basu, K., J. W. Weibull. 2003. Punctuality: A Cultural Trait as Equilibrium.
In Economics for an Imperfect World: Essays in Honor of Joseph E. Stiglitz,
ed. R. Arnott, B. Greenwald, R. Kanbur, B. Nalebu¤, 163–182. London: The
Belot, M., M. Schröder. 2013. Sloppy Work, Lies and Theft: A Novel Experimen-
tal Design to Study Counterproductive behavior. Journal of Economic Behavior
and Organization 93 233-238.
Bénabou, R., J. Tirole. 2003. Intrinsic and Extrinsic Motivation. Review of
Economic Studies 70 489–520.
Boly, A. 2011. On the Incentive E¤ects of Monitoring: Evidence from the Lab
and the Field. Experimental Economics 14(2) 241–253.
Dickinson, D., M.-C. Villeval. 2008. Does Monitoring Decrease Work E¤ort?
The Complementary Between Agency and Crowding-Out Theories. Games and
Economic Behavior 63(1) 56–76.
Dufwenberg, M., G. Kirchsteiger. 2004. A Theory of Sequential Reciprocity.
Games and Economic Behavior 47 268–298.
Falk, A., M. Kosfeld. 2006. The Hidden Costs of Control. American Economic
Review 96(5) 1611–1630.
Fisman, R., E. Miguel. 2007. Corruption, Norms, and Legal Enforcement: Ev-
idence from Diplomatic Parking Tickets. Journal of Political Economy 115(6)
Frey, B. S. 1993. Does Monitoring Increase Work E¤ort? The Rivalry with Trust
and Loyalty. Economic Inquiry 31(4) 663–670.
Frey, B. S., R. Jegen. 2001. Motivational Interactions: E¤ects on behavior.
Annales of Economics and Statistics, 63/64 131–153
Gneezy, U., S. Meier, P. Rey-Biel. 2011. When and Why Incentives (Don’t)
Work to Modify Behavior. Journal of Economic Perspectives 25(4) 191-210.
Gneezy, U., A. Rustichini. 2000a. A Fine is a Price. Journal of Legal Studies
Gneezy, U., A. Rustichini. 2000b. Pay Enough or Don’t Pay at All. Quarterly
Journal of Econoimcs 115(3) 791–810.
Greiner, B. 2004. An Online Recruitment System for Economic Experiments. In
Forschung und wissenschaftliches Rechnen 2003, ed. K. Kremer, V. Macho,73-
93. GWDG Bericht 63, Göttingen.
Gubler, T., I Larkin, L. Pierce. 2013. The Dirty Laundry of Employee Award
Programs: Evidence from the Field. Harvard Business School Working Paper
Krupka, E. L., R. A. Weber. 2013. Identifying Social Norms Using Coordination
Games: Why does Dictator Game Sharing Vary? Journal of the European
Economic Association 11(3) 495–524.
Kwintessential. Doing Business in Germany. Available at
last access 25.02.2014.
Nagin, D. S., J. B. Rebitzer, S. Sanders, L. J. Taylor. 2002. Monitoring, Mo-
tivation, and Management: The Determinants of Opportunistic Behavior in a
Field Experiment. American Economic Review 92(2) 850-873.
Rabin, M. 1993. Incorporating Fairness into Game Theory and Economics.
American Economic Review 83(5) 1281–302.
Robinson, S. L., R. J. Bennett. 1995. A Typology of Deviant Workplace Be-
haviors: A Multidimensional Scaling Study. Academy of Management Journal
Sliwka, D. 2007. Trust as a Signal of a Social Norm and the Hidden Costs of
Incentive Schemes. American Economic Review 97(3) 999–1012.
The Local: Germany’s news in English, Ten tips for German business eti-
quette. Available at http://www.thelocal.de/galleries/news/1773, last accesss
University of Frankfurt (International O¢ ce), 2013. Guide to Ger-
man culture, customs and etiquette. Available at http://www2.uni-
02_12_13.pdf, last access 25.02.2014.