ArticlePDF Available

Using tri-reference point theory to evaluate risk attitude and the effects of financial incentives in a gamified crowdsourcing task

Authors:

Abstract and Figures

Crowdsourcing has rapidly developed as a mechanism to accomplish tasks that are easy for humans to accomplish but are challenging for machines. However, unlike machines, humans need to be cajoled to perform tasks, usually through some type of incentive. Since participants from the crowd are typically anonymous and have no expectation of an ongoing work relationship with a task requester, the types of incentives offered to workers are usually short-term monetary bonuses, which have had an inconclusive impact on crowdsourcing worker quality. In this paper, we explore the notion that the risk attitude of crowdsourcing workers may play an important role in the effectiveness of incentives on task accuracy. Traditional utility theories, such as prospect theory, depend on decisions made relative to a singular reference point, whereas the tri-reference point (TRP) theory (Wang and Johnson, J Exp Psychol Gen 141:743–756, 2012) holds that three reference points impact decision making. Using the TRP theory as a guide, we develop a game that provides workers with three reference points and subsequently explores the role of multiple reference points on worker risk aversion and task accuracy.
Content may be subject to copyright.
ORIGINAL PAPER
Using tri-reference point theory to evaluate risk
attitude and the effects of financial incentives
in a gamified crowdsourcing task
Christopher Harris Chen Wu
Published online: 21 February 2014
ÓSpringer-Verlag Berlin Heidelberg 2014
Abstract Crowdsourcing has rapidly developed as a mechanism to accomplish
tasks that are easy for humans to accomplish but are challenging for machines.
However, unlike machines, humans need to be cajoled to perform tasks, usually
through some type of incentive. Since participants from the crowd are typically
anonymous and have no expectation of an ongoing work relationship with a task
requester, the types of incentives offered to workers are usually short-term monetary
bonuses, which have had an inconclusive impact on crowdsourcing worker quality.
In this paper, we explore the notion that the risk attitude of crowdsourcing workers
may play an important role in the effectiveness of incentives on task accuracy.
Traditional utility theories, such as prospect theory, depend on decisions made
relative to a singular reference point, whereas the tri-reference point (TRP) theory
(Wang and Johnson, J Exp Psychol Gen 141:743–756, 2012) holds that three ref-
erence points impact decision making. Using the TRP theory as a guide, we develop
a game that provides workers with three reference points and subsequently explores
the role of multiple reference points on worker risk aversion and task accuracy.
Keywords Utility theory Risk aversion Gamification Tri-reference
point theory Crowdsourcing Financial incentives
JEL Classification D81 D82
C. Harris (&)
Department of Computer Science, State University of New York (SUNY) Oswego, Oswego,
NY 13126, USA
e-mail: Christopher.Harris@oswego.edu
C. Wu
School of Business, Black Hills State University, Spearfish, SD 57799, USA
e-mail: Chen.Wu@bhsu.edu
123
J Bus Econ (2014) 84:281–302
DOI 10.1007/s11573-014-0718-4
1 Introduction
Over the past decade, a number of crowdsourcing platforms have been developed to
serve as a marketplace for microtasks—quick, simple, and discretely-defined tasks
that are straightforward for humans to accomplish yet remain challenging for
machines. This includes tasks such as annotating images, performing relevance
assessments, or validating common sense facts. These online labor markets, such as
Amazon Mechanical Turk (MTurk),
1
have proliferated, matching workers with
requesters for tens of thousands of microtasks each day, lowering per-task costs and
reducing task completion times. However, research on task quality has been highly
variable—although considerable research has demonstrated non-expert quality that
approaches expert quality in some tasks (Callison-Burch 2009; Harris and Xu 2011;
Snow et al. 2008), many other studies have demonstrated poor non-expert worker
quality (Mason and Watts 2010; Sheng et al. 2008).
Economic theory has long established that rational workers will choose to
improve their performance in response to incentives (Laffont and Martimort 2002;
Prendergast 1999). Motivational techniques for workers have been discussed since
at least the dawn of the Industrial Age, but these techniques usually depend on an
ongoing, long-term relationship between workers and employers. In online labor
markets, however, the relationship between crowdworker (as workers in online
labor markets are known) and requester (as employers in online labor markets are
known) rarely spans more than a single set of tasks. Likewise, due to mutual
anonymity, there is only a slight opportunity for an ongoing work relationship to
develop.
To date, the evaluation of incentive schemes in online labor markets has received
scant scholarly attention. Although the literature offers a fair amount of theoretical
guidance, remarkably few empirical studies have been conducted to evaluate
incentive schemes on short-term work arrangements. In this paper, we offer the
following contributions. First, using a game, we ask workers from the crowd to
respond to a series of questions. We offer these crowdworkers a monetary incentive
to provide accurate answers, and evaluate the effectiveness of this incentive. Next,
borrowing from game shows and the theory of economic behavior, we introduce and
evaluate incentive schemes with a ‘walk-away bonus’ condition. This reserve
amount offer, which borrows utility theory, is modified to encourage lower-
performing workers to withdraw from the task while encouraging the higher-
performing workers to complete the entire task. We examine if this reserve amount
offer provides the desired effect on task quality. Last, we vary some of the
incentives to evaluate crowdworker decisions the against three reference points that
comprise the tri-reference point (TRP) theory of behavioral economics: a minimum
requirement (MR), a status quo (SQ) and a goal (G). Most other studies examine
these points relative to games of chance, whereas we evaluate these points against
worker ability. We use our game to establish these points for our workers, observe
their risk attitude and performance, and discuss how our findings fit with these
theories.
1
http://www.mturk.com.
282 C. Harris, C. Wu
123
This paper is organized as follows. In the next section, we discuss the related
work, background and motivation for our study. We then develop our research
questions and explain our experimental design. Last, we present our results, discuss
our findings and conclude with an indication of our planned future work.
2 Background and motivation
We discuss some of the elements involved in our work and illustrate why they are
important to our study.
2.1 Crowdsourcing platforms
Crowdsourcing platforms, such as MTurk, are online labor marketplaces in which
requesters (individuals or corporations) list tasks called ‘‘human intelligence tasks’
along with the compensation to be provided. The compensation usually ranges
between fractions of a penny to ten US cents per task, which may include annotating
an image or choosing the best answer from several answers. Crowdworkers obtain a
task description and then may elect to perform a task. Upon completion of that task,
workers are compensated through the crowdsourcing platform by the task requester.
Requesters also may provide additional incentives to workers in the form of a bonus
payment.
2.2 Worker incentives
Research shows that under certain conditions the provision of extrinsic (e.g.,
financial) incentives can undermine ‘intrinsic motivation’’ (e.g., enjoyment, desire
to help others), possibly leading to poor outcomes (Gneezy and Rustichini 2000;
Heyman and Ariely 2004). Likewise, some studies have found that students who are
already intrinsically motivated actually become discouraged when extrinsic
incentives are introduced, even during games (Kohn 1999; Lepper et al. 1973).
Other studies contradict these findings and illustrate the positive effects of tangible
extrinsic rewards in many settings, such as (Cameron and Pierce 1994; Eisenberger
and Cameron 1996,1998). An excellent overview of the literature on the effects of
extrinsic incentives is discussed by Deci et al. (1999).
In crowdsourcing, Mason and Watts (2010) examined financial incentives on two
tasks offered through MTurk. They found that increased financial incentives may
increase the quantity, but not necessarily the quality, of work performed. The
authors indicate this could be due to the result of the ‘anchoring’’ effect, where
workers who were paid more believed the value of their work to be greater, and
were no more motivated than workers who received smaller financial incentives.
Harris examined various types of financial incentives with MTurk in a resume
evaluation task using the crowd (Harris 2011). This human resources task offered
three treatments—one where crowdworkers were given positive incentives for
matching resume ratings with an oracle, another negative incentive where
crowdworkers would have compensation removed for providing an answer that
Tri-reference point theory 283
123
differed from an oracle, and a third offering a combination of both positive and
negative (hybrid) incentives. The hybrid scheme provided the most risk averse
ratings from the crowd, but outperformed the other incentive schemes.
Shaw et al. (2011) examined the role of incentives on non-expert raters using
controlled experiments on MTurk. They evaluated 14 incentive schemes designed to
increase task accuracy, however, only 2 of these incentive structures significantly
improved worker accuracy. We use one scheme from their study design as a
foundation for our own experiment, which we discuss further in the Sect. 4.2 of this
paper.
2.3 Risk attitude
Risk attitude plays an essential role in evaluating incentives under uncertainty.
Humans are typically grouped into one of three risk attitude categories: risk-averse
(or risk-avoiding), risk-neutral, or risk-loving (or risk-seeking). Risk attitude is often
evaluated based on the Arrow–Pratt measure of absolute risk aversion (Arrow 1971;
Pratt 1964). This measure associates one of three risk attitudes with the curvature of
an individual’s utility function: risk neutral individuals have linear utility functions,
risk seekers have convex utility functions and risk averse individuals have concave
utility functions, as illustrated in Fig. 1.
From Fig. 1, we can examine the ratio of DU(W), the change in utility (or value)
of the increased reward and DW, the change in certain compensation, for different
attitudes towards risk. This simplified view of the Arrow–Pratt measure states that
for risk seekers, DU(W)/DW[1, for risk neutral individuals, DU(W)/DW=1, and
for risk averse individuals, DU(W)/DW\1. The estimate of relative risk aversion
varies for different populations (Choi and Menezes 1992; Hartley et al. 2006), and
we know of no other published studies on the risk attitude of crowdworkers.
From a behavioral perspective, several studies provide insight on people’s view
of risk. Kahneman and Tversky (1979) investigated prospect theory, or how risk is
evaluated by people in decision-making events (Tversky and Kahneman 1981).
According to their paper on prospect theory (1979), whether an outcome is
perceived as a gain or as a loss compared with the reference point has a strong
influence on risk attitudes. In general, people exhibit risk aversion when they hold a
gain, risk-seeking behavior when they face a loss, and a greater aversion to losses
than appreciation of gains by a factor of nearly two-to-one.
Others studies attempt to identify situations where risk aversion is likely to occur.
Gal (2006) illustrated that loss aversion phenomena are more likely to result from
inertia than from loss/gain asymmetry, but found that loss aversion may be more
salient when people are in a competition. A recent study by Gil and Prowse (2012)
demonstrated that people are loss averse in a competitive environment that involves
real effort. In another study, Harinck et al. (2007) found that loss aversion does not
exist when the payout magnitudes are relatively small. Holt and Laury (2002)
examined risk aversion and the effects of incentives and found that when the payout
was tangible (i.e., paid in cash), people became more risk averse. In our study, we
evaluate the effects of risk attitude in a competitive environment requiring real
effort, with small payments offered in cash.
284 C. Harris, C. Wu
123
Others have investigated risk and loss aversion in game show environments.
Deck et al. (2008) and Post et al. (2008) examined risk attitude on ‘‘Deal or No
Deal’’, while Hartley et al. (2006) and Lam et al. (2002) examined risk attitude on
the popular game show ‘Who Wants to Be a Millionaire’’. Each discovered that risk
aversion is affected by the scale of the financial incentive at risk as well as by the
probability of success. Game show design typically involves a rapid increase in task
difficulty as the game progresses, making the increased risk apparent to the player
through various clues (such as a higher walk-away amount being offered to the
player).
Risk aversion studies that offer insight on the decisions made by game show
participants are not easily duplicated through a traditional crowdsourcing task
design. This motivates our use of a game to examine people’s perspectives on risk.
First, crowdsourcing platforms do not offer a rapidly-increasing level of difficulty
and a corresponding increase in reserve payments. With the growth of games with a
purpose (GWAP) (Von Ahn 2006) and serious games (Abt 1987), game-like designs
that adjust incentives for each player are likely to occur. Also, the amount of game
show winnings are often a large percentage of a contestant’s annual income and thus
provide a large wealth effect, which are a challenge to replicate with the relatively
small bonuses offered on crowdsourcing microtasks. Although not explored in this
study, other types of incentives, such as recognition for superior performance, have
been shown to be as powerful as financial incentives (Locke 1968; Mason and Watts
2010).
2.4 Reserve (walk-away) amounts
Most discussion in the literature about reserve amounts (also called reservation or
walk-away amounts) comes from negotiation theory and the aforementioned studies
Fig. 1 Utility function shapes for risk averse, risk neutral, and risk seeking individuals
Tri-reference point theory 285
123
on game show risk attitude. Van Poucke and Buelens (2002) explain that a reserve
amount is an indifference point where a negotiator should be indifferent between
accepting the offer or ending the negotiation, i.e., accepting the walk away amount.
It therefore represents the lowest outcome a negotiator is willing to accept. In this
paper, we regard the reserve amount as the compensation offered to a worker if they
choose to quit the task. This represents the point of indifference between continuing
with the game (to seek a greater bonus) and quitting the game. The walk-away
amount is typically a percentage of the cumulative bonus earned by that worker at a
specific point in the game, adjusted for risk. Since a walk-away amount represents
‘certain compensation,’’ accepting this amount when there is a likelihood of higher
(but less certain) compensation from continuing the task represents a form of risk
aversion as long as the amount offered is less than the potential bonus amount.
2.5 Tri-reference points
One basic assumption in utility-based theories such as prospect theory is that there is
a single fixed reference point upon which decisions are made. This point is usually
defined as the decision maker’s current level of wealth, or status quo. Koop and
Johnson have recently shown (Koop and Johnson 2012) that decision makers, in a
single decision setting, seem to be sensitive to multiple reference points as inputs.
While the status quo could be considered the most commonly-applied benchmark in
utility theories, a number of studies have shown that reference points besides the
status quo can also have a significant impact on decision behavior. Several studies
have examined the impact of goals and aspirations as reference points (Chiles and
McMackin 1996; Heath et al. 1999; Lopes and Oden 1999; March and Shapira
1992; Sullivan and Kida 1995) or the impact of survival requirements on behavior
(Lopes and Oden 1999). These minimum requirements typically refer to the
minimum safe amount that a participant is willing to accept when evaluating a
decision. The evaluation of decisions made using these three reference points–status
quo, goals, and minimum requirements—and the actions taken by the decision
maker in response make up the TRP theory.
Wang and Johnson (2012) introduced a TRP theory of risky decision making that
explicitly considers the effects of the three reference points: MR, SQ, and G. This
TRP theory makes several specific assumptions about reference dependence (Wang
and Johnson 2012). First, it assumes that decision makers simultaneously desire to
surpass a G, stay above a MR, and improve from their SQ. These reference points
effectively carve the outcome space into distinct regions of failure (below MR), loss
(at or above MR but below SQ), gain (between SQ and G), and success (at or above
G). The asymmetries in subjective value that prospect theory often attributes to the
SQ (i.e., ‘‘losses loom larger than gains’’) also appear around these other reference
points.
TRP theory is applicable to incentives offered to the crowd and through games
for several reasons. First, when individuals utilize multiple reference points, unique
patterns of risk-related behavior emerge. For example, a study of investment
managers (Sullivan and Kida 1995) found that contrary to the tenets of prospect
theory, these investment managers were not universally risk averse when their
286 C. Harris, C. Wu
123
performance was above their SQ, but if there was little danger in falling below their
SQ, they tended to be risk seeking in order to reach another reference point, G.
Sullivan and Kida hypothesized that when multiple reference points are important
for assessment, each of these reference points will concurrently impact decision-
making behavior, however with crowdworkers, it remains to be seen how these
points will affect accuracy.
Koop and Johnson (2012) conducted a study which provided participants with
pairwise choices with monetary bonuses specifically designed to examine the notion
that people use multiple reference points. Three reference points were established
simultaneously by providing participants with an initial amount to gamble (SQ),
presenting participants with the possibility of earning bonus entries into a raffle (G),
and retaining the possibility of failing to gain entry to the raffle (MR). They found
four distinct regions: decision makers that were at or above G were risk averse, but
those between SQ and G were risk seekers as they attempted to meet or exceed G.
Decision makers who were in the region between MR and SQ were risk averse, but
those below MR were in ‘survival mode’ and were risk-seeking.
To examine if this theory holds with crowdworkers, we consider the effects of the
TRPs in our game. In Fig. 2, we observe at point O, participants are provided an
amount they begin the task with, SQ, a goal payment, G, and a minimum
requirement they will accept, MR. As a player progresses through the game and
reaches point A, the bonus payment is increased or reduced for each question
missed, and SQ
OA
BG. A rational player will evaluate his or her score based on his
or her performance on questions between O and A, and extrapolate the amount of
their bonus to SQ
OA
, the player compensates by adjusting their risk attitude. If at
point A, a walk-away option, A0, representing a fixed percentage of A is offered to
the player, he or she must evaluate the likelihood and payoffs of several bonus
amounts: the goal, G, the status quo, SQ, the minimum requirement amount, MR,
and the reserve amount, A0.IfSQ
OA
\the minimum requirement, MR, this rational
player is likely to accept the certain payment (the reserve amount, A0) as the best
option and quit the game.
If the player chooses to continue, e.g., not accept the walk away amount, and
reaches point B, these decisions are recalibrated based on SQ
OA
, an extrapolation of
the player’s performance between O and B. Decision points (G and SQ
OB
) are then
evaluated to reflect the player’s task performance and a once again a decision is
offered to continue or quit and accept the walk away amount, B0.
3 Research questions
In this study, our objective is to evaluate financial incentives and how they affect
non-expert worker quality. Specifically, we examine the following research
questions.
1. Will an incentive scheme that provides an additional monetary compensation
for correct answers but reduces compensation for wrong answers increase
crowdworker accuracy?
Tri-reference point theory 287
123
2. Will an incentive scheme that provides a walk-away amount entice the least
accurate workers to quit the task early, leaving only the most accurate workers
to remain and increase overall accuracy?
3. Do crowdworkers demonstrate the same risk attitudes as those explained by the
TRP theory?
4 Experimental design
We solicited participants through the use of MTurk. We indicated that they would
be compensated $0.25 for completing the task and pre- and post-task surveys, but
may qualify for a bonus of up to $3.00. Those who agreed to participate were
redirected to a unique URL. The base compensation amount of $0.25 represents less
than $0.01 per question—well within the range of compensation typically offered on
MTurk for this level of effort.
4.1 Metrics
Accuracy is the most appropriate quality indicator for this task evaluation; we
therefore measure mean accuracy for each treatment. As with many game show
designs, performing well in our game required knowledge of uncommon facts; in
our case, we had participants match descriptions with titles of English-language
films scheduled to be released at least 3 months after our task was offered (see
Fig. 3for an example). The questions in this task were designed to be challenging
(i.e., have a mean accuracy rate of\50 %) and therefore we are able to evaluate the
trade-off choices crowdworkers make. We also examine the retention rate, which is
the percentage of participants who answer each question, to evaluate if reserve
bonus treatments are effective in encouraging the low-accuracy workers to
withdraw from the task early, leaving the high-accuracy workers to complete the
entire 30-question task. Last, we evaluate the risk attitude, or ratio of decisions
made that use insurance to those made that do not use insurance.
Fig. 2 Evaluating expected bonus payment reference points SQ, G, and MR in a series of tasks
288 C. Harris, C. Wu
123
4.2 Game design
To evaluate the role of incentive structure in task accuracy, we use a Flash-based
game. In relevance assessment tasks, games have been shown to provide quality
inputs with less spam and at lower cost than crowdsourcing (Eickhoff et al. 2012).
GWAP, a rapidly-growing area of research, are often designed for repetitive tasks
such as labeling images (Von Ahn and Dabbish 2004) or evaluating common sense
facts (Von Ahn et al. 2006). The task’s game format allows us to dynamically
evaluate and vary incentives with the crowd that would be challenging to
accomplish through a standard crowdsourcing platform.
Participants were randomly assigned each to either a control group or to one of
three treatment groups. At the completion of the game, all participants were given a
three-question survey on their game experience. We asked if the time given to
answer questions was sufficient, if the task was challenging enough, and for those
players that chose to leave before the game’s completion, we asked them to provide
their reasons for doing so. Players could only participate in our study only once and
all IP addresses were logged.
In each round of our game, a worker is presented with an image or short text
snippet, along with four answer choices, only one of which was correct. The player
(worker) is instructed to select the answer that represents the consensus choice—the
one the player believes a majority of other players will select based on the image or
text snippet. The wording resembles that used the ‘Bayesian Truth Serum’’
incentive described by Shaw et al. (2011). This incentive type was one that
demonstrated a significant increase in worker accuracy over their baseline. In their
analysis, these authors inferred two reasons for this increased accuracy: first,
judgment based on consensus created some confusion among subjects about how
exactly they were being evaluated; second it created an incentive for subjects to
think carefully about the responses of other subjects. This unexpected combination
of confusion and cognitive demand probably elicited greater engagement with the
question, which encouraged better performance.
With consensus voting mechanisms, there are a few extra issues to address. First,
it is necessary to avoid the ‘cold start’’ problem frequently faced by consensus-based
Fig. 3 Screenshot from the game (left), with the offer of insurance after the user’ selection (right)
Tri-reference point theory 289
123
schemes such as recommender systems. In these schemes, the first participant is
evaluated with very limited information: they are required provide an answer that
matches a consensus decision, but no votes have been received, rendering a
consensus vote impossible. We addressed this by priming each question: we provided
three initial votes for the answer matching the gold standard. A second issue is the
potential that the first few participants are evaluated against answers that received the
majority vote but do not match the gold standard. With a small number of responses,
a few incorrect answers can produce large swings in the majority vote. Priming the
questions with three (rather than one) vote, we limited the variance for early
participants. We conducted a post hoc analysis and evaluated the consensus decision
with the gold standard for each participant. We found the consensus decisions to be
remarkably accurate over time, even with the first few participants. The consensus
(majority) decision coincided with the ground truth 99.8 % of the time.
The task description was worded as follows.
Which title do you believe most other players will choose
based on this description?
Figure 3displays screenshots taken from the game. Participants were asked to
answer 30 questions, divided into 6 rounds of 5 tasks each. Each participant
evaluated the same 30 questions; however, questions were assigned in random order
to ensure the average task difficulty was constant. Each question had 4 possible
choices, randomly presented to the user to avoid selection bias. For each of the 30
questions, workers had 20 s to choose an answer. Failure to make a selection on
three consecutive questions before time ran out indicated withdrawal from the task.
A time limit is provided to enhance flow, a concept to describe the delicate balance
between anxiety due to a task that is too difficult and boredom due to a task that is
too easy (Csikszentmihalyi 1991). In a preliminary study, we found that 95 % of
workers were able to make a selection within 20 s. We believe that the challenge of
making a selection within the prescribed time, coupled with the financial incentive,
provides a sufficient ongoing balance of flow to a majority of participants.
2
As has been illustrated in studies, such as that conducted by Covey et al. (1989),
cheating is an issue for tasks that provide explicit incentives. In our study, we took the
followingsteps to mitigate the possibility of cheating by participants. First, as mentioned
earlier, we logged all IP addresses.Third, questions were presented to players in random
order. In addition, the four answer choices for each question were randomized as well.
Fourth, users were given a time limit of 20 s to answer each question. This time limit
reduced the chance they would be able to use external resources such as Wikipedia. Last,
we performed post hoc analysis to detect any possibility of collusion between task
participants, but did not detect any behavior we considered suspicious.
2
This was reinforced by the responses we received in a post-task survey, where we asked if participants
felt rushed or bored while participating in the task. The percentages we received from this survey
(n =240) were 12 and 9 % respectfully, which indicate a reasonable balance between the two—if more
than 20 % of participants felt either rushed or bored, this would be likely that flow was insufficient and we
would need to adjust the time allowed.
290 C. Harris, C. Wu
123
4.3 Incentives
Our initial evaluation was to determine if accuracy increased due to incentives. At
the beginning of the 30-question task, we set a value for MR consistent with a study
conducted by Koop and Johnson (2012). In the Koop and Johnson study,
participants were provided with pairwise choices that offered bonuses and was
specifically designed to examine the notion that people use multiple reference
points. Three reference points were established simultaneously by providing
participants with an initial amount to gamble (SQ), presenting participants with the
possibility of earning bonus entries into a raffle (G), and retaining the possibility of
failing to gain entry to the raffle (MR).
All three treatment groups began the game with a bonus amount of $1.50, with
$0.05 added to the bonus for each correct answer and a deduction of $0.05 for each
incorrect answer. This $1.50 serves as their SQ. Treatment groups were also told
before beginning the game that they could qualify for a potential performance-based
bonus of up to $3.00, which serves as G. These three treatment group players were
also told at the beginning of the game that a ‘pay-in’’ of $0.20 was required in order
to play. Therefore first $0.20 of any bonus that player earned would be surrendered
to pay this pay-in amount, serving as MR.
After treatment group players selected their choice from the four answers provided
and before the answer was revealed, they were asked if they wanted to take ‘insurance’
on their decision. It has long been established that obtaining self-insurance is a risk
averse behavior (Briys and Schlesinger 1990; Ehrlich and Becker 1972; McGuire et al.
1991) Insurance in our game has the effect of halving the loss while simultaneously
halving the gain, i.e., by accepting the offerto take insurance, if they missedthe question,
only half the usual $0.05 bonus deduction was made; however, if they were correct, only
half the $0.05 bonus reward was provided. Since taking insurance is considered a risk-
averse decision, we use this as measurement for player tolerance of risk. A sample of the
summary screen provided to this treatment group follows.
Round 3 Summary
You got 3 of 5 correct in this round and 10 of 15 correct
overall.
The first $0.20 of your bonus will be used to participate in
this game.
The maximum bonus you could earn after 6 rounds is $2.30
($2.50 less the $0.20 pay-in).
Evaluating your current performance, the bonus amount you
could expect to earn at this level of performance is $1.80
($2.00 less the $0.20 pay-in).
\Continue Playing[\Quit[
This first treatment group allows us to examine our first research question on the
effectiveness of financial incentives.
Tri-reference point theory 291
123
4.4 Walk-away amounts
A second treatment group was identical to the first, but also provided players with a
walk-away option. This amount offered in the walk-away option was a percentage
of the player’s current bonus (33 % was used in this experiment) and this amount
could be redeemed at the end of the first five rounds (i.e., except at the end of the
game) if the player chose to quit the task.
At the end of each five-question round, players were provided a summary screen
that associates the walk-away bonus amount offered to the player with their goal G,
the status quo SQ, and their pay-in amount, MR. A sample of the information
presented to the walk away treatment group follows.
Round 3 Summary
You got 3 of 5 correct in this round and 10 of 15 correct
overall.
The first $0.20 of your bonus will be used to participate in
this game.
The maximum bonus you could earn after 6 rounds is $2.30
($2.50 less the $0.20 pay-in).
Evaluating your current performance, the bonus amount you
could expect to earn at this level of performance is $1.80
($2.00 less the $0.20 pay-in).
You have the option to quit and walk away with a bonus of
$0.13 ($0.33 less the $0.20 pay-in). This is $1.50 less than
you would expect to earn at your current rate and $2.00 less
than your maximum potential bonus at the end of the game.
\Continue Playing[\Quit and take my $0.13[
This summary information presents the TRPs to the player and asks them to
make a decision either to quit the task and accept the walk-away amount (the
difference of the current bonus amount and the pay-in of $0.20, multiplied by 0.33)
or to continue with the game with the opportunity to increase their bonus. We
examine player decisions using this information and compare it with our evaluation
of incentives using this same game as well as to the results from other TRP studies.
This second treatment group allows us to examine our second research question on
the effectiveness of walk-away options on task accuracy.
4.5 Lowering G and raising MR
Since most players will fall between MR and SQ or between SQ and G, we
conducted a separate treatment to increase the participants falling above the G and
below the MR thresholds as follows. To increase the players that fall below MR, we
increase the pay-in from $0.20 to $0.50. To increase the number of players that meet
292 C. Harris, C. Wu
123
or exceed G, we indicate to participants that all participants earning a bonus at least
$2.50 or more will earn the full $3.00 bonus, and the top two most accurate
participants will earn a bonus of $4.00.
This third treatment group, along with the results from the first two treatment
groups, will allow us to examine if crowdworkers demonstrate the same risk
attitudes as those mentioned in the Koop and Johnson study.
A summary of treatments is provided in Table 1.
5 Results and discussion
We evaluate worker task accuracy, the retention rate, the amount of the bonus
earned, and the risk aversity. Risk aversity measures the percentage of times
insurance is used e.g., if a player used insurance for 50 % of the questions answered,
that player would have a risk aversity of 0.50. Bonus earned is before the pay-in
amount is deducted. Walk-away amounts are included if the user withdrew from the
game prior to completing all six rounds. Table 2reports the results for each
treatment plus the control.
Table 3reports the Cohen’s d and effect size, r, for each treatment. Effect sizes
reflect the average percentile standing of the average treated (or experimental)
participant relative to the average control participant. An effect size of 0.0 indicates
that the mean of the treated group is at the 50th percentile of the control group; an
effect size of 0.8 indicates that the mean of the treated group is at the 79th percentile
of the control group. Negative values indicate the association is reversed.
5.1 Effects of incentives on task accuracy
Figure 4shows the accuracy for each treatment plus the control across the
30-question task. A one-way between subjects ANOVA test was used to compare
the effect of the treatments on task accuracy. There was a significant effect of
incentive conditions on accuracy at the p \0.05 level for the three treatments [F(3,
236) =42.682, p \0.001]. Post hoc comparisons using the Bonferroni test
indicated that all four groups differed on task accuracy with the exception of
Treatments 1 and 2. Taken together, these results suggest that incentive conditions
overall have a positive effect on task accuracy, i.e., task accuracy increases when
the incentives are applied; however, the walk-away amount introduced in Treatment
2, by itself, does not affect task accuracy over the incentive offered in Treatment 1.
The additional incentive conditions provided in Treatment 3 did significantly
improve task accuracy over those offered in Treatments 1 and 2.
5.2 Effects of incentives on worker retention
A one-way between subjects ANOVA test was used to compare the effect of the
treatments on worker retention. There was a significant effect of incentive
conditions on accuracy at the p \0.05 level for the three conditions plus the control
[F(3, 236) =22.624, p \0.001]. Post hoc comparisons using the Bonferroni test
Tri-reference point theory 293
123
indicated that the retention rate did not differ between the Treatment 1 and the
control and did not differ between Treatments 2 and 3, indicating that the walk-
away bonus was responsible for a drop in retention rates, but no other incentives had
an influence on worker retention.
Table 1 Summary of conditions provided for the control group plus three treatment groups
Group Incentive bonus offered Walk-away bonus Insurance offered Pay-in
required
Control None None Yes, only number of
correct displayed
(no financial
incentive offered)
None
Treatment
1
$1.50 ?$0.05 for each correct
answer, -$0.05 for each
wrong answer
None Yes, if accepted
reduces bonus
amount by 0.50 for
that question
$0.20
Treatment
2
$1.50 ?$0.05 for each correct
answer, -$0.05 for each
wrong answer
At the end of each
of the first five
rounds,
0.33 9current
bonus
Yes, if accepted
reduces bonus
amount by 0.50 for
that question
$0.20
Treatment
3
$1.50 ?$0.05 for each correct
answer, -$0.05 for each
wrong answer; bonuses above
$2.50 pay $3.00; top 2 overall
scores get paid a total of $4.00
At the end of each
of the first five
rounds,
0.33 9current
bonus
Yes, if accepted
reduces bonus
amount by 0.50 for
that question
$0.50
Table 2 Results for the control group plus three treatment groups on selected measurements
Group N Task accuracy Retention rate Bonus earned Risk aversity
Mean Std dev Mean Std dev Mean Std dev Mean Std dev
Control 60 0.3897 0.0473 0.9467 0.0230 $0.00 $0.00 0.0394 0.0317
Treatment 1 60 0.4320 0.0219 0.9737 0.0167 $1.07 $0.22 0.1706 0.1364
Treatment 2 60 0.4513 0.0226 0.8983 0.0586 $1.05 $0.18 0.1866 0.1668
Treatment 3 60 0.4717 0.0146 0.8883 0.0135 $1.34 $0.15 0.2540 0.1637
Table 3 Cohen’s d (d) and effect size (r) the three treatment groups for selected measurements
Group Task accuracy Retention rate Bonus earned Risk aversity
dRdr D r dr
Treatment 1 -1.148 -0.498 -1.343 -0.558 -6.878 -0.960 -1.325 -0.552
Treatment 2 -1.662 -0.639 1.087 0.478 -8.250 -0.972 -1.226 -0.523
Treatment 3 -2.343 -0.761 3.098 0.840 -12.634 -0.988 -1.821 -0.673
294 C. Harris, C. Wu
123
5.3 Effects of incentives on bonus earned
A one-way between subjects ANOVA was also conducted on the three treatment
groups to compare the reserve option effects on the bonus amount earned. There
was a significant effect of the reserve option on accuracy at the p \0.05 level for
the three conditions plus the control [F(2, 177) =11.218, p \0.001]. Post hoc
comparisons using the Bonferroni test indicated that Treatment 3 differed from the
other treatments, but Treatments 1 and 2 did not differ from one another. This
suggests the bonus earned by those with the higher MR and incentives to surpass G
may have provided the right incentive to boost scores over Treatment 2, which does
not offer these conditions. The walk-away reserve, offered in Treatment 2 but not
Treatment 1, had no effect on bonus earned.
5.4 Effects of incentives on risk aversity
A one-way between subjects ANOVA was also conducted to compare the reserve option
effects on riskaversity. There was a significant effect of the reserve option on accuracyat
the p \0.05 level for the three conditions plus the control [F(3, 236) =26.056,
p\0.001]. Post hoc comparisons using the Bonferroni testindicated that all four groups
differed on risk aversity with the exception of Treatments 1 and 2. This suggests that
incentive conditions overall have a positive effect on risk aversity, i.e., risk aversity
increases when the incentives are applied; however, the walk-away amount introduced
in Treatment 2, by itself, does not affect risk aversity over the incentive offered in
Treatment 1. The additional incentive conditions provided in Treatment 3 did
significantly improve risk aversity over those offered in Treatments 1 and 2.
5.5 Relationship between risk aversity and bonus earned
Does risk aversion translate into an increase in bonus earned for workers as we
expect? The incentives applied to Treatments 1 and 3 indicate a significant increase
0.25
0.30
0.35
0.40
0.45
0.50
0.55
1 4 7 10131619222528
Mean Accuracy
Question Number
Treatment 1 Treatment 3
Treatment 2 Control
Fig. 4 Mean accuracy rates for the control group (n =60) plus three treatment groups (n =60 each), by
question number. Vertical lines indicate the end of each five-question task
Tri-reference point theory 295
123
in both risk aversity and bonus amount earned, while the addition of a walk-away
incentive with Treatment 2 did not provide a significant increase in either measure.
A Pearson product-moment correlation coefficient was computed to examine this
relationship. We found a positive correlation was found between bonus earned by
the crowdworkers and application of insurance, an indication of risk aversity,
r=0.846, n =240, p \0.001. We believe that most players have a sense of their
answer’s accuracy, and when they are unsure, they take a risk averse action, if one is
available, to reduce the effects of being incorrect, which increased their bonus.
5.6 Relationship between risk aversity and task accuracy
We examined if there was a relationship between player success on a given question
(task accuracy) and their application of insurance (risk aversity). In the 16.42 % of
questions answered by crowdworkers who accepted insurance, were they correct to
take insurance in those situations? Also, for the 83.58 % of questions where they did
not accept insurance, would they be better off if they have accepted the insurance?
Players who did not take insurance on a question were not significantly more
accurate on that question than the mean (two tailed ttest, p =0.154). The accuracy
rate for those players who did choose to take insurance was lower than the overall
mean task accuracy, indicating that taking insurance on those questions was prudent
(two tailed ttest, p =0.021). This may indicate that when players did choose to
take insurance, they were far less certain about their answers than usual, but the
converse was not true.
We found the most successful players (those meeting or exceeding G) used
insurance significant y less than the average player (two tailed t test, p \0.001). For
those who used insurance less often, we found they were slightly more likely to
exceed the SQ (two tailed t test, p =0.061). In contrast, for those crowdworkers
who used insurance more than the average, we found they were slightly less
accurate than the mean, and more likely to obtain a final bonus amount less than SQ
amount of $1.50 (two tailed t test, p =0.053), however, neither of these last two
tests were significant at p \0.05.
5.7 Evaluation of risk attitude
The effects incentives have on risk aversity only tell part of the overall picture. To
examine how risk attitude is affected by the TRPs, we examine risk aversity in the
regions bounded by the three reference points for the three treatment groups. These
represent the raw amounts for the 153 participants who did not accept the walk-
away bonus before the end of the game. These values are provided in Table 4.
A one-way between subjects ANOVA was conducted to compare the effects on
risk aversity between regions. There was a significant effect of the reserve option on
accuracy at the p \0.05 level for the for the four regions [F(3, 149) =24.908,
p\0.001]. Post hoc comparisons using the Bonferroni test showed that each of the
four regions significantly differed from one another. This provides the strongest
evidence that the more successful players were more risk averse overall, accepting
296 C. Harris, C. Wu
123
insurance around a third of the time. Those who performed worst only used
insurance sparingly—only 2.5 % of the time.
Table 4only indicates the final region in which each player ended the game, not
the region where they answered most questions; we recognize that a player may
apply a different strategy depending on which region they happen to occupy. To
examine the TRP theory accurately, we wish to see what strategy is applied in each
region. Table 5provides this information.
A one-way between subjects ANOVA was conducted to compare the effects on
risk aversity on the questions answered in each of the four regions. There was a
significant effect of the reserve option on accuracy at the p \0.05 level for the four
regions [F(3, 25,653) =904.695, p \0.001]. Post hoc comparisons using the
Bonferroni test showed that all regions significantly differed from one another.
5.8 Evaluation of reference points
We now apply these findings to Wang and Johnson’s TRP theory. Table 6provides
a comparison between Wang and Johnson’s (2012) assessment of risk attitude with
our own observations based on our findings indicated in Table 5.
The information in Table 6provides the most compelling evidence of risk
attitude of crowdworkers by region and backs up the overall observations of risk
attitude made by Koop and Johnson, particularly those who are in the lowest
(xBMR) and highest (x[G) regions.
5.9 Implications for crowdsourcing design
From our study, we do observe that financial incentives, when properly applied, can
improve task accuracy. These incentives work because they align the goals of
crowdworkers and requesters alike. By establishing different incentive strategies
based on the risk seeking or risk averse attitude of the crowdworkers, requesters can
improve the effectiveness of these incentives. We observe that several reference
points, not only a single reference point as established utility theory implies, are
likely to determine the risk attitude of crowdworkers, which in turn affect the
incentives that apply best.
Table 4 Mean risk aversity rates by region that each player occupied at the completion of the
30-question task
Region
a
Definition N Risk aversity
Mean Std dev
Failure xBMR 13 0.0251 0.0266
Loss MR \xBSQ 66 0.1554 0.1022
Gain SQ \xBG 62 0.1275 0.1284
Success x[G 12 0.3320 0.1470
a
We use the region terms and definition from Koop and Johnson (2012)
Tri-reference point theory 297
123
In some crowdsourcing tasks, such as requesting relevance judgments, we wish
to encourage risk averse behavior. In other tasks, such as brainstorming, we wish to
take advantage of the diversity of the crowd, and wish to encourage risk seeking
behavior. It is therefore essential for task requesters to understand that incentives
can be used in tandem with the risk behavior. We find setting risk behavior works
best if the context of these reference points is communicated to the crowdworker in
the task design.
5.10 Implications for game design
To keep players engaged, games frequently require more design complexity than
crowdsourcing tasks, and typically only offer non-monetary incentives to players.
Will the risk attitudes explored here translate to non-monetary incentives as well?
Although not explicitly explored in this paper, based on other research, we believe
they will be as effective.
Properly-applied incentives work because they align the behavior of a participant
with the behavior the incentive provider expects. As long as the incentive motivates
the player to take the correct action, we can anticipate the expected player behavior
based on these same reference points. Therefore, game designers can design
incentives into their game to encourage risk attitudes. This can encourage the types
of behavior that will make the game more challenging or more rewarding further
enhancing the game’s appeal.
In our study, we combined intrinsic and extrinsic incentives. This combination
can have a powerful effect; in an earlier study, Toomim and Landay (2010) found
Table 5 Mean risk aversity rates by question answered by players while occupying each region
Region Definition Number of questions answered
while in this region
Percent of total Risk aversity
Mean Std dev
Failure xBMR 2,177 9.10 0.0630 0.1502
Loss MR \xBSQ 12,158 50.84 0.1859 0.1697
Gain SQ \xBG 9,486 36.33 0.1412 0.1884
Success x[G 1,836 3.73 0.3391 0.2627
Table 6 Risk aversity regions, the Wang and Johnson assessment of each region, and our findings for
each region
Region Definition Wang and Johnson assessment Our assessment Risk aversity
Mean Std dev
Failure xBMR Risk seeking Most risk seeking 0.0630 0.1502
Loss MR \xBSQ Risk averse Slightly risk averse 0.1859 0.1697
Gain SQ \xBG Risk seeking Slightly risk seeking 0.1412 0.1884
Success x[G Risk averse Most risk averse 0.3391 0.2627
298 C. Harris, C. Wu
123
systems that employ this mix achieved an increase in worker utility over those that
did used extrinsic incentives alone. This makes the combination of incentive types a
nice option on crowdsourcing platforms. The primary reason for the benefit, we
believe, is that crowdworkers expect tasks to be largely absent of intrinsic
incentives, thus providing intrinsic incentives likely increases worker performance
over the use of extrinsic incentives alone. In our study, the intrinsic incentives were
not varied; only the extrinsic incentives (the bonus payments) varied for our three
treatment groups, allowing us to measure their effect on our four metrics.
6 Conclusion
We listed a series of tasks on an online labor market website that solicits work from
largely anonymous participants in exchange for payment. These tasks offered
participants a chance to play a game requiring factual knowledge of future film
releases for a small amount of compensation. Participants were given the
opportunity to receive an additional financial incentive for good performance.
The task was viewed as work with the added benefit of being in a game format.
Using this game format, we observed the effects of different incentive types on
crowdworker task accuracy. We applied three different incentive treatments and
examined their effects on task accuracy, worker retention rate, the amount of
bonuses earned by the workers, and risk attitude. By offering workers in the three
treatment groups plus our control group with the option to exercise risk averse
behavior, our study allowed us to make over 25,000 individual observations on risk
attitude. Risk aversity could be measured by the acceptance of ‘insurance’’, which
halved the risk and return associated with the bonus earned for an individual
question. By observing the region they occupied while answering each question
relative to three reference points, minimum requirements, status quo, and goal, we
were able to show a strong association with crowdworker risk attitude to the risk
behaviors expected using the TRP theory presented by Wang and Johnson (2012)
(Fig. 5).
As expected, the walk-away option offered in our Treatment 2 did have an effect
on retention rate, since the enticement to quit early and keep some of the bonus
compensation will appeal to some workers. However, we note that in our study, the
walk-away treatment did not affect the overall task accuracy or on the amount of
bonus earned by crowdworkers. The latter is surprising, since we would rationally
expect low-performing workers would accept the walk-away amount, raising the
bonus for those high-performing workers who remained. Our investigation found
that the low and high performing workers took advantage of the walk-away option
in equal numbers, with the net result of only a marginal effect on these two metrics.
The walk-away option also did not affect crowdworker risk attitude. While our
study does not uncover the exact reason for this lack of association, we believe this
may be a result of both risk-seeking and risk-averse workers taking advantage of the
walk-away option, but for different reasons. Risk seekers may have been
emboldened by having this walk-away option available to them, facilitating their
risk-seeking behavior; risk averse crowdworkers may have observed the opportunity
Tri-reference point theory 299
123
cost associated with our task and exercised the walk-away option to find another
task that involved a higher payment certainty.
We examined the application of risk aversion techniques found the most accurate
crowdworkers did not make use of insurance, but those who used it more often than
average did not exceed the status quo. When crowdworkers did use insurance for a
given question, however, they usually did not choose the correct answer, indicating
that insurance was the prudent action.
In future work, we plan to continue our examination of incentives including
applying a non-uniform bonus allocation, providing different levels of feedback
information to players during the game and varying intrinsic incentives as opposed
to varying the extrinsic incentives examined in this study. Game formats provide
numerous advantages over static crowdsourcing tasks for evaluating treatments,
therefore we plan to examine this area in greater detail, providing more complex
tradeoffs to workers in order to examine the decisions they choose to make.
References
Abt CC (1987) Serious games. University Press of America, USA
Arrow KJ (1965) Aspects of the theory of risk-bearing. Yrjo
¨Jahnssonin Sa
¨a
¨tio
¨, Helsinki
Briys E, Schlesinger H (1990) Risk aversion and the propensities for self-insurance and self-protection.
South Econ J 57(2):458–467
Callison-Burch C (2009) Fast, cheap, and creative: evaluating translation quality using Amazon’s
Mechanical Turk. Paper presented at the proceedings of the 2009 conference on empirical methods
in natural language processing, vol 1, Singapore. Association for Computational Linguistics,
Stroudsburg, pp 286–295
Cameron J, Pierce WD (1994) Reinforcement, reward, and intrinsic motivation: a meta-analysis. Rev
Educ Res 64:363–423
Chiles TH, McMackin JF (1996) Integrating variable risk preferences, trust, and transaction cost
economics. Acad Manag Rev 21(1):73–99
Choi EK, Menezes CF (1992) Is relative risk aversion greater than one? Int Rev Econ Financ 1(1):43–54
0.75
0.80
0.85
0.90
0.95
1.00
1 4 7 10 13 16 19 22 25 28
Worker Retention
Question Number
Treatment 1 Tre at me nt 3
Treatment 2 Control
Fig. 5 Retention rates for the control group (n =60) plus three treatment groups (n =60 each), by
question number. Vertical lines indicate the end of each five-question task
300 C. Harris, C. Wu
123
Covey MK, Saladin S, Killen PJ (1989) Self-monitoring, surveillance, and incentive effects on cheating.
J Soc Psychol 129(5):673–679
Csikszentmihalyi M (1991) Flow: the psychology of optimal experience. Harper Perennial, New York
Deci EL, Koestner R, Ryan RM (1999) A meta-analytic review of experiments examining the effects of
extrinsic rewards on intrinsic motivation. Psychol Bull 125(6):627
Deck C, Lee J, Reyes J (2008) Risk attitudes in large stake gambles: evidence from a game show. Appl
Econ 40(1):41–52
Ehrlich I, Becker G (1972) Market insurance, self-protection and self-insurance. J Polit Econ
80(4):623–648
Eickhoff C, Harris CG, de Vries AP, Srinivasan P (2012) Quality through flow and immersion: gamifying
crowdsourced relevance assessments. Paper presented at SIGIR’12, Portland. ACM, New York,
pp 871–880. doi:10.1145/2348283.2348400
Eisenberger R, Cameron J (1996) Detrimental effects of reward: reality or myth? Am Psychol
51:1153–1166
Eisenberger R, Cameron J (1998) Reward, intrinsic interest, and creativity: new findings. Am Psychol
53:676–679
Gal D (2006) A psychological law of inertia and the illusion of loss aversion. Judgm Decis Mak
1(1):23–32
Gill D, Prowse V (2012) A structural analysis of disappointment aversion in a real effort competition. Am
Econ Rev 102(1):469–503
Gneezy U, Rustichini A (2000) Pay enough or don’t pay at all. Q J Econ 115(3):791–810
Harinck F, Van Dijk E, Van Beest I, Mersmann P (2007) When gains loom larger than losses: reversed
loss aversion for small amounts of money. Psychol Sci J Am Psychol Soc/APS 18(12):1099
Harris C (2011) You’re hired! an examination of crowdsourcing incentive models in human resource
tasks. In: Proceedings of the Workshop on Crowdsourcing for Search and Data Mining (CSDM) at
the Fourth ACM International Conference on Web Search and Data Mining (WSDM), pp 15–18
Harris CG, Xu T (2011) The importance of visual context clues in multimedia translation. In Multiling
Multimodal Inf Access Eval: 6941: 107–118. Springer
Hartley R, Lanot G, Walker I (2006) Who really wants to be a millionaire? Estimates of risk aversion
from gameshow data. Working Paper No. 719, Department of Economics, University of Warwick
Heath C, Larrick RP, Wu G (1999) Goals as reference points. Cogn Psychol 38(1):79–109
Heyman J, Ariely D (2004) Effort for payment a tale of two markets. Psychol Sci 15(11):787–793
Holt CA, Laury SK (2002) Risk aversion and incentive effects. Am Econ Rev 92(5):1644–1655
Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econom J Econom
Soc 47(2):263–292
Kohn A (1999) Punished by rewards: the trouble with gold stars, incentive plans, A’s, praise, and other
bribes. Houghton Mifflin Harcourt, Boston
Koop GJ, Johnson JG (2012) The use of multiple reference points in risky decision making. J Behav
Decis Mak 25(1):49–62
Laffont JJ, Martimort D (2002) The theory of incentives: the principal-agent model. Princeton University
Press, Princeton
Lam SK, Pennock DM, Cosley D, Lawrence S (2002) 1 Billion pages =1 million dollars? Mining the
web to play ‘who wants to be a millionaire?’’. In: Proceedings of the nineteenth conference on
uncertainty in artificial intelligence (UAI’03), Uffe Kjærulff and Christopher Meek (eds) Morgan
Kaufmann Publishers Inc., San Francisco, pp 337–345
Lepper MR, Greene D, Nisbett RE (1973) Undermining children’s intrinsic interest with extrinsic reward:
a test of the ‘overjustification’’ hypothesis. J personal soc psychol 28(1):129–137
Locke EA (1968) Toward a theory of task motivation and incentives. Organ Behav Hum Perform
3(2):157–189. doi:10.1016/0030-5073(68)90004-4
Lopes LL, Oden GC (1999) The role of aspiration level in risky choice: a comparison of cumulative
prospect theory and SP/A theory. J Math Psychol 43(2):286–313
March JG, Shapira Z (1992) Variable risk preferences and the focus of attention. Psychol Rev
99(1):172–183
Mason W, Watts DJ (2010) Financial incentives and the performance of crowds. ACM SIGKDD Explor
Newsl 11:100–108
McGuire M, Pratt J, Zeckhauser R (1991) Paying to improve your chances: gambling or insurance? J Risk
Uncertain 4(4):329–338
Tri-reference point theory 301
123
Post T, Van den Assem MJ, Baltussen G, Thaler RH (2008) Deal or no deal? Decision making under risk
in a large-payoff game show. Am Econ Rev 98(1):38–71
Pratt JW (1964) Risk aversion in the small and in the large. Econom J Econom Soc 47(2):122–136
Prendergast C (1999) The provision of incentives in firms. J Econ Lit 37(1):7–63
Shaw AD, Horton JJ, Chen DL (2011) Designing incentives for inexpert human raters. Paper presented at
the proceedings of the ACM 2011 conference on computer supported cooperative work, Hangzhou,
pp 275–284
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining
using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference
on knowledge discovery and data mining (KDD’08). ACM, New York, pp 614–622. doi:10.1145/
1401890.1401965
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: Evaluating non-expert
annotations for natural language tasks. Paper presented at the proceedings of the Conference on
empirical methods in natural language processing, Honolulu
Sullivan K, Kida T (1995) The effect of multiple reference points and prior gains and losses on managers’
risky decision making. Organ Behav Hum Decis Process 64(1):76–83
Toomim M, Landay JA (2010) Measuring utility of human-computer interaction. In: Proceedings of the
ACM SIGKDD workshop on human computation (HCOMP’10). ACM, New York, pp 53–53.
doi:10.1145/1837885.1837901
Tversky A, Kahneman D (1981) The framing of decisions and the psychology of choice. Science
211(4481):453–458
Van Poucke D, Buelens M (2002) Predicting the outcome of a two-party price negotiation: contribution of
reservation price, aspiration price and opening offer. J Econ Psychol 23(1):67–76. doi:10.1016/
s0167-4870(01)00068-x
Von Ahn L (2006) Games with a purpose. Computer 39(6):92–94
Von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI
conference on human factors in computing systems (CHI’04). ACM, New York, pp 319–326.
doi:10.1145/985692.985733
Von Ahn L, Kedia M, Blum M (2006) Verbosity: a game for collecting common-sense facts. In: Grinter
R, Rodden T, Aoki P, Cutrell E, Jeffries R, Olson G (eds) Proceedings of the SIGCHI conference on
human factors in computing systems (CHI’06) ACM, New York, pp 75–78. doi:10.1145/1124772.
1124784
Wang XT, Johnson JG (2012) A tri-reference point theory of decision making under risk. J Exp Psychol
Gen 141(4):743–756. doi:10.1037/a0027415
302 C. Harris, C. Wu
123
... We assume that the ultimate goal of the decision maker responsible for solving problem (1) is to acquire and apply additional preference information to select a compromise solution out of P(U,F,θ). As such, the aforementioned reference points serve as additional preference information in the extended TOPSIS [11], bipolar [16], reference set (RefSet, [5]) methods or in risk analysis [6], [7] and marketing [4]. A method of preference aggregation in problem (1) with multiple reference points was proposed in [12]. ...
... In order to obtain the desired properties of the level sets of v, we imposed an additional condition (7), symmetric to (6), which allowed us (cf. [12]) to formulate the definition of mutual consistency, ...
Chapter
Full-text available
This article proposes an algorithm to reconcile inconsistent recommendations of experts involved in a multicriteria decision support procedure. The algorithm yields a consistent set of reference points transformed from an arbitrary collection. By assumption, experts S1,…,Sn are agents independently involved in a decision making process of other agents termed decision makers. Experts formulate recommendations as reference points in the criteria space and simultaneously communicate them to decision makers. The above decision-making process corresponds to the schemes ‘one decision maker – multiple recommending experts’ (group decision support) or ‘multiple decision makers – multiple experts’ (group decision making and support). The assumed independence of expert judgments may provide different types of recommendation inconsistency. We define several variants of the internal and mutual inconsistency, which may occur simultaneously in the same decision-making problem. We will also assume that expert recommendations may belong to four predefined characteristic reference sets. The proposed new preference aggregation procedure regularizes the set of reference values pointed out by multiple experts. The aggregation-regularization operations include recommendation merging, reference point averaging, splitting of reference classes, moving a reference point between classes, or removing it from a class. A real-life example displaying the implementation of content-based multimedia retrieval from a knowledge repository will illustrate the above approach. In the final section, we will discuss the dependence of aggregation-regularization process outcomes on the sequence of reference classes, with reference points within each class checked for inconsistencies.KeywordsInformation inconsistencyGroup recommendationMulticriteria decision makingReference set methodPreference aggregation
... Benefiting from the foundational theories of their predecessors, Wang and Johnson (2012) propose trireference point (TRP) theory with minimum requirement (MR), status quo (SQ), and goal (G) as the reference points for decision makers, which is the theoretical tool that this paper emphasizes. Although TRP theory is a recent innovation, it has been widely used in various areas related to decision-making management, such as gamified crowdsourcing tasks (Harris & Wu, 2014), salary perception (Zhao et al.,208), HR management (Hu & Wang, 2014), food choice (Lagerkvist et al., 2015), and even international second home retirement motives in Malaysia (Keemun & Musa, 2015). ...
... If the preset number of iterations is reached, the position of the optimal particle and the objective function value are output; otherwise, return to step 2. Particle representation: Each particle's position, denoted as 12 ( , )   X , is a feasible solution, and this particle's velocity is 12 ( , ) yy  Y . Initialization: Under the constraints of formulas (8) and (9), the positions of particles are randomly initialized, and the velocities of the particles are randomly initialized in the interval (-0.05, 0.05). If the initial solution does not satisfy constraint (5), it is reinitialized. ...
... For example, managing dependencies between tasks and sub-tasks drives the ability to coordinate large-scale tasks and improves the completion and acceptance rates of such tasks (Chen et al., 2014). Incentive management, meanwhile, manages the dependencies between workers' performance and different types of rewards available to them (Harris and Wu, 2014). For example, high pricing of tasks lowers the output for the job provider due to budgetary constraints. ...
... The degree to which crowdwork platform governance manages the interdependencies between worker's performance and the incentives and rewards available to them (Harris and Wu, 2014) AMT job providers may provide extra incentives to workers in the form of a reward payment (Harris, 2015). TopCoder awards prizes to winners (Kittur et al., 2013) Contract Management The degree to which the work contracts make it possible for the platform to manage interdependencies between job providers and workers (Agrawal et al., 2015;Vakharia and Lease, 2015) AMT facilitates routine work based on a set fee but does not provide a contract. ...
Article
Crowdwork, a new form of digitally mediated employment and part of the so-called gig economy, has the capacity to change the nature of work organization and to provide strategic value to workers, job providers, and intermediary platform owners. However, because crowdwork is temporary, large-scale, distributed, and mediated, its governance remains a challenge that often casts a shadow over its strategic value. The objective of this paper is to shed light on the making of value-adding crowdwork arrangements. Specifically, the paper explores crowdwork platform governance mechanisms and the relationships between these mechanisms and organizational value creation. Building on a comprehensive review of the extant literature on governance and crowdwork, we construct an overarching conceptual model that integrates control system and coordination system as two complementary mechanisms that drive crowdwork platform governance effectiveness and the consequent job provider benefits. Furthermore, the model accentuates the role of the degree of centralization and the degree of routinization as critical moderators in crowdwork platform governance. Overall, the paper highlights the potential of crowdwork to contribute not only to inclusion, fair wages and flexible work arrangements for workers but also to organizations’ value and competitive edge.
... Si la fonction est linéaire, on dit que la personne reste neutre face au risque (« risk-neutral »). Si la fonction est concave, l'individu a une aversion au risque (« risk-averse ») et si la fonction est convexe, la personne est prône au risque (« risk-prone »), c'est-à-dire qu'elle préfère une récompense incertaine à une plus petite récompense certaine (Harris and Wu, 2014). La théorie de l'utilité espérée propose que ce paramètre soit compris dans la fonction d'utilité subjective qui donne les valeurs subjectives aux paris. ...
Thesis
Les mécanismes cérébraux engagés dans la prise de décision sont loin d’être compris. Nos choix sont le plus souvent loin d’être rationnels et cela est lié à la valeur subjective attribuée aux options présentées par notre cerveau. Le travail réalisé dans cette thèse vise à comprendre plus en profondeur le système de valuation cérébral grâce à la technique d’IRMf. Nous avons étudié notamment les biais de prise de décision autour de trois études portant sur la distorsion des probabilités pour une récompense immédiate ou retardée, l’effet de cadre dans un contexte de gains ou de pertes pour une décision prise pour soi ou pour un proche et enfin le phénomène d’aversion à la perte pour des récompenses primaires ou secondaires. Ces études ont été menée à la fois chez une population saine et une population souffrant d’addiction comportementale, une addiction aux jeux d’argent pour les deux premières études et l’anorexie mentale pour la dernière étude.
... While some discussions of the gig economy have focused on the specifics and work ethics of "giggers" (Tan et al., 2021), their motivations (Harris and Wu, 2014), potential benefits (Kittur et al., 2013;Mason and Watts, 2009) and types of crowdsourcing platforms (Josserand and Kaine, 2019), more and more researchers are wondering about the potential of "giggers" for organizations (Kuhn et al., 2019). They include people who perform creative professions, innovators (Manyika et al., 2016) and talent (Meijerink, 2020). ...
Research Proposal
Full-text available
The Learning Organization Journal Scopus CiteScore 2020: 5.1 ISSN: 0969-6474 Call for papers for a Special Issue on Gig Workers and Learning Organizations Opening date for manuscripts submissions: 01/08/2022 Closing date for manuscripts submission: 30/01/2023
... Risk-taking aptitude is different from investor to investor. Generally, the investors are categorised into three on the basis of their risk-taking abilities (Harris and Wu 2014). They can be risk averse, risk neutral and risk seeking. ...
Article
Full-text available
It is eminent to understand, be aware of and encourage domestic retail investors towards investment in the capital market in a developing economy such as India for tackling the situation of capital insufficiency and financial instability. Therefore, the study was purposed to find out the different dimensions of cognition that affect investment attitude and the different characteristics of risk absorption affecting the investment decision making. The study also intended to find the direct and the mediating impact of investors’ cognition directly and through risk-absorption scenarios on the level of interest on investment. The study used the causative research design and by using stratified random sampling, received 392 responses from investors with risk-absorption characteristics from four strata of Odisha (a state of India) through a self-constructed questionnaire. Factor analysis was used to find out the factor of cognition and risk absorption. Multiple linear regression was used to find out the effect of both factors of cognition and risk absorption on the intensity of purchase financial product or level of interest in investment. Mediation analysis was used to find the mediating impact showing the direct and indirect impact of cognition on interest in investment and through the factors risk absorption. The study found that the dimensions of cognition (hot, cold, social and meta) have a significant impact on the level of interest towards investment, so financial product sellers must use these dimensions and sources of cognition to bring up interest from the domestic investor to invest in the domestic capital market. It has also been found that the risk-absorption characteristics play a mediating and vital role in the relation between investors’ cognition and level of interest in investment. Therefore, it is imperative to uplift the risk-absorption capacity through different dimensions of cognition and sources of information, which can reflect in a better understanding of the market and investment scenarios.
... Financial remuneration is a key apparatus of creative crowdwork platform governance that controls workers' behaviors by motivating them to take part in the crowdwork platform and by incentivizing them to deliver quality work (Harris & Wu, 2014;Gol et al., 2018). In Topcoder, financial remuneration is competition-based. ...
Conference Paper
Crowdwork is a novel form of digitally mediated work arrangement that is managed and organized through online labor platforms. This paper focuses on the governance of platforms that facilitate creative work—that is, complex work tasks that require high-level skill and creative workers. Crowdwork platform governance faces numerous challenges as a result of technology mediation, scalable and distributed workers, and temporary work arrangements. Creative crowdwork platforms, such as Topcoder, typically require additional governance structures to manage complex tasks. However, we know relatively little about creative crowdwork platform governance, as most existing studies focus on routine work platforms, such as Amazon Mechanical Turk. Accordingly, this paper explores how incumbent and insurgent creative crowdwork platforms are governed under centralized and decentralized modes. We conducted a comparative case study based on the analysis of two different cases: Topcoder, a successful commercial platform with a largely centralized governance structure, and CanYa, an emerging innovative platform based on blockchain technology with more decentralized governance. We identified and classified different governance elements related to work control and work coordination. In addition, we explored the characteristics of creative crowdwork platform governance with different degrees of centralization.
... Participants in the 3-collaborator models had a higher retention rate than those in the 4 or 5 collaborator models. Nearly a third (32.6%) of all participants played the maximum of 10 sessions, which is a far higher retention rate than typical for repeated tasks conducted through crowdsourcing (e.g., [7,8,17]). This indicates the monetary incentives we offered worked well and the task provided was sufficiently engaging. ...
Data
Full-text available
... Knoller (2016) argue that the very gear for the presence of "cushion effect" in demand for principal-protected life annuities is the attitude to risk around personal reference points. Harris and Wu (2014) investigate the role of multiple reference points in making financial incentives to get efficiency in production. These applicative issues are left to future research. ...
Article
Full-text available
In their pioneering works on prospect theory Kahneman and Tversky (1979, 1992) propose the groundbreaking idea that in making decisions under risk individuals evaluate asymmetrically losses and gains against to a personal reference point. According to the Kahneman and Tversky (1979) statement " losses loom larger than gains " , individuals display loss aversion. However, Sacchi and Stanca (2014) argue that people may exhibit gain appetite that states that " gains loom larger than losses ". Although the prospect theory can be traced back of more than thirty years, how to formalize asymmetrical preferences to a reference point is still an open issue (see Abdellaoui et al., 2007; and Ghossoub, 2012). In this short note we set a preference-based definition for loss aversion, gain appetite and equally weighted preferences " in the small " , i.e. for outcomes around a given reference point; and " in the large " , i.e. for any outcome of the domain. The classical Kahneman and Tversky (1979, page 279) loss aversion definition follows as a special case. Keywords: Loss-gain asymmetry, Preference-based definition of loss aversion and gain appetite; Multiple reference points 674 Robert Bordley et al.
Article
Full-text available
Conducted a field experiment with 3-5 yr old nursery school children to test the "overjustification" hypothesis suggested by self-perception theory (i.e., intrinsic interest in an activity may be decreased by inducing him to engage in that activity as an explicit means to some extrinsic goal). 51 Ss who showed intrinsic interest in a target activity during baseline observations were exposed to 1 of 3 conditions: in the expected-award condition, Ss agreed to engage in the target activity in order to obtain an extrinsic reward; in the unexpected-award condition, Ss had no knowledge of the reward until after they had finished with the activity; and in the no-award condition, Ss neither expected nor received the reward. Results support the prediction that Ss in the expected-award condition would show less subsequent intrinsic interest in the target activity than Ss in the other 2 conditions. (25 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This article reviews research on the effects of reinforcement/reward on intrinsic motivation. The main meta-analysis included 96 experimental studies that used between-groups designs to compare rewarded subjects to nonrewarded controls on four measures of intrinsic motivation. Results indicate that, overall, reward does not decrease intrinsic motivation. When interaction effects are examined, findings show that verbal praise produces an increase in intrinsic motivation. The only negative effect appears when expected tangible rewards are given to individuals simply for doing a task. Under this condition, there is a minimal negative effect on intrinsic motivation as measured by time spent on task following the removal of reward. A second analysis was conducted on five studies that used within-subject designs to evaluate the effects of reinforcement on intrinsic motivation; results suggest that reinforcement does not harm an individual’s intrinsic motivation.
Article
Economics has much to do with incentives--not least, incentives to work hard, to produce quality products, to study, to invest, and to save. Although Adam Smith amply confirmed this more than two hundred years ago in his analysis of sharecropping contracts, only in recent decades has a theory begun to emerge to place the topic at the heart of economic thinking. In this book, Jean-Jacques Laffont and David Martimort present the most thorough yet accessible introduction to incentives theory to date. Central to this theory is a simple question as pivotal to modern-day management as it is to economics research: What makes people act in a particular way in an economic or business situation? In seeking an answer, the authors provide the methodological tools to design institutions that can ensure good incentives for economic agents. This book focuses on the principal-agent model, the "simple" situation where a principal, or company, delegates a task to a single agent through a contract--the essence of management and contract theory. How does the owner or manager of a firm align the objectives of its various members to maximize profits? Following a brief historical overview showing how the problem of incentives has come to the fore in the past two centuries, the authors devote the bulk of their work to exploring principal-agent models and various extensions thereof in light of three types of information problems: adverse selection, moral hazard, and non-verifiability. Offering an unprecedented look at a subject vital to industrial organization, labor economics, and behavioral economics, this book is set to become the definitive resource for students, researchers, and others who might find themselves pondering what contracts, and the incentives they embody, are really all about.
Article
A menu of paired lottery choices is structured so that the crossover point to the high-risk lottery can be used to infer the degree of risk aversion. With normal laboratory payoffs of several dollars, most subjects are risk averse and few are risk loving. Scaling up all payoffs by factors of twenty, fifty, and ninety makes little difference when the high payoffs are hypothetical. In contrast, subjects become sharply more risk averse when the high payoffs are actually paid in cash. A hybrid "power/expo" utility function with increasing relative and decreasing absolute risk aversion nicely replicates the data patterns over this range of payoffs from several dollars to several hundred dollars.