Conducting behavioral research on Amazon's Mechanical Turk.

Yahoo! Research, New York, USA.
Behavior Research Methods (Impact Factor: 2.12). 06/2011; 44(1):1-23. DOI: 10.3758/s13428-011-0124-6
Source: PubMed

ABSTRACT Amazon's Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this article is to demonstrate how to use this Web site for conducting behavioral research and to lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments. While other methods of conducting behavioral research may be comparable to or even better than Mechanical Turk on one or more of the axes outlined above, we will show that when taken as a whole Mechanical Turk can be a useful tool for many researchers. We will discuss how the behavior of workers compares with that of experts and laboratory subjects. Then we will illustrate the mechanics of putting a task on Mechanical Turk, including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform, including techniques for conducting synchronous experiments, methods for ensuring high-quality work, how to keep data private, and how to maintain code security.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In four experiments, we demonstrated a new phenomenon called "slow-change deafness." In Experiment 1 we presented listeners with continuous speech that changed three semitones in pitch over time, and we found that nearly 50 % failed to notice the change. Experiments 2 and 3 replicated the finding, demonstrated that the changes in the stimuli were well above threshold, and showed that when listeners were alerted to the possibility of a change, detection rates improved dramatically. Experiment 4 showed that increasing the magnitude of the change that occurred in the stimulus decreased the rate of change deafness. Our results are consistent with previous work that had shown that cueing listeners to potential auditory changes can significantly reduce change deafness. These findings support an account of change deafness that is dependent on both the magnitude of a stimulus change and listener expectations.
    Attention Perception & Psychophysics 03/2015; DOI:10.3758/s13414-015-0871-z · 1.97 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the causal effects of financial incentives on the quality of crowdwork. We focus on performance-based payments (PBPs), bonus payments awarded to workers for producing high quality work. We design and run randomized behavioral experiments on the popular crowdsourcing platform Amazon Mechanical Turk with the goal of understanding when, where, and why PBPs help, identifying properties of the payment, payment structure, and the task itself that make them most effective. We provide examples of tasks for which PBPs do improve quality. For such tasks, the effectiveness of PBPs is not too sensitive to the threshold for quality required to receive the bonus, while the magnitude of the bonus must be large enough to make the reward salient. We also present examples of tasks for which PBPs do not improve quality. Our results suggest that for PBPs to improve quality, the task must be effort-responsive: the task must allow workers to produce higher quality work by exerting more effort. We also give a simple method to determine if a task is effort-responsive a priori. Furthermore, our experiments suggest that all payments on Mechanical Turk are, to some degree, implicitly performance-based in that workers believe their work may be rejected if their performance is sufficiently poor. Finally, we propose a new model of worker behavior that extends the standard principal-agent model from economics to include a worker's subjective beliefs about his likelihood of being paid, and show that the predictions of this model are in line with our experimental findings. This model may be useful as a foundation for theoretical studies of incentives in crowdsourcing markets.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We recruit an online labor force through's Mechanical Turk platform to identify clouds and cloud shadows in Landsat satellite images. We find that a large group of workers can be mobilized quickly and relatively inexpensively. Our results indicate that workers' accuracy is insensitive to wage, but deteriorates with the complexity of images and with time-on-task. In most instances, human interpretation of cloud impacted area using a majority rule was more accurate than an automated algorithm (Fmask) commonly used to identify clouds and cloud shadows. However, cirrus-impacted pixels were better identified by Fmask than by human interpreters. Crowd-sourced interpretation of cloud impacted pixels appears to be a promising means by which to augment or potentially validate fully automated algorithms.
    Remote Sensing 02/2015; 7(3):2334-2351. DOI:10.3390/rs70302334 · 2.62 Impact Factor

Full-text (4 Sources)

Available from
May 20, 2014