Conducting behavioral research on Amazon's Mechanical Turk.

Yahoo! Research, New York, USA.
Behavior Research Methods (Impact Factor: 2.12). 06/2011; 44(1):1-23. DOI: 10.3758/s13428-011-0124-6
Source: PubMed

ABSTRACT Amazon's Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this article is to demonstrate how to use this Web site for conducting behavioral research and to lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments, and faster iteration between developing theory and executing experiments. While other methods of conducting behavioral research may be comparable to or even better than Mechanical Turk on one or more of the axes outlined above, we will show that when taken as a whole Mechanical Turk can be a useful tool for many researchers. We will discuss how the behavior of workers compares with that of experts and laboratory subjects. Then we will illustrate the mechanics of putting a task on Mechanical Turk, including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform, including techniques for conducting synchronous experiments, methods for ensuring high-quality work, how to keep data private, and how to maintain code security.


Available from: Winter Mason, Apr 21, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: When people rapidly judge the truth of claims presented with or without related but nonprobative photos, the photos tend to inflate the subjective truth of those claims-a "truthiness" effect (Newman et al., 2012). For example, people more often judged the claim "Macadamia nuts are in the same evolutionary family as peaches" to be true when the claim appeared with a photo of a bowl of macadamia nuts than when it appeared alone. We report several replications of that effect and 3 qualitatively new findings: (a) in a within-subjects design, when people judged claims paired with a mix of related, unrelated, or no photos, related photos produced truthiness but unrelated photos had no significant effect relative to no photos; (b) in a mixed design, when people judged claims paired with related (or unrelated) and no photos, related photos produced truthiness and unrelated photos produced "falseness;" and (c) in a fully between design, when people judged claims paired with either related, unrelated, or no photos, neither truthiness nor falsiness occurred. Our results suggest that photos influence people's judgments when a discrepancy arises in the expected ease of processing, and also support a mechanism in which-against a backdrop of an expected standard-related photos help people generate pseudoevidence to support claims. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
    Journal of Experimental Psychology Learning Memory and Cognition 03/2015; DOI:10.1037/xlm0000099 · 3.10 Impact Factor
  • Journal of Educational Psychology 01/2015; DOI:10.1037/edu0000031 · 3.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We recruit an online labor force through's Mechanical Turk platform to identify clouds and cloud shadows in Landsat satellite images. We find that a large group of workers can be mobilized quickly and relatively inexpensively. Our results indicate that workers' accuracy is insensitive to wage, but deteriorates with the complexity of images and with time-on-task. In most instances, human interpretation of cloud impacted area using a majority rule was more accurate than an automated algorithm (Fmask) commonly used to identify clouds and cloud shadows. However, cirrus-impacted pixels were better identified by Fmask than by human interpreters. Crowd-sourced interpretation of cloud impacted pixels appears to be a promising means by which to augment or potentially validate fully automated algorithms.
    Remote Sensing 02/2015; 7(3):2334-2351. DOI:10.3390/rs70302334 · 2.62 Impact Factor