Article

Evaluating On-Line Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples-the modal sample in published experimental political science-but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples. © The Author 2012. Published by Oxford University Press on behalf of the Society for Political Methodology. All rights reserved.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The use of MTurk is controversial. Although some researchers have evaluated the participant samples to be skewed (Casler et al., 2013), other studies have shown that MTurk has an adequate representation of the population (Aguinis et al., 2021;Berinsky et al., 2012;Burnham et al., 2018;Hauser et al., 2019;Huff & Tingley, 2015). However, MTurk is a widely used "pull" method of online recruitment, so we chose it as our final online recruitment site to compare with Facebook and ResearchMatch. ...
Article
Full-text available
Online survey studies have become an increasingly popular method of data collection, particularly within behavioral sciences. Though there are many benefits, many challenges also arise when using internet recruitment to conduct online studies. There is a paucity of peer-reviewed guidance on how to protect data against these challenges. We present a detailed account of our online recruitment and survey methodology when recruiting adults (n = 241) for a study on a sensitive topic requiring anonymous data collection. We evaluated the influence of two compensation amounts (10and10 and 15 USD) and three online recruitment sites on data validity. The results indicated that both the type of recruitment site and the compensation amount influenced the likelihood that participants would provide a valid survey. The findings have implications for the development of best practice guidelines for online studies and to provide recommendations for future studies using online recruitment and data collection.
... The survey participants were drawn from an online convenience sample via Amazon's Mechanical Turk (hereafter "MTurk"). There have been some concerns about MTurk stressed since its inception as a tool for experimental research; however, the reliability of MTurk has been substantiated through the successful replication of numerous major American surveys (Berinsky et al., 2012). The potency of these findings also suggests a noteworthy degree of generalizability for associated findings (Mullinix et al., 2015). ...
Article
Worldwide many public services are delivered by nonprofit organizations, both secular and faith based. The reliance on nonprofits for service delivery is especially prominent in the provision of relief efforts in response to natural and human-caused disasters. Although there is a growing literature on sector bias (public, private and nonprofit) in public service delivery, the role of faith based nonprofits has generally been ignored in the despite their prominence in practice. Using two randomized experiments involving US subjects focused on the delivery of humanitarian aid to Somalia, we examine the question of bias in the evaluation of performance based on the type of organization delivering the service. The first experiment contrasts government delivery of aid versus that provided by denomination based organizations or generic faith based organizations that are nondenominational. The second experiment varies the denominational affiliation of faith based nonprofits to examine those that are Methodist, Catholic, or Muslim. We find that US residents view faith based nonprofits as less effective than secular nonprofits; but there is no bias in terms of discounting performance information based on which type of organization was delivering the services. The second experiment showed that there were no differences in assessment based on the denominational affiliation of the nonprofit and no biases in discounting performance. The implications of these findings for the delivery of public services are then discussed.
... MTurk was selected because it provides a diverse and relatively representative sample of the general population, allows for efficient and cost-effective data collection, and has been widely used in social science research. Data collected on MTurk has been proven to be as reliable as those obtained from traditional survey methods (53)(54)(55). ...
Article
Full-text available
Introduction Social media plays a crucial role in shaping health behaviors by influencing users' perceptions and engagement with health-related content. Understanding these dynamics is important as new social media technologies and changing health behaviors shape how people engage with health messages. Aim The current study explored the relationship between the characteristics of content creators, the messaging strategies employed in social media, and users' engagement with social media content, and whether these features are linked to users' behavioral intentions. Methods This study adopts a cross-sectional survey design. A total of 1,141 participants were recruited. We have developed a structural equation model to investigate the relationships between the characteristics of content creators, the messaging strategies employed in social media, users’ perceived HBM constructs, user engagement, and users' behavioral intentions. Results Results revealed that social media posts focusing on self-efficacy were linked to increased willingness to engage in healthy behaviors. Additionally, individuals who demonstrate stronger perceptions of HBM constructs—such as higher perceived susceptibility and benefits of vaccination—are more likely to engage with posts, which was associated with higher vaccination intention. Posts authored by celebrities garnered a relatively higher number of favorites, while a greater proportion of politicians as content creators was linked to increased user comment intention. Conclusion Our study underscores the potential of integrating the Health Belief Model into social media to help promote health behaviors like the COVID-19 vaccination. Furthermore, our findings offer valuable insights for professionals and policymakers, guiding them in crafting effective message strategies and selecting appropriate sources to promote health behaviors on social media platforms.
... Moreover, using the MTurk platform did provide advantages over alternative methods of data collection, including greater demographic [58] and geographic diversity than is found in college student samples often used in psychological research. Importantly, MTurk samples generally perform similarly to samples drawn from other sources across many tasks [59]. Moreover, relations that have been reported elsewhere in the literature (e.g., between disgust sensitivity and perceived vulnerability to disease) successfully replicate in this sample [19,51]. ...
Article
Full-text available
Recent theorizing suggests that people gravitate toward conspiracy theories during difficult times because such beliefs promise to alleviate threats to psychological motives. Surprisingly, however, previous research has largely failed to find beneficial intrapersonal effects of endorsing an event conspiracy theory for outcomes like well-being. The current research provides correlational evidence for a link between well-being and an event conspiracy belief by teasing apart this relation from (1) the influence of experiencing turmoil that nudges people toward believing the event conspiracy theory in the first place and (2) conspiracist ideation—the general tendency to engage in conspiratorial thinking. Across two studies we find that, when statistically accounting for the degree of economic turmoil recently experienced and conspiracist ideation, greater belief in COVID-19 conspiracy theories concurrently predicts less stress and longitudinally predicts greater contentment. However, the relation between COVID-19 conspiracy belief and contentment diminishes in size over time. These findings suggest that despite their numerous negative consequences, event conspiracy beliefs are associated with at least temporary intrapersonal benefits.
... Prolific.co, a fee-for-service panel, was used for data collection. Prolific attracts diverse samples across the United States and globally and has been found satisfactory in collecting high-quality data from nationally representative samples despite some drawbacks [33,34]. The survey opportunity was listed to prospective subjects on Prolific.co ...
Article
Full-text available
Background: Robust evidence indicates that having few or poor-quality social connections is associated with poorer physical health outcomes and risk for earlier death (Snyder-Mackler N, Science 368, 2020; Vila J, Front Psychol 12:717164, 2021). Aim: This study sought to determine whether recent attention on social connection and loneliness brought on by the COVID-19 pandemic may influence risk perception and whether these perceptions were heightened among those who are lonely. Methods: Two waves of online survey data were collected. The first included data from 1,486 English-speaking respondents in the US, UK, and Australia, and a second sample of 999 nationally representative US adults, with a final sample of 2392 respondents from the US and UK. Results: Perceptions of risk have remained consistent, underestimating the influence of social factors on health outcomes and longevity, even among respondents who reported moderate-to-severe levels of loneliness. Conclusions: Despite heightened awareness and discourse during the COVID-19 pandemic, public perception in the US and UK continues to significantly underestimate the impact of social factors on physical health and mortality. This underestimation persists regardless of individual loneliness levels, underscoring the need for enhanced public education and policy efforts to recognize social connection as a crucial determinant of health outcomes.
... Online behavioral research has thrived over the last two decades, thanks in no small part to the proliferation of crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) and Prolific (Berinsky et al., 2012;Buhrmester et al., 2011;Litman et al., 2017;Palan & Schitter, 2018;Peer et al., 2017). These widely utilized platforms have dramatically reduced the resources required for data collection, thus revolutionizing how researchers across disciplines collect data (Chandler & Shapiro, 2016;Zhou & Fishbach, 2016). ...
Article
Full-text available
Online crowdsourcing platforms such as MTurk and Prolific have revolutionized how researchers recruit human participants. However, since these platforms primarily recruit computer-based respondents, they risk not reaching respondents who may have exclusive access or spend more time on mobile devices that are more widely available. Additionally, there have been concerns that respondents who heavily utilize such platforms with the incentive to earn an income provide lower-quality responses. Therefore, we conducted two studies by collecting data from the popular MTurk and Prolific platforms, Pollfish, a self-proclaimed mobile-first crowdsourcing platform, and the Qualtrics audience panel. By distributing the same study across these platforms, we examine data quality and factors that may affect it. In contrast to MTurk and Prolific, most Pollfish and Qualtrics respondents were mobile-based. Using an attentiveness composite score we constructed, we find mobile-based responses comparable with computer-based responses, demonstrating that mobile devices are suitable for crowdsourcing behavioral research. However, platforms differ significantly in attentiveness, which is also affected by factors such as the respondents’ incentive for completing the survey, their activity before engaging, environmental distractions, and having recently completed a similar study. Further, we find that a stronger system 1 thinking is associated with lower levels of attentiveness and acts as a mediator between some of the factors explored, including the device used and attentiveness. In addition, we raise a concern that most MTurk users can pass frequently used attention checks but fail less utilized measures, such as the infrequency scale. Supplementary information The online version contains supplementary material available at 10.3758/s13428-025-02618-1.
... Second, I weighted the data to approximate representativeness within year, race, and party. Many of the studies in this analysis were fielded by industryleading polling firms on close-to-representative samples, but others, including my own, were fielded on platforms such as Mechanical Turk and Lucid, which tend to yield samples that are somewhat less representative (Berinsky, Huber, and Lenz 2012;Coppock and McClellan 2019). Increased use of the latter category of platforms in recent years could be an issue for the over-time analysis I am conducting if the non-representativeness of these samples leads to a greater bias in overall estimates for more recent years. ...
Article
Full-text available
Equitable representation of minority groups is a challenge for democratic government. One way to resolve this dilemma is for majority-group voters to support minority-group candidates, but this support is often elusive. To understand how such inter-group coalitions become possible, this paper investigates the case of white Democratic Americans’ growing support for Black political candidates. I show that as white Democrats’ racial attitudes have liberalized, an increasing number of majority-white districts have elected Black congressional representatives. White Democratic survey respondents have also come to prefer Black candidate profiles, as demonstrated in a meta-analysis of 42 experiments. White Democratic respondents in a series of original conjoint experiments were most likely to prefer Black profiles when they expressed awareness of racial discrimination, low racial resentment, and dislike towards Trump. Additional tests underscore the association between majority-group voters’ concern about racial injustice and their support for minority-group candidates.
... 5). MTurk respondents are also typically more diverse (Weigold & Weigold, 2022) and perhaps even more "representative" of the United States relative to other convenience samples typically employed in psychological research (e.g., college students; Berinsky et al., 2012). In fact, the MTurk recruitment strategy reflects what other scholars have utilized to examine threat perceptions of immigrants (Guillermo et al., 2021;Rowatt et al., 2020;Wei et al., 2019). ...
Article
Full-text available
La presente investigación utilizó la Teoría de la Amenaza Intergrupal (ITT, por sus siglas en inglés) para examinar hasta qué punto las percepciones de amenaza de los inmigrantes mexicanos indocumentados predecían las actitudes hacia las políticas de inmigración. Conceptualizamos la amenaza intergrupal como amenaza simbólica, amenaza realista, ansiedad, estereotipos negativos, y ampliamos las investigaciones previas de la ITT mediante la evaluación de estereotipos positivos. Dos de las cuatro medidas dependientes no fueron específicas del país de origen: opiniones sobre la política general de inmigración y posturas favorables a la inmigración. Las dos medidas específicas del país incluyeron opiniones sobre políticas que afectan directamente a la inmigración procedente de México y el acuerdo para proporcionar a los inmigrantes mexicanos recursos básicos en los centros de detención. Planteamos la hipótesis de que la amenaza simbólica, la amenaza realista, la ansiedad y los estereotipos negativos predecirían opiniones más punitivas en todas las medidas de resultados, mientras que los estereotipos positivos predecirían posturas más favorables. El estudio reclutó a 175 adultos estadounidenses a través de Amazon Mechanical Turk. En los análisis de regresión, la amenaza simbólica tuvo correspondencia con posturas más punitivas en todas las medidas dependientes, excepto en las posturas favorables a la inmigración. La amenaza realista predijo una política general punitiva y menores posturas favorables a la inmigración. La ansiedad predijo opiniones más duras sobre las políticas generales y específicas de México. Los estereotipos mostraron evidencia de efectos predictivos únicos: los estereotipos negativos predijeron opiniones punitivas sobre las políticas generales y específicas de México, mientras que los estereotipos positivos se relacionaron con posturas más favorables a la inmigración y con el apoyo para proporcionar recursos en detención (y con opiniones sobre políticas generales ligeramente menos punitivas). El modelado exploratorio de ecuaciones estructurales reveló que sólo la amenaza simbólica predecía las posturas sobre las políticas generales y específicas de México, y sólo los estereotipos positivos predecían las opiniones sobre los recursos en detención. La presente investigación pone en evidencia las maneras diferenciadas en que las percepciones de amenaza intergrupal, incluyendo los estereotipos positivos, se asocian con las posturas con respecto a la inmigración.
... CloudResearch is a U.S.-based platform that extensively screens study participants to prevent threats to online data quality. Prior work has demonstrated that participants from Mechanical Turk offer valid data and that CloudResearch screening can improve the quality of responses (Berinsky et al., 2012;Coppock, 2019;Litman et al., 2017). Due to a widespread concern that MTurk samples tend to skew more liberal than nationally representative samples, we preregistered that we would oversample Republican respondents if selfidentifying Democrats and Democratic leaners exceeded 55% of the first 1,000 responses. ...
Article
State media outlets spread propaganda disguised as news online, prompting social media platforms to attach state-affiliated media tags to their accounts. Do these tags reduce belief in state media misinformation? Previous studies suggest the tags reduce misperceptions but focus on Russia, and current research does not compare these tags with other interventions. Contrary to expectations, a preregistered U.S. experiment found no effect of Twitter-style tags on belief in false state media claims, seemingly because they were rarely noticed. By contrast, fact-check labels decreased belief in false information from state outlets. We recommend platforms design state media tags that are more visible to users.
... The veterinary clinic staff survey was opportunistically distributed and, based on the respondent demographics, geographically biased. Similarly, the veterinary client survey lacked representation from some geographic regions and, while Amazon's Mechanical Turk platform has been shown to be more representative of the US population than convenience sampling, there is typically an over-representation of younger and urban based respondents (41,42). Respondents' previous experience with disasters, which was not accounted for in this study, may have influenced their decision to participate in the project and their responses to individual questions. ...
Article
Full-text available
Climate change has made disasters, and their associated health risks, more frequent and severe. Despite these growing risks, a substantial proportion of adults in the US do not have a disaster plan. Even for those who have disaster plans, it is unclear if these always include pets. The objective of this project was to explore the potential for veterinary teams to facilitate the development of pet-inclusive disaster plans through conversations during routine veterinary visits. We conducted two separate anonymous surveys, one for veterinary staff and one for veterinary clients. Overall, we found that both groups believe disasters are increasing and likely to impact people and their pets, however respondents remain largely unprepared for these events. Although both groups reported that the topic of disaster preparedness was not typically covered during veterinary visits, pet owners overwhelmingly agreed that pet health professionals are trustworthy sources of information, and that it would be helpful to have support from their veterinary team in developing a disaster plan that includes their pets. Barriers to such conversations, and potential solutions, were explored. Collectively these findings reinforce the role of veterinary professionals as trusted community members who can enhance public health and community resilience by integrating disaster preparedness into their practice.
... Prolific.co, a fee-for-service panel, was used for data collection. Prolific attracts diverse samples across the United States and globally and has been found satisfactory in collecting high-quality data from nationally representative samples despite some drawbacks [33,34]. The survey opportunity was listed to prospective subjects on Prolific.co ...
Article
Full-text available
Background Robust evidence indicates that having few or poor-quality social connections is associated with poorer physical health outcomes and risk for earlier death (Snyder-Mackler N, Science 368, 2020; Vila J, Front Psychol 12:717164, 2021). Aim This study sought to determine whether recent attention on social connection and loneliness brought on by the COVID-19 pandemic may influence risk perception and whether these perceptions were heightened among those who are lonely. Methods Two waves of online survey data were collected. The first included data from 1,486 English-speaking respondents in the US, UK, and Australia, and a second sample of 999 nationally representative US adults, with a final sample of 2392 respondents from the US and UK. Results Perceptions of risk have remained consistent, underestimating the influence of social factors on health outcomes and longevity, even among respondents who reported moderate-to-severe levels of loneliness. Conclusions Despite heightened awareness and discourse during the COVID-19 pandemic, public perception in the US and UK continues to significantly underestimate the impact of social factors on physical health and mortality. This underestimation persists regardless of individual loneliness levels, underscoring the need for enhanced public education and policy efforts to recognize social connection as a crucial determinant of health outcomes.
... Lucid distributes surveys to respondents who have opted into various online survey platforms and quota samples with respect to age, gender, race/ethnicity, and region to achieve a diverse national sample. Though online opt-in panels are not nationally representative as with a probability-based sample, they can provide better estimates of causal effects in experimental designs than other convenience samples (Berinsky, Huber, and Lenz 2012). 2 I analyze the 1,535 respondents in the sample who identified as white and who completed the survey. ...
Article
Full-text available
Imagining oneself in another’s position can soften animus and promote empathy. When one’s loved ones have intense contact with carceral institutions, it can provoke a sense of injustice and political mobilization. Drawing on these insights, I design a survey experiment which assigns respondents to a no-treatment condition, an informational control, an egocentric perspective-taking exercise (imagining they are incarcerated), or a surrogate perspective-taking exercise (imagining someone close to them is incarcerated). I test the effects of the treatments on attitudes toward prisoner release and a semi-behavioral measure—whether respondents write a message to their sheriff in support of release. Relative to the no-treatment condition, the informational control doesn’t elicit changes. However, egocentric and surrogate perspective-taking can increase pro-release attitudes and mobilize respondents to write in support of release. These results push forward the literature on punitive attitudes by considering what forces might mobilize Americans against the carceral state.
... student assistants). These populations differ from the general population on important characteristics like age, education, and cultural context (Smart et al., 2024;Berinsky et al., 2012;Ouyang et al., 2022), and these characteristics impact the labels they assign (Sap et al., 2022;Fleisig et al., 2023;Kirk et al., 2024). 1 The code for experiments will be available at https:// github.com/soda-lmu/PAIR. Fortunately, survey researchers have developed robust statistical techniques to estimate populationlevel parameters from non-representative samples (Eckman et al., 2024;Bethlehem et al., 2011a). ...
Preprint
Full-text available
Models trained on crowdsourced labels may not reflect broader population views when annotator pools are not representative. Since collecting representative labels is challenging, we propose Population-Aligned Instance Replication (PAIR), a method to address this bias through statistical adjustment. Using a simulation study of hate speech and offensive language detection, we create two types of annotators with different labeling tendencies and generate datasets with varying proportions of the types. Models trained on unbalanced annotator pools show poor calibration compared to those trained on representative data. However, PAIR, which duplicates labels from underrepresented annotator groups to match population proportions, significantly reduces bias without requiring new data collection. These results suggest statistical techniques from survey research can help align model training with target populations even when representative annotator pools are unavailable. We conclude with three practical recommendations for improving training data quality.
Article
Comparing mobility is an important but controversial issue. In this paper, we argue that in a specific and relevant case, there exists a univocal and non-controversial definition of greater (exchange) mobility that allows for unambiguous comparisons. We conducted a questionnaire experiment to investigate whether people’s perceptions of social mobility align with this definition, and we found that people’s choices are broadly in line with the theoretical predictions.
Article
Full-text available
It is commonly argued that factual knowledge about a political issue increases attitude polarization due to politically motivated reasoning. By this account, individuals ignore counter-attitudinal facts and direct their attention to pro-attitudinal facts; reject counter-attitudinal facts when directly confronted with them; and use pro-attitudinal facts to counterargue, all making them more polarized. The observation that more knowledgeable partisans are often more polarized is widely taken as support for this account. Yet these data are only correlational. Here, we directly test the causal effect of increasing issue-relevant knowledge on attitude polarization. Specifically, we randomize whether N = 1,011 participants receive a large, credible set of both pro- and counter-attitudinal facts on a contentious political issue – gun control – and provide a modest incentive for them to learn this information. We find evidence that people are willing to engage with and learn policy-relevant facts both for and against their initial attitudes; and that this increased factual knowledge shifts individuals towards more moderate policy attitudes, a durable effect that is still visible after one month. Our results suggest that the impact of directionally motivated reasoning on the processing of political information might be more limited than previously thought.
Article
Large language models (LLMs) provide cost-effective but possibly inaccurate predictions of human behavior. Despite growing evidence that predicted and observed behavior are often not interchangeable , there is limited guidance on using LLMs to obtain valid estimates of causal effects and other parameters. We argue that LLM predictions should be treated as potentially informative observations, while human subjects serve as a gold standard in a mixed subjects design . This paradigm preserves validity and offers more precise estimates at a lower cost than experiments relying exclusively on human subjects. We demonstrate—and extend—prediction-powered inference (PPI), a method that combines predictions and observations. We define the PPI correlation as a measure of interchangeability and derive the effective sample size for PPI. We also introduce a power analysis to optimally choose between informative but costly human subjects and less informative but cheap predictions of human behavior. Mixed subjects designs could enhance scientific productivity and reduce inequality in access to costly evidence.
Article
Full-text available
In a field experiment where revelation of co-worker earnings and the shape of the earnings distribution are exogenously controlled, I test whether relative earnings information itself influences effective labor supply and labor supply elasticity. Piece-rate workers shown their peer earnings standing provide significantly more labor effort. However, the productivity boost from earnings disclosure disappears when inequalities in the underlying piece rate exist. By cross-randomizing net of tax piece rates, labor supply elasticity with respect to the net of tax wage is also estimated. Unlike labor level, I find this labor elasticity is unchanged by the relative standing information. Taken together, these findings have direct implications for how to best model relative status concerns in utility functions, supporting some and precluding other common ways. More speculatively, they also suggest social comparisons could be strategically used to grow firm output or the tax base, and, that underlying inequalities in compensation schemes inhibit the ability of social comparisons to incentivize work.
Article
Full-text available
p>It is well known that voters’ evaluation of candidates on leadership traits influences their overall candidate assessment and vote choice (i.e., leader effects). It remains unclear, however, whether positive or negative leader trait evaluations are most influential. We argue that especially in current-day political reality—in which ideological and affective polarization are skyrocketing and the political climate is fueled with negativity, high levels of incivility, and negative campaigning—the negative leader effects outweigh the positive ones. Moreover, we expect this negativity bias in leader effects to be conditioned by partisanship and political dissatisfaction. To test these expectations, we triangulate multiple studies. First, we use data from a multi-country election survey to examine the relation between perceived leadership traits of real candidates and party preferences, providing observational evidence from the US, the Netherlands, France, and Germany. Second, focusing on the causal mechanism, we test the negativity bias in a survey experiment among American voters. Here, we manipulate how leadership traits (competence, leadership, integrity, empathy) of a fictitious candidate are presented in terms of valence (positive, negative), and test the impact of these cues on voters’ candidate evaluations and vote choices. The findings indicate, as predicted, that negative leader effects influence voters most strongly. Thus, the role of party leaders is mainly a push instead of a pull factor in elections. Additionally, we show that partisanship and political dissatisfaction seem relevant only for candidate evaluations, not for vote choice. This article pushes the field of candidate evaluations forward by examining the dynamics of the negativity bias in leader effects in an era of negative politics.</p
Article
Partisan gerrymandering threatens the health of democracy by manipulating formal institutions away from majority rule. In the conventional formulation, institutional manipulation mechanically alters political outcomes. Yet research has neglected the psychological effects of partisan redistricting, which can provoke an emotional backlash from voters. This study presents a theory of voting behavior in which citizen anger over political machinations incites greater turnout that can, in turn, partially compensate for partisan gerrymandering. The constituents that politicians target for disadvantage are agents who can learn they are being targeted, react emotionally, and become that much more motivated to vote. In two survey experiments on large samples, citizens were randomly assigned to receive information about gerrymandering that aimed to either advantage or disadvantage their party. Advantaged citizens on average feel positive emotions but do not significantly alter their intended turnout behavior. Disadvantaged citizens, on the contrary, report greater amounts of fear and anger. The angry participants declare significantly higher rates of voting intent. The results indicate that institutional manipulation may not result simply in mechanical effects but might also provoke psychological backlash that may be partly offsetting, suggesting an avenue for democratic resilience.
Article
Full-text available
Survey research in the Global South has traditionally required large budgets and lengthy fieldwork. The expansion of digital connectivity presents an opportunity for researchers to engage global subject pools and study settings where in-person contact is challenging. This paper evaluates Facebook advertisements as a tool to recruit diverse survey samples in the Global South. Using Facebook’s advertising platform, we quota-sample respondents in Mexico, Kenya, and Indonesia and assess how well these samples perform on a range of survey indicators, identify sources of bias, replicate a canonical experiment, and highlight trade-offs for researchers to consider. This method can quickly and cheaply recruit respondents, but these samples tend to be more educated than corresponding national populations. Weighting ameliorates sample imbalances. This method generates comparable data to a commercial online sample for a fraction of the cost. Our analysis demonstrates the potential of Facebook advertisements to cost-effectively conduct research in diverse settings.
Article
Have concerns about equal rights and equal chances crowded out economic equality as a priority of left parties? Despite the increased importance of inequality in political science, this contentiously fought debate has been standing on shaky empirical foundations. While voter's equality preferences are well understood, parties’ equality emphases remain uncharted territory. This research note assesses whether the Left has replaced its emphasis on economic equality with a focus on equal chances and equal rights. Based on a new dataset of 300,000 party statements, we use online crowd‐coding to map the equality trajectories of left parties in 12 OECD countries from 1970 to 2020. We examine if trade‐offs between economic and non‐economic aspects of inequality have come to dominate left parties’ equality profiles. Distinguishing social democratic, green and far‐left parties, we refute a meritocratic or ‘woke’ crowding out of redistribution. Yet, Social Democrats have indeed forsaken the once complementary link between economic equality and equal rights in favour of a weak trade‐off.
Article
Full-text available
Despite their positive relationships with health outcomes, few studies directly assess the relationships among religiosity and hope. Using item factor analysis (N = 630) within a religiously diverse United States sample, we hypothesized fundamentalism (H1: Intratextual Fundamentalism Scale, ITFS; Multidimensional Fundamentalism Inventory, MDFI; and Religious Fundamentalism Scale, RFS) and hope measures (H2: Adult Hope Scale, AHS; Integrative Hope Scale, IHS) would demonstrate acceptable psychometrics and statistically (H3) and practically significant relationships (H4). The ITFS possessed near perfect psychometrics (CFI = 1.000, TLI = 1.000, RMSEA = 0.000, SRMR = 0.009, omega = 0.92), but other measures needed modifications. After Bonferroni corrections, we found statistically significant relationships among the RFS and the AHS (r = .16) and IHS (r = .155) as well as the ITFS and the AHS (r = .155) and IHS (r = .162), all at a small effect size. However, there were no statistically significant relationships among the MDFI and the hope measures (H3). No association among a fundamentalism measure and a hope measure reached practical significance (H4). Given these results, we computed correlations among the original scales and found similar results among hope and fundamentalism measures (r = − .002–0.152). These findings indicate direct relationships among measures of hope and fundamentalism, but no relationship met the criteria for practical significance—before or after scale modifications. We discuss the implications of these findings regarding the integration of religiosity and spirituality into mental health research, training, and practice.
Article
I introduce and test for preference for simplicity in choice under risk. I characterize the theory axiomatically, and derive its properties and unique predictions relative to canonical models. By designing and running theoretically motivated experiments, I document that people value simplicity in ways not fully captured by existing models that study risk premia in financial markets. Participants' risk premia increase as complexity increases, holding moments fixed; their dominance violations increase in complexity; their behavior is predicted by simplicity's characterizing axiom; and their complexity aversion is heterogeneous in cognitive ability. None of expected utility theory, cumulative prospect theory, prospect theory, rational inattention, sparsity, salience, or probability weighting that differs by number of outcomes fully capture the experimental findings. I generalize the underlying theory to additionally capture broader measures of complexity, including obfuscation, computation, and language effects.
Article
Full-text available
Racial equity in education is often framed around “closing the achievement gap,” but many scholars argue this frame perpetuates deficit mindsets. The “opportunity gap” (OG) frame has been offered as an alternative to focus attention on structural injustices. In a preregistered survey experiment, I estimate the effects of framing racial equity in education around “achievement gaps” (AGs) versus OGs. I find U.S. adult respondents on MTurk gave higher priority to “closing the racial opportunity gap” versus “closing the racial achievement gap” (effect size = 0.11 SD). When randomly assigned to read an OG frame before being asked to explain the Black/White “achievement gap,” respondents were less likely to endorse cultural or individual-level explanations compared with respondents only shown AG statistics (effect size = –0.10 SD). I find no evidence the OG frame affected respondents’ racial stereotypes or policy preferences.
Article
Full-text available
Purpose The two primary purposes of the current study are to further understand the impact of corrective messages on misperceptions about election fraud in the US and to test the effect of party affiliation of the accused politician on participants’ election misperceptions. Design/methodology/approach To assess these relationships, we conducted a between-subjects randomized online experiment. Findings Our results show that participants in the control condition held higher misperceptions than those who were exposed to a correction message. Findings also showed that liberal media use was negatively associated with election fraud misperceptions, while conservative media use, information from Donald Trump, authoritarianism and self-reported conservatives were positively associated with election fraud misperceptions. Originality/value Experimental test to understand election fraud misperceptions, using our own original stimulus materials.
Article
This study investigates how team identification, moral emotions, and moral decoupling influence fans’ punitive behaviors in response to sports-related crises. Using a sample of 437 sports fans, the researchers examined whether negative moral emotions (anger, contempt, and disgust) drive punitive behaviors and how team identification affects moral decoupling. Findings reveal that higher team identification is associated with increased moral decoupling, enabling fans to separate their support for team performance from ethical transgressions. Additionally, negative moral emotions were significant predictors of punitive behaviors, with perceived crisis severity mediating the relationship between team identification and fan punishment. Forgiveness emerged as a key moderator, reducing the likelihood of punitive actions among highly identified fans. However, forgiveness did not moderate the relationship between team identification and moral decoupling. Both theoretical and practical implications are discussed.
Article
Hashtags are integral to conversations and discussions on social media and are used to communicate emotions and meaning. However, little is known about how hashtags form impressions of social media users' personalities, credibility, and attractiveness. This study utilizes Social Information Processing Theory (SIPT) to examine if impressions can be formed about the sender based on their use of hashtags on social media. A between-subjects post-test-only experimental design was implemented in which participants (N = 322) viewed mock Instagram posts that differed in the number of hashtags utilized (low vs. high) and framing (positive, negative, neutral) and completed assessments on source perceptions of extraversion, neuroticism, narcissism, trustworthiness, and social attractiveness. Posts with a high number of positive hashtags were associated with higher perceptions of extraversion. Posts with positive hashtags made the source perceived as more socially attractive and trustworthy than those with neutral and negative hashtags. This research significantly advances theory by extending tenets of SIPT to understand how hashtags influence person perception on social media.
Article
High-profile incidents of police violence against Black citizens over the past decade have spawned contentious debates in the United States on the role of police. This debate has played out prominently in the news media, leading to a perception that media outlets have become more critical of the police. There is currently, however, little empirical evidence supporting this perceived shift. We construct a large dataset of local news reporting on the police from 2013 to 2023 in 10 politically diverse U.S. cities. Leveraging advanced language models, we measure criticism by analyzing whether reporting supports or is critical of two contentions: 1) that the police protect citizens and 2) that the police are racist. To validate this approach, we collect labels from members of different political parties. We find that contrary to public perceptions, local media criticism of the police has remained relatively stable along these two dimensions over the past decade. While criticism spiked in the aftermath of high-profile police killings, such as George Floyd’s murder, these events did not produce sustained increases in negative police news. In fact, reporting supportive of police effectiveness has increased slightly since Floyd’s death. We find only small differences in coverage trends in more conservative and more liberal cities, undermining the idea that local outlets cater to the politics of their audiences. Last, although Republicans are more likely to view a piece of news as supportive of the police than Democrats, readers across parties see reporting as no more critical than it was a decade ago.
Article
The present study examined search strategies that those in need of mental health services might use to select a provider. Specifically, 176 participants were recruited using Amazon Mechanical Turk and asked to imagine a scenario in which they had taken a job in a new city and had begun feeling isolated and depressed to the point that they decided to seek the services of a mental health provider. Participants responded to a survey that queried their likelihood of using various referral sources based on a five-category Likert-style response scale. An exploratory factor analysis yielded four main factors of potential referrals: promotional sources, nonmedical professional sources, traditional healthcare providers, and personal sources. These factors were then explored using multivariate analyses. The implications of these findings for providing pathways to greater access to mental health services as well as limitations of the study and future research directions are discussed.
Article
This study contributes the first experiment (n = 1,342) comparing audience reception of solidarity reporting to monitorial reporting. Solidarity reporting prioritizes insights of people impacted, while monitorial reporting focuses on officials. We find that covering abortion access protests using a solidarity approach improved news story credibility perceptions for Democrats, while Republicans rated the solidarity and monitorial stories as equally credible. Solidarity reporting may help newsrooms cover contentious issues inclusively, without diminishing audience perceptions of credibility.
Article
The COVID‐19 pandemic is a critical challenge to public health, with authorities emphasizing the importance of measures like vaccination to curb its spread. Yet, pandemic misperceptions, including distrust in scientists and conspiratorial beliefs about the disease, pose significant barriers to these efforts. Amid the turmoil of the COVID‐19 pandemic, that is, there are some who revel in mayhem. Our research investigates the need for chaos (NFC)—the drive to disrupt societal institutions—as a predictor of pandemic misperceptions. In an online sample ( N = 1079 individuals), we found that those high in the NFC are also more anti‐intellectual, less cognitively sophisticated, more prone to conspiratorial thinking, including about COVID‐19, and reported reduced willingness to engage in other forms of disease mitigation, such as vaccination and social distancing. These observations emerged while controlling for ideology and other psychological, political, and demographic variables. We also find evidence that the relationships between NFC and COVID‐19‐specific behaviors may be explained by greater endorsement of COVID‐19 conspiracy theories (CTs). We consider the implications of these findings for a scientific understanding of pandemic psychology, political misperceptions, and the challenges that surround effective disease mitigation and other issues concerning public health.
Article
Conventional wisdom holds that women’s political underrepresentation partly stems from gendered stereotypes, with women candidates perceived as lower in ability and assertiveness, and as less competent to handle key issues like the economy and national security. However, recent research uncovers how societal leadership stereotypes have become less advantageous for men. Two conjoint experiments show that Americans’ stereotypes about political candidates follow similar trends: although women candidates (following conventional expectations) are perceived as friendlier and more moral than men, they are also seen as higher in ability and as equally assertive. Similarly, men and women candidates are perceived as equally competent to handle the economy, crime, and national security. Further analyses reveal that liberals and individuals low in hostile sexism hold stereotypes most favorable to women. These findings suggest that gendered candidate stereotypes likely constitute less of a hindrance to women seeking political nominations than in the past.
Article
Increasingly, crowdfunding is transforming financing for many people worldwide. Yet we know relatively little about how, why, and when funding outcomes are impacted by signaling between funders. We conduct two studies of N=500 and N=750 participants involved in crowdfunding to investigate the effect of “crowd signals,” i.e., certain characteristics deduced from the amounts and timing of contributions, on the decision to fund. In our first study, we find that, under a variety of conditions, contributions of heterogeneous amounts arriving at varying time intervals are significantly more likely to be selected than homogeneous contribution amounts and times. The impact of signaling is strongest among participants who are susceptible to social influence. The effect is remarkably general across different project types, fundraising goals, participant interest in the projects, and participants’ altruistic attitudes. Our second study using less strict controls indicates that the role of crowd signals in decision-making is typically unrecognized by participants. Our results underscore the fundamental nature of social signaling in crowdfunding. They highlight the importance of designing around these crowd signals and inform user strategies both on the project creator and funder side .
Article
The shift of public discourse to online platforms has intensified the debate over content moderation by platforms and the regulation of online speech. Designing rules that are met with wide acceptance requires learning about public preferences. We present a visual vignette study using a sample (N = 2,622) of German and U.S. citizens that were exposed to 20,976 synthetic social media vignettes mimicking actual cases of hateful speech. We find people's evaluations to be primarily shaped by message type and severity, and less by contextual factors. While focused measures like deleting hateful content are popular, more extreme sanctions like job loss find little support even in cases of extreme hate. Further evidence suggests in-group favoritism among political partisans. Experimental evidence shows that exposure to hateful speech reduces tolerance of unpopular opinions.
Article
Full-text available
Does shaming human rights violators shape attitudes at home? A growing literature studies the effect of shaming on public attitudes in the target state, but far less is known about its effect in countries initiating the criticism – that is the shamers . In this article, I theorize that when governments shame human rights violators they shape both government approval and human rights attitudes at home. Utilizing two US-based survey experiments, I demonstrate that by shaming foreign countries, governments can improve their image at home and virtue signal their dedication to human rights. At the same time, shaming can modestly shape tolerance towards certain domestic human rights violations. I consider the generalizability of my results through comprehensive supplementary analyses, where experimental insights are corroborated with cross-national observational data. Overall, my findings can provide valuable insight into governments’ incentives to engage in foreign criticism.
Article
Full-text available
Does an immigrant’s country of origin shape Americans’ immigration preferences? If so, are some attributes of origin countries likely to provoke particularly strong opposition over others? We answer these questions using three conjoint experimental studies that focus on one of these potential attributes: religion. In doing so we find consistent evidence of strong opposition to immigration from Muslim-majority countries. Furthermore, individual Muslim immigrants face stronger opposition than non-Muslims, independently of their origin countries. Aversion to Muslim immigration is found across the partisan divide, even though it is lower among Democrats than among Republicans. Our findings suggest that exclusionary immigration policies aimed at Muslims, like President Donald Trump’s travel ban, can have non-trivial support amongst the American public. Methodologically, we demonstrate the limitations of relying on country of origin as a catch-all attribute in conjoint experiments and suggest that, instead, researchers should directly manipulate the relevant characteristics of potential immigrants.
Article
Full-text available
Does the public disapprove of leaders who back down to initiate negotiation with non-state armed actors? This study advances our understanding of leader-public interactions during domestic security crises. First, we argue that the public disapproves of their leaders when they back down to initiate negotiations with non-state armed actors. Second, the public’s disapproval stems from their transfer of negative emotional reactions against armed groups to their leaders if they fail to escalate. Using survey experiments with Indian and Nigerian respondents, we provide novel empirical evidence to shed more light on strategic policy choices confronting leaders during domestic security crises.
Article
Individuals in developing countries are the ultimate end users of foreign aid. While the international donor community has emphasized the importance of aligning aid with recipient countries’ preferences, the literature on public opinion and foreign aid has remained largely focused on donors. Using an original conjoint experiment conducted in seven developing countries, we examine the determinants of public attitudes toward foreign aid in recipient countries. We find that the characteristics of donor countries and foreign aid projects significantly influence recipient attitudes, often more than the size of the aid packages themselves. Individuals in recipient countries consistently prefer aid from democracies and donors with transparent aid agencies, as well as aid delivered by international organizations rather than directly from donor countries’ aid agencies. These findings underscore the importance of multilateral aid agencies in aligning the preferences of donors and recipients.
Article
Amazon’s Mechanical Turk (MTurk) and Prolific are popular online platforms for connecting academic researchers with respondents. A broad literature has sought to assess the extent to which these respondents are representative of the U.S. population in terms of their demographic background, yet no work has assessed the representativeness of their daily lives. The authors provide this analysis by collecting time diaries from 136 MTurk and 156 Prolific respondents, which they compare with diary responses from 468 contemporaneous responses to the American Time Use Survey (ATUS). Responses from MTurk and Prolific respondents include several notable differences relative to ATUS responses, including doing less housework and care work, spending less time traveling, spending more time at home, and spending more time alone. In general, MTurk respondents worked more than ATUS respondents, and Prolific respondents spent more time in leisure. These differences persist even after adjusting for demographic differences. The present findings highlight time use as a potential major source of differences across samples that go beyond demographic differences. Thus, scholars interested in these samples should consider how time use may moderate processes of interest.
Article
There have been many attempts to use prebunking strategies to address the problem of misinformation. While extant research supports their efficacy, it also finds that they could make people suspicious even of true information. This study employs survey experiments in Taiwan to assess the effect of warning messages, with or without mentioning punishments one might incur if spreading misinformation, on people’s beliefs in false and true news, and their subsequent intention to share the information with others. The findings suggest that warning messages mentioning monetary punishments are the most effective in affecting people’s beliefs and intentions to behave.
Article
Research has demonstrated a relationship between moral disengagement and cyberbullying perpetration among adolescents, indicating youth with higher levels of moral disengagement are more likely to engage in cyberbullying. Less is known, however, regarding the impact of moral disengagement, particularly the influence of the four dimensions, on adults’ involvement in cyberbullying perpetration. Using a sample of adults ( n = 652; aged 18–50 years) in the United States, this study examined the relationship between the four dimensions of moral disengagement and cyberbullying perpetration. Findings revealed that approximately 12% had ever engaged in cyberbullying perpetration. Further, logistic regression results revealed that moral disengagement was a significant predictor of cyberbullying perpetration among adults, with the dimension of minimizing responsibility driving the significant relationship.
Article
Healthy eating is critical to consumers’ overall health. The purpose of this study was to examine body mass index (BMI), obesity knowledge, and self-efficacy, along with online nutrition information seeking (ONIS), as antecedents to healthy food purchase (HFP) in a moderated mediation model. An online survey was conducted using Amazon Mechanical Turk to recruit 897 participants, with 484 women and 380 men. A moderated mediation analysis was then used to explore the mediating effect of ONIS, and the moderating effects of obesity knowledge and self-efficacy. Results found the impact of ONIS on HFP was significantly generated by obesity knowledge but not by BMI. Both ONIS and self-efficacy yielded individual and interactive effects on HFP, and ONIS did not only generate a direct effect on HFP but also interacted with self-efficacy for HFP. Practically, it was suggested that online health information should be strategically crafted to promote healthy eating behavior, given that consumers in various health conditions were activated to purchase heathier foods via ONIS. Through the ONIS’s mediation of the relationship between obesity knowledge and HFP, consumers with poor obesity knowledge would be cultivated well to further develop their better eating habits.
Chapter
The concept of implicit bias – the idea that the unconscious mind might hold and use negative evaluations of social groups that cannot be documented via explicit measures of prejudice – is a hot topic in the social and behavioral sciences. It has also become a part of popular culture, while interventions to reduce implicit bias have been introduced in police forces, educational settings, and workplaces. Yet researchers still have much to understand about this phenomenon. Bringing together a diverse range of scholars to represent a broad spectrum of views, this handbook documents the current state of knowledge and proposes directions for future research in the field of implicit bias measurement. It is essential reading for those who wish to alleviate bias, discrimination, and inter-group conflict, including academics in psychology, sociology, political science, and economics, as well as government agencies, non-governmental organizations, corporations, judges, lawyers, and activists.
Article
Full-text available
We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.
Article
Full-text available
Taking part in an experiment is "a special form of social interaction." The S plays a role and places himself under the control of the E; he may agree "to tolerate a considerable degree of discomfort, boredom, or actual pain, if required to do so." The very high degree of control inherent in the experimental situation itself may lead to difficulties in experimental design. The S "must be recognized as an active participant in any experiment." With understanding of factors intrinsic to experimental context, experimental method in psychology may become a more effective tool in predicting behavior in nonexperimental contexts. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Indicates that research in social psychology has largely been based on college students tested in academic laboratories on academiclike tasks. How this dependence on one narrow data base may have biased the main substantive conclusions of sociopsychological research in this era is discussed. Research on the full life span suggests that, compared with older adults, college students are likely to have less crystallized attitudes, less formulated senses of self, stronger cognitive skills, stronger tendencies to comply with authority, and more unstable peer-group relationships. These peculiarities of social psychology's predominant data base may have contributed to central elements of its portrait of human nature. According to this view, people are quite compliant and their behavior is easily socially influenced, readily change their attitudes and behave inconsistently with them, and do not rest their self-perceptions on introspection. The data base may also contribute to this portrait of human nature's strong emphasis on cognitive processes and to its lack of emphasis on personality dispositions, material self-interest, emotionally based irrationalities, group norms, and stage-specific phenomena. The analysis implies the need both for more careful examination of sociopsychological propositions for systematic biases introduced by dependence on this data base and for increased reliance on adults tested in their natural habitats with materials drawn from ordinary life. (127 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
We report the results of the first large-scale experiment involving paid political advertising. During the opening months of a 2006 gubernatorial campaign, approximately $2 million of television and radio advertising on behalf of the incumbent candidate was deployed experimentally. In each experimental media market, the launch date and volume of television advertising were randomly assigned. In order to gauge movement in public opinion, a tracking poll conducted brief telephone interviews with approximately 1,000 registered voters each day and a brief follow-up one month after the conclusion of the television campaign. Results indicate that televised ads have strong but short-lived effects on voting preferences. The ephemeral nature of these effects is more consistent with psychological models of priming than with models of on-line processing.
Article
Full-text available
A flurry of recent studies indicates that candidates who simply look more capable or attractive are more likely to win elections. In this article, the authors investigate whether voters' snap judgments of appearance travel across cultures and whether they influence elections in new democracies. They show unlabeled, black-and-white pictures of Mexican and Brazilian candidates' faces to subjects living in America and India, asking them which candidates would be better elected officials. Despite cultural, ethnic, and racial differences, Americans and Indians agree about which candidates are superficially appealing (correlations ranging from .70 to .87). Moreover, these superficial judgments appear to have a profound influence on Mexican and Brazilian voters, as the American and Indian judgments predict actual election returns with surprising accuracy. These effects, the results also suggest, may depend on the rules of the electoral game, with institutions exacerbating or mitigating the effects of appearance.
Article
Full-text available
We examine how people evaluate a series of competing political messages received over the course of a campaign or policy debate. Instead of assuming a message has a fixed effect, we emphasize the variable effect of a message depending on when it is received in relation to other messages. We present data from two experiments showing there are critical differences in the psychology of framing between static and dynamic contexts. Competition between messages received concurrently tends to lead to cancellation of framing effects. However, when competing messages are received sequentially, individuals typically give disproportionate weight to the most recent frame, as the accessibility of earlier arguments decays over time. Therefore, framing effects on political preferences are unlikely to be negated simply by democratic competition. Recent messages, however, do not dominate for all individuals. Biases in how people evaluate competing messages over time vary across individuals depending on how they process information. Some individuals, owing to motivational or contextual factors, are more likely to process information in a manner that generates strong attitudes that endure. As hypothesized, individuals in our experiments who formed stronger attitudes when processing information gave greater weight to the first randomly assigned message they encountered in a sequence of messages, while individuals who formed weaker attitudes favored the last randomly assigned message they received. In the former case, competition over time produced primacy effects rather than balancing, and in the latter case, competition over time produced recency effects. We conclude by discussing the implications of our findings for understanding the power of communications in contemporary politics.
Article
Full-text available
Embedding experiments within surveys has reinvigorated survey research. Several survey experiments are generally embedded within a survey, and analysts treat each of these experiments as self-contained. We investigate whether experiments are self-contained or if earlier treatments affect later experiments, which we call “experimental spillover.” We consider two types of bias that might be introduced by spillover: mean and inference biases. Using a simple procedure, we test for experimental spillover in two data sets: the 1991 Race and Politics Survey and a survey containing several experiments pertaining to foreign policy attitudes. We find some evidence of spillover and suggest solutions to avoid bias.
Conference Paper
Full-text available
In this paper we discuss a screening process used in conjunction with a survey administered via Amazon.com's Mechanical Turk. We sought an easily implementable method to disqualify those people who participate but don't take the study tasks seriously. By using two previously pilot tested screening questions, we identified 764 of 1,962 people who did not answer conscientiously. Young men seem to be most likely to fail the qualification task. Those that are professionals, students, and non-workers seem to be more likely to take the task seriously than financial workers, hourly workers, and other workers. Men over 30 and women were more likely to answer seriously. Author Keywords
Conference Paper
Full-text available
User studies are important for many aspects of the design process and involve techniques ranging from informal surveys to rigorous laboratory studies. However, the costs involved in engaging users often requires practitioners to trade off between sample size, time requirements, and monetary costs. Micro-task markets, such as Amazon's Mechanical Turk, offer a potential paradigm for engaging a large number of users for low time and monetary costs. Here we investigate the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks. Although micro-task markets have great potential for rapidly collecting user measurements at low costs, we found that special care is needed in formulating tasks in order to harness the capabilities of the approach. Author Keywords Remote user study, Mechanical Turk, micro task, Wikipedia.
Article
Full-text available
The relationship between financial incentives and performance, long of interest to social scientists, has gained new relevance with the advent of web-based "crowd-sourcing" models of production. Here we investigate the effect of compensation on performance in the context of two experiments, conducted on Amazon's Mechanical Turk (AMT). We find that increased financial incentives increase the quantity, but not the quality, of work performed by participants, where the difference appears to be due to an "anchoring" effect: workers who were paid more also perceived the value of their work to be greater, and thus were no more motivated than workers paid less. In contrast with compensation levels, we find the details of the compensation scheme do matter---specifically, a "quota" system results in better work for less pay than an equivalent "piece rate" system. Although counterintuitive, these findings are consistent with previous laboratory studies, and may have real-world analogs as well.
Article
Full-text available
Crowdsourcing is a form of "peer production" in which work traditionally performed by an employee is outsourced to an "undefined, generally large group of people in the form of an open call." We present a model of workers supplying labor to paid crowdsourcing projects. We also introduce a novel method for estimating a worker's reservation wage--the smallest wage a worker is willing to accept for a task and the key parameter in our labor supply model. It shows that the reservation wages of a sample of workers from Amazon's Mechanical Turk (AMT) are approximately log normally distributed, with a median wage of $1.38/hour. At the median wage, the point elasticity of extensive labor supply is 0.43. We discuss how to use our calibrated model to make predictions in applied work. Two experimental tests of the model show that many workers respond rationally to offered incentives. However, a non-trivial fraction of subjects appear to set earnings targets. These "target earners" consider not just the offered wage--which is what the rational model predicts--but also their proximity to earnings goals. Interestingly, a number of workers clearly prefer earning total amounts evenly divisible by 5, presumably because these amounts make good targets.
Article
Full-text available
Although Mechanical Turk has recently become popular among social scientists as a source of experimental data, doubts may linger about the quality of data provided by subjects recruited from online labor markets. We address these potential concerns by presenting new demographic data about the Mechanical Turk subject population, reviewing the strengths of Mechanical Turk relative to other online and offline methods of recruiting subjects, and comparing the magnitude of effects obtained using Mechanical Turk and traditional subject pools. We further discuss some additional benefits such as the possibility of longitudinal, cross cultural and prescreening designs, and offer some advice on how to best manage a common subject pool.
Article
Full-text available
People prefer a sure gain to a probable larger gain when the two choices are presented from a gain perspective, but a probable larger loss to a sure loss when the objectively identical choices are presented from a loss perspective. Such reversals of preference due to the context of the problem are known as framing effects. In the present study, schema activation and subjects' interpretations of the problems were examined as sources of the framing effects. Results showed that such effects could be eliminated by introducing into a problem a causal schema that provided a rationale for the reciprocal relationship between the gains and the losses. Moreover, when subjects were freed from framing they were consistently risk seeking in decisions about human life, but risk averse in decisions about property. Irrationality in choice behaviors and the ecological implication of framing effects are discussed.
Article
Full-text available
The rapid growth of the Internet provides a wealth of new research opportunities for psychologists. Internet data collection methods, with a focus on self-report questionnaires from self-selected samples, are evaluated and compared with traditional paper-and-pencil methods. Six preconceptions about Internet samples and data quality are evaluated by comparing a new large Internet sample (N = 361,703) with a set of 510 published traditional samples. Internet samples are shown to be relatively diverse with respect to gender, socioeconomic status, geographic region, and age. Moreover, Internet findings generalize across presentation formats, are not adversely affected by nonserious or repeat responders, and are consistent with findings from traditional methods. It is concluded that Internet methods can contribute to many areas of psychology.
Article
To understand why rms rarely cut nominal wages, we hired workers for
Article
Analyses of question wording experiments on the General Social Survey spending items showed consistent wording effects for several issues across three years. An examination of types of wording change indicate that even minor changes can affect responses. However, an examination of interactions with respondent individual differences showed no consistent pattern.
Article
Two experimental studies were conducted to examine the influence of elaboration on the framing of a medical decision. Subjects (N = 344) were undergraduate students randomly assigned to one cell of a 2 × 2 design (high- and low-elaboration conditions; positive and negative decision frame versions). In the low-elaboration condition, a framing effect (Tversky & Kahneman, 1981) was observed: Most of the subjects chose the riskless option when decision options were phrased positively in terms of gains, whereas most chose the risky option when options were phrased negatively in terms of losses. However, in the high-elaboration condition, the framing effect was not observed.
Article
In this article, we examine the effect of citizens’ risk orientations on policy choices that are framed in various ways. We introduce an original risk orientations scale and test for the relationship between risk orientations and policy preferences using an original survey experiment. We find that individuals with higher levels of risk acceptance are more likely to prefer probabilistic outcomes as opposed to certain outcomes. Mortality and survival frames influence the choices citizens make, but so does our individual-difference measure of risk acceptance. Finally, using a unique within-subject design, we find that risk acceptance undercuts susceptibility to framing effects across successive framing scenarios. The findings suggest that citizens’ risk orientations are consequential in determining their policy views and their susceptibility to framing effects.
Article
We conduct the first natural field experiment to explore the relationship between the "meaningfulness" of a task and worker effort. We employed about 2,500 workers from Amazon's Mechanical Turk (MTurk), an online labor market, to label medical images. Although given an identical task, we experimentally manipulated how the task was framed. Subjects in the meaningful treatment were told that they were labeling tumor cells in order to assist medical researchers, subjects in the zero-context condition (the control group) were not told the purpose of the task, and, in stark contrast, subjects in the shredded treatment were not given context and were additionally told that their work would be discarded. We found that when a task was framed more meaningfully, workers were more likely to participate. We also found that the meaningful treatment increased the quantity of output (with an insignificant change in quality) while the shredded treatment decreased the quality of output (with no change in quantity). We believe these results will generalize to other short-term labor markets. Our study also discusses MTurk as an exciting platform for running natural field experiments in economics.
Article
Amazon's Mechanical Turk (MTurk) is a relatively new website that contains the major elements required to conduct research: an integrated participant compensation system; a large participant pool; and a streamlined process of study design, participant recruitment, and data collection. In this article, we describe and evaluate the potential contributions of MTurk to psychology and other social sciences. Findings indicate that (a) MTurk participants are slightly more demographically diverse than are standard Internet samples and are significantly more diverse than typical American college samples; (b) participation is affected by compensation rate and task length, but participants can still be recruited rapidly and inexpensively; (c) realistic compensation rates do not affect data quality; and (d) the data obtained are at least as reliable as those obtained via traditional methods. Overall, MTurk can be used to obtain high-quality data inexpensively and rapidly. © The Author(s) 2011.
Article
How do people make sense of politics? Integrating empirical results in communication studies on framing with models of comprehension in cognitive psychology, we argue that people understand complicated event sequences by organizing information in a manner that conforms to the structure of a good story. To test this claim, we carried out a pair of experiments. In each, we presented people with news reports on the 1999 Kosovo crisis that were framed in story form, either to promote or prevent U.S. intervention. Consistent with expectations, we found that framing news about the crisis as a story affected what people remembered, how they structured what they remembered, and the opinions they expressed on the actions government should take.
Article
Online labor markets have great potential as platforms for conducting experiments, as they provide immediate access to a large and diverse subject pool and allow researchers to conduct randomized controlled trials. We argue that online experiments can be just as valid – both internally and externally – as laboratory and field experiments, while requiring far less money and time to design and to conduct. In this paper, we first describe the benefits of conducting experiments in online labor markets; we then use one such market to replicate three classic experiments and confirm their results. We confirm that subjects (1) reverse decisions in response to how a decision-problem is framed, (2) have pro-social preferences (value payoffs to others positively), and (3) respond to priming by altering their choices. We also conduct a labor supply field experiment in which we confirm that workers have upward sloping labor supply curves. In addition to reporting these results, we discuss the unique threats to validity in an online setting and propose methods for coping with these threats. We also discuss the external validity of results from online domains and explain why online results can have external validity equal to or even better than that of traditional methods, depending on the research question. We conclude with our views on the potential role that online experiments can play within the social sciences, and then recommend software development priorities and best practices.
Article
The experimental approach has begun to permeate political science research, increasingly so in the last decade. Laboratory researchers face at least two challenges: determining who to study and how to lure them into the lab. Most experimental studies rely on student samples, yet skeptics often dismiss student samples for lack of external validity. In this article, we propose another convenience sample for laboratory research: campus staff. We report on a randomized experiment to investigate the characteristics of samples drawn from a general local population and from campus staff. We report that campus staff evidence significantly higher response rates, and we find few discernible differences between the two samples. We also investigate the second challenge facing researchers: how to lure subjects into the lab. We use evidence from three focus groups to identify ways of luring this alternative convenience sample into the lab. We analyze the impact of self-interest, social-utility, and neutral appeals on encouraging study participation, and we find that campus staff respond better to a no-nonsense approach compared to a hard-sell that promises potential policy benefits to the community or, and especially, to the self. We conclude that researchers should craft appeals with caution as they capitalize on this heretofore largely untapped reservoir for experimental research: campus employees.
Article
The term framing is used to refer to the various ways decision situations are presented that lead decision makers to construct markedly different representations of such situations. In two experiments using Asian disease-like decision problems, we tested for the persistence of framing effects dependent on the amount and quality of information presented. In standard wordings these problems are not fully described, yet it is hardly ever reported that information is missing. Additionally, we investigated the perceived ambiguity of the values presented in the problem descriptions. Variation of missing items of information produced markedly different framing effects: With fully described problems, no framing effects emerged. With standard wording, the framing effect was most pronounced in the negative framing condition. While risk-aversion with positive framing was not very strong, with some of the problems worded in a novel fashion we found a reversal of the standard framing effect. The problems were interpreted by subjects as ambiguous to a considerable degree, but ambiguity was not related to subjects′ choices. The results are discussed in a framework explicitly distinguishing domain effects from framing effects. Unlike earlier attempts with prospect theory, or, more recently, with fuzzy-trace theory, we propose that probabilistic mental models theory explains framing effects.
Article
This paper examines Tversky and Kahneman's well-known Asian disease framing problem (A. Tversky, D. Kahneman, Science 211 (1981) 453–458). I describe an experiment where respondents received a version of the disease problem using a survival format, a mortality format, or both formats. The results from the survival and mortality formats replicate Tversky and Kahneman's original experiment both in terms of statistical significance and, in contrast to some other studies, in terms of magnitude. I then argue that the “both format” condition constitutes an important and previously unused baseline for evaluating the strength of framing effects. This standard of comparison provides a way to evaluate the impact of a frame on unadulterated preferences – that is, preferences unaffected by a particular frame. The implications for future framing effect experiments are discussed.
Conference Paper
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon's Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
Conference Paper
Amazon Mechanical Turk (MTurk) is a crowdsourcing system in which tasks are distributed to a population of thousands of anonymous workers for completion. This system is increasingly popular with researchers and developers. Here we extend previous studies of the demographics and usage behaviors of MTurk workers. We describe how the worker population has changed over time, shifting from a primarily moderate-income, U.S.-based workforce towards an increasingly international group with a significant population of young, well-educated Indian workers. This change in population points to how workers may treat Turking as a full-time job, which they rely on to make ends meet.
Article
Participants were exposed to the "asian disease" problem (Tversky & Kahneman, 1981). When the problem was subtly framed as a medical decision problem previous findings replicated: Participants avoided the risky option when the problem was framed positively, but preferred the risky option when the problem was framed negatively. This reversal of preferences was eliminated however, when the same problem was subtly introduced as a statistical problem. The results are interpreted as evidence for the impact of context cues on the representation of decision problems.
Article
The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways. Reversals of preference are demonstrated in choices regarding monetary outcomes, both hypothetical and real, and in questions pertaining to the loss of human lives. The effects of frames on preferences are compared to the effects of perspectives on perceptual appearance. The dependence of preferences on the formulation of decision problems is a significant concern for the theory of rational choice.
Article
Here we show that rapid judgments of competence based solely on the facial appearance of candidates predicted the outcomes of gubernatorial elections, the most important elections in the United States next to the presidential elections. In all experiments, participants were presented with the faces of the winner and the runner-up and asked to decide who is more competent. To ensure that competence judgments were based solely on facial appearance and not on prior person knowledge, judgments for races in which the participant recognized any of the faces were excluded from all analyses. Predictions were as accurate after a 100-ms exposure to the faces of the winner and the runner-up as exposure after 250 ms and unlimited time exposure (Experiment 1). Asking participants to deliberate and make a good judgment dramatically increased the response times and reduced the predictive accuracy of judgments relative to both judgments made after 250 ms of exposure to the faces and judgments made within a response deadline of 2 s (Experiment 2). Finally, competence judgments collected before the elections in 2006 predicted 68.6% of the gubernatorial races and 72.4% of the Senate races (Experiment 3). These effects were independent of the incumbency status of the candidates. The findings suggest that rapid, unreflective judgments of competence from faces can affect voting decisions. • face perception • social judgments • voting decisions
Conference Paper
We show how to outsource data annotation to Amazon Mechanical Turk. Doing so has produced annotations in quite large numbers relatively cheaply. The quality is good, and can be checked and controlled. Annotations are produced quickly. We describe results for several different annotation problems. We describe some strategies for determining when the task is well specified and properly priced.
) has collected reports of successful replications of several canonical experiments from a diverse group of researchers, including the Asian Disease Problem discussed in this section
  • Lawson
Simas also employed a within-subjects design to show that high levels on the risk acceptance scale reduce susceptibility to framing effects across successive framing scenarios. We replicated these results as well (see Supplementary data)
  • Kam
conclude that Internet samples tend to be diverse, are not adversely affected by nonserious or habitual responders, and produce findings consistent with traditional methods
  • Gosling