Article

A creative destruction approach to replication: Implicit work and sex morality across cultures

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

How can we maximize what is learned from a replication study? In the creative destruction approach to replication, the original hypothesis is compared not only to the null hypothesis, but also to predictions derived from multiple alternative theoretical accounts of the phenomenon. To this end, new populations and measures are included in the design in addition to the original ones, to help determine which theory best accounts for the results across multiple key outcomes and contexts. The present pre-registered empirical project compared the Implicit Puritanism account of intuitive work and sex morality to theories positing regional, religious, and social class differences; explicit rather than implicit cultural differences in values; self-expression vs. survival values as a key cultural fault line; the general moralization of work; and false positive effects. Contradicting Implicit Puritanism's core theoretical claim of a distinct American work morality, a number of targeted findings replicated across multiple comparison cultures, whereas several failed to replicate in all samples and were identified as likely false positives. No support emerged for theories predicting regional variability and specific individual-differences moderators (religious affiliation, religiosity, and education level). Overall, the results provide evidence that work is intuitively moralized across cultures.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Research
Full-text available
MY updated CV.
Article
Full-text available
Cryptocurrencies have ballooned into a billion-dollar business. To inform regulations aimed at protecting consumers vulnerable to suboptimal financial decisions, we investigate crypto investment intentions as a function of consumer gender, financial overconfidence (greater subjective versus objective financial knowledge), and the Big Five personality traits. Study 1 (N = 126) found that people believe each Big Five personality trait as well as consumer gender and financial overconfidence to predict consumers’ crypto investment intentions. Study 2 (N = 1,741) revealed that less than 1 in 10 consumers from a nationally representative sample (Norway) are willing to invest in crypto. However, the proportion of male (vs. female) consumers considering such investments is more than twice as large, with less (vs. more) agreeable, less (vs. more) conscientious, and more (vs. less) open consumers also being increasingly inclined to consider crypto investments. Financial overconfidence, agreeableness, and conscientiousness mediate the link between consumer gender and crypto investment intentions. These results hold after accounting for a theoretically relevant confounding factor (financial self-efficacy). Together, this research offers novel implications for marketing theory and practice that help understand the observed gender differences in consumers’ crypto investments.
Preprint
Full-text available
Age of acquisition (AoA) refers to the age at which people learn a particular item and the AoA effect refers to the phenomenon that early-acquired items are processed more quickly and accurately than those acquired later. Over several decades, the AoA effect has been investigated using neuroscientific, behavioural, corpus and computational techniques. We review the current evidence for the AoA effect stemming from a range of methodologies and paradigms, and apply these findings to current explanations of how and where the AoA effect occurs. We conclude that the AoA effect can be found both in the connections between levels of representations and within these representations themselves, and that the effect itself occurs through the process of the distinct coding of early and late items, together with the nature of the connections between levels of representation. This approach strongly suggests that the AoA effect results from the construction of perceptual-semantic representations and the mappings between representations.
Article
This study assessed the prevalence of childhood stuttering in adults with dyslexia (AWD) and the prevalence of dyslexia in adults who stutter (AWS). In addition, the linguistic profiles of 50 AWD, 30 AWS and 84 neurotypical adults were measured. We found that 17 out of 50 AWD (34%) reported stuttering during childhood compared to 1% of the neurotypical population. This was moderated by the severity of dyslexia: People with mild dyslexia showed a lower prevalence rate (15%) of childhood stuttering than those with severe dyslexia (47%). In addition, we observed that 50% of the AWS (n = 30) fulfilled the diagnostic criteria of dyslexia, even though they had never been diagnosed as dyslexic. Compared to neurotypical adults, phonological working memory, awareness, and retrieval were similarly reduced in AWS and AWD. The findings supports the view that stuttering and dyslexia may share a phonological deficit.
Article
Full-text available
We examined the evidence for heterogeneity (of effect sizes) when only minor changes to sample population and settings were made between studies and explored the association between heterogeneity and average effect size in a sample of 68 meta-analyses from 13 preregistered multilab direct replication projects in social and cognitive psychology. Among the many examined effects, examples include the Stroop effect, the "verbal overshadowing" effect, and various priming effects such as "anchoring" effects. We found limited heterogeneity; 48/68 (71%) meta-analyses had nonsignificant heterogeneity, and most (49/68; 72%) were most likely to have zero to small heterogeneity. Power to detect small heterogeneity (as defined by Higgins, Thompson, Deeks, & Altman, 2003) was low for all projects (mean 43%), but good to excellent for medium and large heterogeneity. Our findings thus show little evidence of widespread heterogeneity in direct replication studies in social and cognitive psychology, suggesting that minor changes in sample population and settings are unlikely to affect research outcomes in these fields of psychology. We also found strong correlations between observed average effect sizes (standardized mean differences and log odds ratios) and heterogeneity in our sample. Our results suggest that heterogeneity and moderation of effects is unlikely for a 0 average true effect size, but increasingly likely for larger average true effect size. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Article
Full-text available
To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.
Article
Full-text available
Most meta-analyses focus on the behavior of meta-analytic means. In many cases, however, this mean is difficult to defend as a construct because the underlying distribution of studies reflects many factors, including how we as researchers choose to design studies. We present an alternative goal for meta-analysis. The analyst may ask about relations that are stable across all the studies. In a typical meta-analysis, there is a hypothesized direction (e.g., that violent video games increase, rather than decrease, aggressive behavior). We ask whether all studies in a meta-analysis have true effects in the hypothesized direction. If so, this is an example of a stable relation across all the studies. We propose 4 models: (a) all studies are truly null; (b) all studies share a single true nonzero effect; (c) studies differ, but all true effects are in the same direction; and (d) some study effects are truly positive, whereas others are truly negative. We develop Bayes factor model comparison for these models and apply them to 4 extant meta-analyses to show their usefulness. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Article
Full-text available
We conducted preregistered replications of 28 classic and contemporary published findings with protocols that were peer reviewed in advance to examine variation in effect magnitudes across sample and setting. Each protocol was administered to approximately half of 125 samples and 15,305 total participants from 36 countries and territories. Using conventional statistical significance (p < .05), fifteen (54%) of the replications provided evidence in the same direction and statistically significant as the original finding. With a strict significance criterion (p < .0001), fourteen (50%) provide such evidence reflecting the extremely high powered design. Seven (25%) of the replications had effect sizes larger than the original finding and 21 (75%) had effect sizes smaller than the original finding. The median comparable Cohen’s d effect sizes for original findings was 0.60 and for replications was 0.15. Sixteen replications (57%) had small effect sizes (< .20) and 9 (32%) were in the opposite direction from the original finding. Across settings, 11 (39%) showed significant heterogeneity using the Q statistic and most of those were among the findings eliciting the largest overall effect sizes; only one effect that was near zero in the aggregate showed significant heterogeneity. Only one effect showed a Tau > 0.20 indicating moderate heterogeneity. Nine others had a Tau near or slightly above 0.10 indicating slight heterogeneity. In moderation tests, very little heterogeneity was attributable to task order, administration in lab versus online, and exploratory WEIRD versus less WEIRD culture comparisons. Cumulatively, variability in observed effect sizes was more attributable to the effect being studied than the sample or setting in which it was studied.
Article
Full-text available
Understanding and improving reproducibility is crucial for scientific progress. Prediction markets and related methods of eliciting peer beliefs are promising tools to predict replication outcomes. We invited researchers in the field of psychology to judge the replicability of 24 studies replicated in the large scale Many Labs 2 project. We elicited peer beliefs in prediction markets and surveys about two replication success metrics: the probability that the replication yields a statistically significant effect in the original direction (p < 0.001), and the relative effect size of the replication. The prediction markets correctly predicted 75% of the replication outcomes, and were highly correlated with the replication outcomes. Survey beliefs were also significantly correlated with replication outcomes, but had larger prediction errors. The prediction markets for relative effect sizes attracted little trading and thus did not work well. The survey beliefs about relative effect sizes performed better and were significantly correlated with observed relative effect sizes. The results suggest that replication outcomes can be predicted and that the elicitation of peer beliefs can increase our knowledge about scientific reproducibility and the dynamics of hypothesis testing.
Article
Full-text available
Can recent failures to replicate psychological research be explained by typical magnitudes of statistical power, bias or heterogeneity? A large survey of 12,065 estimated effect sizes from 200 meta-analyses and nearly 8,000 papers is used to assess these key dimensions of replicability. First, our survey finds that psychological research is, on average, afflicted with low statistical power. The median of median power across these 200 areas of research is about 36%, and only about 8% of studies have adequate power (using Cohen’s 80% convention). Second, the median proportion of the observed variation among reported effect sizes attributed to heterogeneity is 74% (I2). Heterogeneity of this magnitude makes it unlikely that the typical psychological study can be closely replicated when replication is defined as study-level null hypothesis significance testing. Third, the good news is that we find only a small amount of average residual reporting bias, allaying some of the often-expressed concerns about the reach of publication bias and questionable research practices. Nonetheless, the low power and high heterogeneity that our survey finds fully explain recent difficulties to replicate highly regarded psychological studies and reveal challenges for scientific progress in psychology.
Article
Full-text available
Concerns about the veracity of psychological research have been growing. Many findings in psychological science are based on studies with insufficient statistical power and nonrepresentative samples, or may otherwise be limited to specific, ungeneralizable settings or populations. Crowdsourced research, a type of large-scale collaboration in which one or more research projects are conducted across multiple lab sites, offers a pragmatic solution to these and other current methodological challenges. The Psychological Science Accelerator (PSA) is a distributed network of laboratories designed to enable and support crowdsourced research projects. These projects can focus on novel research questions or replicate prior research in large, diverse samples. The PSA’s mission is to accelerate the accumulation of reliable and generalizable evidence in psychological science. Here, we describe the background, structure, principles, procedures, benefits, and challenges of the PSA. In contrast to other crowdsourced research networks, the PSA is ongoing (as opposed to time limited), efficient (in that structures and principles are reused for different projects), decentralized, diverse (in both subjects and researchers), and inclusive (of proposals, contributions, and other relevant input from anyone inside or outside the network). The PSA and other approaches to crowdsourced psychological science will advance understanding of mental processes and behaviors by enabling rigorous research and systematic examination of its generalizability.
Article
Full-text available
Being able to replicate scientific findings is crucial for scientific progress. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.
Article
Full-text available
Does religion promote prosocial behaviour? Despite numerous publications that seem to answer this question affirmatively, divergent results from recent meta-analyses and pre-registered replication efforts suggest that the issue is not yet settled. Uncertainty lingers around (i) whether the effects of religious cognition on prosocial behaviour were obtained through implicit cognitive processes, explicit cognitive processes or both and (ii) whether religious cognition increases generosity only among people disinclined to share with anonymous strangers. Here, we report two experiments designed to address these concerns. In Experiment 1, we sought to replicate Shariff and Norenzayan's demonstration of the effects of implicit religious priming on Dictator Game transfers to anonymous strangers; unlike Shariff and Norenzayan, however, we used an online environment where anonymity was virtually assured. In Experiment 2, we introduced a ‘taking’ option to allow greater expression of baseline selfishness. In both experiments, we sought to activate religious cognition implicitly and explicitly, and we investigated the possibility that religious priming depends on the extent to which subjects view God as a punishing, authoritarian figure. Results indicated that in both experiments, religious subjects transferred more money on average than did non-religious subjects. Bayesian analyses supported the null hypothesis that implicit religious priming did not increase Dictator Game transfers in either experiment, even among religious subjects. Collectively, the two experiments furnished support for a small but reliable effect of explicit priming, though among religious subjects only. Neither experiment supported the hypothesis that the effect of religious priming depends on viewing God as a punishing figure. Finally, in a meta-analysis of relevant studies, we found that the overall effect of implicit religious priming on Dictator Game transfers was small and did not statistically differ from zero.
Article
Full-text available
Research on moral judgment has been dominated by rationalist models, in which moral judgment is thought to be caused by moral reasoning. The author gives 4 reasons for considering the hypothesis that moral reasoning does not cause moral judgment; rather, moral reasoning is usually a post hoc construction, generated after a judgment has been reached. The social intuitionist model is presented as an alternative to rationalist models. The model is a social model in that it deemphasizes the private reasoning done by individuals and emphasizes instead the importance of social and cultural influences. The model is an intuitionist model in that it states that moral judgment is generally the result of quick, automatic evaluations (intuitions). The model is more consistent than rationalist models with recent findings in social, cultural, evolutionary, and biological psychology, as well as in anthropology and primatology.
Article
Full-text available
When psychologists test a commonsense (CS) hypothesis and obtain no support, they tend to erroneously conclude that the CS belief is wrong. In many such cases it appears, after many years, that the CS hypothesis was valid after all. It is argued that this error of accepting the "theoretical" null hypothesis reflects confusion between the operationalized hypothesis and the theory or generalization that it is designed to test. That is, on the basis of reliable null data one can accept the operationalized null hypothesis (e.g., "A measure of attitude x is not correlated with a measure of behavior y"). In contrast, one cannot generalize from the findings and accept the abstract or theoretical null (e.g., "We know that attitudes do not predict behavior"). The practice of accepting the theoretical null hypothesis hampers research and reduces the trust of the public in psychological research.
Article
Full-text available
Responding to recent concerns about the reliability of the published literature in psychology and other disciplines, we formed the X-Phi Replicability Project (XRP) to estimate the reproducibility of experimental philosophy (osf.io/dvkpr). Drawing on a representative sample of 40 x-phi studies published between 2003 and 2015, we enlisted 20 research teams across 8 countries to conduct a high-quality replication of each study in order to compare the results to the original published findings. We found that x-phi studies – as represented in our sample – successfully replicated about 70% of the time. We discuss possible reasons for this relatively high replication rate in the field of experimental philosophy and offer suggestions for best research practices going forward.
Article
Full-text available
Religious people are more trusted than nonreligious people. Although most theorists attribute these perceptions to the beliefs of religious targets, religious individuals also differ in behavioral ways that might cue trust. We examined whether perceivers might trust religious targets more because they heuristically associate religion with slow life-history strategies. In three experiments, we found that religious targets are viewed as slow life-history strategists and that these findings are not the result of a universally positive halo effect; that the effect of target religion on trust is significantly mediated by the target’s life-history traits (i.e., perceived reproductive strategy); and that when perceivers have direct information about a target’s reproductive strategy, their ratings of trust are driven primarily by his or her reproductive strategy, rather than religion. These effects operate over and above targets’ belief in moralizing gods and offer a novel theoretical perspective on religion and trust.
Article
Full-text available
Model comparison in Bayesian mixed models is becoming popular in psychological science. Here we develop a set of nested models that account for order restrictions across individuals in psychological tasks. An order-restricted model addresses the question “Does everybody,” as in “Does everybody show the usual Stroop effect,” or “Does everybody respond more quickly to intense noises than subtle ones?” The crux of the modeling is the instantiation of 10s or 100s of order restrictions simultaneously, one for each participant. To our knowledge, the problem is intractable in frequentist contexts but relatively straightforward in Bayesian ones. We develop a Bayes factor model-comparison strategy using Zellner and Siow’s default g-priors appropriate for assessing whether effects obey equality and order restrictions. We apply the methodology to seven data sets from Stroop, Simon, and Eriksen interference tasks. Not too surprisingly, we find that everybody Stroops—that is, for all people congruent colors are truly named more quickly than incongruent ones. But, perhaps surprisingly, we find these order constraints are violated for some people in the Simon task, that is, for these people spatially incongruent responses occur truly more quickly than congruent ones! Implications of the modeling and conjectures about the task-related differences are discussed.
Article
Full-text available
We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.
Article
Full-text available
Although the individualism–collectivism dimension is usually examined in a U.S. versus Asian context, there is variation within the United States. The authors created an eight-item index ranking states in terms of collectivist versus individualist tendencies. As predicted, collectivist tendencies were strongest in the Deep South, and individualist tendencies were strongest in the Mountain West and Great Plains. In Part 2, convergent validity for the index was obtained by showing that state collectivism scores predicted variation in individual attitudes, as measured by a national survey. In Part 3, the index was used to explore the relationship between individualism–collectivism and a variety of demographic, economic, cultural, and health-related variables. The index may be used to complement traditional measures of collectivism and individualism and may be of use to scholars seeking a construct to account for unique U.S. regional variation. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Article
Full-text available
According to a recent meta-analysis, religious priming has a positive effect on prosocial behavior (Shariff et al., 2015). We first argue that this meta-analysis suffers from a number of methodological shortcomings that limit the conclusions that can be drawn about the potential benefits of religious priming. Next we present a re-analysis of the religious priming data using two different meta-analytic techniques. A Precision-Effect Testing–Precision-Effect-Estimate with Standard Error (PET-PEESE) meta-analysis suggests that the effect of religious priming is driven solely by publication bias. In contrast, an analysis using Bayesian bias correction suggests the presence of a religious priming effect, even after controlling for publication bias. These contradictory statistical results demonstrate that meta-analytic techniques alone may not be sufficiently robust to firmly establish the presence or absence of an effect. We argue that a conclusive resolution of the debate about the effect of religious priming on prosocial behavior – and about theoretically disputed effects more generally – requires a large-scale, preregistered replication project, which we consider to be the sole remedy for the adverse effects of experimenter bias and publication bias.
Article
Full-text available
Empirically analyzing empirical evidence One of the central goals in any scientific endeavor is to understand causality. Experiments that seek to demonstrate a cause/effect relation most often manipulate the postulated causal factor. Aarts et al. describe the replication of 100 experiments reported in papers published in 2008 in three high-ranking psychology journals. Assessing whether the replication and the original experiment yielded the same result according to several criteria, they find that about one-third to one-half of the original findings were also observed in the replication study. Science , this issue 10.1126/science.aac4716
Book
Described by the philosopher A.J. Ayer as a work of ‘great originality and power’, this book revolutionized contemporary thinking on science and knowledge. Ideas such as the now legendary doctrine of ‘falsificationism’ electrified the scientific community, influencing even working scientists, as well as post-war philosophy. This astonishing work ranks alongside The Open Society and Its Enemies as one of Popper’s most enduring books and contains insights and arguments that demand to be read to this day. © 1959, 1968, 1972, 1980 Karl Popper and 1999, 2002 The Estate of Karl Popper. All rights reserved.
Chapter
Introduction When on board H.M.S. ‘Beagle,’ as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me...
Chapter
Two books have been particularly influential in contemporary philosophy of science: Karl R. Popper's Logic of Scientific Discovery, and Thomas S. Kuhn's Structure of Scientific Revolutions. Both agree upon the importance of revolutions in science, but differ about the role of criticism in science's revolutionary growth. This volume arose out of a symposium on Kuhn's work, with Popper in the chair, at an international colloquium held in London in 1965. The book begins with Kuhn's statement of his position followed by seven essays offering criticism and analysis, and finally by Kuhn's reply. The book will interest senior undergraduates and graduate students of the philosophy and history of science, as well as professional philosophers, philosophically inclined scientists, and some psychologists and sociologists.
Article
Although the benefits of crowdsourcing research models have been outlined elsewhere, very little attention has been paid to the application of these models to cross-cultural behavioral research. In this manuscript, we delineate two types of crowdsourcing initiatives—researcher crowdsourced and participant crowdsourced. Researcher crowdsourced refers to initiatives where researchers are gathered to work toward a shared goal. Participant crowdsourced refers to those which allow a researcher to gather a large number of participants within a short time frame. We explore the utility of each type of initiative while providing readers with a framework that can be used when deciding whether researcher or participant crowdsourcing initiatives would be most fruitful for their work. Perceived strengths of a researcher crowdsourced initiative with a cross-cultural focus is based on contributor data from Psi Chi’s Network for International Collaborative Exchange (NICE) and is integrated into this framework. Claims are made for the utility of both researcher and participant crowdsourcing as a way to increase generalizability and reliability, decrease time burdens, democratize research, educate individuals on open science, and provide mentorship. These claims are supported with data from NICE contributors.
Article
In recent years, psychology has wrestled with the broader implications of disappointing rates of replication of previously demonstrated effects. This article proposes that many aspects of this pattern of results can be understood within the classic framework of four proposed forms of validity: statistical conclusion validity, internal validity, construct validity, and external validity. The article explains the conceptual logic for how differences in each type of validity across an original study and a subsequent replication attempt can lead to replication “failure.” Existing themes in the replication literature related to each type of validity are also highlighted. Furthermore, empirical evidence is considered for the role of each type of validity in non-replication. The article concludes with a discussion of broader implications of this classic validity framework for improving replication rates in psychological research.
Book
Cambridge Core - Statistical Theory and Methods - Statistical Inference as Severe Testing - by Deborah G. Mayo
Article
We analyze how academic experts and nonexperts forecast the results of 15 piece-rate and behavioral treatments in a real-effort task. The average forecast of experts closely predicts the experimental results, with a strong wisdom-of-crowds effect: the average forecast outperforms 96 percent of individual forecasts. Citations, academic rank, field, and contextual experience do not correlate with accuracy. Experts as a group do better than nonexperts, but not if accuracy is defined as rank-ordering treatments. Measures of effort, confidence, and revealed ability are predictive of forecast accuracy to some extent and allow us to identify “superforecasters” among the nonexperts.
Article
The bureaucratization of psychological science exacts intellectual costs that go beyond the sheer amount of time that is drained away from creative scientific activity. Additional administrative hurdles are now being generated in an attempt to ensure the replicability of psychological effects. A cognitive analysis of those hurdles shows that impairment of scientific creativity is a foreseeable consequence, owing to their frequent verbatim-processing focus and the negative emotional context in which they are embedded. We consider whether it is possible to enhance replicability without increasing bureaucratic obstacles and to enhance scientific creativity in the presence of such obstacles.
Article
Replication is the scientific gold standard that enables the confirmation of research findings. Concerns related to publication bias, flexibility in data analysis, and high-profile cases of academic misconduct have led to recent calls for more replication and systematic accumulation of scientific knowledge in psychological science. This renewed emphasis on replication may pose specific challenges to cross-cultural research due to inherent practical difficulties in emulating an original study in other cultural groups. The purpose of the present article is to discuss how the core concepts of this replication debate apply to cross-cultural psychology. Distinct to replications in cross-cultural research are examinations of bias and equivalence in manipulations and procedures, and that targeted research populations may differ in meaningful ways. We identify issues in current psychological research (analytic flexibility, low power) and possible solutions (preregistration, power analysis), and discuss ways to implement best practices in cross-cultural replication attempts.
Article
In 2010–2012, a few largely coincidental events led experimental psychologists to realize that their approach to collecting, analyzing, and reporting data made it too easy to publish false-positive findings. This sparked a period of methodological reflection that we review here and call Psychology’s Renaissance. We begin by describing how psychologists’ concerns with publication bias shifted from worrying about file-drawered studies to worrying about p-hacked analyses. We then review the methodological changes that psychologists have proposed and, in some cases, embraced. In describing how the renaissance has unfolded, we attempt to describe different points of view fairly but not neutrally, so as to identify the most promising paths forward. In so doing, we champion disclosure and preregistration, express skepticism about most statistical solutions to publication bias, take positions on the analysis and interpretation of replication failures, and contend that meta-analytical thinking increases the prevalence of false positives. Our general thesis is that the scientific practices of experimental psychologists have improved dramatically. Expected final online publication date for the Annual Review of Psychology Volume 69 is January 4, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
Psychological scientists draw inferences about populations based on samples—of people, situations, and stimuli—from those populations. Yet, few papers identify their target populations, and even fewer justify how or why the tested samples are representative of broader populations. A cumulative science depends on accurately characterizing the generality of findings, but current publishing standards do not require authors to constrain their inferences, leaving readers to assume the broadest possible generalizations. We propose that the discussion section of all primary research articles specify Constraints on Generality (i.e., a “COG” statement) that identify and justify target populations for the reported findings. Explicitly defining the target populations will help other researchers to sample from the same populations when conducting a direct replication, and it could encourage follow-up studies that test the boundary conditions of the original finding. Universal adoption of COG statements would change publishing incentives to favor a more cumulative science.
Article
A major challenge for accumulating knowledge in psychology is the variation in methods and participant populations across studies in a single domain. We offer a systematic approach to addressing this challenge and implement it in the domain of money priming. In three preregistered experiments ( N = 4,649), participants were exposed to one of a number of money manipulations before completing self-report measures of money activation (Study 1); engaging in a behavioral-persistence task (Study 3); completing self-report measures of subjective wealth, self-sufficiency, and communion-agency (Studies 1-3); and completing demographic questions (Studies 1-3). Four of the five manipulations we tested activated the concept of money, but, contrary to what we expected based on the preponderance of the published literature, no manipulation consistently affected any dependent measure. Moderation by sociodemographic characteristics was sparse and inconsistent across studies. We discuss implications for theories of money priming and explain how our approach can complement recent efforts to build a reproducible, cumulative psychological science.
Article
Pre-registration of studies before they are conducted has recently become more feasible for researchers, and is encouraged by an increasing number of journals. However, because the practice of pre-registration is relatively new to psychological science, specific guidelines for the content of registrations are still in a formative stage. After giving a brief history of pre-registration in medical and psychological research, we outline two different models that can be applied—reviewed and unreviewed pre-registration—and discuss the advantages of each model to science as a whole and to the individual scientist, as well as some of their drawbacks and limitations. Finally, we present and justify a proposed standard template that can facilitate pre-registration. Researchers can use the template before and during the editorial process to meet article requirements and enhance the robustness of their scholarly efforts.
Article
Evidence is reviewed which suggests that there may be little or no direct introspective access to higher order cognitive processes. Subjects are sometimes (a) unaware of the existence of a stimulus that importantly influenced a response, (b) unaware of the existence of the response, and (c) unaware that the stimulus has affected the response. It is proposed that when people attempt to report on their cognitive processes, that is, on the processes mediating the effects of a stimulus on a response, they do not do so on the basis of any true introspection. Instead, their reports are based on a priori, implicit causal theories, or judgments about the extent to which a particular stimulus is a plausible cause of a given response. This suggests that though people may not be able to observe directly their cognitive processes, they will sometimes be able to report accurately about them. Accurate reports will occur when influential stimuli are salient and are plausible causes of the responses they produce, and will not occur when stimuli are not salient or are not plausible causes.
Article
Another social science looks at itself Experimental economists have joined the reproducibility discussion by replicating selected published experiments from two top-tier journals in economics. Camerer et al. found that two-thirds of the 18 studies examined yielded replicable estimates of effect size and direction. This proportion is somewhat lower than unaffiliated experts were willing to bet in an associated prediction market, but roughly in line with expectations from sample sizes and P values. Science , this issue p. 1433
Article
Third-party punishment (TPP), in which unaffected observers punish selfishness, promotes cooperation by deterring defection. But why should individuals choose to bear the costs of punishing? We present a game theoretic model of TPP as a costly signal of trustworthiness. Our model is based on individual differences in the costs and/or benefits of being trustworthy. We argue that individuals for whom trustworthiness is payoff-maximizing will find TPP to be less net costly (for example, because mechanisms that incentivize some individuals to be trustworthy also create benefits for deterring selfishness via TPP). We show that because of this relationship, it can be advantageous for individuals to punish selfishness in order to signal that they are not selfish themselves. We then empirically validate our model using economic game experiments. We show that TPP is indeed a signal of trustworthiness: third-party punishers are trusted more, and actually behave in a more trustworthy way, than non-punishers. Furthermore, as predicted by our model, introducing a more informative signal - the opportunity to help directly - attenuates these signalling effects. When potential punishers have the chance to help, they are less likely to punish, and punishment is perceived as, and actually is, a weaker signal of trustworthiness. Costly helping, in contrast, is a strong and highly used signal even when TPP is also possible. Together, our model and experiments provide a formal reputational account of TPP, and demonstrate how the costs of punishing may be recouped by the long-run benefits of signalling one's trustworthiness.
Article
Significance There is increasing concern about the reproducibility of scientific research. For example, the costs associated with irreproducible preclinical research alone have recently been estimated at US$28 billion a year in the United States. However, there are currently no mechanisms in place to quickly identify findings that are unlikely to replicate. We show that prediction markets are well suited to bridge this gap. Prediction markets set up to estimate the reproducibility of 44 studies published in prominent psychology journals and replicated in The Reproducibility Project: Psychology predict the outcomes of the replications well and outperform a survey of individual forecasts.
Article
Trying to remember something now typically improves your ability to remember it later. However, after watching a video of a simulated bank robbery, participants who verbally described the robber were 25% worse at identifying the robber in a lineup than were participants who instead listed U.S. states and capitals—this has been termed the " verbal overshadowing " effect (Schooler & Engstler-Schooler, 1990). More recent studies suggested that this effect might be substantially smaller than first reported. Given uncertainty about the effect size, the influence of this finding in the memory literature, and its practical importance for police procedures, we conducted two collections of preregistered direct replications (RRR1 and RRR2) that differed only in the order of the description task and a filler task. In RRR1, when the description task immediately followed the robbery, participants who provided a description were 4% less likely to select the robber than were those in the control condition. In RRR2, when the description was delayed by 20 min, they were 16% less likely to select the robber. These findings reveal a robust verbal overshadowing effect that is strongly influenced by the relative timing of the tasks. The discussion considers further implications of these replications for our understanding of verbal overshadowing. Multilab direct replication of: Study 4 (modified) and Study 1 from Schooler, J. W., & Engstler-Schooler, T. Y. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive Psychology, 22, 36–71.
Book
Part I. From There to Here - Theoretical Background: 1. From visiousness to viciousness: theories of intergroup relations 2. Social dominance theory as a new synthesis Part II. Oppression and its Psycho-Ideological Elements: 3. The psychology of group dominance: social dominance orientation 4. Let's both agree that you're really stupid: the power of consensual ideology Part III. The Circle of Oppression - The Myriad Expressions of Institutional Discrimination: 5. You stay in your part of town and I'll stay in mine: discrimination in the housing and retail markets 6. They're just too lazy to work: discrimination in the labor market 7. They're just mentally and physically unfit: discrimination in education and health care 8. The more of 'them' in prison, the better: institutional terror, social control and the dynamics of the criminal justice system Part IV. Oppression as a Cooperative Game: 9. Social hierarchy and asymmetrical group behavior: social hierarchy and group difference in behavior 10. Sex and power: the intersecting political psychologies of patriarchy and empty-set hierarchy 11. Epilogue.
Article
The development course of implicit and explicit gender attitudes between the ages of 5 and adulthood is investigated. Findings demonstrate that implicit and explicit own-gender preferences emerge early in both boys and girls, but implicit own-gender preferences are stronger in young girls than boys. In addition, female participants' attitudes remain largely stable over development, whereas male participants' implicit and explicit attitudes show an age-related shift towards increasing female positivity. Gender attitudes are an anomaly in that social evaluations dissociate from social status, with both male and female participants tending to evaluate female more positively than male. © 2015 John Wiley & Sons Ltd.
Article
I suspect that many members of our field, including those in leadership positions, believe that our hypercommitment to theory - and particularly the requirement that every article must contribute to theory - is somehow on the side of the angels. They may believe that this is a hallmark of a serious field. They may believe that theory is good and that the "mere" description of phenomena and generation of facts are bad. Worse yet, they may have given no thought to these matters, accepting our field's zeal about theory as simply part of the cosmos. My aim has been to promote a rethinking of these positions. Theory is critically important for our field, and we should remain committed to it. And, for sure, the greatest acclaim will always go to those who develop breakthrough theories. So there is plenty of incentive to keep working on theory. But it takes much more than theory for an academic field to advance. Indeed, various types of atheoretical or pretheoretical work can be instrumental in allowing theory to emerge or develop. Thus, our insistence in the field of management that all papers contribute to theory may actually have the unintended perverse effect of stymying the discovery of important theories. More broadly, this norm - or policy, really - is holding back our field.