Article

Psychological Science Replicates Just Fine, Thanks

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Meehl argued in 1978 that theories in psychology come and go, with little cumulative progress. We believe that this assessment still holds, as also evidenced by increasingly common claims that psychology is facing a “theory crisis” and that psychologists should invest more in theory building. In this article, we argue that the root cause of the theory crisis is that developing good psychological theories is extremely difficult and that understanding the reasons why it is so difficult is crucial for moving forward in the theory crisis. We discuss three key reasons based on philosophy of science for why developing good psychological theories is so hard: the relative lack of robust phenomena that impose constraints on possible theories, problems of validity of psychological constructs, and obstacles to discovering causal relationships between psychological variables. We conclude with recommendations on how to move past the theory crisis.
Article
Full-text available
For any scientific report, repeating the original analyses upon the original data should yield the original outcomes. We evaluated analytic reproducibility in 25 Psychological Science articles awarded open data badges between 2014 and 2015. Initially, 16 (64%, 95% confidence interval [43,81]) articles contained at least one ‘major numerical discrepancy' (>10% difference) prompting us to request input from original authors. Ultimately, target values were reproducible without author involvement for 9 (36% [20,59]) articles; reproducible with author involvement for 6 (24% [8,47]) articles; not fully reproducible with no substantive author response for 3 (12% [0,35]) articles; and not fully reproducible despite author involvement for 7 (28% [12,51]) articles. Overall, 37 major numerical discrepancies remained out of 789 checked values (5% [3,6]), but original conclusions did not appear affected. Non-reproducibility was primarily caused by unclear reporting of analytic procedures. These results highlight that open data alone is not sufficient to ensure analytic reproducibility.
Article
Full-text available
Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data (‘analytic reproducibility’). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal Cognition. Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.
Article
Full-text available
We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.
Article
Full-text available
Over the last 50 years, we argue that incentives for academic scientists have become increasingly perverse in terms of competition for research funding, development of quantitative metrics to measure performance, and a changing business model for higher education itself. Furthermore, decreased discretionary funding at the federal and state level is creating a hypercompetitive environment between government agencies (e.g., EPA, NIH, CDC), for scientists in these agencies, and for academics seeking funding from all sources—the combination of perverse incentives and decreased funding increases pressures that can lead to unethical behavior. If a critical mass of scientists become untrustworthy, a tipping point is possible in which the scientific enterprise itself becomes inherently corrupt and public trust is lost, risking a new dark age with devastating consequences to humanity. Academia and federal agencies should better support science as a public good, and incentivize altruistic and ethical outcomes, while de-emphasizing output.
Article
Full-text available
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favor them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing---no deliberate cheating nor loafing---by scientists, only that publication is a principle factor for career advancement. Some normative methods of analysis have almost certainly been selected to further publication instead of discovery. In order to improve the culture of science, a shift must be made away from correcting misunderstandings and towards rewarding understanding. We support this argument with empirical evidence and computational modeling. We first present a 60-year meta-analysis of statistical power in the behavioral sciences and show that power has not improved despite repeated demonstrations of the necessity of increasing power. To demonstrate the logical consequences of structural incentives, we then present a dynamic model of scientific communities in which competing laboratories investigate novel or previously published hypotheses using culturally transmitted research methods. As in the real world, successful labs produce more "progeny", such that their methods are more often copied and their students are more likely to start labs of their own. Selection for high output leads to poorer methods and increasingly high false discovery rates. We additionally show that replication slows but does not stop the process of methodological deterioration. Improving the quality of research requires change at the institutional level.
Article
Full-text available
Author Openness is a core value of scientific practice. The sharing of research materials and data facilitates critique, extension, and application within the scientific community, yet current norms provide few incentives for researchers to share evidence underlying scientific claims. In January 2014, the journal Psychological Science adopted such an incentive by offering “badges” to acknowledge and signal open practices in publications. In this study, we evaluated the effect that two types of badges—Open Data badges and Open Materials badges—have had on reported data and material sharing, as well as on the actual availability, correctness, usability, and completeness of those data and materials both in Psychological Science and in four comparison journals. We report an increase in reported data sharing of more than an order of magnitude from baseline in Psychological Science, as well as an increase in reported materials sharing, although to a weaker degree. Moreover, we show that reportedly available data and materials were more accessible, correct, usable, and complete when badges were earned. We demonstrate that badges are effective incentives that improve the openness, accessibility, and persistence of data and materials that underlie scientific research.
Chapter
Full-text available
In this chapter, Binswanger (a critic of the current scientific process) explains how artificially staged competitions affect science and how they result in nonsense. An economist himself, Binswanger provides examples from his field and shows how impact factors and publication pressure reduce the quality of scientific publications. Some might know his work and arguments from his book ‘Sinnlose Wettbewerbe’.
Article
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Article
Full-text available
Academic systems rely on the existence of a supply of “outsiders” ready to forgo wages and employment security in exchange for the prospect of uncertain security, prestige, freedom and reasonably high salaries that tenured positions entail. Drawing on data from the US, Germany and the UK, this paper looks at how the academic job market is structured in many respects like a drug gang, with an expanding mass of outsiders and a shrinking core of insiders.
Article
Full-text available
The veracity of substantive research claims hinges on the way experimental data are collected and analyzed. In this article, we discuss an uncomfortable fact that threatens the core of psychology’s academic enterprise: almost without exception, psychologists do not commit themselves to a method of data analysis before they see the actual data. It then becomes tempting to fine tune the analysis to the data in order to obtain a desired result—a procedure that invalidates the interpretation of the common statistical tests. The extent of the fine tuning varies widely across experiments and experimenters but is almost impossible for reviewers and readers to gauge. To remedy the situation, we propose that researchers preregister their studies and indicate in advance the analyses they intend to conduct. Only these analyses deserve the label “confirmatory,” and only for these analyses are the common statistical tests valid. Other analyses can be carried out but these should be labeled “exploratory.” We illustrate our proposal with a confirmatory replication attempt of a study on extrasensory perception.
Article
Full-text available
Theories in "soft" areas of psychology (e.g., clinical, counseling, social, personality, school, and community) lack the cumulative character of scientific knowledge because they tend neither to be refuted nor corroborated, but instead merely fade away as people lose interest. Even though intrinsic subject matter difficulties (20 are listed) contribute to this, the excessive reliance on significance testing is partly responsible (Ronald A. Fisher). Karl Popper's approach, with modifications, would be prophylactic. Since the null hypothesis is quasi-always false, tables summarizing research in terms of patterns of "significant differences" are little more than complex, causally uninterpretable outcomes of statistical power functions. Multiple paths to estimating numerical point values ("consistency tests") are better, even if approximate with rough tolerances; and lacking this, ranges, orderings, 2nd-order differences, curve peaks and valleys, and function forms should be used. Such methods are usual in developed sciences that seldom report statistical significance. Consistency tests of a conjectural taxometric model yielded 94% success with no false negatives. (3 p ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Examines certain constraints on the character of the knowledge claims made by the psychology of the past century, as well as some "in-principle" constraints. A syndrome of "ameaningful thinking" is seen to underlie much of modern scholarship, especially the inquiring practices of the psychological sciences. Ameaningful thought regards knowledge as an almost automatic result of a self-corrective rule structure, a fail-proof heuristic, a methodology—rather than of discovery. In consequence, much of psychological history can be seen as a form of scientistic role playing which, however sophisticated, entails the trivialization, and even evasion, of significant problems. Against a background of such considerations, the author considers whether, after the century-long march of psychology under the banner of "independent, experimental science," the field actually is (a) independent and (b) a science. (17 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
In this methodological commentary, we use Bem's (2011) recent article reporting experimental evidence for psi as a case study for discussing important deficiencies in modal research practice in empirical psychology. We focus on (a) overemphasis on conceptual rather than close replication, (b) insufficient attention to verifying the soundness of measurement and experimental procedures, and (c) flawed implementation of null hypothesis significance testing. We argue that these deficiencies contribute to weak method-relevant beliefs that, in conjunction with overly strong theory-relevant beliefs, lead to a systemic and pernicious bias in the interpretation of data that favors a researcher's theory. Ultimately, this interpretation bias increases the risk of drawing incorrect conclusions about human psychology. Our analysis points to concrete recommendations for improving research practice in empirical psychology. We recommend (a) a stronger emphasis on close replication, (b) routinely verifying the integrity of measurement instruments and experimental procedures, and (c) using stronger, more diagnostic forms of null hypothesis testing. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
"If psychology is to live up to the purview of its very definition, then it must be that science whose problems lie closest to those of the humanities; indeed it must be that area in which the problems of the sciences, as traditionally conceived, and the humanities intersect… . It is clear that psychology needs many individuals having sensitivities overlapping with those of the humanist." Psychology must take the lead in exploring the relations between science and the humanities. From Psyc Abstracts 36:04:4AK29K. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
An academic scientist's professional success depends on publishing. Publishing norms emphasize novel, positive results. As such, disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results. Prior reports demonstrate how these incentives inflate the rate of false effects in published science. When incentives favor novelty over replication, false results persist in the literature unchallenged, reducing efficiency in knowledge accumulation. Previous suggestions to address this problem are unlikely to be effective. For example, a journal of negative results publishes otherwise unpublishable reports. This enshrines the low status of the journal and its content. The persistence of false findings can be meliorated with strategies that make the fundamental but abstract accuracy motive-getting it right-competitive with the more tangible and concrete incentive-getting it published. This article develops strategies for improving scientific practices and knowledge accumulation that account for ordinary human motivations and biases. © The Author(s) 2012.
Article
Full-text available
The perspective that behavior is often driven by unconscious determinants has become widespread in social psychology. Bargh, Chen, and Burrows' (1996) famous study, in which participants unwittingly exposed to the stereotype of age walked slower when exiting the laboratory, was instrumental in defining this perspective. Here, we present two experiments aimed at replicating the original study. Despite the use of automated timing methods and a larger sample, our first experiment failed to show priming. Our second experiment was aimed at manipulating the beliefs of the experimenters: Half were led to think that participants would walk slower when primed congruently, and the other half was led to expect the opposite. Strikingly, we obtained a walking speed effect, but only when experimenters believed participants would indeed walk slower. This suggests that both priming and experimenters' expectations are instrumental in explaining the walking speed effect. Further, debriefing was suggestive of awareness of the primes. We conclude that unconscious behavioral priming is real, while real, involves mechanisms different from those typically assumed to cause the effect.
Article
Full-text available
In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
Article
Full-text available
The term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms. Two variants of psi are precognition (conscious cognitive awareness) and premonition (affective apprehension) of a future event that could not otherwise be anticipated through any known inferential process. Precognition and premonition are themselves special cases of a more general phenomenon: the anomalous retroactive influence of some future event on an individual's current responses, whether those responses are conscious or nonconscious, cognitive or affective. This article reports 9 experiments, involving more than 1,000 participants, that test for retroactive influence by "time-reversing" well-established psychological effects so that the individual's responses are obtained before the putatively causal stimulus events occur. Data are presented for 4 time-reversed effects: precognitive approach to erotic stimuli and precognitive avoidance of negative stimuli; retroactive priming; retroactive habituation; and retroactive facilitation of recall. The mean effect size (d) in psi performance across all 9 experiments was 0.22, and all but one of the experiments yielded statistically significant results. The individual-difference variable of stimulus seeking, a component of extraversion, was significantly correlated with psi performance in 5 of the experiments, with participants who scored above the midpoint on a scale of stimulus seeking achieving a mean effect size of 0.43. Skepticism about psi, issues of replication, and theories of psi are also discussed.
Article
Full-text available
Previous research has shown that trait concepts and stereotype become active automatically in the presence of relevant behavior or stereotyped-group features. Through the use of the same priming procedures as in previous impression formation research, Experiment 1 showed that participants whose concept of rudeness was printed interrupted the experimenter more quickly and frequently than did participants primed with polite-related stimuli. In Experiment 2, participants for whom an elderly stereotype was primed walked more slowly down the hallway when leaving the experiment than did control participants, consistent with the content of that stereotype. In Experiment 3, participants for whom the African American stereotype was primed subliminally reacted with more hostility to a vexatious request of the experimenter. Implications of this automatic behavior priming effect for self-fulfilling prophecies are discussed, as is whether social behavior is necessarily mediated by conscious choice processes.
Article
Registered reports present a substantial departure from traditional publishing models with the goal of enhancing the transparency and credibility of the scientific literature. We map the evolving universe of registered reports to assess their growth, implementation and shortcomings at journals across scientific disciplines.
Article
The “replication crisis” has been attributed to misguided external incentives gamed by researchers (the strategic-game hypothesis). Here, I want to draw attention to a complementary internal factor, namely, researchers’ widespread faith in a statistical ritual and associated delusions (the statistical-ritual hypothesis). The “null ritual,” unknown in statistics proper, eliminates judgment precisely at points where statistical theories demand it. The crucial delusion is that the p value specifies the probability of a successful replication (i.e., 1 – p), which makes replication studies appear to be superfluous. A review of studies with 839 academic psychologists and 991 students shows that the replication delusion existed among 20% of the faculty teaching statistics in psychology, 39% of the professors and lecturers, and 66% of the students. Two further beliefs, the illusion of certainty (e.g., that statistical significance proves that an effect exists) and Bayesian wishful thinking (e.g., that the probability of the alternative hypothesis being true is 1 – p), also make successful replication appear to be certain or almost certain, respectively. In every study reviewed, the majority of researchers (56%–97%) exhibited one or more of these delusions. Psychology departments need to begin teaching statistical thinking, not rituals, and journal editors should no longer accept manuscripts that report results as “significant” or “not significant.”
Article
The issue raised by the "historical" view of social psychology were discussed by Lewin as early as 1927. Lewin sharply separates historical and systematic analysis of psychological events. His distinction between "historical-geographic" concepts and "conditional-genetic" concepts is examined. Perhaps surprisingly, Lewin supports many of the views of the "historians," while rejecting their conclusion that laws are impossible. Lewin argues that lawful concepts are concrete, real, content-full and re fer to potentialities. They include feedback loops, values, and perceptual-cognitive variables and apply to unique events. Lewin's work is placed in a neo-Kantian tradition which con trasts with our dominant tradition which is based on Hume and Locke. This Lockean-Humean world view may be the source of much of our current frustration.
Article
In this article it is argued that—in spite of contrary semantic and substantive criticisms that have been put forward—the crisis in psychology is a real problem facing the discipline. The crisis is discussed as a nexus of philosophical tensions, which divide individuals, departments, and psychological organizations, and which are therefore primarily responsible for the fragmentation of psychology. Some of the major existing analyses of the crisis are critiqued, and it is subsequently concluded that even the major analyses themselves perpetuate the crisis since they fail to direct unification efforts to the underlying philosophical tensions without at least approaching them with unbracketed a priori theoretical commitments and assumptions. The article concludes with a discussion of the difficulties facing those psychologists who take on the program of research necessary for resolving the crisis in psychology.
Article
This article describes the logical structure of one type of empirical argument commonly used in psychological research. A characteristic flaw in its application is identified and illustrated with an analysis of a number of experiments. Intraindividual as well as social factors that contribute to the flaw's occurrence are discussed. The operation of the social factor is explored with an analysis of citation patterns in the literature. The citation analysis reveals the degree to which the flaw goes unnoticed, in deference to building a consensus of support for broad theoretical claims. The article closes with an outline of the decisions involved in choosing a research strategy and indicates the epistemic consequences of these choices. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
In this introductory article, we provide a historical and philosophical framework for studying crisis discussions in psychology. We first trace the various meanings of crisis talk outside and inside of the sciences. We then turn to Kuhn's concept of crisis, which is mainly an analyst's category referring to severe clashes between theory and data. His view has also dominated many discussions on the status of psychology: Can it be considered a "mature" science, or are we dealing here with a pre- or multi-paradigmatic discipline? Against these Kuhnian perspectives, we point out that especially, but not only in psychology distinctive crisis declarations and debates have taken place since at least the late 19th century. In these, quite different usages of crisis talk have emerged, which can be determined by looking at (a) the content and (b) the dimensions of the declarations, as well as (c) the functions these declarations had for their authors. Thus, in psychology at least, 'crisis' has been a vigorous actor's category, occasionally having actual effects on the future course of research. While such crisis declarations need not be taken at face value, they nevertheless help to break the spell of Kuhnian analyses of psychology's history. They should inform ways in which the history and philosophy of psychology is studied further.
Article
Cases of clear scientific misconduct have received significant media attention recently, but less flagrantly questionable research practices may be more prevalent and, ultimately, more damaging to the academic enterprise. Using an anonymous elicitation format supplemented by incentives for honest reporting, we surveyed over 2,000 psychologists about their involvement in questionable research practices. The impact of truth-telling incentives on self-admissions of questionable research practices was positive, and this impact was greater for practices that respondents judged to be less defensible. Combining three different estimation methods, we found that the percentage of respondents who have engaged in questionable practices was surprisingly high. This finding suggests that some questionable practices may constitute the prevailing research norm.
Priming effects replicate just fine, thanks
  • J A Bargh
In peer-review we (don't) trust: How peer-review's filtering poses a systemic risk to science
  • H Crane
Shall we really do it again? The powerful concept of replication is neglected in the social sciences
  • D L Sayers