ArticlePDF Available

False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant

Authors:

Abstract

The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
(1) Overview
Collection date(s)
2010
Background
We wanted an experiment arriving at a necessarily false
finding. We settled for age based on self-reported birth-
day as that would seem impossible to move around even
through measurement error.
(2) Methods
Sample
The Wharton school has a behavioral lab where people are
paid for participating. They usually complete several stud-
ies in a single session and get paid a flat fee plus additional
revenue some experiments within the session may include.
More specific demographics are included in the data
themselves.
Given the light nature of the study we did not monitor
incomplete submissions, so do not know if people started
and did not complete, but this seldom if ever happens in
this lab.
Materials
In both experiments people listened to one of three music
files. The song “Kalimba” by Mr. Scrub which comes free
with the Windows 7 operating system, the song “Hot
Potato” by the Australian band The Wiggles, and “When I
am 64” by the Beatles. Copyright restrictions do not make
it possible to post those songs here.
The questions were posted on Qualtrics (an online sur-
vey provider), after participants listened to the song with
headphones they proceeded to answer all questions.
Procedures
See above.
Quality control
None, given the setting.
Ethical issues
The study followed the ethical standards by the American
Psychological Association. The study was approved by the
Institutional Review Board of the Wharton School. There
are no personal identifiers in the data beyond age and par-
ents’ age, insufficient to identify people.
(3) Dataset description
Object name
Text-iles
•Study 1.txt
•Study 2.txt
•Codebook.txt
Excel
• Post Data - False Positive Psychology.xlsx
Data type
Raw data file
DATA PAPER
Data from Paper “False-Positive Psychology:
Undisclosed Flexibility in Data Collection and Analysis

Joseph P. Simmons,1 Leif D. Nelson,2 Uri Simonsohn1
1 The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
2 Haas Marketing Group, University of California, Berkeley, California, United States of America
Simmons, J P, Nelson, L D and Simonsohn, U 2014 Data from Paper “False-Positive Psychology:

Journal of Open Psychology Data,
2(1): e1, DOI: 
Keywords: False-Positive psychology; methodology; motivated reasoning; publication bias; disclosure; p-hacking
Funding Statement: 



in data collection, analysis, and reporting of results. Data are useful for educational purposes.
RSHQSV\FKRORJ\GDWD
-RXUQDORI
Simmons et al
Format names and versions
Both in .txt with a .txt codebook, and a self-contained
Excel Workbook file (xlsx).
Data collectors
Paid staff at the lab.
Language
English
License
CC0
Repository location
http://doi.org/10.5281/zenodo.7664
Publication date
13 January 2014
(4) Reuse potential
Data from this highly-cited paper are especially useful
for educational purposes (teaching of statistics) as well
as for future research concerned with various statistical
approaches.
References
1. Simmons, J. P., Nelson, L. D., & Simonsohn, U.
(2011). False-Positive Psychology: Undisclosed
Flexibility in Data Collection and Analysis Allows
Presenting Anything as Significant. Psychologi-
cal Science, 22(11), 1359-1366. DOI: http://dx.doi.
org/10.1177/0956797611417632
How to cite this article: Simmons, J P, Nelson, L D and Simonsohn, U 2014 Data from Paper “False-Positive Psychology:

Journal of Open Psychology
Data,
2(1): e1, DOI: 
Published: 21 February 2014
Copyright


The
Journal of Open Psychology Data

by Ubiquity Press OPEN ACCESS
Peer review comments: 
... Researchers have called for more replications and open science practices (OSPs) to address concerns about the reliability of social science findings (Pridemore et al., 2018;Lösel, 2018;Simmons et al., 2011). Replication is key to discerning the truth and serves as a foundation for establishing scientific knowledge (Pridemore et al., 2018;Open Science Collaboration, 2012. ...
... Replication is key to discerning the truth and serves as a foundation for establishing scientific knowledge (Pridemore et al., 2018;Open Science Collaboration, 2012. Replication also provides some protection against false positives stemming from novel studies, 2 and is especially needed given many published research findings are likely false (Simmons et al., 2011;Ioannidis, 2005). ...
... These "researcher degrees of freedom" occur during the design, analysis, and writeup process, typically going unreported in publication. P-hacking (Simmons et al., 2011;Simonsohn et al., 2014) hypothesizing after results are known (HARKing; Kerr, 1998), and other QRPs, inflate the chances of false positive findings (John et al., 2012;Simmons et al., 2011). Though the public finds their use to be immoral and even criminal (Pickett & Roche, 2018), such practices appear to be widespread in the social sciences (John et al., 2012;Simmons et al., 2011), including criminology (Chin et al., 2021). ...
Article
Full-text available
In 2014, Pickett and Baker cast doubt on the scholarly consensus that Americans are pragmatic about criminal justice. Previous research suggested this pragmaticism was evidenced by either null or positive relationships between seemingly opposite items (i.e., between dispositional and situational crime attributions and between punitiveness and rehabilitative policy support). Pickett and Baker argued that because these studies worded survey items in the same positive direction, respondents’ susceptibility to acquiescence bias led to artificially inflated positive correlations. Using a simple split-ballot experiment, they manipulated the direction of survey items and demonstrated bidirectional survey items resulted in negative relationships between attributions and between support for punitive and rehabilitative policies. We replicated Pickett and Baker’s methodology with a nationally representative sample of American respondents supplemented by a diverse student sample. Our results were generally consistent, and, in many cases, effect sizes were stronger than those observed in the original study. Americans appear much less pragmatic when survey items are bidirectional. Yet, we suggest the use of bidirectional over unidirectional survey items trades one set of problems for another. Instead, to reduce acquiescence bias and improve overall data quality, we encourage researchers to adopt item-specific questioning.
... 1 One application of the p-curve is to detect the presence of p-hacking with respect to significant P-values. 6 Under certain assumptions about the mechanism of p-hacking (as described below), if p-hacking is conducted such that a nonsignificant result is turned into a significant one (e.g., because investigators conduct subsequent tests until they obtain a significant result), then the p-curve's shape may be notably changed close to the significance level (usually P = .05). As a result, when these forms of p-hacking have occurred, a p-curve may have an excess of P-values just below the significance level. ...
... "Ajnr: American Journal of neuroradiology".jw. 5 "American journal of roentgenology".jw. 6 Annals of nuclear medicine.jn. 7 "Applied radiation and isotopes".jn. ...
Article
Full-text available
Background P-hacking, the tendency to run selective analyses until they become significant, is prevalent in many scientific disciplines. Purpose This study aims to assess if p-hacking exists in imaging research. Methods Protocol, data, and code available here https://osf.io/xz9ku/?view_only=a9f7c2d841684cb7a3616f567db273fa . We searched imaging journals Ovid MEDLINE from 1972 to 2021. Text mining using Python script was used to collect metadata: journal, publication year, title, abstract, and P-values from abstracts. One P-value was randomly sampled per abstract. We assessed for evidence of p-hacking using a p-curve, by evaluating for a concentration of P-values just below .05. We conducted a one-tailed binomial test (α = .05 level of significance) to assess whether there were more P-values falling in the upper range (e.g., .045 < P < .05) than in the lower range (e.g., .04 < P < .045). To assess variation in results introduced by our random sampling of a single P-value per abstract, we repeated the random sampling process 1000 times and pooled results across the samples. Analysis was done (divided into 10-year periods) to determine if p-hacking practices evolved over time. Results Our search of 136 journals identified 967,981 abstracts. Text mining identified 293,687 P-values, and a total of 4105 randomly sampled P-values were included in the p-hacking analysis. The number of journals and abstracts that were included in the analysis as a fraction and percentage of the total number was, respectively, 108/136 (80%) and 4105/967,981 (.4%). P-values did not concentrate just under .05; in fact, there were more P-values falling in the lower range (e.g., .04 < P < .045) than falling just below .05 (e.g., .045 < P < .05), indicating lack of evidence for p-hacking. Time trend analysis did not identify p-hacking in any of the five 10-year periods. Conclusion We did not identify evidence of p-hacking in abstracts published in over 100 imaging journals since 1972. These analyses cannot detect all forms of p-hacking, and other forms of bias may exist in imaging research such as publication bias and selective outcome reporting.
... I only concealed that a lot of tests were run before finding and reporting a significant result, which for this reason is actually fake. Such a research practice is related to so-called p-hacking, see Simonsohn et al. (2014) and Simmons et al. (2011). One continues collecting data or carrying out experiments until a sufficiently small p-value shows up, thus promising some significant result, which is prone to "false positive" findings. ...
... To account for the second case B), it must be demanded that "Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed" (Wasserstein and Lazar (2016, p. 132)). Since it is hard to control whether researchers comply to such a policy, it is important to overcome the publication bias; Sterling et al. (1995) suggested that empirical studies should be accepted for publication if they tackle a relevant or interesting research question with adequate methods and data, irrespective of the level of significance of the outcome; see also Simmons et al. (2011). ...
Preprint
Full-text available
The controversy about statistical significance vs. scientific relevance is more than 100 years old. But still nowadays null hypothesis significance testing is considered as gold standard in many empirical fields from economics and social sciences over psychology to medicine, and small $p$-values are often the key to publish in journals of high scientific reputation. I highlight, illustrate and discuss potential pitfalls of statistical significance testing on three occasions.
... In response to concerns about a Breproducibility crisis ( Baker, 2016), the open science movement and its associated practices are being discussed fervently within scholarly circles (Antonakis, 2017;Bosco, Aguinis, Field, Pierce, & Dalton, 2016;Grand, Rogelberg, Banks, Landis, & Tonidandel, in press;Hollenbeck & Wright, 2017;Ioannidis, 2005;Simmons, Nelson, & Simonsohn, 2011) as well as in mainstream media outlets (Carey, 2015;Korn, 2014). Although proper implementation of open science practices should lead to marked improvements to research and practice (e.g., greater reproducibility and replicability), some open science practices have been greeted with a measure of skepticism. ...
... A second benefit to pre-registration lies in transparently disclosing and drawing a clear line between the confirmatory versus exploratory aspects of research (Kepes & McDaniel, 2013;Simmons et al., 2011). Both confirmatory and exploratory work might be specified in pre-registration; however, the researcher is obligated to specify any additional changes in design and analyses after pre-registration as exploratory. ...
Chapter
Open science refers to an array of practices that promote openness, integrity, and reproducibility in research; the merits of which are being vigorously debated and developed across academic journals, listservs, conference sessions, and professional associations. The current paper identifies and clarifies major issues related to the use of open science practices (e.g., data sharing, study pre-registration, open access journals).We begin with a useful general description of what open science in organizational research represents and adopt a question-and-answer format. Through this format, we then focus on the application of specific open science practices and explore future directions of open science. All of this builds up to a series of specific actionable recommendations provided in conclusion, to help individual researchers, reviewers, journal editors, and other stakeholders develop a more open research environment and culture.KeywordsOpen sciencePhilosophy of scienceQuestionable research practicesResearch ethics
... Second, a theory-driven approach to quantitative CSR research is necessary because using sophisticated empirical methods without theory-based causal analysis at best yields shallow and misleading results (Simmons et al., 2011). Theory provides guidance to research questions and logical reasoning, forces discipline in methodology (i.e., measurement, data collection, analysis), and imparts meaning to empirical results (Cortina, 2016;Van de Ven, 2007;Van Maanen et al., 2007). ...
Article
Full-text available
In this article, the co-editors of the corporate responsibility: quantitative issues section of the journal provide an overview of the quantitative CSR field and offer some new perspectives on where the field is going. They highlight key issues in developing impactful, theory-driven, and ethically grounded research and call for research that examines complex problems facing businesses and the society (e.g., big data and artificial intelligence, political polarization, and the role of CSR in generating social impact). By examining topics that are under-researched, forward-looking, and socially oriented, scholars can expand the boundary of CSR’s substantive domain and produce research that helps businesses act in a long-term, socially responsible way in this quickly evolving, turbulent environment. They also discuss ways to enhance the methodological rigor of quantitative CSR research and encourage scholars to employ cutting-edge, innovative methods to shed light on the micro-level mechanisms of CSR and reveal patterns and relationships hidden in unstructured big data.
... Sufce to say, there has been sufcient criticism of much of the research referenced in the frst edition of this book, including the Stanford Prison Experiment (Haslam, Introduction This is not to suggest that research from these latter felds is bereft of shortcomings. The statistical fallacies that have played a central role in many of the questionable fndings emerging from psychology (for example,"p-hacking" [Simmons, Nelson & Simonsohn, 2011] and "forking paths" [Gelman & Loken, 2014]) can wreak havoc in any feld that requires the analysis of experimental data. Furthermore, the highly controlled laboratory conditions experimental economists create to conduct their research creates trade-ofs. ...
... years) from the online participant subject pool Prolific Academic. We decided on the sample size based on a rule of thumb to estimate a normal distribution (Simmons et al., 2011). All participants had been residents in the UK for more than five years. ...
Thesis
Creative thinking is the psychological mechanism underlying the descriptive process that produces real-life creative outcomes. However, the connection between individual creative thinking and real-life creativity remains unclear. For example, the widely employed psychometric tools for creative thinking showed limited predictive power towards real-life creativity. In addition, empirical evidence for the social psychology of creativity is inconsistent. Also, the links between creative thinking and social cognitive process are rarely validated in the field. Besides, some domains that require creativity lack guiding theories and empirical evidence. Therefore, this research project aimed to advance the understanding of creative thinking and its role in real-life situations. To address the knowledge gap and fulfil the central purpose, we conducted four pilot and seven main studies using quantitative research methods. Accordingly, we created an integrative-thinking-based psychometric tool - Function Synthesis Task and validated its discriminate validity and predictive ability towards engineering students' creative product design. To understand the link between social comparison and creativity, we produced a new experimental paradigm that addressed existing methodological issues. We employed the paradigm and found that competition and star rating feedback altered speed or performance in creative thinking tasks. Besides, we produced a new product design task based on a hot topic at the time and found that ranking feedback benefited engineering students' creative performance in the task. Moreover, we designed a new un-stereotype intervention and found its effectiveness in improving marketers' divergent thinking. We also found that advertising stereotypes increased audiences' perceived creativity. Our research shows that integrative thinking and social cognition might play essential roles in developing the theory of creative thinking and offers novel research tools for future studies. We also form practical advice to guide educators, organisational leaders, and policymakers to promote creativity, diversity, and inclusion in real-life situations.
... When a detailed preregistration has been created, the flexibility to engage in certain "questionable research practices" (John et al., 2012, p. 524) using "researcher degrees of freedom" (Simmons et al., 2011(Simmons et al., , p. 1359) is curtailed. In particular, preregistration may deter the following practices: ...
Preprint
According to mindset theory, students who believe their personal characteristics can change—that is, those who hold a growth mindset—will achieve more than students who believe their characteristics are fixed. Proponents of the theory have developed interventions to influence students’ mindsets, claiming that these interventions lead to large gains in academic achievement. Despite their popularity, the evidence for growth mindset intervention benefits has not been systematically evaluated considering both the quantity and quality of the evidence. Here, we provide such a review by (a) evaluating empirical studies’ adherence to a set of best practices essential for drawing causal conclusions and (b) conducting three meta-analyses. When examining all studies (63 studies, N = 97,672), we found major shortcomings in study design, analysis, and reporting, and suggestions of researcher and publication bias: Authors with a financial incentive to report positive findings published significantly larger effects than authors without this incentive. Across all studies, we observed a small overall effect: d ̅ = 0.05, 95% CI = [0.02, 0.09], which was non-significant after correcting for potential publication bias. No theoretically-meaningful moderators were significant. When examining only studies demonstrating the intervention influenced students’ mindsets as intended (13 studies, N = 18,355), the effect was non-significant: d ̅ = 0.04, 95% CI = [-0.01, 0.10]. When examining the highest-quality evidence (6 studies, N = 13,571), the effect was non-significant: d ̅ = 0.02, 95% CI = [-0.06, 0.10]. We conclude that apparent effects of growth mindset interventions on academic achievement are likely attributable to inadequate study design, reporting flaws, and bias.
... Despite that, the current scientific method of knowing has a serious problem and it is possible that is about to cause another scientific "revolution". I refer to the scientific "replication crisis" (e.g., Ioannidis, 2005;Maxwell et al., 2015;Shrout & Rodgers, 2018;Simmons et al., 2011;Yarkoni, 2022) that undermines the overall credibility of scientific knowledge (Pashler & Wagenmakers, 2012). Replicability 2 , i.e., the ability to reproduce similar findings on different independent samples, is one of the fundamental cornerstones of science (e.g., Fisher, 1926;Popper, 1959Popper, /2002. ...
Thesis
This doctoral thesis consists of four research articles dealing with how cultural influences shape cognitive processes. The first article describes cross-cultural differences in analytic-holistic cognitive style, individualism/collectivism and map reading. The next two articles describe the psychometric properties of the methods measuring behavioral impulsivity and individualism/collectivism. The fourth article demonstrates the risk of ignoring measurement invariance on a cross-cultural comparison of individualism/collectivism. In the closing, I discuss the implications for cross-cultural research and its measurement
Preprint
Full-text available
Understanding the causes of within- and among-population differences in vital rates, life histories, and population dynamics is a central topic in ecology. To understand how within- and among-population variation emerges, we need long-term studies that include episodic events and contrasting environmental conditions, data to characterize individual and shared variation, and statistical models that can tease apart population-, shared-, and individual contribution to the observed variation. We used long-term tag-recapture data and novel statistical and modeling techniques to investigate and estimate within- and among-population differences in vital rates, life histories and population dynamics of marble trout Salmo marmoratus, a endemic freshwater salmonid with a narrow range. Only ten populations of pure marble trout persist in headwaters of Alpine rivers in western Slovenia. Marble trout populations are also threatened by floods and landslides, which have caused the extinction of two populations in recent years. We estimated and determined causes of variation in growth, survival, and recruitment both within and among populations, and evaluated trade-offs between them. Specifically, we estimated the responses of these traits to variation in water temperature, density, sex, early life conditions, and the occurrence of extreme climatic events (e.g., flash floods and debris flows). We found that the effects of population density on traits were mostly limited to the early stages of life and that individual growth trajectories were established early in life. We found no clear effects of water temperature on survival and recruitment. Population density varied over time, with flash floods and debris flows causing massive mortalities and threatening population persistence. Apart from flood events, variation in population density within streams was largely determined by variation in recruitment, with survival of older fish being relatively constant over time within populations, but substantially different among populations. Marble trout show a fast to slow continuum of life histories, with slow growth associated with higher survival at the population level, possibly determined by food conditions and age at maturity. Our work provides unprecedented insight into the causes of variation in vital rates, life histories, and population dynamics in an endemic species that is teetering on the edge of extinction.
Article
Full-text available
In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant
  • Simmons