ArticlePDF Available

False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant

Authors:

Abstract

The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
(1) Overview
Collection date(s)
2010
Background
We wanted an experiment arriving at a necessarily false
finding. We settled for age based on self-reported birth-
day as that would seem impossible to move around even
through measurement error.
(2) Methods
Sample
The Wharton school has a behavioral lab where people are
paid for participating. They usually complete several stud-
ies in a single session and get paid a flat fee plus additional
revenue some experiments within the session may include.
More specific demographics are included in the data
themselves.
Given the light nature of the study we did not monitor
incomplete submissions, so do not know if people started
and did not complete, but this seldom if ever happens in
this lab.
Materials
In both experiments people listened to one of three music
files. The song “Kalimba” by Mr. Scrub which comes free
with the Windows 7 operating system, the song “Hot
Potato” by the Australian band The Wiggles, and “When I
am 64” by the Beatles. Copyright restrictions do not make
it possible to post those songs here.
The questions were posted on Qualtrics (an online sur-
vey provider), after participants listened to the song with
headphones they proceeded to answer all questions.
Procedures
See above.
Quality control
None, given the setting.
Ethical issues
The study followed the ethical standards by the American
Psychological Association. The study was approved by the
Institutional Review Board of the Wharton School. There
are no personal identifiers in the data beyond age and par-
ents’ age, insufficient to identify people.
(3) Dataset description
Object name
Text-iles
•Study 1.txt
•Study 2.txt
•Codebook.txt
Excel
• Post Data - False Positive Psychology.xlsx
Data type
Raw data file
DATA PAPER
Data from Paper “False-Positive Psychology:
Undisclosed Flexibility in Data Collection and Analysis

Joseph P. Simmons,1 Leif D. Nelson,2 Uri Simonsohn1
1 The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
2 Haas Marketing Group, University of California, Berkeley, California, United States of America
Simmons, J P, Nelson, L D and Simonsohn, U 2014 Data from Paper “False-Positive Psychology:

Journal of Open Psychology Data,
2(1): e1, DOI: 
Keywords: False-Positive psychology; methodology; motivated reasoning; publication bias; disclosure; p-hacking
Funding Statement: 



in data collection, analysis, and reporting of results. Data are useful for educational purposes.
RSHQSV\FKRORJ\GDWD
-RXUQDORI
Simmons et al
Format names and versions
Both in .txt with a .txt codebook, and a self-contained
Excel Workbook file (xlsx).
Data collectors
Paid staff at the lab.
Language
English
License
CC0
Repository location
http://doi.org/10.5281/zenodo.7664
Publication date
13 January 2014
(4) Reuse potential
Data from this highly-cited paper are especially useful
for educational purposes (teaching of statistics) as well
as for future research concerned with various statistical
approaches.
References
1. Simmons, J. P., Nelson, L. D., & Simonsohn, U.
(2011). False-Positive Psychology: Undisclosed
Flexibility in Data Collection and Analysis Allows
Presenting Anything as Significant. Psychologi-
cal Science, 22(11), 1359-1366. DOI: http://dx.doi.
org/10.1177/0956797611417632
How to cite this article: Simmons, J P, Nelson, L D and Simonsohn, U 2014 Data from Paper “False-Positive Psychology:

Journal of Open Psychology
Data,
2(1): e1, DOI: 
Published: 21 February 2014
Copyright


The
Journal of Open Psychology Data

by Ubiquity Press OPEN ACCESS
Peer review comments: 
... Multiverse analysis is especially helpful for addressing the call for open science (e.g., Nosek et al., 2015;Simmons et al., 2011). As is true for preregistrations (e.g., Logg & Dorison, 2021), sharing of data and materials (e.g., Gilmore et al., 2018), and other open science practices, multiverse analysis strengthens transparency (Steegen et al., 2016). ...
... Figure 2, we organize in four clusters decisions that are relevant for most types of psychological research (see Stage 1; see the JARS; Appelbaum et al., 2018). Cluster 1 concerns sample decisions (e.g., criteria for inclusion of participants; e.g., Simmons et al., 2011), also as they pertain to meta-analyses (e.g., criteria for inclusion of effect sizes; e.g., Wanous et al., 1989) or longitudinal studies (e.g., how often and when to sample, and potentially resulting issues regarding attrition; e.g., Goodman & Blum, 1996; please note that such decisions also provide opportunities for different analyses, including which waves to consider statistically; see Cluster 4). Cluster 2 concerns operationalization decisions (e.g., psychometrics; e.g., Cortina et al., 2020), also as they pertain to experiments (e.g., manipulation checks; e.g., Fiedler et al., 2021) and again to meta-analyses (e.g., procedures to calculate effect sizes; e.g., Morris, 2023). ...
... Another recommendation is to "[r]un results with and without the CVs [control variables] and contrast the findings" (Becker et al., 2016, p. 164;insertion added;cf. Carlson & Wu, 2012;Simmons et al., 2011), which can be accomplished through multiverse analysis by considering the inclusion of controls as a decision. ...
Article
Full-text available
A multiverse analysis allows researchers to systematically evaluate the support for a hypothesis across a range of sensible ways in which data can be prepared for statistical analysis and/or be analyzed. Accordingly, multiverse analysis provides insights into the relevance of different approaches to, for instance, dealing with outliers or attrition, creating scales, or using different measures for the same construct. The goal of this paper is to illustrate the usefulness of multiverse analysis for research in applied psychology and to guide researchers in conducting a multiverse analysis. To do so, we provide a detailed process model of the typical stages involved in conducting a multiverse analysis (along with a shortened version depicting multiverse analysis “at a glance”), as well as a designated, corresponding preregistration template for multiverse analysis. To showcase the merits of a multiverse analysis, we also evaluate two exemplary hypotheses regarding employees’ experience of commuting to and from work. We observed that the results of these hypothesis tests varied strongly depending on how common decisions were made. As such, multiverse analysis represents an important tool for exploring the robustness of knowledge at the level of individual studies, even before a replication is conducted. Hence, multiverse analysis can strengthen the openness and transparency of empirical work.
... On the other side, AI programmes that resemble humans physically could endanger the humanity of their users. Nevertheless, Anthropomorphism has significant impacts on customers' willingness and trustworthiness of using service robots in hotel industry (Simmons et al., 2011). At the same time, Lu et al. ...
... This indicates that when the anthropomorphism construct goes up by (1) standard deviation, hotel customers' willingness goes up by (0.41) standard deviations. The outcome is in line with Simmons et al. (2011) who believed that Anthropomorphism shapes the human-robot relationship by influencing how believable robots are interaction. In this regard, Wan and Aggarwal (2015) clarified that smart gadget possessing a human-like physical appearance could be hazardous for customers' sense of self. ...
Article
Full-text available
Dilemmas of staff deficiency or turnover especially in perilous times may be recovered by integrating service robots. In addition, there is continuous and global competition in the daily business environment of the hotel industry. Therefore, the current research aims to shed the light on artificial intelligence (AI) in general and service robots in hotels in Egypt in particular. It also aims to investigate the extent of customers' willingness towards robots, and to determine the impact of a set of factors on their willingness. Further, it aims to explore the challenges facing hotel managers to integrate robots in their hotels. This study is considered a descriptive and exploratory one. First, Customers of the hotels provided data, as 46 hotels were chosen using the stratified random sample technique as a representative sample, then, respondents were randomly surveyed and 1200 valid responses were collected and analyzed through a set of statistical tests using the Amos program. Second, data were collected from 30 hotel managers using focus group technique. The results revealed a positive willingness for hotel guests to interact with service robots in different operations. Also, the results emphasized the positive impacts of performance efficacy, anthropomorphism, social influence, and emotions on the hotel customers' willingness towards robots. Conversely, both intrinsic motivation and facilitating conditions were found to have no significant impact. In addition, the focus group results indicated that hotel managers have positive intentions towards integrating robots in their hotels; however, they face many challenges. Finally, the current research provides implications to hotel management for integrating robots that satisfying customers' willingness.
... This problem is further amplified by the often-vague research questions in sport and exercise studies (i.e., vague questions allow for more flexible exploration). This undisclosed flexibility in the data analysis, driven by ambiguity about how best to analyse the data, and a desire to find statistically significant results, can greatly inflate the Type 1 error rate (Simmons et al., 2011). Similar concerns about unplanned analytical flexibility are echoed in Gelman and Loken (2013). ...
... Improving the quality of research reports could be achieved by reporting guidelines and improving researcher adherence, but reducing the risk of bias caused by flaws in research design and conduct is more challenging. Existing examples of initiatives with this intention include guideline references for researchers to rationalize study design [20], pre-registration requirements to improve study transparency [21], and open data and code that may reduce questionable research practices [22] Other examples of interventions with a more theoretical basis include training, mentoring, incentives, tools, assistance, and infrastructure. Another opportunity is through funding applications. ...
Article
Full-text available
Background Interventional studies are intended to provide robust evidence. Yet poorly designed or conducted studies may bias research results and skew resulting evidence. While there have been advances in the assessment of risk of bias, it is unclear how to intervene against risks of bias during study design and conduct. Objective To identify interventions to reduce or predict risk of bias in interventional studies during their design and conduct. Search strategy For this scoping review, we searched three electronic bibliographic databases (MEDLINE, Embase, and Cochrane Library) and nine grey literature sources and Google from in September 2024. This was supplemented by a natural language processing fuzzy matching search of the top 2000 relevant publications in the electronic bibliographic databases. Publications were included if they described the implementation and effectiveness of an intervention during study design or conduct aimed at reducing risk of bias in interventional studies. The characteristics and effect of the interventions were recorded. Result We identified, and reviewed the title and abstracts of, a total of 41,793 publications, reports, documents and grey literature, with 24,677 from electronic bibliographic databases and 17,140 from grey literature sources. There were 67 publications from bibliographic databases and 24 items from grey literature that were considered potentially eligible for inclusion, and the full-text of these were reviewed. Only three studies met the inclusion criteria. The first intervention was offering education and training to researchers during study design. This training included the implementation of a more rigorous participant screening process and systematic participant tracking program that reduced loss to follow-up and missing data, particularly for long-term follow-up trials. The second intervention was introducing an independent clinical events committee during study conduct. This was intended to mitigate bias due to conflicts of interest affecting the analysis and interpretation of results. The third intervention was to provide participants with financial incentives in randomized controlled trials, so that participants could more actively accomplish the requirements of the trials. Conclusion Despite the major impact of risk of bias on study outcomes, there are few empirical interventions to address this during study design or conduct.
... These methods can deal with highly correlated independent variables, with Lasso allowing model selection by shrinking model parameters to absolute zero [50]. All modelling choices should match the study's objective and be pre-planned in a study protocol to allow transparency and avoid p-hacking [18,51]. ...
Article
Full-text available
Background Decisions about health care, such as the effectiveness of new treatments for disease, are regularly made based on evidence from published work. However, poor reporting of statistical methods and results is endemic across health research and risks ineffective or harmful treatments being used in clinical practice. Statistical modelling choices often greatly influence the results. Authors do not always provide enough information to evaluate and repeat their methods, making interpreting results difficult. Our research is designed to understand current reporting practices and inform efforts to educate researchers. Methods Reporting practices for linear regression were assessed in 95 randomly sampled published papers in the health field from PLOS ONE in 2019, which were randomly allocated to statisticians for post-publication review. The prevalence of reporting practices is described using frequencies, percentages, and Wilson 95% confidence intervals. Results While 92% of authors reported p-values and 81% reported regression coefficients, only 58% of papers reported a measure of uncertainty, such as confidence intervals or standard errors. Sixty-nine percent of authors did not discuss the scientific importance of estimates, and only 23% directly interpreted the size of coefficients. Conclusion Our results indicate that statistical methods and results were often poorly reported without sufficient detail to reproduce them. To improve statistical quality and direct health funding to effective treatments, we recommend that statisticians be involved in the research cycle, from study design to post-peer review. The research environment is an ecosystem, and future interventions addressing poor statistical quality should consider the interactions between the individuals, organisations and policy environments. Practical recommendations include journals producing templates with standardised reporting and using interactive checklists to improve reporting practices. Investments in research maintenance and quality control are required to assess and implement these recommendations to improve the quality of health research.
... We report how we determined our sample size and all data exclusions (Simmons et al., 2012(Simmons et al., , 2016. Researchers can find all the data, materials, and R Code on the Open Science ...
Article
Full-text available
Self-referential processing bias (SRPB), self-assigned characteristics about oneself, may concurrently contribute to depression and psychological well-being. In the current study, we examined two hypotheses using structural equation modeling: self-critical rumination predicting depression and state self-esteem predicting psychological well-being because of SRPB. Participants (n = 133) were undergraduates between the ages of 18 to 32 (M = 19.97; female: 63.91%; White: 65.41%) who completed a three-part longitudinal study comprised of a baseline assessment of state self-esteem and self-critical rumination at T1, the Referential Encoding Task, a behavioral task validated to assess SRPB, at T2, and a one-week follow-up of depression and psychological well-being at T3. Although our initial findings did not support our hypotheses, a respecified model suggested that (1) a combination of SRPB and state self-esteem fully explained self-critical rumination’s ability to positively predict depression, (2) state self-esteem fully explained self-critical rumination’s ability to predict psychological well-being negatively. The findings suggest that state self-esteem and SRPB may be important behavioral targets to reduce depression and promote psychological well-being simultaneously – opening the potential for more efficient interventions and nuanced mechanisms of behavior change.
... Such omissions underscore the inherent challenges in mimicking the full complexity of actual biological datasets. They also still offer a large set of degrees of freedom in the choice of hyperparameters ("researcher degrees of freedom" phenomenon 71 ). Furthermore, the distribution over the true underlying graph still needs to be chosen, and how far those generated graphs are from a realistic causal graph is also unknown. ...
Article
Full-text available
Mapping biological mechanisms in cellular systems is a fundamental step in early-stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, evaluating the performance of network inference methods in real-world environments is challenging due to the lack of ground-truth knowledge. Moreover, traditional evaluations conducted on synthetic datasets do not reflect the performance in real-world systems. We thus introduce CausalBench, a benchmark suite revolutionizing network inference evaluation with real-world, large-scale single-cell perturbation data. CausalBench, distinct from existing benchmarks, offers biologically-motivated metrics and distribution-based interventional measures, providing a more realistic evaluation of network inference methods. An initial systematic evaluation of state-of-the-art causal inference methods using our CausalBench suite highlights how poor scalability of existing methods limits performance. Moreover, methods that use interventional information do not outperform those that only use observational data, contrary to what is observed on synthetic benchmarks. CausalBench subsequently enables the development of numerous promising methods through a community challenge, thus demonstrating its potential as a transformative tool in the field of computational biology, bridging the gap between theoretical innovation and practical application in drug discovery and disease understanding. Thus, CausalBench opens new avenues for method developers in causal network inference research, and provides to practitioners a principled and reliable way to track progress in network methods for real-world interventional data.
Preprint
Full-text available
If interest in scientific research has become one of the features of the modern era, research in the field of humanities and social sciences has a special character, represented in the study of man in his behavioral, cognitive and affective dimensions, in his social, cultural and environmental contexts. Its aim is to crystallize appropriate scientific solutions to the problems of the individual and society. However, the complexity of research in the humanities lies in the fact that the researcher is a human being and the respondent is also a human being, with the consequent interactions and mutual influences between them. Hence, the empirical situation becomes more complex, compared to the exact sciences situation. On this basis, the humanities and social sciences face various challenges, represented in dealing with the respondents, the nature of the method used, the selection of measurement tools and their application methods, the use of appropriate analysis methods, and objectivity in interpreting the results, taking into account cultural and contextual considerations. Thus, ethical research controls in psychology are essential to ensuring that research is conducted in a responsible manner. Accordingly, in this research we will present our psychological field experiences in the Moroccan context, the subject of which is an anxiety disorder in the academic context in evaluative situations, we will also present therapeutic field trials for adolescents with anxiety. Our aim is to present the ethical, scientific, and methodological challenges that we faced while working with teenage students from the secondary stages, and how we dealt with them.
Article
Full-text available
The reproducibility crisis in psychology has caused various fields to consider the reliability of their own findings. Many of the unfortunate aspects of research design that undermine reproducibility also threaten translation potential. In preclinical addiction research, the rates of translation have been disappointing. We tallied indices of transparency and accurate and thorough reporting in animal models of opioid addiction from 2019 to 2023. By examining the prevalence of these practices, we aimed to understand whether efforts to improve reproducibility are relevant to this field. For 255 articles, we report the prevalence of transparency measures such as preregistration, registered reports, open data and open code, as well as compliance to the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines. We also report rates of bias minimization practices (randomization, masking and data exclusion), sample size calculations and multiple corrections adjustments. Lastly, we estimated the accuracy of test statistic reporting using a version of StatCheck. All the transparency measures and the ARRIVE guideline items had low prevalence, including no cases of study preregistration and no cases where authors shared their analysis code. Similarly, the levels of bias minimization practices and sample size calculations were unsatisfactory. In contrast, adjustments for multiple comparisons were implemented in most articles (76.5%). Lastly, p‐value inconsistencies with test statistics were detected in about half of papers, and 11% contained statistical significance errors. We recommend that researchers, journal editors and others take steps to improve study reporting and to facilitate both replication and translation.
Article
Full-text available
Concerns about the replicability, reproducibility and transparency of research have ushered in a set of practices and behaviours under the umbrella of ‘open research’. To this end, many new initiatives have been developed that represent procedural (i.e. behaviours and sets of commonly used practices in the research process), structural (new norms, rules, infrastructure and incentives), and community-based change (working groups, networks). The objectives of this research were to identify and outline international initiatives that enhance awareness and uptake of open research practices in the discipline of psychology. A systematic mapping review was conducted in three stages: (i) a Web search to identify open research initiatives in psychology; (ii) a literature search to identify related articles; and (iii) a hand search of grey literature. Eligible initiatives were then coded into an overarching theme of procedural, structural or community-based change. A total of 187 initiatives were identified; 30 were procedural (e.g. toolkits, resources, software), 70 structural (e.g. policies, strategies, frameworks) and 87 community-based (e.g. working groups, networks). This review highlights that open research is progressing at pace through various initiatives that share a common goal to reform research culture. We hope that this review promotes their further adoption and facilitates coordinated efforts between individuals, organizations, institutions, publishers and funders.
Article
Full-text available
In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant
  • Simmons