An exploratory test for an excess of significant findings

University of Ioannina, Yannina, Epirus, Greece
Clinical Trials (Impact Factor: 1.93). 02/2007; 4(3):245-53. DOI: 10.1177/1740774507079441
Source: PubMed


The published clinical research literature may be distorted by the pursuit of statistically significant results.
We aimed to develop a test to explore biases stemming from the pursuit of nominal statistical significance.
The exploratory test evaluates whether there is a relative excess of formally significant findings in the published literature due to any reason (e.g., publication bias, selective analyses and outcome reporting, or fabricated data). The number of expected studies with statistically significant results is estimated and compared against the number of observed significant studies. The main application uses alpha = 0.05, but a range of alpha thresholds is also examined. Different values or prior distributions of the effect size are assumed. Given the typically low power (few studies per research question), the test may be best applied across domains of many meta-analyses that share common characteristics (interventions, outcomes, study populations, research environment).
We evaluated illustratively eight meta-analyses of clinical trials with >50 studies each and 10 meta-analyses of clinical efficacy for neuroleptic agents in schizophrenia; the 10 meta-analyses were also examined as a composite domain. Different results were obtained against commonly used tests of publication bias. We demonstrated a clear or possible excess of significant studies in 6 of 8 large meta-analyses and in the wide domain of neuroleptic treatments.
The proposed test is exploratory, may depend on prior assumptions, and should be applied cautiously.
An excess of significant findings may be documented in some clinical research fields.

48 Reads
  • Source
    • "They declare that a right skewed p-curve is evidence of biased analysis or selective reporting[4]. Ioannidis also concluded that significant p values were over-represented in a review of meta-analyses of neuroleptic agents for schizophrenia[16]. Apart from publications bias and bias in analysis and outcomes reporting, Ioannidis added data fabrication as another possible cause of an over-representation of significant p values. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: A relatively high incidence of p values immediately below 0.05 (such as 0.047 or 0.04) compared to p values immediately above 0.05 (such as 0.051 or 0.06) has been noticed anecdotally in published medical abstracts. If p values immediately below 0.05 are over-represented, such a distribution may reflect the true underlying distribution of p values or may be due to error (a false distribution). If due to error, a consistent over-representation of p values immediately below 0.05 would be a systematic error due either to publication bias or (overt or inadvertent) bias within studies. Methods: We searched the Medline 2012 database to identify abstracts containing a p value. Two thousand abstracts out of 80,649 abstracts were randomly selected. Two independent researchers extracted all p values. The p values were plotted and compared to a predicted curve. Chi square test was used to test assumptions and significance was set at 0.05. Results: 2798 p value ranges and 3236 exact p values were reported. 4973 of these (82 %) were significant (<0.05). There was an over-representation of p values immediately below 0.05 (between 0.01 and 0.049) compared to those immediately above 0.05 (between 0.05 and 0.1) (p = 0.001). Conclusion: The distribution of p values in reported medical abstracts provides evidence for systematic error in the reporting of p values. This may be due to publication bias, methodological errors (underpowering, selective reporting and selective analyses) or fraud.
    Full-text · Article · Nov 2015 · BMC Research Notes
  • Source
    • "To minimize the fourth concern— overestimation of mean effect sizes due to publication bias—we searched for, retrieved, and included results from as many unpublished experiments as possible. Moreover, we applied both classic and more recently developed statistical techniques to assess and correct meta-analytic estimates for the influence of small-study effects such as publication bias (Duval & Tweedie, 2000; Ioannidis & Trikalinos, 2007; Stanley & Doucouliagos, 2014). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Failures of self-control are thought to underlie various important behaviors (e.g., addiction, violence, obesity, poor academic achievement). The modern conceptualization of self-control failure has been heavily influenced by the idea that self-control functions as if it relied upon a limited physiological or cognitive resource. This view of self-control has inspired hundreds of experiments designed to test the prediction that acts of self-control are more likely to fail when they follow previous acts of self-control (the depletion effect). Here, we evaluated the empirical evidence for this effect with a series of focused, meta-analytic tests that address the limitations in prior appraisals of the evidence. We find very little evidence that the depletion effect is a real phenomenon, at least when assessed with the methods most frequently used in the laboratory. Our results strongly challenge the idea that self-control functions as if it relies on a limited psychological or physical resource. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
    Full-text · Article · Jun 2015 · Journal of Experimental Psychology General
    • "This IC index is not close to the Ͼ .90 criterion that has been suggested as criterion to infer that there might exist more nonsignificant findings than have been reported (Ioannidis & Trikalinos, 2007; Schimmack, 2012). As Schimmack (2012) warns that a low IC score is a necessary but not sufficient condition to ensure credibility, we also looked at two other indices of publication bias. "
    [Show abstract] [Hide abstract]
    ABSTRACT: A rich tradition in self-control research has documented the negative consequences of exerting self-control in 1 task for self-control performance in subsequent tasks. However, there is a dearth of research examining what happens when people exert self-control in multiple domains simultaneously. The current research aims to fill this gap. We integrate predictions from the most prominent models of self-control with recent neuropsychological insights in the human inhibition system to generate the novel hypothesis that exerting effortful self-control in 1 task can simultaneously improve self-control in completely unrelated domains. An internal meta-analysis on all 18 studies we conducted shows that exerting self-control in 1 domain (i.e., controlling attention, food consumption, emotions, or thoughts) simultaneously improves self-control in a range of other domains, as demonstrated by, for example, reduced unhealthy food consumption, better Stroop task performance, and less impulsive decision making. A subset of 9 studies demonstrates the crucial nature of task timing-when the same tasks are executed sequentially, our results suggest the emergence of an ego depletion effect. We provide conservative estimates of the self-control facilitation (d = |0.22|) as well as the ego depletion effect size (d = |0.17|) free of data selection and publication biases. These results (a) shed new light on self-control theories, (b) confirm recent claims that previous estimates of the ego depletion effect size were inflated due to publication bias, and (c) provide a blueprint for how to handle the power issues and associated file drawer problems commonly encountered in multistudy research projects. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
    No preview · Article · Mar 2015 · Journal of Experimental Psychology General
Show more