Preprint

Spurious Precision in Meta-Analysis

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Meta-analysis upweights studies reporting lower standard errors and hence more precision. But in empirical practice, notably in observational research, precision is not given to the researcher. Precision must be estimated, and thus can be p-hacked to achieve statistical significance. Simulations show that a modest dose of spurious precision creates a formidable problem for inverse-variance weighting and bias-correction methods based on the funnel plot. Selection models fail to solve the problem, and the simple mean can beat sophisticated estimators. Cures to publication bias may become worse than the disease. We introduce an approach that surmounts spuriousness: the Meta-Analysis Instrumental Variable Estimator (MAIVE), which employs inverse sample size as an instrument for reported variance.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

... Irsova et al. (2023) provide a link between sample size and precision and suggest using the sample size as an instrument. Worldwide studies employ large data sets of observations that make partial correlations virtually free from small-sample bias. ...
Article
Full-text available
We analyze diverse and heterogeneous literature to grasp the general effect size of financial development on economic growth on a world scale. For that, we perform by far the largest available meta-analysis of the finance–growth nexus using 3561 estimates collected from 177 studies. Our meta-synthesis results show that large heterogeneity in empirical evidence is, in fact, driven by only a limited number of variables (moderators). By using advanced techniques, we also document the existence of the publication selection bias that is propagated in the literature in a nonlinear fashion. We account for uncertainty in moderator selection by employing model-averaging techniques. After adjusting for the publication bias, the results of our meta-regression provide evidence of a small but genuine positive effect of the financial development on growth that very mildly declines over time. Finance channeled via capital markets seems to be more beneficial for economic growth than that provided in the form of private credit. Our evidence goes against arguments about the damaging role of financial development and is in line with century-old theoretical foundations that favor the positive role of finance on economic growth.
Article
Estimates of the exchange rate pass‐through vary significantly across studies. Therefore, I conduct a meta‐analysis to understand why estimates differ and provide consensus for the conflicting results. The dataset includes 72 primary studies containing 1219 estimates of the pass‐through from nominal effective exchange rates to consumer prices for 111 countries. Because there are many potential causes of heterogeneity, I use Bayesian model averaging to identify the important ones. I find that results vary mainly due to a combination of country‐specific and methodological characteristics, even though factors such as asymmetry and product‐specific characteristics also play a role. The country‐specific characteristics include trade openness, exchange rate flexibility, economic development status, exchange rate persistence, and commodity dependence. On the other hand, the methodological factors include estimation methods, data characteristics, endogeneity bias, and the researcher's choice of control variables. Finally, I model the exchange rate pass‐through, taking into account asymmetry and the best practices in the literature. I find that a 1% increase in the exchange rate leads to a 0.09% decrease in the consumer price level, whereas a 1% decrease leads to a 0.19% increase.
Article
We assess statistical power and excess statistical significance among 31 leading economics general interest and field journals using 22,281 parameter estimates from 368 distinct areas of economics research. Median statistical power in leading economics journals is very low (only 7%), and excess statistical significance is quite high (19%). Power this low and excess significance this high raise serious doubts about the credibility of economics research. We find that 26% of all reported results have undergone some process of selection for statistical significance and 56% of statistically significant results were selected to be statistically significant. Selection bias is greater at the top five journals, where 66% of statistically significant results were selected to be statistically significant. A large majority of empirical evidence reported in leading economics journals is potentially misleading. Results reported to be statistically significant are about as likely to be misleading as not (falsely positive) and statistically nonsignificant results are much more likely to be misleading (falsely negative). We also compare observational to experimental research and find that the quality of experimental economic evidence is notably higher.
Article
Full-text available
A key concern for property owners about the set up of proximate wind turbines is the potential devaluation of their property. However, there is no consensus in the empirical hedonic literature estimating this price-distance relationship. It remains unclear if the proximity to wind turbines reduces, increases, or has no significant effect on property values. This article addresses this ambiguity, combining 720 estimates from 25 hedonic pricing studies in a first comprehensive meta-analysis on this topic. Using Bayesian model averaging techniques and novel publication bias correction methods, I calculate an average of the reported estimates that is free from misspecification and publication bias. In economic terms, I find an average reduction in property values of 0.68%-0.68\% - 0.68 % for properties 1.89 miles away, which turns to zero beyond 2.8 miles. Next to publication selection, the studies’ ability to control for confounding factors such as pre-existing price differentials and spatial effects explains the variance in reported effect sizes.
Article
This article provides concise, nontechnical, step‐by‐step guidelines on how to conduct a modern meta‐analysis, especially in social sciences. We treat publication bias, p‐ hacking, and systematic heterogeneity as phenomena meta‐analysts must always confront. To this end, we provide concrete methodological recommendations. Meta‐analysis methods have advanced notably over the last few years. Yet many meta‐analyses still rely on outdated approaches, some ignoring publication bias and systematic heterogeneity. While limitations persist, recently developed techniques allow robust inference even in the face of formidable problems in the underlying empirical literature. The purpose of this paper is to summarize the state of the art in a way accessible to aspiring meta‐analysts in any field. We also discuss how meta‐analysts can use advances in artificial intelligence to work more efficiently.
Article
We examine whether estimates of hedge fund performance reported in prior empirical research are affected by publication bias. Using a sample of 1019 intercept terms from regressions of hedge fund returns on risk factors (the “alphas”) collected from 74 studies published between 2001 and 2021, we show that the selective publication of empirical results does not significantly contaminate inferences about hedge fund returns. Most of our monthly alpha estimates adjusted for the (small) bias fall within a relatively narrow range of 30–40 basis points, indicating positive abnormal returns of hedge funds: Hedge funds generate money for investors. Studies that explicitly control for potential biases in the underlying data (e.g., backfilling and survivorship biases) report lower but still positive alphas. Our results demonstrate that despite the prevalence of publication selection bias in many other research settings, publication may not be selective when there is no strong a priori theoretical prediction about the sign of the estimated coefficients.
Article
Full-text available
The log response ratio, lnRR, is the most frequently used effect size statistic for meta‐analysis in ecology. However, often missing standard deviations (SDs) prevent estimation of the sampling variance of lnRR. We propose new methods to deal with missing SDs via a weighted average coefficient of variation (CV) estimated from studies in the dataset that do report SDs. Across a suite of simulated conditions, we find that using the average CV to estimate sampling variances for all observations, regardless of missingness, performs with minimal bias. Surprisingly, even with missing SDs, this simple method outperforms the conventional approach (basing each effect size on its individual study‐specific CV) with complete data. This is because the conventional method ultimately yields less precise estimates of the sampling variances than using the pooled CV from multiple studies. Our approach is broadly applicable and can be implemented in all meta‐analyses of lnRR, regardless of ‘missingness’.
Article
Full-text available
New meta-regression methods are introduced that identify whether the magnitude of heterogeneity across study findings is correlated with their standard errors. Evidence from dozens of meta-analyses finds robust evidence of this correlation and that small-sample studies typically have higher heterogeneity. This correlated heterogeneity violates the random-effects (RE) model of additive and independent heterogeneity. When small studies not only have inadequate statistical power but also high heterogeneity, their scientific contribution is even more dubious. When the heterogeneity variance is correlated with the sampling-error variance to the degree we find, simulations show that RE is dominated by an alternative weighted average, the unrestricted weighted least squares (UWLS). Meta-research evidence combined with simulations establish that UWLS should replace RE as the conventional meta-analysis summary of psychological research.
Article
Full-text available
While the effect of higher public debt levels on economic growth has received much attention, the literature partly points to contradictory results. This paper applies meta‐regression methods to 816 estimates from 47 primary studies. The unweighted mean of the reported results suggests that a 10 percentage points increase in public‐debt‐to‐GDP is associated with a decline in annual growth rates by 0.14 percentage points, with a 95% confidence interval from 0.09 to 0.19 percentage points. However, we cannot reject a zero effect after correcting for publication bias. Furthermore, the meta‐regression analysis shows that tackling endogeneity between public debt and growth leads to less adverse effects of public debt. In testing for nonlinear effects, our results do not point to a uniform public‐debt‐to‐GDP threshold beyond which growth slows. Threshold estimates are sensitive to data and econometric choices. These findings imply a lack of evidence of a consistently negative growth effect of higher public‐debt‐to‐GDP. The main policy implication is that there should be caution with regard to “one‐size‐fits‐all” fiscal policy prescriptions in dealing with higher public debt levels.
Article
Full-text available
Meta-analyses are essential for cumulative science, but their validity can be compromised by publication bias. To mitigate the impact of publication bias, one may apply publication-bias-adjustment techniques such as precision-effect test and precision-effect estimate with standard errors (PET-PEESE) and selection models. These methods, implemented in JASP and R, allow researchers without programming experience to conduct state-of-the-art publication-bias-adjusted meta-analysis. In this tutorial, we demonstrate how to conduct a publication-bias-adjusted meta-analysis in JASP and R and interpret the results. First, we explain two frequentist bias-correction methods: PET-PEESE and selection models. Second, we introduce robust Bayesian meta-analysis, a Bayesian approach that simultaneously considers both PET-PEESE and selection models. We illustrate the methodology on an example data set, provide an instructional video ( https://bit.ly/pubbias ) and an R-markdown script ( https://osf.io/uhaew/ ), and discuss the interpretation of the results. Finally, we include concrete guidance on reporting the meta-analytic results in an academic article.
Article
Full-text available
A key parameter in the analysis of wage inequality is the elasticity of substitution between skilled and unskilled labor. We show that the empirical literature is consistent with both publication and attenuation bias in the estimated inverse elasticities. Publication bias, which exaggerates the mean reported inverse elasticity, dominates and results in corrected inverse elasticities closer to zero than the typically published estimates. The implied mean elasticity is 4, with a lower bound of 2. Elasticities are smaller for developing countries. To derive these results, we use nonlinear tests for publication bias and model averaging techniques that account for model uncertainty.
Article
Full-text available
Publication bias is a ubiquitous threat to the validity of meta‐analysis and the accumulation of scientific evidence. In order to estimate and counteract the impact of publication bias, multiple methods have been developed; however, recent simulation studies have shown the methods' performance to depend on the true data generating process, and no method consistently outperforms the others across a wide range of conditions. Unfortunately, when different methods lead to contradicting conclusions, researchers can choose those methods that lead to a desired outcome. To avoid the condition‐dependent, all‐or‐none choice between competing methods and conflicting results, we extend robust Bayesian meta‐analysis and model‐average across two prominent approaches of adjusting for publication bias: (1) selection models of p‐values and (2) models adjusting for small‐study effects. The resulting model ensemble weights the estimates and the evidence for the absence/presence of the effect from the competing approaches with the support they receive from the data. Applications, simulations, and comparisons to preregistered, multi‐lab replications demonstrate the benefits of Bayesian model‐averaging of complementary publication bias adjustment methods. This article is protected by copyright. All rights reserved.
Article
Full-text available
Animal communication is central to many animal societies, and effective signal transmission is crucial for individuals to survive and reproduce successfully. One environmental factor that exerts selection pressure on acoustic signals is ambient noise. To maintain signal efficiency, species can adjust signals through phenotypic plasticity or microevolutionary response to natural selection. One of these signal adjustments is the increase in signal amplitude, called the Lombard effect, which has been frequently found in birds and mammals. However, the evolutionary origin of the Lombard effect is largely unresolved. Using a phylogenetically controlled meta-analysis, we show that the Lombard effect is also present in fish and amphibians, and contradictory results in the literature can be explained by differences in signal-to-noise ratios among studies. Our analysis also demonstrates that subcortical processes are sufficient to elicit the Lombard effect and that amplitude adjustments do not require vocal learning. We conclude that the Lombard effect is a widespread mechanism based on phenotypic plasticity in vertebrates for coping with changes in ambient noise levels.
Article
Full-text available
The empirical literature on the impact of corporate taxes on economic growth reaches ambiguous conclusions: corporate tax cuts increase, reduce, or do not significantly affect growth. We apply meta-regression methods to a novel data set with 441 estimates from 42 primary studies. There is evidence for publication selectivity in favour of reporting growth-enhancing effects of corporate tax cuts. Correcting for this bias, we cannot reject the hypothesis of a zero effect of corporate taxes on growth. Several factors influence reported estimates, including researcher choices concerning the measurement of growth and corporate taxes, and controlling for other budgetary components.
Article
Full-text available
We introduce a new meta-analysis estimator, the weighted and iterated least squares (WILS), that greatly reduces publication selection bias (PSB) when selective reporting for statistical significance (SSS) is present. WILS is the simple weighted average that has smaller bias and rates of false positives than conventional meta-analysis estimators, the unrestricted weighted least squares (UWLS), and the weighted average of the adequately powered (WAAP) when there is SSS. As a simple weighted average, it is not vulnerable to violations in publication bias corrections models’ assumptions too often seen in application. WILS is based on the novel idea of allowing excess statistical significance (ESS), which is a necessary condition of SSS, to identify when and how to reduce PSB. We show in comparisons with large-scale preregistered replications and in evidence-based simulations that the remaining bias is small. The routine application of WILS in the place of random effects would do much to reduce conventional meta-analysis’s notable biases and high rates of false positives.
Article
Full-text available
What is the best way to estimate the size of important effects? Should we aggregate across disparate findings using statistical meta-analysis, or instead run large, multi-laboratory replications (MLR)? A recent paper by Kvarven, Strømland and Johannesson (Kvarven et al. 2020 Nat. Hum. Behav. 4, 423-434. (doi:10.1038/s41562-019-0787-z)) compared effect size estimates derived from these two different methods for 15 different psychological phenomena. The authors reported that, for the same phenomenon, the meta-analytic estimate tended to be about three times larger than the MLR estimate. These results are a specific example of a broader question: What is the relationship between meta-analysis and MLR estimates? Kvarven et al. suggested that their results undermine the value of meta-analysis. By contrast, we argue that both meta-analysis and MLR are informative, and that the discrepancy between the two estimates that they observed is in fact still largely unexplained. Informed by re-analyses of Kvarven et al.'s data and by other empirical evidence, we discuss possible sources of this discrepancy and argue that understanding the relationship between estimates obtained from these two methods is an important puzzle for future meta-scientific research.
Article
Full-text available
We outline a Bayesian model‐averaged (BMA) meta‐analysis for standardized mean differences in order to quantify evidence for both treatment effectiveness δ and across‐study heterogeneity τ. We construct four competing models by orthogonally combining two present‐absent assumptions, one for the treatment effect and one for across‐study heterogeneity. To inform the choice of prior distributions for the model parameters, we used 50% of the Cochrane Database of Systematic Reviews to specify rival prior distributions for δ and τ. The relative predictive performance of the competing models and rival prior distributions was assessed using the remaining 50% of the Cochrane Database. On average, ℋ1r—the model that assumes the presence of a treatment effect as well as across‐study heterogeneity—outpredicted the other models, but not by a large margin. Within ℋ1r, predictive adequacy was relatively constant across the rival prior distributions. We propose specific empirical prior distributions, both for the field in general and for each of 46 specific medical subdisciplines. An example from oral health demonstrates how the proposed prior distributions can be used to conduct a BMA meta‐analysis in the open‐source software R and JASP. The preregistered analysis plan is available at https://osf.io/zs3df/.
Article
Full-text available
Publication bias threatens the validity of quantitative evidence from meta‐analyses as it results in some findings being overrepresented in meta‐analytic datasets because they are published more frequently or sooner (e.g. ‘positive’ results). Unfortunately, methods to test for the presence of publication bias, or assess its impact on meta‐analytic results, are unsuitable for datasets with high heterogeneity and non‐independence, as is common in ecology and evolutionary biology. We first review both classic and emerging publication bias tests (e.g. funnel plots, Egger's regression, cumulative meta‐analysis, fail‐safe N , trim‐and‐fill tests, p ‐curve and selection models), showing that some tests cannot handle heterogeneity, and, more importantly, none of the methods can deal with non‐independence. For each method, we estimate current usage in ecology and evolutionary biology, based on a representative sample of 102 meta‐analyses published in the last 10 years. Then, we propose a new method using multilevel meta‐regression, which can model both heterogeneity and non‐independence, by extending existing regression‐based methods (i.e. Egger's regression). We describe how our multilevel meta‐regression can test not only publication bias, but also time‐lag bias, and how it can be supplemented by residual funnel plots. Overall, we provide ecologists and evolutionary biologists with practical recommendations on which methods are appropriate to employ given independent and non‐independent effect sizes. No method is ideal, and more simulation studies are required to understand how Type 1 and Type 2 error rates are impacted by complex data structures. Still, the limitations of these methods do not justify ignoring publication bias in ecological and evolutionary meta‐analyses.
Article
Full-text available
Recent, high-profile, large-scale, preregistered failures to replicate uncover that many highly-regarded experiments are ‘false positives;’ that is, statistically significant results of underlying null effects. Large surveys of research reveal that statistical power is often low and inadequate. When the research record includes selective reporting, publication bias and/or questionable research practices, conventional meta-analyses are also likely to be falsely positive. At the core of research credibility lies the relation of statistical power to the rate of false positives. This study finds that high (> 50–60%) median retrospective power (MRP) is associated with credible meta-analysis and large-scale, preregistered, multi-lab ‘successful’ replications; that is, with replications that corroborate the effect in question. When median retrospective power is low (< 50%), positive meta-analysis findings should be interpreted with great caution or discounted altogether. This article is protected by copyright. All rights reserved.
Article
Full-text available
The purpose of this study is to show how Monte Carlo analysis of meta‐analytic estimators can be used to select estimators for specific research situations. Our analysis conducts 1620 individual experiments, where each experiment is defined by a unique combination of sample size, effect size, effect size heterogeneity, publication selection mechanism, and other research characteristics. We compare 11 estimators commonly used in medicine, psychology, and the social sciences. These are evaluated on the basis of bias, mean squared error (MSE), and coverage rates. For our experimental design, we reproduce simulation environments from four recent studies. We demonstrate that relative estimator performance differs across performance measures. Estimator performance is a complex interaction of performance indicator and aspects of the application. An estimator that may be especially good with respect to MSE may perform relatively poorly with respect to coverage rates. We also show that the size of the meta‐analyst's sample and effect heterogeneity are important determinants of relative estimator performance. We use these results to demonstrate how these observable characteristics can guide the meta‐analyst to choose the most appropriate estimator for their research circumstances.
Article
Full-text available
We propose sensitivity analyses for publication bias in meta‐analyses. We consider a publication process such that ‘statistically significant’ results are more likely to be published than negative or “non‐significant” results by an unknown ratio, η. Our proposed methods also accommodate some plausible forms of selection based on a study's standard error. Using inverse probability weighting and robust estimation that accommodates non‐normal population effects, small meta‐analyses, and clustering, we develop sensitivity analyses that enable statements such as ‘For publication bias to shift the observed point estimate to the null, “significant” results would need to be at least 30 fold more likely to be published than negative or “non‐significant” results’. Comparable statements can be made regarding shifting to a chosen non‐null value or shifting the confidence interval. To aid interpretation, we describe empirical benchmarks for plausible values of η across disciplines. We show that a worst‐case meta‐analytic point estimate for maximal publication bias under the selection model can be obtained simply by conducting a standard meta‐analysis of only the negative and ‘non‐significant’ studies; this method sometimes indicates that no amount of such publication bias could ‘explain away’ the results. We illustrate the proposed methods by using real meta‐analyses and provide an R package: PublicationBias.
Article
Full-text available
Meta‐analysis has become the conventional approach to synthesizing the results of empirical economics research. To further improve the transparency and replicability of the reported results and to raise the quality of meta‐analyses, the Meta‐Analysis of Economics Research Network has updated the reporting guidelines that were published by this Journal in 2013. Future meta‐analyses in economics will be expected to follow these updated guidelines or give valid reasons why a meta‐analysis should deviate from them.
Article
Full-text available
Many researchers rely on meta-analysis to summarize research evidence. However, there is a concern that publication bias and selective reporting may lead to biased meta-analytic effect sizes. We compare the results of meta-analyses to large-scale preregistered replications in psychology carried out at multiple laboratories. The multiple-laboratory replications provide precisely estimated effect sizes that do not suffer from publication bias or selective reporting. We searched the literature and identified 15 meta-analyses on the same topics as multiple-laboratory replications. We find that meta-analytic effect sizes are significantly different from replication effect sizes for 12 out of the 15 meta-replication pairs. These differences are systematic and, on average, meta-analytic effect sizes are almost three times as large as replication effect sizes. We also implement three methods of correcting meta-analysis for bias, but these methods do not substantively improve the meta-analytic results. Kvarven, Strømland and Johannesson compare meta-analyses to multiple-laboratory replication projects and find that meta-analyses overestimate effect sizes by a factor of almost three. Commonly used methods of adjusting for publication bias do not substantively improve results.
Article
Full-text available
As Mohnen (1996: 40) has indicated, research and development (R&D) externalities is a two-sided theoretical issue. Its ‘dark’ side concerns the under-investment problem caused by non-appropriability of R&D benefits. On the ‘bright’ side, R&D spillovers are a source of productivity gains. Both aspects have been invoked to justify public support for R&D investment directly and indirectly. To establish whether public support can be justified due to productivity gains from spillovers, we meta-analyse 983 productivity estimates for spillovers and 501 estimates for own-R&D from 60 empirical studies. Our findings indicate that the average spillover effect is: (i) positive but heterogenous and smaller than what is reported in most narrative reviews; (ii) usually smaller than that of own-R&D capital; (iii) too small to be practically significant when evidence with adequate statistical power is considered. Controlling for observable sources of heterogeneity and best-practice research, the meta-effect is insignificant in the full sample but significant and large among OECD firms/industries/countries. We discuss the implications of these findings for future research and public support for R&D investment.
Article
Full-text available
Publication bias and questionable research practices in primary research can lead to badly overestimated effects in meta-analysis. Methodologists have proposed a variety of statistical approaches to correct for such overestimation. However, it is not clear which methods work best for data typically seen in psychology. Here, we present a comprehensive simulation study in which we examined how some of the most promising meta-analytic methods perform on data that might realistically be produced by research in psychology. We simulated several levels of questionable research practices, publication bias, and heterogeneity, and used study sample sizes empirically derived from the literature. Our results clearly indicated that no single meta-analytic method consistently outperformed all the others. Therefore, we recommend that meta-analysts in psychology focus on sensitivity analyses—that is, report on a variety of methods, consider the conditions under which these methods fail (as indicated by simulation studies such as ours), and then report how conclusions might change depending on which conditions are most plausible. Moreover, given the dependence of meta-analytic methods on untestable assumptions, we strongly recommend that researchers in psychology continue their efforts to improve the primary literature and conduct large-scale, preregistered replications. We provide detailed results and simulation code at https://osf.io/rf3ys and interactive figures at http://www.shinyapps.org/apps/metaExplorer/.
Article
Full-text available
Publication bias distorts the available empirical evidence and misinforms policymaking. Evidence of publication bias is mounting in virtually all fields of empirical research. This paper proposes the Endogenous Kink (EK) meta‐regression model as a novel method of publication bias correction. The EK method fits a piecewise linear meta‐regression of the primary estimates on their standard errors, with a kink at the cutoff value of the standard error below which publication selection is unlikely. We provide a simple method of endogenously determining this cutoff value as a function of a first‐stage estimate of the true effect and an assumed threshold of statistical significance. Our Monte Carlo simulations show that EK is less biased and more efficient than other related regression‐based methods of publication bias correction in a variety of research conditions.
Article
Full-text available
We surveyed 807 researchers (494 ecologists and 313 evolutionary biologists) about their use of Questionable Research Practices (QRPs), including cherry picking statistically significant results, p hacking, and hypothesising after the results are known (HARKing). We also asked them to estimate the proportion of their colleagues that use each of these QRPs. Several of the QRPs were prevalent within the ecology and evolution research community. Across the two groups, we found 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking); 42% had collected more data after inspecting whether results were statistically significant (a form of p hacking) and 51% had reported an unexpected finding as though it had been hypothesised from the start (HARKing). Such practices have been directly implicated in the low rates of reproducible results uncovered by recent large scale replication studies in psychology and other disciplines. The rates of QRPs found in this study are comparable with the rates seen in psychology, indicating that the reproducibility problems discovered in psychology are also likely to be present in ecology and evolution.
Article
Full-text available
The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method.
Article
Loss aversion is one of the most widely used concepts in behavioral economics. We conduct a large-scale, interdisciplinary meta-analysis to systematically accumulate knowledge from numerous empirical estimates of the loss aversion coefficient reported from 1992 to 2017. We examine 607 empirical estimates of loss aversion from 150 articles in economics, psychology, neuroscience, and several other disciplines. Our analysis indicates that the mean loss aversion coefficient is 1.955 with a 95 percent probability that the true value falls in the interval [1.820, 2.102]. We record several observable characteristics of the study designs. Few characteristics are substantially correlated with differences in the mean estimates. (JEL D81, D91)
Article
Meta-analysts’ practice of transcribing and numerically combining all results in a research literature can generate uninterpretable and/or misleading conclusions. Meta-analysts should instead critically evaluate studies, draw conclusions only from those that are valid and provide readers with enough information to evaluate those conclusions.
Preprint
As traditionally conceived, publication bias arises from selection operating on a collection of individually unbiased estimates. A canonical form of such selection across studies (SAS) is the preferential publication of affirmative studies (i.e., significant, positive estimates) versus nonaffirmative studies (i.e., nonsignificant or negative estimates). However, meta-analyses can also be compromised by selection within studies (SWS), in which investigators “p-hack’’ results within their study to obtain an affirmative estimate. Published estimates can then be biased even conditional on affirmative status, compromising existing methods that only consider SAS. We propose two sensitivity analyses that accommodate joint SAS and SWS; both analyze only the published nonaffirmative estimates. First, assuming that published, hacked studies are never publish nonaffirmative estimates (e.g., their investigators p-hack until they obtain an affirmative estimate), we propose estimating the underlying meta-analytic mean by fitting “right-truncated meta-analysis’’ (RTMA) to the published nonaffirmative estimates, which are unhacked. Second, we propose conducting a standard meta-analysis of only the nonaffirmative studies (MAN); this estimate is conservative (negatively biased) under weakened assumptions, including when nonaffirmative estimates from p-hacked studies are sometimes published. We provide an R package, phacking. Our proposed methods supplement existing methods by assessing the robustness of meta-analyses to joint SAS and SWS.
Article
Nudge interventions have quickly expanded from academic studies to larger implementation in so‐called Nudge Units in governments. This provides an opportunity to compare interventions in research studies, versus at scale. We assemble a unique data set of 126 RCTs covering 23 million individuals, including all trials run by two of the largest Nudge Units in the United States. We compare these trials to a sample of nudge trials in academic journals from two recent meta‐analyses. In the Academic Journals papers, the average impact of a nudge is very large—an 8.7 percentage point take‐up effect, which is a 33.4% increase over the average control. In the Nudge Units sample, the average impact is still sizable and highly statistically significant, but smaller at 1.4 percentage points, an 8.0% increase. We document three dimensions which can account for the difference between these two estimates: (i) statistical power of the trials; (ii) characteristics of the interventions, such as topic area and behavioral channel; and (iii) selective publication. A meta‐analysis model incorporating these dimensions indicates that selective publication in the Academic Journals sample, exacerbated by low statistical power, explains about 70 percent of the difference in effect sizes between the two samples. Different nudge characteristics account for most of the residual difference.
Article
We show that the large elasticity of substitution between capital and labor estimated in the literature on average, 0.9, can be explained by three issues: publication bias, use of cross-country variation, and omission of the first-order condition for capital. The mean elasticity conditional on the absence of these issues is 0.3. To obtain this result, we collect 3,186 estimates of the elasticity reported in 121 studies, codify 71 variables that reflect the context in which researchers produce their estimates, and address model uncertainty by Bayesian and frequentist model averaging. We employ nonlinear techniques to correct for publication bias, which is responsible for at least half of the overall reduction in the mean elasticity from 0.9 to 0.3. Our findings also suggest that a failure to normalize the production function leads to a substantial upward bias in the estimated elasticity. The weight of evidence accumulated in the empirical literature emphatically rejects the Cobb-Douglas specification.
Article
A key parameter estimated by lab and field experiments in economics is the individual discount rate—and the results vary widely. We examine the extent to which this variance can be attributed to observable differences in methods, subject pools, and potential publication bias. To address the model uncertainty inherent to such an exercise we employ Bayesian and frequentist model averaging. We obtain evidence consistent with publication bias against unintuitive results. The corrected mean annual discount rate is 0.33. Our findings also suggest that discount rates are independent across domains: people tend to be less patient when health is at stake compared to money. Negative framing is associated with more patience. Finally, the results of lab and field experiments differ systematically, and it also matters whether the experiment relies on students or uses broader samples of the population.
Article
A key theoretical prediction in financial economics is that under risk neutrality and rational expectations a currency’s forward rates should form unbiased predictors of future spot rates. Yet scores of empirical studies report negative slope coefficients from regressions of spot rates on forward rates. We collect 3,643 estimates from 91 research articles and using recently developed techniques investigate the effect of publication and misspecification biases on the reported results. Correcting for these biases yields slope coefficients in the intervals (0.23,0.45) and (0.95,1.16) for the currencies of developed and emerging countries respectively, which implies that empirical evidence is in line with the theoretical prediction for emerging economies and less puzzling than commonly thought for developed economies. Our results also suggest that the coefficients are systematically influenced by the choice of data, numeraire currency, and estimation method.
Article
We investigate the prevalence and sources of reporting errors in 30,993 hypothesis tests from 370 articles in three top economics journals. We define reporting errors as inconsistencies between reported significance levels by means of eye‐catchers and calculated ‐values based on reported statistical values, such as coefficients and standard errors. While 35.8% of the articles contain at least one reporting error, only 1.3% of the investigated hypothesis tests are afflicted by reporting errors. For strong reporting errors for which either the eye‐catcher or the calculated ‐value signals statistical significance but the respective other one does not, the error rate is 0.5% for the investigated hypothesis tests corresponding to 21.6% of the articles having at least one strong reporting error. Our analysis suggests a bias in favor of errors for which eye‐catchers signal statistical significance but calculated ‐values do not. Survey responses from the respective authors, replications, and exploratory regression analyses indicate some solutions to mitigate the prevalence of reporting errors in future research.
Article
Meta‐studies are often conducted on empirical findings obtained from overlapping samples. Sample overlap is common in research fields that strongly rely on aggregated observational data (eg, economics and finance), where the same set of data may be used in several studies. More generally, sample overlap tends to occur whenever multiple estimates are sampled from the same study. We show analytically how failing to account for sample overlap causes high rates of false positives, especially for large meta‐sample sizes. We propose a generalized‐weights (GW) meta‐estimator, which solves the sample overlap problem by explicitly modeling the variance‐covariance matrix that describes the structure of dependence among estimates. We show how this matrix can be constructed from information that is usually available from basic sample descriptions in the primary studies (ie, sample sizes and number of overlapping observations). The GW meta‐estimator amounts to weighting each empirical outcome according to its share of independent sampling information. We use Monte Carlo simulations to (a) demonstrate how the GW meta‐estimator brings the rate of false positives to its nominal level, and (b) quantify the efficiency gains of the GW meta‐estimator relative to standard meta‐estimators. The GW meta‐estimator is fairly straightforward to implement and can solve any case of sample overlap, within or between studies. Highlights Meta‐analyses are often conducted on empirical outcomes based on samples containing common observations. Sample overlap induces a correlation structure among empirical outcomes that harms the statistical properties of meta‐analysis methods. We derive the analytic conditions under which sample overlap causes conventional meta‐estimators to exhibit high rates of false positives. We propose a generalized‐weights (GW) solution to sample overlap, which involves approximating the variance‐covariance matrix that describes the correlation structure between outcomes; we show how to construct this matrix from information typically reported in the primary studies. We conduct Monte Carlo simulations to quantify the efficiency gains of the proposed GW estimator and show how it brings the rate of false positives near its nominal level. Although we focus on meta‐analyses of regression coefficients, our approach can, in principle, be modified and extended to effect sizes more commonly used in other research fields, such as Cohen's d or odds ratios.
Article
The relationship between social capital and health has received extensive attention in fields such as public health, medicine, epidemiology, gerontology and other health-related disciplines. In contrast, the economics literature on this subject is relatively small. To address this research gap, we investigate the cross-disciplinary empirical literature using meta-analysis. We analyze 12,778 estimates from 470 studies. Our analysis finds that social capital is significantly related to a variety of positive health outcomes. However, the effect sizes are consistently very small. This finding is robust across different types of social capital (e.g., cognitive, structural, bonding, bridging, linking), and for many different measures of health outcomes (e.g., mortality, disease/illnesses, depression). The small effects that we estimate cast doubt on recent initiatives to promote health through social capital such as those by the WHO, the OECD, and US Healthy People 2020.
Article
When instruments are weakly correlated with endogenous regressors, conventional methods for instrumental variables (IV) estimation and inference become unreliable. A large literature in econometrics has developed procedures for detecting weak instruments and constructing robust confidence sets, but many of the results in this literature are limited to settings with independent and homoskedastic data, while data encountered in practice frequently violate these assumptions. We review the literature on weak instruments in linear IV regression with an emphasis on results for nonhomoskedastic (heteroskedastic, serially correlated, or clustered) data. To assess the practical importance of weak instruments, we also report tabulations and simulations based on a survey of papers published in the American Economic Review from 2014 to 2018 that use IV. These results suggest that weak instruments remain an important issue for empirical practice, and that there are simple steps that researchers can take to better handle weak instruments in applications.
Article
In this article, we consider inference in the linear instrumentalvariables models with one or more endogenous variables and potentially weak instruments. I developed a command, twostepweakiv, to implement the twostep identification-robust confidence sets proposed by Andrews (2018, Review of Economics and Statistics 100: 337-348) based on Wald tests and linear combination tests (Andrews, 2016, Econometrica 84: 2155-2182). Unlike popular procedures based on first-stage F statistics (Stock and Yogo, 2005, Testing for weak instruments in linear IV regression, in Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg), the two-step identificationrobust confidence sets control coverage distortion without assuming the data are homoskedastic. I demonstrate the use of twostepweakiv with an example of analyzing the effect of wages on married female labor supply. For inference on subsets of parameters, twostepweakiv also implements the refined projection method (Chaudhuri and Zivot, 2011, Journal of Econometrics 164: 239-251). I illustrate that this method is more powerful than the conventional projection method using Monte Carlo simulations.
Article
Meta-analysis is the quantitative, scientific synthesis of research results. Since the term and modern approaches to research synthesis were first introduced in the 1970s, meta-analysis has had a revolutionary effect in many scientific fields, helping to establish evidence-based practice and to resolve seemingly contradictory research outcomes. At the same time, its implementation has engendered criticism and controversy, in some cases general and others specific to particular disciplines. Here we take the opportunity provided by the recent fortieth anniversary of meta-analysis to reflect on the accomplishments, limitations, recent advances and directions for future developments in the field of research synthesis.
Article
Some empirical results are more likely to be published than others. Such selective publication leads to biased estimates and distorted inference. This paper proposes two approaches for identifying the conditional probability of publication as a function of a study's results, the first based on systematic replication studies and the second based on meta-studies. For known conditional publication probabilities, we propose median-unbiased estimators and associated confidence sets that correct for selective publication. We apply our methods to recent large-scale replication studies in experimental economics and psychology, and to meta-studies of the effects of minimum wages and de-worming programs.
Article
We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.
Article
In empirical work in economics it is common to report standard errors that account for clustering of units. Typically, the motivation given for the clustering adjustments is that unobserved components in outcomes for units within clusters are correlated. However, because correlation may occur across more than one dimension, this motivation makes it difficult to justify why researchers use clustering in some dimensions, such as geographic, but not others, such as age cohorts or gender. It also makes it difficult to explain why one should not cluster with data from a randomized experiment. In this paper, we argue that clustering is in essence a design problem, either a sampling design or an experimental design issue. It is a sampling design issue if sampling follows a two stage process where in the first stage, a subset of clusters were sampled randomly from a population of clusters, while in the second stage, units were sampled randomly from the sampled clusters. In this case the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample. Clustering is an experimental design issue if the assignment is correlated within the clusters. We take the view that this second perspective best fits the typical setting in economics where clustering adjustments are used. This perspective allows us to shed new light on three questions: (i) when should one adjust the standard errors for clustering, (ii) when is the conventional adjustment for clustering appropriate, and (iii) when does the conventional adjustment of the standard errors matter.
Article
Meta-regression models were originally developed for the synthesis of experimental research where randomization ensures unbiased and consistent estimation of the effect of interest. Most economics research is, however, observational and specification searches may often result in estimates that are biased and inconsistent, for example, due to omitted-variable biases. We show that if the authors of primary studies search for statistically significant estimates in observational research, meta-regression models tend to make false-positive findings of genuine empirical effects. More research is needed to better understand how meta-regression models need to be specified to help identifying genuine empirical effects in observational research.
Article
We introduce the class of conditional linear combination tests, which reject null hypotheses concerning model parameters when a data-dependent convex combination of two identification-robust statistics is large. These tests control size under weak identification and have a number of optimality properties in a conditional problem. We show that the conditional likelihood ratio test of Moreira, 2003 is a conditional linear combination test in models with one endogenous regressor, and that the class of conditional linear combination tests is equivalent to a class of quasi-conditional likelihood ratio tests. We suggest using minimax regret conditional linear combination tests and propose a computationally tractable class of tests that plug in an estimator for a nuisance parameter. These plug-in tests perform well in simulation and have optimal power in many strongly identified models, thus allowing powerful identification-robust inference in a wide range of linear and nonlinear models without sacrificing efficiency if identification is strong.