ArticlePDF Available

Abstract and Figures

This article describes the new meta-analysis command metaan, which can be used to perform fixed- or random-effects meta-analysis. Besides the stan- dard DerSimonian and Laird approach, metaan offers a wide choice of available models: maximum likelihood, profile likelihood, restricted maximum likelihood, and a permutation model. The command reports a variety of heterogeneity mea- sures, including Cochran’s Q, I2, HM2 , and the between-studies variance estimate τb2. A forest plot and a graph of the maximum likelihood function can also be generated.
Content may be subject to copyright.
The Stata Journal (yyyy) vv, Number ii, pp. 111
metaan: random effects meta-analysis
Evangelos Kontopantelis
National Primary Care
Research & Development Centre
University of Manchester
Manchester, UK
e.kontopantelis@manchester.ac.uk
David Reeves
Health Sciences Primary Care
Research Group
University of Manchester
Manchester, UK
david.reeves@manchester.ac.uk
Abstract. This article describes a new meta-analysis command, metaan, which can
be used to perform fixed- or random-effects meta-analysis, offering a wide choice
of available models: maximum likelihood, profile likelihood, restricted maximum
likelihood and a permutation method, besides the standard DerSimonian and Laird
approach. The command reports a variety of heterogeneity measures including
Cochran’s Q, I
2
, H
2
M
and the between-study variance estimate ˆτ
2
. A forest plot
and a graph of the maximum likelihood function can also be generated.
Keywords: st0001, metaan, meta-analysis, random-effect(s), effect size(s), maxi-
mum likelihood, profile likelihood, restricted maximum likelihood, REML, permu-
tation(s) method, forest plot
1 Introduction
Meta-analysis is a statistical methodology that combines or integrates the results of
several independent clinical trials, or studies in general, considered by the analyst to
be ‘combinable’ (Huque 1988). Usually, this is a two-stage process: in the first stage
the appropriate summary statistic for each study is estimated, then at the second stage
these are combined into a weighted average. Methods also exist for combining and
meta-analysing data across studies at the individual patient level (IPD methods). An
IPD analysis provides advantages such as standardization (of marker values, outcome
definitions etc), follow-up information updating, detailed data-checking, subgroup anal-
yses and the ability to include participant-level covariates (Stewart and Clarke 1995;
Lambert et al. 2002). However, individual observations are rarely available; addition-
ally, if the main interest is in mean effects then the two-stage and the IPD approaches
can provide equivalent results (Olkin and Sampson 1998).
This paper concerns itself with the second stage of the two-stage approach to meta-
analysis. At this stage, researchers can select between two main approaches, the fixed-
or the random-effects model, in their effort to combine the study-level summary esti-
mates and calculate an overall average effect. The fixed-effect model is simpler and
assumes the true effect to be the same (homogeneous) across studies. However, homo-
geneity has been found to be the exception rather than the rule and some degree of
true effect variability between studies is to be expected (Thompson and Pocock 1991).
This between-study heterogeneity stems from differences in populations, interventions,
outcomes or follow-up times (clinical heterogeneity), or differences in trial design and
c
yyyy StataCorp LP st0001
2 metaan
quality (methodological heterogeneity) (Higgins and Green 2008; Thompson 1994). The
most common approach to modelling the between study variance is the method proposed
by DerSimonian and Laird (1986), which is widely used in generic and specialist meta-
analysis statistical packages alike. In Stata the DerSimonian and Laird (DL) model is
used in the most popular meta-analysis commands, the recently updated metan and
the older but still useful meta (Harris et al. 2008). However, the between-study vari-
ance component can be estimated using more advanced iterative (and computationally
expensive) techniques: maximum likelihood, profile likelihood and restricted maximum
likelihood(Hardy and Thompson 1996; Thompson and Sharp 1999). Alternatively, the
estimate can be obtained using non-parametric approaches, such as the ‘permutations’
method proposed by Follmann and Proschan (1999).
We have implemented these methods in metaan, which performs the second stage
of a two-stage meta-analysis, offering alternatives to the DerSimonian-Laird random-
effects model. The command requires the study effect estimates and standard errors as
input. We have also created metaeff - not discussed in the present paper - a command
which provides support in the first stage of the two-stage process and which compliments
metaan. The metaeff command calculates the effect size (standardised mean difference)
and its standard error from the input parameters supplied by the user, for each study,
using one of the methods described in the Cochrane Handbook for Systematic Reviews of
Interventions (Higgins and Green 2006). For more details type ssc describe metaeff
in Stata, or see Kontopantelis and Reeves (2009).
The metaan command does not offer the plethora of options metan does for inputting
various types of binary or continuous data. Other useful features in metan (and not
available in metaan) include: stratified meta-analysis, user-input study weights, vaccine
efficacy calculations, Mantel-Haenszel fixed-effect method, L’Abbe and funnel plots.
The REML model, assumed to be the best method to fit a random-effects meta-analysis
model even though this assumption has not been thoroughly investigated (Thompson
and Sharp 1999), has recently been coded in the updated meta-regression command
metareg (Harbord and Higgins 2008) and the new multivariate random-effects meta-
analysis command mvmeta (White 2009). However, the output and options provided by
metaan can be more useful in the univariate meta-analysis context.
2 The metaan command
2.1 Syntax
metaan varname1 varname2
if
in
, fe dl ml reml pl pe varc
label(varname) forest forestw(#) plplot(string )
where
varname1 the study effect sizes.
varname2 the study effect variation, with standard error used as default.
E. Kontopantelis and D. Reeves 3
2.2 Options
fe Fixed-effect (FE) model that assumes there is no heterogeneity between the studies.
The model assumes that within-study variances may differ, but that there is homo-
geneity of effect size across studies. Often the homogeneity assumption is unlikely
and variation in the true effect across studies is to be expected. Therefore, caution
is required when using this model. Reported heterogeneity measures are estimated
using the dl model.
dl DerSimonian-Laird (DL), the most commonly used random-effects model. Models
heterogeneity between the studies i.e. assumes that the true effect can be differ-
ent for each study. The method assumes that the individual study true effects are
distributed with a variance τ
2
, around an ‘overall’ true effect, but makes no as-
sumptions about the form of the distribution of either the within- or between-study
effects. Reported heterogeneity measures are estimated using the dl model.
ml Maximum-likelihood (ML) random-effects model. Makes the additional assump-
tion (necessary to derive the log-likelihood function, and also true for reml and pl
below) that both the within-study and between-study effects have Normal distribu-
tions. The log-likelihood function is solved iteratively to produce an estimate of the
between-study variance. However, the method does not always converge while in
some cases the between-study variance estimate is negative and set to zero (in which
case the model is reduced to the fe model). Estimates are reported as missing in
the event of non-convergence. Reported heterogeneity measures are estimated using
the ml model.
reml Restricted maximum-likelihood (REML) random-effects model. Similar method
to ml and using the same assumptions. The log-likelihood function is maximized
iteratively to provide estimates as in ml. However, under reml only the part of
the likelihood function which is location invariant is maximized (i.e. maximizing
the portion of the likelihood that does not involve µ, if estimating τ
2
, and vice
versa). The method does not always converge while in some cases the between-study
variance estimate is negative and set to zero (in which case the model is reduced to
the fe model). Estimates are reported as missing in the event of non-convergence.
Reported heterogeneity measures are estimated using the reml model.
pl Profile-likelihood (PL) random-effects model. Profile likelihood uses the same like-
lihood function as ml, but takes into account the uncertainty associated with the
between-study variance estimate when calculating an overall effect, by using nested
iterations to converge to a maximum. The confidence intervals provided by the
method are asymmetric and hence so is the diamond in the forest plot. However,
the method does not always converge. Values that were not computed are reported
as missing. Reported heterogeneity measures are estimated using the ml model,
since ˆµ and ˆτ
2
, the effect and between-study variance estimates, are the same (only
their confidence intervals are re-estimated). The method also provides a confidence
interval for the between-study variance estimate.
pe Permutations (PE) random-effects model. A non-parametric random-effects method
4 metaan
which utilises dl and does not assume a normal distribution for the random effects.
The confidence interval provided by the method is asymmetric and hence so is the
diamond in the forest plot. Reported heterogeneity measures are estimated using
the dl model.
varc Informs the program that the study effect variation variable varname2 holds vari-
ance values. If this option is omitted the program assumes the variable contains
standard error values (the default).
label(varname) Selects labels for the studies. Up to two variables can be selected and
converted to strings. If two variables are selected they will be separated by a comma.
Usually, the author names and the year of study are selected as labels. The final
string is truncated to 20 characters.
forest Requests a forest plot. The weights from the specified analysis are used for
plotting symbol sizes (PE uses DL weights).
forestw(#) Requests a forest plot with adjusted weight ratios for better display. The
value can be in the [1,50] range. For example if the largest to smallest weight ratio
is 60 and the graph looks awkward the user can use this command to improve the
appearance, by requesting the weight to be rescaled to a largest/smallest weight
ratio of 30. It should be noted that only the weight squares in the plot are affected
and not the model. The confidence intervals in the plot are unaffected.
plplot(string) Requests a plot of the likelihood function for the average effect or
between-study variance estimate of the ml, pl or reml models. Option plplot(mu)
fixes the average effect parameter to its model estimate, in the likelihood function,
and creates a two way plot of τ
2
vs the likelihood function. Option plplot(tsq)
fixes the between-study variance to its model estimate, in the likelihood function,
and creates a two way plot of µ vs the likelihood function.
2.3 Saved results
metaan saves the following scalar results (some varying by selected method) in r():
All methods
r(Hsq) Heterogeneity measure H
2
M
r(Isq) Heterogeneity measure I
2
r(Q) Cochran’s Q value r(Qpval) p-value for Cochran’s Q
r(df) Degrees of freedom
r(effvar) effect variance r(eff) effect size
r(efflo) effect size, lower 95% CI r(effup) effect size, upper 95% CI
fe, dl methods
r(tausq dl) ˆτ
2
, from the DL method
ml method
r(tausq dl) ˆτ
2
, from the DL method r(tausq ml) ˆτ
2
, from the ML method
r(conv ml) ML convergence information
reml method
r(tausq dl) ˆτ
2
, from the DL method r(tausq reml) ˆτ
2
, from the REML method
r(conv reml) REML convergence information
E. Kontopantelis and D. Reeves 5
pl method
r(tausq dl) ˆτ
2
, from the DL method r(tausq pl) ˆτ
2
, from the PL method
r(tausqlo pl) ˆτ
2
(PL), lower 95% CI r(tausqup pl) ˆτ
2
(PL), upper 95% CI
r(cloeff pl) convergence information, PL
effect size (lower CI)
r(cupeff pl) convergence info, PL effect size
(upper CI)
r(ctausqlo pl) convergence information, PL
ˆτ
2
(lower CI)
r(ctausqup pl) convergence information, PL ˆτ
2
(upper CI)
r(conv ml) ML convergence information
pe method
r(tausq dl) ˆτ
2
, from the DL method r(exec pe) Information on PE execution
In each case, heterogeneity measures H
2
M
and I
2
are computed using the returned
between-variance estimate ˆτ
2
. Convergence (and PE execution) information is returned
as 1 if succesful and as 0 otherwise. r(effvar) cannot be computed for PE. r(effvar)
is the same for ML and PL, but for PL the confidence intervals are ‘amended’ to take
into account the ˆτ
2
uncertainty.
2.4 Methods
The metaan command offers six meta-analysis methods for calculating a mean effect
estimate and its confidence intervals: fixed-effect model (FE), random-effects DerSimo-
nian & Laird method (DL), maximum-likelihood random-effects model (ML), restricted
maximum-likelihood random-effects model (REML), profile-likelihood random-effects
model (PL) and permutations method utilising a DL random-effects model (PE). Mod-
els of the random-effects family take into account the identified between-study variation,
estimate it and usually produce wider confidence intervals for the overall effect than a
fixed-effect analysis. Brief descriptions of the methods have been provided in section 2.2.
In this section, we will provide a few more details and practical advice in selecting be-
tween the methods. Their complexity prohibits complete descriptions in this paper and
users wishing to look into method details are encouranged to refer to the original papers
which have described them (DerSimonian and Laird 1986; Hardy and Thompson 1996;
Follmann and Proschan 1999; Brockwell and Gordon 2001).
The three maximum likelihood methods are iterative and usually computationally
expensive. ML and PL derive the µ (overall effect) and τ
2
estimates by maximizing the
log-likelihood function in (1), under different conditions. REML estimates τ
2
and µ by
maximizing the restricted log-likelihood function in (2).
log L(µ, τ
2
) =
1
2
"
k
X
i=1
log(2π(ˆσ
2
i
+ τ
2
)) +
k
X
i=1
( ˆy
i
µ)
2
ˆσ
2
i
+ τ
2
#
, µ < & τ
2
0 (1)
log L
0
(µ, τ
2
) =
1
2
"
k
X
i=1
log(2π(ˆσ
2
i
+ τ
2
)) +
k
X
i=1
( ˆy
i
ˆµ)
2
ˆσ
2
i
+ τ
2
#
1
2
log
k
X
i=1
1
ˆσ
2
i
+ τ
2
, ˆµ < & τ
2
0 (2)
6 metaan
where k is the number of studies to be meta-analysed, ˆy
i
and ˆσ
2
i
are the effect and
variance estimates for study i and ˆµ is the overall effect estimate.
Maximum likelihood follows the simplest approach, maximizing (1) in a single itera-
tion loop. A criticism of ML is that it takes no account of the loss in degrees of freedom
that results from estimating the overall effect. Restricted Maximum Likelihood derives
the likelihood function in a way that adjusts for this and removes downward bias in
the between-studies variance estimator. A useful description for REML, in the meta-
analysis context, has been provided by Normand (1999). Profile likelihood uses the
same likelihood function as ML, but takes into account the uncertainty associated with
the between-study variance estimate when calculating an overall effect, through the use
of use nested iterations to converge to a maximum. By incorporating this extra factor
of uncertainty, PL yields confidence intervals that are usually wider than for DL and
also asymmetric. PL has been shown to outperform DL in various scenarios (Brockwell
and Gordon 2001).
The PE method (Follmann and Proschan 1999) can be described as follows: First, in
line with a Null hypothesis that all true study effects are zero and observed effects are due
to random variation, a dataset of all possible combinations of observed study outcomes
is created by permuting the sign of each observed effect. Next, the dl method is used
to compute an overall effect for each combination. Finally, the resulting distribution of
overall effect sizes is used to derive a confidence interval for the observed overall effect.
Method performance is known to be affected by three factors: the number of studies
in the meta-analysis, the degree of heterogeneity in true effects and - provided there is
heterogeneity present - the distribution of the true effects (Brockwell and Gordon 2001).
Heterogeneity is a major problem researchers have to face, when combining study re-
sults in a meta-analysis, which is attributed to clinical and/or methodological diver-
sity (Higgins and Green 2006). The variability that arises from different interventions,
populations, outcomes or follow-up times is described by clinical heterogeneity, while
differences in trial design and quality are accounted for by methodological heterogene-
ity (Thompson 1994). Traditionally, heterogeneity is tested with Cochran’s Q which
provides a p-value for the test of homogeneity, when compared with a χ
2
k1
distribution
(Brockwell and Gordon 2001) (where k is the number of studies). However the test is
known to be poor at detecting heterogeneity since its power is low when the number of
studies is small (Hardy and Thompson 1998). An alternative measure is I
2
, which is
thought to be more informative in assessing inconsistency between studies, with values
of 25%, 50% and 75% corresponding to low, moderate and high heterogeneity respec-
tively (Higgins et al. 2003). Another measure is H
2
M
, the measure least affected by the
value of k, taking values in the [0, +) range with 0 indicating perfect homogeneity
(Mittlbock and Heinzl 2006). Obviously, the between-study variance estimate ˆτ
2
can
also be informative about the presence or not of heterogeneity.
The test for heterogeneity is often used as the basis for applying a fixed-effect or
a random-effects model. However, the often low power of the Q test makes it unwise
to base a decision on the result of the test alone. Research studies, even on the same
topic, can vary on a large number of factors, hence homogeneity is often an unlikely
E. Kontopantelis and D. Reeves 7
assumption and some degree of variability between studies is to be expected (Thompson
and Pocock 1991). Some authors recommend the adoption of a random-effects model,
unless there are compelling reasons for doing otherwise, irrespective of the outcome of
the test for heterogeneity (Brockwell and Gordon 2001).
However, even though random-effects methods model heterogeneity, the performance
of the maximum likelihood methods (ML, REML and PL) in situations where the true
effects violate the assumptions of a Normal distribution may not be optimal (Brockwell
and Gordon 2001; Hardy and Thompson 1998; Bohning et al. 2002; Sidik and Jonkman
2007). The number of studies in the analysis is also an issue, since most meta-analysis
methods (including DL, ML, REML, PL, but not PE) are only asymptotically correct:
i.e. they provide the theoretical 95% coverage only as the number of studies increases
(approaches infinity). Method performance is therefore affected when the number of
studies is small, but the extent depends on the method (some are more susceptible),
along with the degree of heterogeneity and the distribution of the true effects (Brockwell
and Gordon 2001).
2.5 Example
As an example, we apply the metaan command to health risk outcome data from seven
studies. The information was collected for an unpublished meta-analysis and the data
is available from the authors. Using describe and list commands we provide details
of the dataset and proceed to perform a univariate meta-analysis with metaan.
. use metaan_example.dta,
. describe
Contains data from metaan_example.dta
obs: 7
vars: 4 19 Apr 2010 12:19
size: 532 (99.9% of memory free)
storage display value
variable name type format label variable label
study str16 %16s First author and year
outcome str48 %35s Outcome description
effsize float %9.0g effect sizes
se float %9.0g SE of the effect sizes
Sorted by: study outcome
. list study outcome effsize se, noobs clean
study outcome effsize se
Bakx A, 1985 Serum cholesterol (mmol/L) -.3041526 .0958199
Campbell A, 1998 Diet .2124063 .0812414
Cupples, 1994 BMI .0444239 .090661
Eckerlund SBP -.3991309 .12079
Moher, 2001 Cholesterol (mmol/l) -.9374746 .0691572
Woolard A, 1995 Alcohol intake (g/week) -.3098185 .206331
Woolard B, 1995 Alcohol intake (g/week) -.4898825 .2001602
8 metaan
. metaan effsize se, pl label(study) forest
Profile Likelihood method selected
Study Effect [95% Conf. Interval] % Weight
Bakx A, 1985 -0.304 -0.492 -0.116 15.09
Campbell A, 1998 0.212 0.053 0.372 15.40
Cupples, 1994 0.044 -0.133 0.222 15.20
Eckerlund -0.399 -0.636 -0.162 14.49
Moher, 2001 -0.937 -1.073 -0.802 15.62
Woolard A, 1995 -0.310 -0.714 0.095 12.01
Woolard B, 1995 -0.490 -0.882 -0.098 12.19
Overall effect (pl) -0.308 -0.622 0.004 100.00
ML method succesfully converged
PL method succesfully converged for both upper and lower CI limits
Heterogeneity Measures
value df p-value
Cochrane Q 139.81 6 0.000
I^2 (%) 91.96
H^2 11.44
value [95% Conf. Interval]
tau^2 est 0.121 0.000 0.449
Estimate obtained with Maximum likelihood - Profile likelihood provides the CI
PL method succesfully converged for both upper and lower CI limits of the tau
> estimate
The PL method used in the example converged successfuly, as did ML whose convergence
is a prerequisite. The overall effect is not found to be significant at the 95% level
and there is considerable heterogeneity across studies, according to the measures. The
method also displays a 95% confidence interval for the between-study variance estimate
ˆτ
2
(provided convergence is achieved, as is the case in this example). The forest plot
created by the command is displayed in Figure 1.
(Continued on next page)
E. Kontopantelis and D. Reeves 9
Overall effect (pl)
Woolard B, 1995
Woolard A, 1995
Moher, 2001
Eckerlund
Cupples, 1994
Campbell A, 1998
Bakx A, 1985
Studies
−1 −.5 0 .5
Effect sizes and CIs
Original weights (squares) displayed. Largest to smallest ratio: 1.30
Figure 1: Forest plot displaying profile-likelihood meta-analysis.
Re-executing the analysis with the plplot(mu) and plplot(tsq) options we obtain
the log-likelihood function plots (Figures 2 & 3).
−10
−8
−6
−4
−2
log−likelihood
0 .05 .1 .15 .2
tau² values
for mu fixed to the ML/PL estimate
Likelihood plot
Figure 2: Log-likelihood function plot, µ fixed to the model estimate.
10 metaan
−25
−20
−15
−10
−5
0
log−likelihood
−1.5 −1 −.5 0 .5
mu values
for tau² fixed to the ML/PL estimate
Likelihood plot
Figure 3: Log-likelihood function plot, τ
2
fixed to the model estimate.
3 Discussion
The metaan command can be a useful meta-analysis tool which includes newer and, in
certain circumstances, better performing methods than the standard Dersimonian-Laird
random-effects model. Unpublished results exploring method performance in various
scenarios are available from the authors. Future work will involve implementing more
methods in the metaan command and embellishing the forest plot.
4 Acknowledgments
We would like to thank the authors of meta and metan for all their work and the
anonymous reviewer whose useful comments improved the paper considerably.
5 References
Bohning, D., U. Malzahn, E. Dietz, P. Schlattmann, C. Viwatwongkasem, and A. Big-
geri. 2002. Some General Points in Estimating Heterogeneity Variance with the
DerSimonian-Laird Estimator. Biostatistics 3(4): 445–457.
Brockwell, S. E., and I. R. Gordon. 2001. A Comparison of Statistical Methods for
Meta-Analysis. Statistics in Medicine 20(6): 825–840.
DerSimonian, R., and N. Laird. 1986. Meta-Analysis in Clinical Trials. Controled
E. Kontopantelis and D. Reeves 11
Clinical Trials 7(3): 177–188.
Follmann, D. A., and M. A. Proschan. 1999. Valid Inference in Random Effects Meta-
Analysis. Biometrics 55(3): 732–737.
Harbord, R. M., and J. P. T. Higgins. 2008. Meta-regression in STATA. Stata Journal
8(4): 493–519.
Hardy, R. J., and S. G. Thompson. 1996. A Likelihood Approach to Meta-Analysis with
Random Effects. Statistics in Medicine 15(6): 619–629.
———. 1998. Detecting and Describing Heterogeneity in Meta-Analysis. Statistics in
Medicine 17(8): 841–856.
Harris, R., M. Bradburn, J. Deeks, R. Harbord, D. Altman, and J. Sterne. 2008. Metan:
Fixed- and Random-Effects Meta-Analysis. Stata Journal 8(1): 3–28.
Higgins, J. P., and S. Green. 2006. Cochrane Hand-
book for Systematic Reviews of Interventions: Version 4.2.6.
http://www.cochrane.org/resources/handbook/Handbook4.2.6Sep2006.pdf.
———. 2008. Cochrane Handbo ok for Systematic Reviews of Interventions: Version
5.0.1. http://www.cochrane-handbook.org/.
Higgins, J. P., S. G. Thompson, J. J. Deeks, and D. G. Altman. 2003. Measuring
Inconsistency in Meta-Analyses. British Medical Journal 327(7414): 557–560.
Huque, M. F. 1988. Experiences with Meta-Analysis in NDA Submissions. Proceedings
of the Biopharmaceutical Section of the American Statistical Association 2: 28–33.
Kontopantelis, E., and D. Reeves. 2009. MetaEasy: A Meta-Analysis Add-In for Mi-
crosoft Excel. Journal of Statistical Software 30(7): 1–25.
Lambert, P. C., A. J. Sutton, K. R. Abrams, and D. R. Jones. 2002. A comparison
of summary patient-level covariates in meta-regression with individual patient data
meta-analysis. J Clin Epidemiol 55(1): 86–94.
Mittlbock, M., and H. Heinzl. 2006. A Simulation Study Comparing Properties of
Heterogeneity Measures in Meta-Analyses. Statistics in Medicine 25(24): 4321–4333.
Normand, S. T. 1999. Tutorial in biostatistics. Meta-analysis: formulating, evaluating,
combining, and reporting. Stat Med 18: 321–359.
Olkin, I., and A. Sampson. 1998. Comparison of meta-analysis versus analysis of variance
of individual patient data. Biometrics 54(1): 317–322.
Sidik, K., and J. N. Jonkman. 2007. A comparison of heterogeneity vari-
ance estimators in combining results of studies. Stat Med 26(9): 1964–1981.
http://dx.doi.org/10.1002/sim.2688.
12 metaan
Stewart, L. A., and M. J. Clarke. 1995. Practical methodology of meta-analyses
(overviews) using updated individual patient data. Cochrane Working Group. Stat
Med 14(19): 2057–2079.
Thompson, S. G. 1994. Why Sources of Heterogeneity in Meta-Analysis Should be
Investigated. British Medical Journal 309(6965): 1351–1355.
Thompson, S. G., and S. J. Pocock. 1991. Can Meta-Analyses be Trusted? The Lancet
338(8775): 1127–1130.
Thompson, S. G., and S. J. Sharp. 1999. Explaining heterogeneity in meta-analysis: a
comparison of methods. Stat Med 18(20): 2693–2708.
White, I. R. 2009. Multivariate random-effects meta-analysis. Stata Journal 9(1): 40–
56.
About the authors
Evangelos (Evan) Kontopantelis is a research fellow in statistics at the National Primary Care
Research and Development Centre, University of Manchester, England. His research interests
include statistical methods in health sciences with a focus on meta-analysis, longitudinal data
modeling and large clinical database management.
David Reeves is a senior research fellow in statistics at the Health Sciences Primary Care
Research Group, University of Manchester, England. David has worked as a statistician in
health services research for nearly three decades, mainly in the fields of learning disability
and primary care. His methodological research interests include the robustness of statistical
methods, the analysis of observational studies, and applications of social network analysis
methods to health systems.
... Where possible, we generated pool effect estimates from direct comparisons of each treatment pair using a randomeffect REML model. 11 We assessed heterogeneity using I 2 statistics and explored potentials for risk of publication bias using a funnel plot. We then performed a network meta-analysis within a frequentist framework fitting multivariate meta-analysis models with random effect using the network package in STATA 12,13 exploiting the direct and indirect randomised evidence to determine the relative effects and ranking. ...
Article
Full-text available
Background Tubal ectopic pregnancy (TEP) is a common gynaecological emergency. Several medical and surgical treatment options exist, but it is not clear which is the safest and most effective treatment. Objectives To compare the effectiveness of expectant, medical and surgical treatment options for TEP using a systematic review and network meta‐analysis. Search Strategy MEDLINE, EMBASE, and CENTRAL from inception till September 2022. Selection Criteria Randomised trials that evaluated any treatment option for woman with a TEP. Data Collection and Analysis We performed pairwise and network meta‐analyses using a random effect model. We assessed the studies' risk of bias, heterogeneity and network inconsistency. We reported primarily on TEP resolution and treatment failure using relative risk (RR) and 95% confidence‐intervals (CI). Main Results We included 31 randomised trials evaluating ten treatments (n = 2938 women). Direct meta‐analysis showed no significant benefit for using methotrexate compared to expectant management for TEP resolution. Network meta‐analysis showed similar effect‐size for most conservative treatment options compared to expectant management for TEP resolution (glucose intra‐sac instillation vs. expectant RR 0.84, 95% CI 0.63–1.12; methotrexate intra‐sac instillation vs. expectant RR 0.91, 95% CI 0.75–1.10; multi‐dose methotrexate vs. expectant RR 1.00, 95% CI 0.88–1.15; prostaglandin intra‐sac instillation vs. expectant RR 0.75, 95% CI 0.53–1.07; salpingotomy vs. expectant RR 0.99, 95% CI 0.84–1.16; single dose methotrexate vs. expectant RR 0.97, 95% CI 0.85–1.10; single dose methotrexate + mifepristone vs. expectant RR 1.09, 95% CI 0.89–1.33). All treatment options showed a higher risk of failure compared to salpingectomy. Conclusions There is insufficient evidence to support the use of any medical treatment option for TEP over expectant management.
... We then calculated the Hedges's g between treatment and controls to indicate the effect size for each reading comprehension outcome and used all eligible effect sizes in each study. However, because currently, the BNMA method cannot handle dependency effects (only allows one effect size for group comparison [strategy vs. control or strategy vs. strategy] in one study), we synthesized multiple effect sizes within each group comparison into one effect size for either a specific strategy vs. control comparison or specific strategy vs. strategy comparison based on fixed effects within each study using metaan in Stata (Kontopantelis & Reeves, 2010). ...
Article
Full-text available
Based on 52 studies with samples mostly from English-speaking countries, the current study used Bayesian network meta-analysis to investigate the intervention effectiveness of different reading comprehension strategy combinations on reading comprehension among students with reading difficulties in 3rd through 12th grade. We focused on commonly researched strategies: Main idea, inference, text structure, retell, prediction, self-monitoring, and graphic organizers. Results showed 1) instruction of more strategies did not necessarily have stronger effects on reading comprehension, 2) there was no single reading comprehension strategy that produced the strongest effect, 3) main idea, text structure, and retell, taught together as the primary strategies, seemed the most effective, and 4) the effects of strategies only held when background knowledge instruction was included. These findings suggest strategy instruction among students with reading difficulties follows an ingredient-interaction model. That is, no single strategy works the best. It is not “the more we teach, the better outcomes to expect”. Instead, different strategy combinations may produce different effects on reading comprehension. Main idea, text structure, and retell together may best optimize the cognitive load during reading comprehension. Background knowledge instruction should be combined with strategy instruction to facilitate knowledge retrieval as to reduce the cognitive load of using strategies.
... Where specific measures or measures of uncertainty are not reported methods described by Debray et al., will be used to estimate measures 30,31 . A random effects meta-analyses of O:E and AUC values will be conducted with REML estimation using the metaan procedure in Stata 17 (RRID:SCR_012763) (Stata Corp, College Station TX) 31,32 . This will be conducted as recommended on the log scale for O:E ratios and logit scale for AUC values 31,33 . ...
Article
Background: Hip fracture results in high mortality and, for many survivors, long-term functional limitations. Multivariable prediction models for hip fracture outcomes have the potential to aid clinical-decision making as well as risk-adjustment in national audits of care. The aim of this study is to identify, critically appraise and synthesise published multivariable prediction models for long-term outcomes after hip fracture. Protocol: The systematic review will include a literature search of electronic databases (MEDLINE, Embase, Scopus, Web of Science and CINAHL) for journal articles. Search terms related to hip fracture, prognosis and outcomes will be included. Study selection criteria includes studies of people with hip fracture where the study aimed to predict one or more long-term outcomes through derivation or validation of a multivariable prediction model. Studies will be excluded if they focus only on the predictive value of individual factors, or only include patients with periprosthetic fractures, fractures managed non-surgically or younger patients. Covidence software will be used for data management. Two review authors will independently conduct study selection, data extraction and appraisal. Data will be extracted based on the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist. Risk of bias assessment will be conducted using the Prediction model Risk of Bias Assessment Tool (PROBAST). Characteristics and results of all studies will be narratively synthesised and presented in tables. Where the same model has been validated in multiple studies, a meta-analysis of discrimination and calibration will be conducted. Conclusions: This systematic review will aim to identify multivariable models for hip fracture outcome prognosis that have been derived using high quality methods. Results will highlight if current models have the potential for further assessment for use in both clinical decision making and improving methods of national hip fracture audits. PROSPERO registration: CRD42022330019 (25 th May 2022).
... Therefore, we used random-effects models to combine the effect sizes. This method not only accounted for sampling errors, but also the withinstudy and between-study variance in the three-level metaanalyses; moreover, it was stricter than the fixed-effects model (Assink & Wibbelink, 2016;Brockwell & Gordon, 2001;Kontopantelis & Reeves, 2010). We constructed a forest plot and computed heterogeneity statistics (i.e., the Q test and I 2 ) to examine the amount of variance across studies for all analyses. ...
Article
Family function reflects the operating status of the family system, which plays a vital role in children’s mental health. The current meta-analysis examined the association between family function and post-traumatic stress disorder (PTSD) in children and adolescents for the first time. Studies published from 1980 to 2021 were identified via searching and screening. We identified 31 studies (91 unique effects) with 8,684 children. A three-level meta-analysis revealed that overall family function was negatively associated with PTSD (r = −0.205). Among elements of family function, family affect (r = −0.251), communication (r = −0.221), and cohesion (r = −0.184) were associated with less PTSD, whereas family conflict (r = 0.228) was associated with more PTSD in children. Family flexibility (r = −0.103) was not associated with PTSD. Moderator analyses revealed differences between various types of trauma events and family function scales. The findings highlight the differences in the roles of the elements of family function and suggest that interventions should be focused on targeting specific elements of family function.
Chapter
The volume and complexity of biological and biomedical research continues to grow exponentially with cutting-edge technologies such as high-throughput sequencing. Unfortunately, bioinformatics analysis is often considered only after data have been generated, which significantly limits the ability to make sense of complex big data. This unique book introduces the idea of No-Boundary Thinking (NBT) in biological and biomedical research, which aims to access, integrate, and synthesize data, information, and knowledge from bioinformatics to define important problems and articulate impactful research questions. This interdisciplinary volume brings together a team of bioinformatics specialists who draw on their own experiences with NBT to illustrate the importance of collaborative science. It will help stimulate discussion and application of NBT, and will appeal to all biomedical researchers looking to maximize their use of bioinformatics for making scientific discoveries.
Article
Background: Complete resolution of hypertension (CRH) after adrenalectomy for primary aldosteronism is far from a certainty. Although several prognostic models have been proposed to predict outcome after adrenalectomy, studies have not clarified which of the available models can be used reliably in clinical practice. Objectives: To identify, describe and appraise all prognostic models developed to predict CRH, and meta-analyse their predictive performances. Methods: We searched MEDLINE, Embase, and Web of Science for development and validation studies of prognostic models. After selection, we extracted descriptive statistics and aggregated area under the receiver operator curve (AUC) using meta-analysis. Results: From 25 eligible studies, we identified 12 prognostic models used for predicting CRH after total adrenalectomy in primary aldosteronism. We report the results for three models that had available data from at least three external validation studies: the Primary Aldosteronism Surgical Outcome (PASO) score (AUC: 0.81; 95% confidence interval [CI]: 0.74-0.86; 95% predictive interval [PI]: 0.04-1.00), Utsumi nomogram (AUC: 0.79; 95% CI: 0.72-0.85; 95% PI: 0.03-1.00), and the Aldosteronoma Resolution Score (ARS) model (AUC: 0.77; 95% CI: 0.74-0.80; 95% PI: 0.59-0.86 for all studies and AUC: 0.80; 95% CI: 0.75-0.85; 95% PI: 0.57-0.93 for the studies with the same adrenal vein sampling-guided adrenalectomy rate compared to the models meta-analyzed). Conclusions: The PASO score, Utsumi nomogram, and ARS model showed comparable discrimination performance to predict CRH in primary aldosteronism. Unlike the ARS model, the number of external validation studies for the PASO score and the Utsumi nomogram was relatively low to draw definite conclusions. This article is protected by copyright. All rights reserved.
Article
Full-text available
Background: Antipsychotic treatment resistance affects up to a third of individuals with schizophrenia, with recent research finding systematic biological differences between antipsychotic resistant and responsive patients. Our aim was to determine whether cognitive impairment at first episode significantly differs between future antipsychotic responders and resistant cases. Methods: Analysis of data from seven international cohorts of first-episode psychosis (FEP) with cognitive data at baseline (N = 683) and follow-up data on antipsychotic treatment response: 605 treatment responsive and 78 treatment resistant cases. Cognitive measures were grouped into seven cognitive domains based on the pre-existing literature. We ran multiple imputation for missing data and used logistic regression to test for associations between cognitive performance at FEP and treatment resistant status at follow-up. Results: On average patients who were future classified as treatment resistant reported poorer performance across most cognitive domains at baseline. Univariate logistic regressions showed that antipsychotic treatment resistance cases had significantly poorer IQ/general cognitive functioning at FEP (OR = 0.70, p = .003). These findings remained significant after adjusting for additional variables in multivariable analyses (OR = 0.76, p = .049). Conclusions: Although replication in larger studies is required, it appears that deficits in IQ/general cognitive functioning at first episode are associated with future treatment resistance. Cognitive variables may be able to provide further insight into neurodevelopmental factors associated with treatment resistance or act as early predictors of treatment resistance, which could allow prompt identification of refractory illness and timely interventions.
Article
In recent years, meta-analysis has evolved to a critically important field of Statistics, and has significant applications in Medicine and Health Sciences. In this work we briefly present existing methodologies to conduct meta-analysis along with any discussion and recent developments accompanying them. Undoubtedly, studies brought together in a systematic review will differ in one way or another. This yields a considerable amount of variability, any kind of which may be termed heterogeneity. To this end, reports of meta-analyses commonly present a statistical test of heterogeneity when attempting to establish whether the included studies are indeed similar in terms of the reported output or not. We intend to provide an overview of the topic, discuss the potential sources of heterogeneity commonly met in the literature and provide useful guidelines on how to address this issue and to detect heterogeneity. Moreover, we review the recent developments in the Bayesian approach along with the various graphical tools and statistical software that are currently available to the analyst. In addition, we discuss sensitivity analysis issues and other approaches of understanding the causes of heterogeneity. Finally, we explore heterogeneity in meta-analysis for time to event data in a nutshell, pointing out its unique characteristics.
Article
Full-text available
Background Antimicrobial resistance of bacterial pathogens is an increasing clinical problem and alternative approaches to antibiotic chemotherapy are needed. One of these approaches is the use of lytic bacterial viruses known as phage therapy. We aimed to assess the efficacy of phage therapy in preclinical animal models of bacterial infection. Methods In this systematic review and meta-analysis, MEDLINE/Ovid, Embase/Ovid, CINAHL/EbscoHOST, Web of Science/Wiley, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, and Google Scholar were searched from inception to Sept 30, 2021. Studies assessing phage efficacy in animal models were included. Only studies that assessed the efficacy of phage therapy in treating established bacterial infections in terms of survival and bacterial abundance or density were included. Studies reporting only in-vitro or ex-vivo results and those with incomplete information were excluded. Risk-of-bias assessment was performed using the Systematic Review Centre for Laboratory Animal Experimentation tool. The main endpoints were animal survival and tissue bacterial burden, which were reported using pooled odds ratios (ORs) and mean differences with random-effects models. The I² measure and its 95% CI were also calculated. This study is registered with PROSPERO, CRD42022311309. Findings Of the 5084 references screened, 124 studies fulfilled the selection criteria. Risk of bias was high for 70 (56%) of the 124 included studies; therefore, only studies classified as having a low-to-moderate risk of bias were considered for quantitative data synthesis (n=32). Phage therapy was associated with significantly improved survival at 24 h in systemic infection models (OR 0·08 [95% CI 0·03 to 0·20]; I²=55% [95% CI 8 to 77]), skin infection (OR 0·08 [0·04 to 0·19]; I² = 0% [0 to 79]), and pneumonia models (OR 0·13 [0·06 to 0·31]; I²=0% [0 to 68]) when compared with placebo. Animals with skin infections (mean difference –2·66 [95% CI –3·17 to –2·16]; I² = 95% [90 to 96]) and those with pneumonia (mean difference –3·35 [–6·00 to –0·69]; I² = 99% [98 to 99]) treated with phage therapy had significantly lower tissue bacterial loads at 5 ± 2 days of follow-up compared with placebo. Interpretation Phage therapy significantly improved animal survival and reduced organ bacterial loads compared with placebo in preclinical animal models. However, high heterogeneity was observed in some comparisons. More evidence is needed to identify the factors influencing phage therapy performance to improve future clinical application. Funding Swiss National Foundation and Swiss Heart Foundation.
Article
Exploring the possible reasons for heterogeneity between studies is an important aspect of conducting a meta-analysis. This paper compares a number of methods which can be used to investigate whether a particular covariate, with a value defined for each study in the meta-analysis, explains any heterogeneity. The main example is from a meta-analysis of randomized trials of serum cholesterol reduction, in which the log-odds ratio for coronary events is related to the average extent of cholesterol reduction achieved in each trial. Different forms of weighted normal errors regression and random effects logistic regression are compared. These analyses quantify the extent to which heterogeneity is explained, as well as the effect of cholesterol reduction on the risk of coronary events. In a second example, the relationship between treatment effect estimates and their precision is examined, in order to assess the evidence for publication bias. We conclude that methods which allow for an additive component of residual heterogeneity should be used. In weighted regression, a restricted maximum likelihood estimator is appropriate, although a number of other estimators are also available. Methods which use the original form of the data explicitly, for example the binomial model for observed proportions rather than assuming normality of the log-odds ratios, are now computationally feasible. Although such methods are preferable in principle, they often give similar results in practice. Copyright © 1999 John Wiley & Sons, Ltd.
Article
This article describes updates of the meta-analysis command metan and options that have been added since the command's original publication (Bradburn, Deeks, and Altman, metan – an alternative meta-analysis command, Stata Technical Bulletin Reprints, vol. 8, pp. 86–100). These include version 9 graphics with flexible display options, the ability to meta-analyze precalculated effect estimates, and the ability to analyze subgroups by using the by() option. Changes to the output, saved variables, and saved results are also described.
Article
Meta-analysis involves combining summary information from related but independent studies. The objectives of a meta-analysis include increasing power to detect an overall treatment effect, estimation of the degree of benefit associated with a particular study treatment, assessment of the amount of variability between studies, or identification of study characteristics associated with particularly effective treatments. This article presents a tutorial on meta-analysis intended for anyone with a mathematical statistics background. Search strategies and review methods of the literature are discussed. Emphasis is focused on analytic methods for estimation of the parameters of interest. Three modes of inference are discussed: maximum likelihood; restricted maximum likelihood, and Bayesian. Finally, software for performing inference using restricted maximum likelihood and fully Bayesian methods are demonstrated. Methods are illustrated using two examples: an evaluation of mortality from prophylactic use of lidocaine after a heart attack, and a comparison of length of hospital stay for stroke patients under two different management protocols. Copyright
Article
Exploring the possible reasons for heterogeneity between studies is an important aspect of conducting a meta-analysis. This paper compares a number of methods which can be used to investigate whether a particular covariate, with a value defined for each study in the meta-analysis, explains any heterogeneity. The main example is from a meta-analysis of randomized trials of serum cholesterol reduction, in which the log-odds ratio for coronary events is related to the average extent of cholesterol reduction achieved in each trial. Different forms of weighted normal errors regression and random effects logistic regression are compared. These analyses quantify the extent to which heterogeneity is explained, as well as the effect of cholesterol reduction on the risk of coronary events. In a second example, the relationship between treatment effect estimates and their precision is examined, in order to assess the evidence for publication bias. We conclude that methods which allow for an additive component of residual heterogeneity should be used. In weighted regression, a restricted maximum likelihood estimator is appropriate, although a number of other estimators are also available. Methods which use the original form of the data explicitly, for example the binomial model for observed proportions rather than assuming normality of the log-odds ratios, are now computationally feasible. Although such methods are preferable in principle, they often give similar results in practice. Copyright © 1999 John Wiley & Sons, Ltd.
Book
The Cochrane Handbook for Systematic Reviews of Interventions (the Handbook) has undergone a substantial update, and Version 5 of the Handbook is now available online at www.cochrane-handbook.org and in RevMan 5. In addition, for the first time, the Handbook will soon be available as a printed volume, published by Wiley-Blackwell. We are anticipating release of this at the Colloquium in Freiburg. Version 5 of the Handbook describes the new methods available in RevMan 5, as well as containing extensive guidance on all aspects of Cochrane review methodology. It has a new structure, with 22 chapters divided into three parts. Part 1, relevant to all reviews, introduces Cochrane reviews, covering their planning and preparation, and their maintenance and updating, and ends with a guide to the contents of a Cochrane protocol and review. Part 2, relevant to all reviews, provides general methodological guidance on preparing reviews, covering question development, eligibility criteria, searching, collecting data, within-study bias (including completion of the Risk of Bias table), analysing data, reporting bias, presenting and interpreting results (including Summary of Findings tables). Part 3 addresses special topics that will be relevant to some, but not all, reviews, including particular considerations in addressing adverse effects, meta-analysis with non-standard study designs and using individual participant data. This part has new chapters on incorporating economic evaluations, non-randomized studies, qualitative research, patient-reported outcomes in reviews, prospective meta-analysis, reviews in health promotion and public health, and the new review type of overviews of reviews.
Article
Although meta-analysis is now well established as a method of reviewing evidence, an uncritical use of the technique can be very misleading. One common problem is the failure to investigate appropriately the sources of heterogeneity, in particular the clinical differences between the studies included. This paper distinguishes between the concepts of clinical and statistical heterogeneity and exemplifies the importance of investigating heterogeneity by using published meta-analyses of epidemiological studies of serum cholesterol concentration and clinical trials of its reduction. Although not without some dangers of speculative conclusions, prompted by overzealous inspection of the data to hand, a sensible investigation of sources of heterogeneity should increase both the scientific and the clinical relevance of the results of meta-analyses.* This paper was presented at a meeting on Systematic Reviews organised jointly by the BMJ and the UK Cochrane Centre and held in London in July 1993; it is the last in this seriesThe purpose of a meta-analysis of a set of clinical trials is rather different from the specific aims of an individual trial. For example, a particular clinical trial investigating the effect of serum cholesterol reduction on the risk of ischaemic heart disease tests a particular treatment regimen, given for a specified duration to participants fulfilling certain selection criteria, using a particular definition of outcome measures. The purpose of a meta-analysis of cholesterol lowering trials is broader - that is, to estimate the extent to which serum cholesterol reduction, achieved by a variety of means, generally influences the risk of ischaemic heart disease. A meta- analysis also attempts to gain greater objectivity, generalisability, and precision by including all the available evidence from randomised trials that pertain to the issue.1 Because of the broader aims of a meta- analysis, the trials included usually encompass a substantial variety of specific …
Article
Meta-analyses using updated individual patient data may provide the most reliable means of combining data from similar randomized controlled trials. The benefits of this approach to systematic reviews are described. Guidance, based on the experience of several groups who have undertaken such projects, is given. This includes practical advice on initiating and maintaining collaboration, the time and resources required to undertake these usually international projects and methods of data checking and validation. Example proforma are included.
Article
The investigation of heterogeneity is a crucial part of any meta-analysis. While it has been stated that the test for heterogeneity has low power, this has not been well quantified. Moreover the assumptions of normality implicit in the standard methods of meta-analysis are often not scrutinized in practice. Here we simulate how the power of the test for heterogeneity depends on the number of studies included, the total information (that is total weight or inverse variance) available and the distribution of weights among the different studies. We show that the power increases with the total information available rather than simply the number of studies, and that it is substantially lowered if, as is quite common in practice, one study comprises a large proportion of the total information. We also describe normal plots that are useful in assessing whether the data conform to a fixed effect or random effects model, together with appropriate tests, and give an application to the analysis of a multi-centre trial of blood pressure reduction. We conclude that the test of heterogeneity should not be the sole determinant of model choice in meta-analysis, and inspection of relevant normal plots, as well as clinical insight, may be more relevant to both the investigation and modelling of heterogeneity. © 1998 John Wiley & Sons, Ltd.
Article
Multivariate meta-analysis combines estimates of several related parameters over several studies. These parameters can, for example, refer to multiple outcomes or comparisons between more than two groups. A new Stata command, mvmeta, performs maximum likelihood, restricted maximum likelihood, or method- of-moments estimation of random-effects multivariate meta-analysis models. A utility command, mvmeta_make, facilitates the preparation of summary datasets from more detailed data. The commands are illustrated with data from the Fibrinogen Studies Collaboration, a meta-analysis of observational studies; I estimate the shape of the association between a quantitative exposure and disease events by grouping the quantitative exposure into several categories. Copyright 2009 by StataCorp LP.