The Stata Journal (yyyy) vv, Number ii, pp. 1–11
metaan: random eﬀects meta-analysis
National Primary Care
Research & Development Centre
University of Manchester
Health Sciences Primary Care
University of Manchester
Abstract. This article describes a new meta-analysis command, metaan, which can
be used to perform ﬁxed- or random-eﬀects meta-analysis, oﬀering a wide choice
of available models: maximum likelihood, proﬁle likelihood, restricted maximum
likelihood and a permutation method, besides the standard DerSimonian and Laird
approach. The command reports a variety of heterogeneity measures including
Cochran’s Q, I
and the between-study variance estimate ˆτ
. A forest plot
and a graph of the maximum likelihood function can also be generated.
Keywords: st0001, metaan, meta-analysis, random-eﬀect(s), eﬀect size(s), maxi-
mum likelihood, proﬁle likelihood, restricted maximum likelihood, REML, permu-
tation(s) method, forest plot
Meta-analysis is a statistical methodology that combines or integrates the results of
several independent clinical trials, or studies in general, considered by the analyst to
be ‘combinable’ (Huque 1988). Usually, this is a two-stage process: in the ﬁrst stage
the appropriate summary statistic for each study is estimated, then at the second stage
these are combined into a weighted average. Methods also exist for combining and
meta-analysing data across studies at the individual patient level (IPD methods). An
IPD analysis provides advantages such as standardization (of marker values, outcome
deﬁnitions etc), follow-up information updating, detailed data-checking, subgroup anal-
yses and the ability to include participant-level covariates (Stewart and Clarke 1995;
Lambert et al. 2002). However, individual observations are rarely available; addition-
ally, if the main interest is in mean eﬀects then the two-stage and the IPD approaches
can provide equivalent results (Olkin and Sampson 1998).
This paper concerns itself with the second stage of the two-stage approach to meta-
analysis. At this stage, researchers can select between two main approaches, the ﬁxed-
or the random-eﬀects model, in their eﬀort to combine the study-level summary esti-
mates and calculate an overall average eﬀect. The ﬁxed-eﬀect model is simpler and
assumes the true eﬀect to be the same (homogeneous) across studies. However, homo-
geneity has been found to be the exception rather than the rule and some degree of
true eﬀect variability between studies is to be expected (Thompson and Pocock 1991).
This between-study heterogeneity stems from diﬀerences in populations, interventions,
outcomes or follow-up times (clinical heterogeneity), or diﬀerences in trial design and
yyyy StataCorp LP st0001
quality (methodological heterogeneity) (Higgins and Green 2008; Thompson 1994). The
most common approach to modelling the between study variance is the method proposed
by DerSimonian and Laird (1986), which is widely used in generic and specialist meta-
analysis statistical packages alike. In Stata the DerSimonian and Laird (DL) model is
used in the most popular meta-analysis commands, the recently updated metan and
the older but still useful meta (Harris et al. 2008). However, the between-study vari-
ance component can be estimated using more advanced iterative (and computationally
expensive) techniques: maximum likelihood, proﬁle likelihood and restricted maximum
likelihood(Hardy and Thompson 1996; Thompson and Sharp 1999). Alternatively, the
estimate can be obtained using non-parametric approaches, such as the ‘permutations’
method proposed by Follmann and Proschan (1999).
We have implemented these methods in metaan, which performs the second stage
of a two-stage meta-analysis, oﬀering alternatives to the DerSimonian-Laird random-
eﬀects model. The command requires the study eﬀect estimates and standard errors as
input. We have also created metaeff - not discussed in the present paper - a command
which provides support in the ﬁrst stage of the two-stage process and which compliments
metaan. The metaeff command calculates the eﬀect size (standardised mean diﬀerence)
and its standard error from the input parameters supplied by the user, for each study,
using one of the methods described in the Cochrane Handbook for Systematic Reviews of
Interventions (Higgins and Green 2006). For more details type ssc describe metaeff
in Stata, or see Kontopantelis and Reeves (2009).
The metaan command does not oﬀer the plethora of options metan does for inputting
various types of binary or continuous data. Other useful features in metan (and not
available in metaan) include: stratiﬁed meta-analysis, user-input study weights, vaccine
eﬃcacy calculations, Mantel-Haenszel ﬁxed-eﬀect method, L’Abbe and funnel plots.
The REML model, assumed to be the best method to ﬁt a random-eﬀects meta-analysis
model even though this assumption has not been thoroughly investigated (Thompson
and Sharp 1999), has recently been coded in the updated meta-regression command
metareg (Harbord and Higgins 2008) and the new multivariate random-eﬀects meta-
analysis command mvmeta (White 2009). However, the output and options provided by
metaan can be more useful in the univariate meta-analysis context.
2 The metaan command
metaan varname1 varname2
, fe dl ml reml pl pe varc
label(varname) forest forestw(#) plplot(string )
varname1 the study eﬀect sizes.
varname2 the study eﬀect variation, with standard error used as default.
E. Kontopantelis and D. Reeves 3
fe Fixed-eﬀect (FE) model that assumes there is no heterogeneity between the studies.
The model assumes that within-study variances may diﬀer, but that there is homo-
geneity of eﬀect size across studies. Often the homogeneity assumption is unlikely
and variation in the true eﬀect across studies is to be expected. Therefore, caution
is required when using this model. Reported heterogeneity measures are estimated
using the dl model.
dl DerSimonian-Laird (DL), the most commonly used random-eﬀects model. Models
heterogeneity between the studies i.e. assumes that the true eﬀect can be diﬀer-
ent for each study. The method assumes that the individual study true eﬀects are
distributed with a variance τ
, around an ‘overall’ true eﬀect, but makes no as-
sumptions about the form of the distribution of either the within- or between-study
eﬀects. Reported heterogeneity measures are estimated using the dl model.
ml Maximum-likelihood (ML) random-eﬀects model. Makes the additional assump-
tion (necessary to derive the log-likelihood function, and also true for reml and pl
below) that both the within-study and between-study eﬀects have Normal distribu-
tions. The log-likelihood function is solved iteratively to produce an estimate of the
between-study variance. However, the method does not always converge while in
some cases the between-study variance estimate is negative and set to zero (in which
case the model is reduced to the fe model). Estimates are reported as missing in
the event of non-convergence. Reported heterogeneity measures are estimated using
the ml model.
reml Restricted maximum-likelihood (REML) random-eﬀects model. Similar method
to ml and using the same assumptions. The log-likelihood function is maximized
iteratively to provide estimates as in ml. However, under reml only the part of
the likelihood function which is location invariant is maximized (i.e. maximizing
the portion of the likelihood that does not involve µ, if estimating τ
, and vice
versa). The method does not always converge while in some cases the between-study
variance estimate is negative and set to zero (in which case the model is reduced to
the fe model). Estimates are reported as missing in the event of non-convergence.
Reported heterogeneity measures are estimated using the reml model.
pl Proﬁle-likelihood (PL) random-eﬀects model. Proﬁle likelihood uses the same like-
lihood function as ml, but takes into account the uncertainty associated with the
between-study variance estimate when calculating an overall eﬀect, by using nested
iterations to converge to a maximum. The conﬁdence intervals provided by the
method are asymmetric and hence so is the diamond in the forest plot. However,
the method does not always converge. Values that were not computed are reported
as missing. Reported heterogeneity measures are estimated using the ml model,
since ˆµ and ˆτ
, the eﬀect and between-study variance estimates, are the same (only
their conﬁdence intervals are re-estimated). The method also provides a conﬁdence
interval for the between-study variance estimate.
pe Permutations (PE) random-eﬀects model. A non-parametric random-eﬀects method
which utilises dl and does not assume a normal distribution for the random eﬀects.
The conﬁdence interval provided by the method is asymmetric and hence so is the
diamond in the forest plot. Reported heterogeneity measures are estimated using
the dl model.
varc Informs the program that the study eﬀect variation variable varname2 holds vari-
ance values. If this option is omitted the program assumes the variable contains
standard error values (the default).
label(varname) Selects labels for the studies. Up to two variables can be selected and
converted to strings. If two variables are selected they will be separated by a comma.
Usually, the author names and the year of study are selected as labels. The ﬁnal
string is truncated to 20 characters.
forest Requests a forest plot. The weights from the speciﬁed analysis are used for
plotting symbol sizes (PE uses DL weights).
forestw(#) Requests a forest plot with adjusted weight ratios for better display. The
value can be in the [1,50] range. For example if the largest to smallest weight ratio
is 60 and the graph looks awkward the user can use this command to improve the
appearance, by requesting the weight to be rescaled to a largest/smallest weight
ratio of 30. It should be noted that only the weight squares in the plot are aﬀected
and not the model. The conﬁdence intervals in the plot are unaﬀected.
plplot(string) Requests a plot of the likelihood function for the average eﬀect or
between-study variance estimate of the ml, pl or reml models. Option plplot(mu)
ﬁxes the average eﬀect parameter to its model estimate, in the likelihood function,
and creates a two way plot of τ
vs the likelihood function. Option plplot(tsq)
ﬁxes the between-study variance to its model estimate, in the likelihood function,
and creates a two way plot of µ vs the likelihood function.
2.3 Saved results
metaan saves the following scalar results (some varying by selected method) in r():
r(Hsq) Heterogeneity measure H
r(Isq) Heterogeneity measure I
r(Q) Cochran’s Q value r(Qpval) p-value for Cochran’s Q
r(df) Degrees of freedom
r(eﬀvar) eﬀect variance r(eﬀ) eﬀect size
r(eﬄo) eﬀect size, lower 95% CI r(eﬀup) eﬀect size, upper 95% CI
fe, dl methods
r(tausq dl) ˆτ
, from the DL method
r(tausq dl) ˆτ
, from the DL method r(tausq ml) ˆτ
, from the ML method
r(conv ml) ML convergence information
r(tausq dl) ˆτ
, from the DL method r(tausq reml) ˆτ
, from the REML method
r(conv reml) REML convergence information
E. Kontopantelis and D. Reeves 5
r(tausq dl) ˆτ
, from the DL method r(tausq pl) ˆτ
, from the PL method
r(tausqlo pl) ˆτ
(PL), lower 95% CI r(tausqup pl) ˆτ
(PL), upper 95% CI
r(cloeﬀ pl) convergence information, PL
eﬀect size (lower CI)
r(cupeﬀ pl) convergence info, PL eﬀect size
r(ctausqlo pl) convergence information, PL
r(ctausqup pl) convergence information, PL ˆτ
r(conv ml) ML convergence information
r(tausq dl) ˆτ
, from the DL method r(exec pe) Information on PE execution
In each case, heterogeneity measures H
are computed using the returned
between-variance estimate ˆτ
. Convergence (and PE execution) information is returned
as 1 if succesful and as 0 otherwise. r(effvar) cannot be computed for PE. r(effvar)
is the same for ML and PL, but for PL the conﬁdence intervals are ‘amended’ to take
into account the ˆτ
The metaan command oﬀers six meta-analysis methods for calculating a mean eﬀect
estimate and its conﬁdence intervals: ﬁxed-eﬀect model (FE), random-eﬀects DerSimo-
nian & Laird method (DL), maximum-likelihood random-eﬀects model (ML), restricted
maximum-likelihood random-eﬀects model (REML), proﬁle-likelihood random-eﬀects
model (PL) and permutations method utilising a DL random-eﬀects model (PE). Mod-
els of the random-eﬀects family take into account the identiﬁed between-study variation,
estimate it and usually produce wider conﬁdence intervals for the overall eﬀect than a
ﬁxed-eﬀect analysis. Brief descriptions of the methods have been provided in section 2.2.
In this section, we will provide a few more details and practical advice in selecting be-
tween the methods. Their complexity prohibits complete descriptions in this paper and
users wishing to look into method details are encouranged to refer to the original papers
which have described them (DerSimonian and Laird 1986; Hardy and Thompson 1996;
Follmann and Proschan 1999; Brockwell and Gordon 2001).
The three maximum likelihood methods are iterative and usually computationally
expensive. ML and PL derive the µ (overall eﬀect) and τ
estimates by maximizing the
log-likelihood function in (1), under diﬀerent conditions. REML estimates τ
and µ by
maximizing the restricted log-likelihood function in (2).
log L(µ, τ
) = −
, µ ∈ < & τ
≥ 0 (1)
) = −
, ˆµ ∈ < & τ
≥ 0 (2)
where k is the number of studies to be meta-analysed, ˆy
are the eﬀect and
variance estimates for study i and ˆµ is the overall eﬀect estimate.
Maximum likelihood follows the simplest approach, maximizing (1) in a single itera-
tion loop. A criticism of ML is that it takes no account of the loss in degrees of freedom
that results from estimating the overall eﬀect. Restricted Maximum Likelihood derives
the likelihood function in a way that adjusts for this and removes downward bias in
the between-studies variance estimator. A useful description for REML, in the meta-
analysis context, has been provided by Normand (1999). Proﬁle likelihood uses the
same likelihood function as ML, but takes into account the uncertainty associated with
the between-study variance estimate when calculating an overall eﬀect, through the use
of use nested iterations to converge to a maximum. By incorporating this extra factor
of uncertainty, PL yields conﬁdence intervals that are usually wider than for DL and
also asymmetric. PL has been shown to outperform DL in various scenarios (Brockwell
and Gordon 2001).
The PE method (Follmann and Proschan 1999) can be described as follows: First, in
line with a Null hypothesis that all true study eﬀects are zero and observed eﬀects are due
to random variation, a dataset of all possible combinations of observed study outcomes
is created by permuting the sign of each observed eﬀect. Next, the dl method is used
to compute an overall eﬀect for each combination. Finally, the resulting distribution of
overall eﬀect sizes is used to derive a conﬁdence interval for the observed overall eﬀect.
Method performance is known to be aﬀected by three factors: the number of studies
in the meta-analysis, the degree of heterogeneity in true eﬀects and - provided there is
heterogeneity present - the distribution of the true eﬀects (Brockwell and Gordon 2001).
Heterogeneity is a major problem researchers have to face, when combining study re-
sults in a meta-analysis, which is attributed to clinical and/or methodological diver-
sity (Higgins and Green 2006). The variability that arises from diﬀerent interventions,
populations, outcomes or follow-up times is described by clinical heterogeneity, while
diﬀerences in trial design and quality are accounted for by methodological heterogene-
ity (Thompson 1994). Traditionally, heterogeneity is tested with Cochran’s Q which
provides a p-value for the test of homogeneity, when compared with a χ
(Brockwell and Gordon 2001) (where k is the number of studies). However the test is
known to be poor at detecting heterogeneity since its power is low when the number of
studies is small (Hardy and Thompson 1998). An alternative measure is I
, which is
thought to be more informative in assessing inconsistency between studies, with values
of 25%, 50% and 75% corresponding to low, moderate and high heterogeneity respec-
tively (Higgins et al. 2003). Another measure is H
, the measure least aﬀected by the
value of k, taking values in the [0, +∞) range with 0 indicating perfect homogeneity
(Mittlbock and Heinzl 2006). Obviously, the between-study variance estimate ˆτ
also be informative about the presence or not of heterogeneity.
The test for heterogeneity is often used as the basis for applying a ﬁxed-eﬀect or
a random-eﬀects model. However, the often low power of the Q test makes it unwise
to base a decision on the result of the test alone. Research studies, even on the same
topic, can vary on a large number of factors, hence homogeneity is often an unlikely
E. Kontopantelis and D. Reeves 7
assumption and some degree of variability between studies is to be expected (Thompson
and Pocock 1991). Some authors recommend the adoption of a random-eﬀects model,
unless there are compelling reasons for doing otherwise, irrespective of the outcome of
the test for heterogeneity (Brockwell and Gordon 2001).
However, even though random-eﬀects methods model heterogeneity, the performance
of the maximum likelihood methods (ML, REML and PL) in situations where the true
eﬀects violate the assumptions of a Normal distribution may not be optimal (Brockwell
and Gordon 2001; Hardy and Thompson 1998; Bohning et al. 2002; Sidik and Jonkman
2007). The number of studies in the analysis is also an issue, since most meta-analysis
methods (including DL, ML, REML, PL, but not PE) are only asymptotically correct:
i.e. they provide the theoretical 95% coverage only as the number of studies increases
(approaches inﬁnity). Method performance is therefore aﬀected when the number of
studies is small, but the extent depends on the method (some are more susceptible),
along with the degree of heterogeneity and the distribution of the true eﬀects (Brockwell
and Gordon 2001).
As an example, we apply the metaan command to health risk outcome data from seven
studies. The information was collected for an unpublished meta-analysis and the data
is available from the authors. Using describe and list commands we provide details
of the dataset and proceed to perform a univariate meta-analysis with metaan.
. use metaan_example.dta,
Contains data from metaan_example.dta
vars: 4 19 Apr 2010 12:19
size: 532 (99.9% of memory free)
storage display value
variable name type format label variable label
study str16 %16s First author and year
outcome str48 %35s Outcome description
effsize float %9.0g effect sizes
se float %9.0g SE of the effect sizes
Sorted by: study outcome
. list study outcome effsize se, noobs clean
study outcome effsize se
Bakx A, 1985 Serum cholesterol (mmol/L) -.3041526 .0958199
Campbell A, 1998 Diet .2124063 .0812414
Cupples, 1994 BMI .0444239 .090661
Eckerlund SBP -.3991309 .12079
Moher, 2001 Cholesterol (mmol/l) -.9374746 .0691572
Woolard A, 1995 Alcohol intake (g/week) -.3098185 .206331
Woolard B, 1995 Alcohol intake (g/week) -.4898825 .2001602
. metaan effsize se, pl label(study) forest
Profile Likelihood method selected
Study Effect [95% Conf. Interval] % Weight
Bakx A, 1985 -0.304 -0.492 -0.116 15.09
Campbell A, 1998 0.212 0.053 0.372 15.40
Cupples, 1994 0.044 -0.133 0.222 15.20
Eckerlund -0.399 -0.636 -0.162 14.49
Moher, 2001 -0.937 -1.073 -0.802 15.62
Woolard A, 1995 -0.310 -0.714 0.095 12.01
Woolard B, 1995 -0.490 -0.882 -0.098 12.19
Overall effect (pl) -0.308 -0.622 0.004 100.00
ML method succesfully converged
PL method succesfully converged for both upper and lower CI limits
value df p-value
Cochrane Q 139.81 6 0.000
I^2 (%) 91.96
value [95% Conf. Interval]
tau^2 est 0.121 0.000 0.449
Estimate obtained with Maximum likelihood - Profile likelihood provides the CI
PL method succesfully converged for both upper and lower CI limits of the tau
The PL method used in the example converged successfuly, as did ML whose convergence
is a prerequisite. The overall eﬀect is not found to be signiﬁcant at the 95% level
and there is considerable heterogeneity across studies, according to the measures. The
method also displays a 95% conﬁdence interval for the between-study variance estimate
(provided convergence is achieved, as is the case in this example). The forest plot
created by the command is displayed in Figure 1.
(Continued on next page)
E. Kontopantelis and D. Reeves 9
Overall effect (pl)
Woolard B, 1995
Woolard A, 1995
Campbell A, 1998
Bakx A, 1985
−1 −.5 0 .5
Effect sizes and CIs
Original weights (squares) displayed. Largest to smallest ratio: 1.30
Figure 1: Forest plot displaying proﬁle-likelihood meta-analysis.
Re-executing the analysis with the plplot(mu) and plplot(tsq) options we obtain
the log-likelihood function plots (Figures 2 & 3).
0 .05 .1 .15 .2
for mu fixed to the ML/PL estimate
Figure 2: Log-likelihood function plot, µ ﬁxed to the model estimate.
−1.5 −1 −.5 0 .5
for tau² fixed to the ML/PL estimate
Figure 3: Log-likelihood function plot, τ
ﬁxed to the model estimate.
The metaan command can be a useful meta-analysis tool which includes newer and, in
certain circumstances, better performing methods than the standard Dersimonian-Laird
random-eﬀects model. Unpublished results exploring method performance in various
scenarios are available from the authors. Future work will involve implementing more
methods in the metaan command and embellishing the forest plot.
We would like to thank the authors of meta and metan for all their work and the
anonymous reviewer whose useful comments improved the paper considerably.
Bohning, D., U. Malzahn, E. Dietz, P. Schlattmann, C. Viwatwongkasem, and A. Big-
geri. 2002. Some General Points in Estimating Heterogeneity Variance with the
DerSimonian-Laird Estimator. Biostatistics 3(4): 445–457.
Brockwell, S. E., and I. R. Gordon. 2001. A Comparison of Statistical Methods for
Meta-Analysis. Statistics in Medicine 20(6): 825–840.
DerSimonian, R., and N. Laird. 1986. Meta-Analysis in Clinical Trials. Controled
E. Kontopantelis and D. Reeves 11
Clinical Trials 7(3): 177–188.
Follmann, D. A., and M. A. Proschan. 1999. Valid Inference in Random Eﬀects Meta-
Analysis. Biometrics 55(3): 732–737.
Harbord, R. M., and J. P. T. Higgins. 2008. Meta-regression in STATA. Stata Journal
Hardy, R. J., and S. G. Thompson. 1996. A Likelihood Approach to Meta-Analysis with
Random Eﬀects. Statistics in Medicine 15(6): 619–629.
———. 1998. Detecting and Describing Heterogeneity in Meta-Analysis. Statistics in
Medicine 17(8): 841–856.
Harris, R., M. Bradburn, J. Deeks, R. Harbord, D. Altman, and J. Sterne. 2008. Metan:
Fixed- and Random-Eﬀects Meta-Analysis. Stata Journal 8(1): 3–28.
Higgins, J. P., and S. Green. 2006. Cochrane Hand-
book for Systematic Reviews of Interventions: Version 4.2.6.
———. 2008. Cochrane Handbo ok for Systematic Reviews of Interventions: Version
Higgins, J. P., S. G. Thompson, J. J. Deeks, and D. G. Altman. 2003. Measuring
Inconsistency in Meta-Analyses. British Medical Journal 327(7414): 557–560.
Huque, M. F. 1988. Experiences with Meta-Analysis in NDA Submissions. Proceedings
of the Biopharmaceutical Section of the American Statistical Association 2: 28–33.
Kontopantelis, E., and D. Reeves. 2009. MetaEasy: A Meta-Analysis Add-In for Mi-
crosoft Excel. Journal of Statistical Software 30(7): 1–25.
Lambert, P. C., A. J. Sutton, K. R. Abrams, and D. R. Jones. 2002. A comparison
of summary patient-level covariates in meta-regression with individual patient data
meta-analysis. J Clin Epidemiol 55(1): 86–94.
Mittlbock, M., and H. Heinzl. 2006. A Simulation Study Comparing Properties of
Heterogeneity Measures in Meta-Analyses. Statistics in Medicine 25(24): 4321–4333.
Normand, S. T. 1999. Tutorial in biostatistics. Meta-analysis: formulating, evaluating,
combining, and reporting. Stat Med 18: 321–359.
Olkin, I., and A. Sampson. 1998. Comparison of meta-analysis versus analysis of variance
of individual patient data. Biometrics 54(1): 317–322.
Sidik, K., and J. N. Jonkman. 2007. A comparison of heterogeneity vari-
ance estimators in combining results of studies. Stat Med 26(9): 1964–1981.
Stewart, L. A., and M. J. Clarke. 1995. Practical methodology of meta-analyses
(overviews) using updated individual patient data. Cochrane Working Group. Stat
Med 14(19): 2057–2079.
Thompson, S. G. 1994. Why Sources of Heterogeneity in Meta-Analysis Should be
Investigated. British Medical Journal 309(6965): 1351–1355.
Thompson, S. G., and S. J. Pocock. 1991. Can Meta-Analyses be Trusted? The Lancet
Thompson, S. G., and S. J. Sharp. 1999. Explaining heterogeneity in meta-analysis: a
comparison of methods. Stat Med 18(20): 2693–2708.
White, I. R. 2009. Multivariate random-eﬀects meta-analysis. Stata Journal 9(1): 40–
About the authors
Evangelos (Evan) Kontopantelis is a research fellow in statistics at the National Primary Care
Research and Development Centre, University of Manchester, England. His research interests
include statistical methods in health sciences with a focus on meta-analysis, longitudinal data
modeling and large clinical database management.
David Reeves is a senior research fellow in statistics at the Health Sciences Primary Care
Research Group, University of Manchester, England. David has worked as a statistician in
health services research for nearly three decades, mainly in the ﬁelds of learning disability
and primary care. His methodological research interests include the robustness of statistical
methods, the analysis of observational studies, and applications of social network analysis
methods to health systems.