Page 1
STATISTICS IN MEDICINE
Statist. Med. 2001; 20:825–840
A comparison of statistical methods for meta-analysis
Sarah E. Brockwell and Ian R. Gordon∗;†
Department of Mathematics and Statistics; Richard Berry Building; The University of Melbourne;
Victoria 3010; Australia
SUMMARY
Meta-analysis may be used to estimate an overall e?ect across a number of similar studies. A number of
statistical techniques are currently used to combine individual study results. The simplest of these is based
on a ?xed e?ects model, which assumes the true e?ect is the same for all studies. A random e?ects model,
however, allows the true e?ect to vary across studies, with the mean true e?ect the parameter of interest. We
consider three methods currently used for estimation within the framework of a random e?ects model, and
illustrate them by applying each method to a collection of six studies on the e?ect of aspirin after myocardial
infarction. These methods are compared using estimated coverage probabilities of con?dence intervals for
the overall e?ect. The techniques considered all generally have coverages below the nominal level, and in
particular it is shown that the commonly used DerSimonian and Laird method does not adequately re?ect
the error associated with parameter estimation, especially when the number of studies is small. Copyright
? 2001 John Wiley & Sons, Ltd.
1. INTRODUCTION
Meta-analysis refers to the process of locating, selecting, assessing and combining information rele-
vant to a particular research question. Over the past 20 years the number of published meta-analyses
and discussions on meta-analysis methodology has dramatically increased. This has occurred par-
ticularly in the areas of medical and epidemiological research [1], but also in the sociological and
behavioural sciences [2]. A Medline search found 269 meta-analyses published in 1990. This ?gure
has risen steadily, to 575 in 1997. Owing to this rapid rise in the popularity of meta-analysis, it
is becoming increasingly important that the methodology and statistics used are sound.
In general it is instructive to identify the sources of heterogeneity between studies, possibly by
modelling the outcome of interest in terms of features of the studies (‘meta-regression’). However,
in many meta-analyses the number of studies is small and such an approach is not feasible. This
paper focuses on the non-Bayesian statistical methods commonly used for meta-analysis, when the
goal of the analysis is to estimate an e?ect from a relatively small number of similar studies. Our
results suggest that, as usually applied, these methods have important de?ciencies. In particular,
∗Correspondence to: Ian R. Gordon; Department of Mathematics and Statistics; Richard Berry Building; The University of
Melbourne; Victoria 3010; Australia
†E-mail: i.gordon@ms.unimelb.edu.au
Received September 1998
Accepted February 2000
Copyright ? 2001 John Wiley & Sons, Ltd.
Page 2
826
S. E. BROCKWELL AND I. R. GORDON
con?dence intervals obtained from the combined information do not adequately account for the
variation introduced both from the data, and the estimation procedure itself.
The statistical methods are generally based on standard ?xed or random e?ects models. These are
outlined brie?y below, and the random e?ects model is discussed in more detail in the following
two sections.
Consider a collection of k studies, the ith of which has estimated e?ect size Yi and true e?ect
size ?i. A general model is then speci?ed by
Yi=?i+ ei
where ei
d=N(0;?2
i);i=1;2;:::;k
The eiindicate random deviations from the true e?ect size and are assumed independent with mean
zero and variance ?2
mean ?i and variance ?2
is (at least approximately) appropriate. Common examples are a log-odds ratio or di?erence in
means.
In general the parameter of interest is the overall e?ect, denoted by ?. The ?xed e?ects model
assumes ?i=? for i=1;2;:::;k, implying that each study in the meta-analysis has the same
underlying e?ect. Note that even if the ?i are assumed to be the same, the Yi are not identically
distributed due to the possibility of di?ering ?2
average of the Yi, with the optimal weights proportional to wi=1=var(Yi). In practice the variances
are not known so estimated variances ˆ ?2
is generally ignored in practice, but to indicate this estimation we use the notation ˆ ?2
Hence we de?ne ˆ wi=1= ˆ ?2
?ˆ wiYi
In contrast to the ?xed e?ects model, the random e?ects model does not assume that the ?iare
equal, but that they are normally distributed. This gives the two-stage model
i. This implies that the estimated e?ect size Yi is normally distributed with
i. Yi can be any measure of e?ect, provided the assumption of normality
i. The estimator of ? is generally a simple weighted
iare used to estimate both ? and var( ˆ ?). Any e?ect of this
ithroughout.
igiving
ˆ ?=
?ˆ wi
=? Yi
ˆ ?2
i
?? 1
ˆ ?2
i
and
?
var( ˆ ?)=1
?? 1
ˆ ?2
i
Yi=?i+ ei
where ei
d=N(0; ˆ ?2
i)
?i=? + ?i
where ?i
d=N(0;?2)
?
(1)
The error terms eiand ?iare assumed to be independent. In this case, the true e?ect for study i
is centred around the overall e?ect, allowing individual studies to vary both in estimated e?ect and
true e?ect. The random e?ects variance parameter ?2is a measure of the heterogeneity between
studies. Note that the ?xed e?ects model is a special case of the random e?ects model, with ?2=0.
It is generally agreed [3;4] that in the presence of heterogeneity, the random e?ects model should
be used. Heterogeneity is commonly tested using a statistic de?ned by Cochran [5]: Qw=?wi(Yi−
the variances ?2
giving the statistic
Qˆ w=?ˆ wi(Yi− ˆ ?)2
which is approximately ?2
and the hypothesis of homogeneity is rejected. In such cases the random e?ects model is generally
ˆ ?)2. The null hypothesis of homogeneity is H0:?2=0 against a one-sided alternative. If we assume
iare known, then under H0; Qw
d=?2
k−1. In practice, estimates ˆ wi=1= ˆ ?2
iare used
k−1under H0. A large value of Qˆ windicates large study-to-study variation,
Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840
Page 3
STATISTICAL METHODS FOR META-ANALYSIS
827
Table I. Columns 2–5 show the sample sizes and observed proportions dying for the six stud-
ies on the e?ect of aspirin after myocardial infarction. Columns 6 and 7 give the observed
log-odds ratio and estimated variances for each of the studies. For a discussion of the ?xed
e?ects and DerSimonian and Laird random e?ects weights see Section 5.
StudyTreatment Controllog(ORi)ˆ ?2
i
Weights
FE∗
nti
ˆ pti
nci
ˆ pci
RE∗
1
2
3
4
5
6
615
758
317
832
810
2267
0.0797
0.0580
0.0852
0.1226
0.1049
0.1085
624
771
309
850
406
2257
0.1074
0.0830
0.1054
0.1482
0.1281
0.0970
−0:3289
−0:3845
−0:2158
−0:2196
−0:2257
0.1246
0.0389
0.0412
0.0753
0.0205
0.0352
0.0096
0.11
0.10
0.05
0.20
0.12
0.43
0.15
0.14
0.09
0.20
0.15
0.26
∗Fixed e?ects weights
ˆ ?−2
i
?
ˆ ?−2
i
; random e?ects weights
( ˆ ?2
i+ ˆ ?2)−1
?( ˆ ?2
i+ ˆ ?2)−1.
adopted. Hardy and Thompson [6] suggest, however, that the power of this test can be low. They
therefore recommend that this should not be the only means by which the ?xed e?ects model is
rejected.
The random e?ects model is outlined and discussed in detail in Sections 2 and 3. In particular
we consider methods for estimating ? and ?2, and for incorporating ˆ ?2into ˆ ?ˆ ?– the random e?ects
estimator of ? – and the variance of ˆ ?ˆ ?. In Section 2 we outline the commonly-used DerSimonian
and Laird [7] random e?ects method, and in Section 3, likelihood techniques.
In Section 4, con?dence intervals for ?, calculated using several methods, are compared using
coverage probabilities. We ?rst compare the ?xed e?ects method and DerSimonian and Laird
random e?ects method, where these are used irrespective of the observed value of Qˆ w. These are
also compared to a Q-based method which re?ects the common practice of using the ?2-test to
determine whether a ?xed or random e?ects approach should be adopted. We also compare the
DerSimonian and Laird intervals to two likelihood based intervals.
In Section 5 four methods are applied to a collection of studies on the e?ect of aspirin after
myocardial infarction. This set of studies appears repeatedly in the literature [4;7;8], owing to its
interesting range of observed measures of e?ect and sample sizes.
The collection consists of six studies, each examining the e?ect of aspirin after myocardial
infarction. In each study the number of patients who died after having been given either aspirin
or a control drug is recorded. Sample sizes for all the studies are quite large – the smallest
involving 626 patients. Of the six studies however, one is particularly large, involving a total of
4524 patients. Table I gives the sample sizes, ntiand nciand the proportions dying ˆ ptiand ˆ pcifor
each of the treatment (t) and control (c) groups, where i denotes the study number. As shown,
the large sixth study is the only one for which ˆ pt¿ ˆ pc.
Table I also gives the observed log-odds ratios (log(OR)) and corresponding estimated variances
where
ORi=
ˆ pti(1 − ˆ pci)
(1 − ˆ pti) ˆ pci
andˆ ?2
i= ?
var[log(ORi)]=1
xti
+
1
nti− xti
+
1
xci
+
1
nci− xci
and xti and xci denote the observed number of deaths for the treatment and control groups re-
spectively for study i. From this table we can see that for the sixth study the log-odds ratio is
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 4
828
S. E. BROCKWELL AND I. R. GORDON
Figure 1. Estimated log-odds ratios and con?dence intervals for the six studies used
in the aspirin meta-analysis.
considerably di?erent from that of the remaining ?ve studies. This table also shows the e?ect of
the large di?erences in sample sizes with the estimated variance for the sixth study being consid-
erably smaller than for the other ?ve studies. Figure 1 shows the estimated log-odds ratios and
corresponding con?dence intervals (using a normal approximation) for each of the six studies.
The results of combining these studies, using four di?erent methods, are presented and discussed
in Section 5. In Section 6 some conclusions are drawn and alternative methods are brie?y discussed.
2. THE RANDOM EFFECTS MODEL AND ESTIMATION OF ?2
The random e?ects model given in (1) can also be written
Yi=? + ?i+ ei
where ei
d=N(0; ˆ ?2
i) and?i
d=N(0;?2)
relating the Yi directly to the overall measure of e?ect ?. By the independence of ?i and ei we
then have Yi
?ˆ wi(?)Yi
with variance
d=N(?; ˆ ?2
i+ ?2). A weighted average is again used to estimate ?, giving
ˆ ??=
?ˆ wi(?)
var( ˆ ??)=
1
?ˆ wi(?)
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 5
STATISTICAL METHODS FOR META-ANALYSIS
829
where ˆ wi(?)=[?2+ ˆ w−1
i
]−1and ˆ wi are as de?ned above. Assuming ?2is known, we then have
?
Note that ˆ wi(?)6 ˆ wi. This implies var( ˆ ??)¿var( ˆ ?), and hence random e?ects model con?dence
intervals for ? are generally wider than those constructed from the ?xed e?ects model.
In practice, ?2is unknown. The most commonly used estimator of ?2is a method of moments
based estimator proposed by DerSimonian and Laird [7], derived by equating an estimate of the
expected value of Qˆ w with its observed value. Note that
ˆ ??
d=N ?;
1
?ˆ wi(?)
?
E(Qˆ w)=k − 1 + ?2
??ˆ wi−
?ˆ w2
i
?ˆ wi
?
Suppose that t is obtained by solving
qˆ w=k − 1 + t
??ˆ wi−
?ˆ w2
i
?ˆ wi
?
giving t =
qˆ w− (k − 1)
?ˆ wi−?ˆ w2
i=?ˆ wi
It is possible that this value is negative, which is unacceptable as a value for ?2, so we de?ne
?
ˆ ?2=
t
0
if t¿0
if t60
Note that due to the truncation, ˆ ?2is a biased estimator for ?2.
DerSimonian and Laird proposed that this estimate can then be incorporated into the random
e?ects weights giving
ˆ wi(ˆ ?)=(ˆ ?2+ ˆ ?2
i)−1
An estimator of ? is then given by
ˆ ?ˆ ?=
?ˆ wi(ˆ ?)Yi
?ˆ wi(ˆ ?)
with variance estimated by
?
var( ˆ ?ˆ ?)=
1
?ˆ wi(ˆ ?)
Note that this is simply a straight substitution of ˆ ?2into the variance of ˆ ??, derived assuming ?2
is known. In obtaining con?dence intervals for ? using ˆ ?ˆ ?and ?
respectively.
Two main issues arise from the general application of the random e?ects model as described
above:
var( ˆ ?ˆ ?), it is common practice to
iin place of ?2and ?2
maintain the assumption of normality for ˆ ?ˆ ?, despite the use of ˆ ?2and ˆ ?2
i,
(i) The assumption of normality poses problems, ?rst in its validity, and secondly in our ability
to check that validity for meta-analyses based on a small number of studies. In particular,
the assumption of normally distributed random e?ects, or between study errors ?i is not
easily veri?ed or justi?ed. The issue of validating the assumption of normality is addressed
by Hardy and Thompson [6], however they consider only relatively large values of k.
Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840
Page 6
830
S. E. BROCKWELL AND I. R. GORDON
(ii) The variation between true study e?ect sizes is taken into account via the inclusion of
errors with variance ?2. It is however only an estimate of this variance which is added
into the weights, and the model takes no account of the uncertainty associated with this
estimate. In particular the distribution used for ˆ ?ˆ ?is not altered. As shown in Section 4.2,
this results in con?dence intervals for ? which are narrower on average than they should
be. It is common practice to use a t-distribution to account for the error associated with
a variance estimate [9]. This approach is not valid in the random e?ects meta-analysis
context.
3. LIKELIHOOD METHODS
Maximum likelihood theory is widely used for estimation and inference. In this section we review
two methods which use established likelihood theory to obtain con?dence intervals for ?. These
are considered as alternatives to the DerSimonian and Laird method.
3.1. Estimating ? and ?2using maximum likelihood
Recall that the standard random e?ects model has Yi
ˆ ?2
d= N(?; ˆ ?2
i+ ?2); i=1;2;:::;k and that the
iare treated as known constants. The log-likelihood function is
log L(?;?2)=−1
2
?log(2?( ˆ ?2
i+ ?2))−1
2
?(yi− ?)2
ˆ ?2
i+ ?2;?∈R; ?2¿0(2)
Maximum likelihood estimates ˆ ?mland ˆ ?2
details).
One major advantage of maximum likelihood estimation lies in the large body of asymptotic
theory existing for such estimators. In regular cases a maximum likelihood estimator from a sample
of k independent and identically distributed random variables has a normal distribution as k →∞.
It should be noted that the k variables Yi; i=1;2;:::;k from a meta-analysis are independent but
not identically distributed, since var(Yi)= ˆ ?2
standard assumptions will still apply however.
Using this asymptotic distribution, it is possible to construct a con?dence interval for ?. How-
ever, this is only an approximate interval, since the asymptotic variance of ˆ ?mldepends on the
unknown ?2. This variance is derived from the covariance matrix of (ˆ ?ml; ˆ ?2
mlcan be found in standard ways (see Appendix A for
i+?2. In any realistic meta-analysis with large k, the
ml) and is given by
var(ˆ ?ml)=
1
?( ˆ ?2
i+ ?2)−1
Under the assumption of asymptotic normality we therefore have
ˆ ?ml
d∼N
?
?;
1
?( ˆ ?2
i+ ?2)−1
?
as k →∞
This distribution is used for ˆ ?mleven though the likelihood estimate of ?2may lie on the boundary
of the parameter space, namely, ?2=0.
Note that the variance of ˆ ?mlis of the same form as that for the DerSimonian and Laird
random e?ects method. In this case var(ˆ ?ml) is estimated using ˆ ?2
mlwithout any modi?cation to
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 7
STATISTICAL METHODS FOR META-ANALYSIS
831
the distribution of ˆ ?ml; 95 per cent con?dence intervals are therefore
ˆ ?ml±
1:96
i+ ˆ ?2
√{?( ˆ ?2
ml)−1}
This method is referred to as the simple likelihood method.
3.2. Pro?le likelihood intervals
An alternative method, which uses a pro?le likelihood function, is proposed by Hardy and
Thompson [10]. Unlike the simple likelihood method this method allows for asymmetric inter-
vals and some imprecision in the estimate of ?2.
The pro?le likelihood function for ? is de?ned as
L?(?0)=L(?0; ˆ ?2(?0))
where ˆ ?2(?0) satis?es
ˆ ?2(?0)=??(yi− ?0)2− ˆ ?2
i
( ˆ ?2
i+ ˆ ?2(?0))2
???
1
( ˆ ?2
i+ ˆ ?2(?0))2
(3)
Clearly ˆ ?2(?0) is not assumed ?xed for all ?0.
An approximate 95 per cent con?dence interval for ? is then given by values of ?0which satisfy
log(L?(?0))¿log(L(ˆ ?ml; ˆ ?2
ml)) −1
2C0:95(?2
1) (4)
where C?(?2
of this method are given in Appendix B.
Biggersta? and Tweedie [11] use a similar method to ?nd a con?dence interval for ?2, but as
the focus of this paper is ?nding con?dence intervals for ?, intervals for ?2are not considered in
detail here.
1) is the ?-quantile of the ?2
1distribution. Details of the derivation and implementation
4. COMPARISON OF METHODS
4.1. Coverage probabilities and simulation methods
The coverage probability of a random interval (A;B) for ? is de?ned as Pr(?∈(A;B)) which
– for a nominal 95 per cent con?dence interval – should be close to 0.95. The exact coverage
can only be found if the distribution of the interval is known. If, however, as is more common,
the distribution is unknown, the coverage probability must be estimated using simulation. This is
done by simulating a large number of meta-analyses and for each meta-analysis calculating the
appropriate con?dence interval. The estimated coverage probability is then the proportion of these
intervals which contain ?.
The coverage probability is usually dependent on the parameters of the model and so the cov-
erages presented below are estimated for a range of values of ?2and k. For all the intervals
considered here, the coverage probability is not dependent on the value of ? since the procedure
is invariant with respect to a location shift. The data for each meta-analysis are simulated using
the random e?ects model described in Section 1, assuming normal errors eiand ?i, with zero mean
and variances ˆ ?2
iand ?2, respectively. For all simulations we use ?=0:5.
Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840
Page 8
832
S. E. BROCKWELL AND I. R. GORDON
Figure 2. Simulated meta-analyses with ?=0:5 and k =10. In graph (a) ?2=0:03, a small
amount of between study variation. In graph (b) ?2=0:07; the larger amount of between-study
variation manifests itself in a larger spread of estimates. The width of the con?dence intervals
for ? indicate the range of values of ˆ ?2
i used.
The coverage probability is then estimated by simulating 25000 meta-analyses of k studies, with
?=0:5 and ?2as speci?ed. A 95 per cent con?dence interval is then calculated for each of these
meta-analyses and the coverage is estimated as the proportion, out of 25000, which contain the
parameter value, ?=0:5. Since the true coverages are generally greater than 0.9, the standard
errors of the estimated coverage probabilities are essentially 60:002.
To give the simulations authenticity, parameter values are chosen to correspond to a typical
scenario for estimating a common log-odds ratio (that is, exp(?) is the parameter of interest). Ac-
cordingly, the Yi may be considered sample log-odds ratios, and the ˆ ?2
These variances are realizations from a ?2
lie within the interval (0:009;0:6). Values produced in this way are consistent with the typical dis-
tribution of ˆ ?2
in order to allow for the sampling error in these values, which although present in practice, is not
accounted for in the methods considered.
The simulation procedure described above is implemented for a given k and ?2. The procedure
is repeated for each of 11 values of ?2between 0.0 and 0.1, and values of k between 3 and 35.
Figure 2 shows two simulated meta-analyses with ?2=0:03 and ?2=0:07; in each case ?=0:5
and k =10. Such graphs were in part used to determine suitable values for ?2and the ˆ ?2
The simulations are implemented using a program written in C++, with each simulation gener-
ating k observations from normal distributions with mean ? and variance ˆ ?2
used to calculate the ?xed e?ects estimate of ?, the observed value of Qˆ w, the DerSimonian and
Laird estimate of ?2and the corresponding con?dence intervals for ?. Since in practice the out-
come of the ?2-test using Qˆ wis often used to determine which method shall be used, a combined
?xed e?ects=random e?ects method is considered here. This method re?ects the ?2-test procedure,
selecting the ?xed or random e?ects methods as determined by comparing the observed value of
Qˆ w with the 0.95 quantile of the ?2
itheir estimated variances.
1distribution, multiplied by 0.25 and then restricted to
ifor log-odds ratios in practice. The ˆ ?2
iare varied for each of the 25000 simulations
i.
i+?2. The data are then
k−1distribution. This is referred to as the Q-based method.
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 9
STATISTICAL METHODS FOR META-ANALYSIS
833
Figure 3. Estimated coverage probabilities for the ?xed e?ects (FE), DerSimonian and Laird random e?ects
(DL) and Q-based (Q) methods, for varying k and ?2. In graph (a) k =10 and in graph (b) k =20. In graph
(c) ?2=0:03 and in graph (d) ?2=0:07.
Estimated coverage probabilities for the ?xed e?ects method, the Q-based method, the
DerSimonian and Laird random e?ects method, the simple likelihood method and the pro?le
likelihood method are presented in the following section.
4.2. Simulation results
The results of the simulations are presented in two ?gures showing the estimated coverage prob-
abilities for di?erent values of k and ?2. Figure 3 shows the estimated coverage probabilities for
the ?xed e?ects method (FE). Here the ?xed e?ects method is used, despite the data being sim-
ulated with ?2¿0 in most cases. For ?2=0 the estimated coverage probability is close to 0.95
as expected, however increasing ?2substantially reduces the coverage. This re?ects the theoretical
coverage probability result for the ?xed e?ects model speci?ed by
coverage probability=?(z) − ?(−z)where z =
1:96
??
1 + ?2
?ˆ w2
i
?ˆ wi
?
(5)
and ? denotes the standard normal cumulative distribution function. A brief derivation of this
result is given in Appendix C. Clearly this coverage probability decreases as ?2increases.
In practice it is more common for the ?xed e?ects method to be used only after the ?2-test,
based on Qˆ w, suggests that the hypothesis H0:?2=0 should not be rejected. However if this test is
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 10
834
S. E. BROCKWELL AND I. R. GORDON
Figure 4. Coverages for the DerSimonian and Laird random e?ects (DL), pro?le likelihood (PL) and simple
likelihood (SL) methods, for varying k and ?2. Note that the scale di?ers from that in Figure 3. In graph
(a) k =10 and in graph (b) k =20. In graph (c) ?2=0:03 and in graph (d) ?2=0:07.
used to select an appropriate method, the ?xed e?ects method is selected a considerable proportion
of the time, even when the true value of ?2is large. For example, with k =10 and ?2=0:06, the
?xed e?ects method is selected an estimated 58 per cent of the time. This percentage decreases as
k increases, with the ?xed method selected in 26 per cent of the cases with k =25 and ?2=0:06.
Coverage probabilities for the Q-based method are also given in Figure 3. Not surprisingly,
these coverage probabilities lie between the ?xed e?ects and DerSimonian and Laird random
e?ects coverages (denoted DL in Figure 3). These latter coverage probabilities are for con?dence
intervals for ? constructed using the DerSimonian and Laird method in all cases, irrespective of
the value of Qˆ w. As indicated in Figure 3, any use of the ?xed e?ects method substantially reduces
the coverage probability.
Figure 4 shows estimated coverage probabilities for the two random e?ects, likelihood based
methods. Coverage probabilities for the DerSimonian and Laird method are also included for
comparison. Note that the scale di?ers from that of Figure 3. These plots indicate that, except when
?2is close to zero, the coverage probabilities for all three methods are below 0.95. For all values
of k and ?2the coverage probabilities for the simple likelihood method are below those of the
other two methods. Similarly the pro?le likelihood method consistently has the highest estimated
coverage probability. For the two cases where data are simulated with ?2=0, all three methods
have coverage probabilities above 0.95, some signi?cantly so. For these simulations we obtain an
estimated value ˆ ?2¿0, resulting in wider con?dence intervals and hence coverage probabilities
higher than the nominal value.
Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840
Page 11
STATISTICAL METHODS FOR META-ANALYSIS
835
Table II. Estimated overall log-odds ratios and corresponding con?dence intervals for four di?erent
meta-analyses of the e?ect of aspirin after myocardial infarction.
Methodˆ ?95 per cent CIˆ ?2
Fixed e?ects
DerSimonian and Laird
Pro?le likelihood
Simple likelihood
−0:1015
−0:1689
−0:1175
−0:1175
(−0:2269;0:0238)
(−0:3609;0:0231)
(−0:3696;0:0352)
(−0:3902;0:0352)
?2=0 (assumed)
ˆ ?2=0:0269
ˆ ?2
ˆ ?2
ml=0:0390
ml=0:0390
The DerSimonian and Laird method for establishing con?dence intervals for ? is the most
commonly used technique, yet estimated values indicate that even for a large number of studies
the con?dence intervals obtained have a coverage probability below 0.95. This suggests that the
use of ˆ ?2in the estimation of ? and the standard error of ˆ ?ˆ ?, combined with the use of a normal
approximation for ˆ ?ˆ ?, produce intervals which are on average too narrow. This problem also arises
when using the simple likelihood method. The standard error of ˆ ?mlis estimated using ˆ ?2
modi?cation to the assumed distribution for ˆ ?ml. In both cases, coverage probabilities estimated
using the true ?2are close to the nominal value.
ml, without
5. AN EXAMPLE: ASPIRIN AFTER MYOCARDIAL INFARCTION
In this section, the four methods compared in Section 4 are applied to six studies on the e?ect
of aspirin after myocardial infarction reviewed in Section 1. Recall that one of the six studies is
considerably larger than the others and is the only study with an odds ratio greater than one.
The homogeneity statistic for these data is qˆ w=9:88 on 5 degrees of freedom, giving P =0:08
for the hypotheses H0:?2=0 versus H1:?2¿0. Whilst the null hypothesis is not rejected at the
0.05 level, the test does bring into question the validity of the ?xed e?ects model. This is brought
about solely by the large sixth study, since the homogeneity statistic for the ?rst ?ve studies is
qˆ w=0:63 on 4 degrees of freedom, giving P¿0:9. Adopting the random e?ects model for all six
studies, we obtain ˆ ?2=0:0269, using the DerSimonian and Laird estimator.
The ?xed e?ects method and three random e?ects methods have all been used to combine these
data. Table II gives estimated values of ?, the overall e?ect of aspirin versus the control, and ?2;
95 per cent con?dence intervals for ? are also presented. The three random e?ects estimates of
? are all considerably smaller than that obtained for the ?xed e?ects model. This is due to the
reduction in the weight ascribed to the large sixth study, which has a positive log-odds ratio.
All of the con?dence intervals obtained include zero, and the upper bounds for each are relatively
similar. The ?xed e?ects con?dence interval is considerably narrower than the three random e?ects
based methods. A rough estimate of the ?xed e?ects coverage probability for these data can be
obtained from the theoretical coverage given in Section 4.2. Using ˆ ?2=0:0269 and the estimated
values of ?2
igiven in Table I we obtain
z =
1:96
√(1 + 0:0269
?(ˆ ?−2
i
)2
?ˆ ?−2
i
)
=1:193
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 12
836
S. E. BROCKWELL AND I. R. GORDON
and estimated coverage probability
?(z) − ?(−z)=0:77
well below the nominal level of 0.95.
The markedly lower bound for the three random e?ects based methods are due to the allowance
of a non-zero value for ?2in the model and the corresponding change in weights. Table I shows
the weights allocated to each study, using both the ?xed and random e?ects models. As can be
seen in Table I, the ?xed e?ects model assigns a large weight to studies with small variance, and
small weights to those with large variance – generally corresponding to smaller sample sizes. In
the random e?ects model, however, the weights tend to be relatively similar, and the sixth study
now counts for only 26 per cent of the total weight, as opposed to 43 per cent in the ?xed e?ects
model. In general, in the random e?ects approach, large studies are downweighted relative to the
?xed e?ects model, and smaller studies given a greater weighting in estimation.
These tendencies produce interesting results for the aspirin meta-analysis. It is generally expected
that making adjustments to the weights does not substantially a?ect the estimate of ?, giving
ˆ ? ≈ ˆ ?ˆ ?. Similarly, since ˆ ?−2
the standard error of ˆ ?. The expected result is therefore that
????
0:0980=SE(ˆ ?ˆ ?), this di?erence is negated by the change in the estimates of ?: ˆ ?= − 0:1015
and ˆ ?ˆ ?= − 0:1689. We then have |zFE|=1:587¡1:724=|zRE|. Again this e?ect is a result of the
change in weights allocated to the sixth study.
This outcome is interesting, since to some extent it suggests a problem with the way in which
the random e?ects model attempts to increase the uncertainty associated with model estimates.
By adding extra variation into the model, we expect to make it more di?cult to reject a null
hypothesis such as H0: ?=0, and on average this is true. As shown by the aspirin meta-analysis,
however, it can occur that the random e?ects model produces a larger z-statistic, and hence a
smaller P-value.
i
= ˆ wi¿ ˆ wi(ˆ ?)=(ˆ ?2
i+ ˆ ?2)−1we have SE(ˆ ?)¡SE(ˆ ?ˆ ?), where SE(ˆ ?) is
|zFE|=
ˆ ?
SE(ˆ ?)
????¿
????
ˆ ?ˆ ?
SE(ˆ ?ˆ ?)
????=|zRE|
For the aspirin meta-analysis, however, we observe the reverse e?ect. Although SE(ˆ ?)=0:0640¡
6. COMMENTS
As demonstrated by the coverage probabilities presented in Section 4, the ?xed e?ects method
does not perform well unless there is very little between-study variation. In practice this would
rarely be the case, and should not be assumed to be so. It has also been demonstrated that use
of the ?2-test to determine which model is appropriate leads to con?dence intervals which are
on average too narrow. For small k in particular, the ?xed e?ects method is frequently selected,
even when ?2is large. As shown, this substantially reduces the coverage probability of con?dence
intervals for ?. It is therefore recommended that the random e?ects model be adopted irrespective
of the outcome of the ?2-test for heterogeneity. This then simpli?es to the ?xed e?ects method
only when ˆ ?2=0.
The random e?ects methods generally perform better than the ?xed e?ects methods, with respect
to coverage probabilities. However, particularly when the number of studies is modest (fewer than
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 13
STATISTICAL METHODS FOR META-ANALYSIS
837
20), the commonly used DerSimonian and Laird method has coverage probability considerably
below 0.95. This suggests that the error associated with the estimation of ?2is not adequately
being accounted for either through modi?cations to ?
imation, despite this estimation. Although it lacks the simplicity of the other three techniques, the
pro?le likelihood method produced the highest coverage probabilities in all cases. In particular,
coverage probabilities for small k were considerably closer to 0.95 than for the other two random
e?ects methods.
For both the ?xed and random e?ects methods, inference is carried out ignoring the sampling
errors in the individual study variances. Estimated values ˆ ?2
form of ˆ ?, its variance or distribution. It has been shown, however, that the ?xed e?ects variance
estimate, ?
unless all studies in the meta-analysis are small.
Non-Bayesian alternatives to the standard random e?ects approaches considered here have re-
cently been proposed in an attempt to incorporate more adequately the estimation of ?2into the
random e?ects model. The ?rst, outlined by Biggersta? and Tweedie [11] utilizes an approximate
distribution for ˆ ?2to modify the estimation of random e?ects weights. These new weights are then
used in the estimation of ? and its variance. However the variance of ˆ ?ˆ ?is still derived assuming
the weights are known, and the assumption of normality for ˆ ?ˆ ?is maintained. A second alternative
involves using an overdispersed generalized linear model to estimate the overall e?ect ?, with the
heterogeneity between studies being re?ected by the overdispersion parameter [13].
A third approach, currently being developed, uses simulations to model the 0.975 quantile of
(ˆ ?ˆ ?− ?)=√var(ˆ ?ˆ ?) as a function of k. The appropriate value, which will be greater than 1.96, is
then used in con?dence intervals for ?. Clearly all three methods need to be examined further
and incorporated into a comparative study of meta-analysis techniques. They do however provide
possible improvements to the de?ciencies highlighted in currently used techniques.
var(ˆ ?ˆ ?) or the distribution used for ˆ ?ˆ ?. Like
the DerSimonian and Laird method, the greatest source of error in simple likelihood con?dence
intervals comes from estimating var(ˆ ?ml) by substituting ˆ ?2
mlin for ?2; and using a normal approx-
iare used without modi?cation to the
var(ˆ ?), has a negative bias [12]. The estimation of ?2
random e?ects coverage probabilities. Hardy and Thompson [10] suggest however that in the case
of maximum likelihood procedures, allowing for the estimation of ?2
imight also be expected to in?uence
idoes not greatly a?ect results
APPENDIX A: MAXIMUM LIKELIHOOD ESTIMATION
If the log-likelihood function, given in (2), is partially di?erentiated with respect to ? and ?2and
the derivatives are set to zero, some algebraic arrangement gives
??
ˆ ?2
(ˆ ?2
ˆ ?m=?
m=? (yi− ˆ ?m)2− ˆ ?2
yi
ˆ ?2
i+ ˆ ?2
m
1
ˆ ?2
i+ ˆ ?2
??
m
(A1)
i
i+ ˆ ?2
m)2
1
(ˆ ?2
i+ ˆ ?2
m)2
(A2)
The maximum likelihood estimates ˆ ?mland ˆ ?2
mlare then
?(ˆ ?m; ˆ ?2
(ˆ ?;0)
(ˆ ?ml; ˆ ?2
ml)=
m)if ˆ ?2
m¿0
if ˆ ?2
m60
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 14
838
S. E. BROCKWELL AND I. R. GORDON
where ˆ ? is the ?xed e?ects estimate of ?. Note that since ˆ ?2
required.
Equations (A1) and (A2) must be solved iteratively. Substituting equation (A1) into (A2) we
obtain
mmay be less than zero, truncation is
ˆ ?2
m=f(ˆ ?2
m)
where f is the resulting function of ˆ ?2
mand the data. This can be solved for ˆ ?2
musing the iteration
ˆ ?2
t=f(ˆ ?2
t−1) (A3)
This simple dynamic system can be iterated until it converges to a ?xed point which will be the
desired estimate, and ˆ ?mcan then be evaluated by substituting ˆ ?2
For our simulations the iterations are initialized with ˆ ?2
estimates converge to within 10−6. The small value is added to the DerSimonian and Laird estimate
of ?2to prevent iterations starting at zero in cases where ˆ ?2= 0.
The convergence of this iterative procedure is not guaranteed however. Simulations show that in
some cases the iterative procedure does not converge to a single ?xed point, but to a limit cycle
of higher order, most commonly two. Whilst convergence to such cyclic behaviour does not often
occur, the possibility implies that the iterative procedure will not necessarily produce estimates
which maximize the likelihood function [14].
An obvious alternative to the iterative routine is to ?nd maximum likelihood estimates by direct
maximization of the likelihood function. The requirement that ˆ ?2
subject to this constraint. This method is used for our simulations where the iterative procedure
has not converged within 1000 iterations. It is implemented using Powell’s algorithm [15], which
minimizes the negative log-likelihood function.
minto (A1).
m= ˆ ?2+ 0:01 and repeated until both
ml¿0 necessitates maximization
APPENDIX B: PROFILE LIKELIHOOD INTERVALS
The pro?le likelihood interval is derived using the standard result for the asymptotic distribution
of a likelihood ratio statistic [9]. Applying this result and assuming H0:?=?0is true we have
?L?(?0)
provided L?(ˆ ?ml) is the maximum likelihood value under the general hypothesis. Note
that L?(ˆ ?ml) = L(ˆ ?ml; ˆ ?2
mates.
Finding the pro?le likelihood interval requires ?nding values of ?0which satisfy (4). This can
be achieved by implementing an iterative search for the lower and upper bounds of the interval.
We begin by selecting a value of ?0 well below the expected lower bound of the con?dence
interval. Substituting this into (3) we can iteratively solve for ˆ ?2(?0). A value of log(L?(?0)) is
then obtained by substituting both ?0 and ˆ ?2(?0) into the log-likelihood equation; log(L?(?0)) is
then compared to the right-hand side of (4). If the value of ?0 is either too small (and hence
the inequality is false), or too large, so it is within the interval rather than on the boundary, it
is adjusted and log(L?(?0)) recalculated. This is repeated until a con?dence bound is obtained to
within 10−6. A similar procedure is then used to ?nd the upper bound of the con?dence interval.
−2log
L?(ˆ ?ml)
?
d→?2
1
as k →∞
ml) – the likelihood function evaluated at the maximum likelihood esti-
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 15
STATISTICAL METHODS FOR META-ANALYSIS
839
As noted, each evaluation of log(L?(?0)) requires ?nding ˆ ?2(?0) which satis?es (3). As with
the iterative procedure discussed in Appendix A, iterations of this function will not necessarily
converge to a single value. Simulation suggests that for small ?2and small k, convergence to a
cycle is more likely to occur. It is possible, however, to ?nd ˆ ?2(?0) by directly maximizing the
likelihood function with respect to ?2, with ?=?0?xed. Again this maximization is implemented
using Powell’s algorithm [15].
APPENDIX C: FIXED EFFECTS COVERAGE PROBABILITY
If the ?xed e?ects method is used for any ?2¿0, the coverage probability is as given in (5). In
such cases we have Yi
?ˆ wiYi
The ?xed e?ects coverage probability is given by c, where
?
Standardizing ˆ ? we have
?−1:96
d=N(0;1). Hence
d=N(?; ˆ ?2
i+ ?2). This gives
ˆ ?=
?ˆ wi
d=N
?
?;
?ˆ wi+ ?2?ˆ w2
i
(?ˆ wi)2
?−1:96
?
c=Prˆ ? −
1:96
√?ˆ wi¡?¡ˆ ? +
1:96
√?ˆ wi
?ˆ wi
?
=Pr
√?ˆ wi¡ˆ ? − ?¡
1:96
√?ˆ wi
1:96
√?ˆ wi
?
c=Pr
√?ˆ wi
√(?ˆ wi+ ?2?ˆ w2
i)¡Z¡
?ˆ wi
√(?ˆ wi+ ?2?ˆ w2
i)
?
where Z
c = Pr(Z¡z) − Pr(Z¡ − z) = ?(z) − ?(−z)
where
z =
1:96
??
1 + ?2
?ˆ w2
i
?ˆ wi
?
ACKNOWLEDGEMENT
The authors thank Ray Watson for valuable discussions and suggestions during the preparation of this paper.
REFERENCES
1. Olkin I. Meta-analysis: methods for combining independent studies. Editor’s introduction. Statistical Science 1992;
7:226.
2. Hunter JE, Schmidt FL, Jackson GB. Meta-analysis: cumulating research ?ndings across studies. Sage: Beverly Hills,
1992.
3. Schmid JE, Koch GG, LaVange LM. An overview of statistical issues and methods of meta-analysis. Journal of
Biopharmaceutical Statistics 1991; 1:103–120.
4. Draper D. Graver DP, Goel PK, Greenhouse JB, Hedges LV, Morris CN, Tucker JR, Waternaux CM. Combining
Information: Statistical Issues and Opportunities for Research. National Academy Press: Washington, 1992.
Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840
Page 16
840
S. E. BROCKWELL AND I. R. GORDON
5. Cochran WG. Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical
Society, 1937; 4:(Supplement) 102–118.
6. Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine 1998; 17:841–
856.
7. DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7:177–188.
8. Peto R. Aspirin after myocardial infarction. Lancet 1980; 1:1172–1173.
9. Watson R. Elementary Mathematical Statistics. VABA Publishing: Melbourne, 1978.
10. Hardy RJ, Thompson SG. A likelihood approach to meta-analysis with random e?ects: Statistics in Medicine 1996;
15:619–629.
11. Biggersta? BJ, Tweedie RL. Incorporating variability in estimates of heterogeneity in the random e?ects model in
meta-analysis. Statistics in Medicine 1997; 16:753–768.
12. Yuan Zhang Li, Li Shi, Daniel RH. The bias of the commonly-used estimate of variance in meta-analysis.
Communications in Statistics; Part A – Theory and Methods 1994; 23:1063–1085.
13. Gordon IR. A simple method for dealing with between-study variation in meta-analysis. Statistical Consulting Centre,
University of Melbourne, Technical Report 1, 1998.
14. Thompson MT, Stewart HB. Nonlinear Dynamics and Chaos. Wiley: New York, 1986.
15. Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical Recipes in C. Cambridge University Press:
Cambridge, 1988.
Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840
Download full-text