Page 1

STATISTICS IN MEDICINE

Statist. Med. 2001; 20:825–840

A comparison of statistical methods for meta-analysis

Sarah E. Brockwell and Ian R. Gordon∗;†

Department of Mathematics and Statistics; Richard Berry Building; The University of Melbourne;

Victoria 3010; Australia

SUMMARY

Meta-analysis may be used to estimate an overall e?ect across a number of similar studies. A number of

statistical techniques are currently used to combine individual study results. The simplest of these is based

on a ?xed e?ects model, which assumes the true e?ect is the same for all studies. A random e?ects model,

however, allows the true e?ect to vary across studies, with the mean true e?ect the parameter of interest. We

consider three methods currently used for estimation within the framework of a random e?ects model, and

illustrate them by applying each method to a collection of six studies on the e?ect of aspirin after myocardial

infarction. These methods are compared using estimated coverage probabilities of con?dence intervals for

the overall e?ect. The techniques considered all generally have coverages below the nominal level, and in

particular it is shown that the commonly used DerSimonian and Laird method does not adequately re?ect

the error associated with parameter estimation, especially when the number of studies is small. Copyright

? 2001 John Wiley & Sons, Ltd.

1. INTRODUCTION

Meta-analysis refers to the process of locating, selecting, assessing and combining information rele-

vant to a particular research question. Over the past 20 years the number of published meta-analyses

and discussions on meta-analysis methodology has dramatically increased. This has occurred par-

ticularly in the areas of medical and epidemiological research [1], but also in the sociological and

behavioural sciences [2]. A Medline search found 269 meta-analyses published in 1990. This ?gure

has risen steadily, to 575 in 1997. Owing to this rapid rise in the popularity of meta-analysis, it

is becoming increasingly important that the methodology and statistics used are sound.

In general it is instructive to identify the sources of heterogeneity between studies, possibly by

modelling the outcome of interest in terms of features of the studies (‘meta-regression’). However,

in many meta-analyses the number of studies is small and such an approach is not feasible. This

paper focuses on the non-Bayesian statistical methods commonly used for meta-analysis, when the

goal of the analysis is to estimate an e?ect from a relatively small number of similar studies. Our

results suggest that, as usually applied, these methods have important de?ciencies. In particular,

∗Correspondence to: Ian R. Gordon; Department of Mathematics and Statistics; Richard Berry Building; The University of

Melbourne; Victoria 3010; Australia

†E-mail: i.gordon@ms.unimelb.edu.au

Received September 1998

Accepted February 2000

Copyright ? 2001 John Wiley & Sons, Ltd.

Page 2

826

S. E. BROCKWELL AND I. R. GORDON

con?dence intervals obtained from the combined information do not adequately account for the

variation introduced both from the data, and the estimation procedure itself.

The statistical methods are generally based on standard ?xed or random e?ects models. These are

outlined brie?y below, and the random e?ects model is discussed in more detail in the following

two sections.

Consider a collection of k studies, the ith of which has estimated e?ect size Yi and true e?ect

size ?i. A general model is then speci?ed by

Yi=?i+ ei

where ei

d=N(0;?2

i);i=1;2;:::;k

The eiindicate random deviations from the true e?ect size and are assumed independent with mean

zero and variance ?2

mean ?i and variance ?2

is (at least approximately) appropriate. Common examples are a log-odds ratio or di?erence in

means.

In general the parameter of interest is the overall e?ect, denoted by ?. The ?xed e?ects model

assumes ?i=? for i=1;2;:::;k, implying that each study in the meta-analysis has the same

underlying e?ect. Note that even if the ?i are assumed to be the same, the Yi are not identically

distributed due to the possibility of di?ering ?2

average of the Yi, with the optimal weights proportional to wi=1=var(Yi). In practice the variances

are not known so estimated variances ˆ ?2

is generally ignored in practice, but to indicate this estimation we use the notation ˆ ?2

Hence we de?ne ˆ wi=1= ˆ ?2

?ˆ wiYi

In contrast to the ?xed e?ects model, the random e?ects model does not assume that the ?iare

equal, but that they are normally distributed. This gives the two-stage model

i. This implies that the estimated e?ect size Yi is normally distributed with

i. Yi can be any measure of e?ect, provided the assumption of normality

i. The estimator of ? is generally a simple weighted

iare used to estimate both ? and var( ˆ ?). Any e?ect of this

ithroughout.

igiving

ˆ ?=

?ˆ wi

=? Yi

ˆ ?2

i

?? 1

ˆ ?2

i

and

?

var( ˆ ?)=1

?? 1

ˆ ?2

i

Yi=?i+ ei

where ei

d=N(0; ˆ ?2

i)

?i=? + ?i

where ?i

d=N(0;?2)

?

(1)

The error terms eiand ?iare assumed to be independent. In this case, the true e?ect for study i

is centred around the overall e?ect, allowing individual studies to vary both in estimated e?ect and

true e?ect. The random e?ects variance parameter ?2is a measure of the heterogeneity between

studies. Note that the ?xed e?ects model is a special case of the random e?ects model, with ?2=0.

It is generally agreed [3;4] that in the presence of heterogeneity, the random e?ects model should

be used. Heterogeneity is commonly tested using a statistic de?ned by Cochran [5]: Qw=?wi(Yi−

the variances ?2

giving the statistic

Qˆ w=?ˆ wi(Yi− ˆ ?)2

which is approximately ?2

and the hypothesis of homogeneity is rejected. In such cases the random e?ects model is generally

ˆ ?)2. The null hypothesis of homogeneity is H0:?2=0 against a one-sided alternative. If we assume

iare known, then under H0; Qw

d=?2

k−1. In practice, estimates ˆ wi=1= ˆ ?2

iare used

k−1under H0. A large value of Qˆ windicates large study-to-study variation,

Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840

Page 3

STATISTICAL METHODS FOR META-ANALYSIS

827

Table I. Columns 2–5 show the sample sizes and observed proportions dying for the six stud-

ies on the e?ect of aspirin after myocardial infarction. Columns 6 and 7 give the observed

log-odds ratio and estimated variances for each of the studies. For a discussion of the ?xed

e?ects and DerSimonian and Laird random e?ects weights see Section 5.

StudyTreatment Controllog(ORi)ˆ ?2

i

Weights

FE∗

nti

ˆ pti

nci

ˆ pci

RE∗

1

2

3

4

5

6

615

758

317

832

810

2267

0.0797

0.0580

0.0852

0.1226

0.1049

0.1085

624

771

309

850

406

2257

0.1074

0.0830

0.1054

0.1482

0.1281

0.0970

−0:3289

−0:3845

−0:2158

−0:2196

−0:2257

0.1246

0.0389

0.0412

0.0753

0.0205

0.0352

0.0096

0.11

0.10

0.05

0.20

0.12

0.43

0.15

0.14

0.09

0.20

0.15

0.26

∗Fixed e?ects weights

ˆ ?−2

i

?

ˆ ?−2

i

; random e?ects weights

( ˆ ?2

i+ ˆ ?2)−1

?( ˆ ?2

i+ ˆ ?2)−1.

adopted. Hardy and Thompson [6] suggest, however, that the power of this test can be low. They

therefore recommend that this should not be the only means by which the ?xed e?ects model is

rejected.

The random e?ects model is outlined and discussed in detail in Sections 2 and 3. In particular

we consider methods for estimating ? and ?2, and for incorporating ˆ ?2into ˆ ?ˆ ?– the random e?ects

estimator of ? – and the variance of ˆ ?ˆ ?. In Section 2 we outline the commonly-used DerSimonian

and Laird [7] random e?ects method, and in Section 3, likelihood techniques.

In Section 4, con?dence intervals for ?, calculated using several methods, are compared using

coverage probabilities. We ?rst compare the ?xed e?ects method and DerSimonian and Laird

random e?ects method, where these are used irrespective of the observed value of Qˆ w. These are

also compared to a Q-based method which re?ects the common practice of using the ?2-test to

determine whether a ?xed or random e?ects approach should be adopted. We also compare the

DerSimonian and Laird intervals to two likelihood based intervals.

In Section 5 four methods are applied to a collection of studies on the e?ect of aspirin after

myocardial infarction. This set of studies appears repeatedly in the literature [4;7;8], owing to its

interesting range of observed measures of e?ect and sample sizes.

The collection consists of six studies, each examining the e?ect of aspirin after myocardial

infarction. In each study the number of patients who died after having been given either aspirin

or a control drug is recorded. Sample sizes for all the studies are quite large – the smallest

involving 626 patients. Of the six studies however, one is particularly large, involving a total of

4524 patients. Table I gives the sample sizes, ntiand nciand the proportions dying ˆ ptiand ˆ pcifor

each of the treatment (t) and control (c) groups, where i denotes the study number. As shown,

the large sixth study is the only one for which ˆ pt¿ ˆ pc.

Table I also gives the observed log-odds ratios (log(OR)) and corresponding estimated variances

where

ORi=

ˆ pti(1 − ˆ pci)

(1 − ˆ pti) ˆ pci

andˆ ?2

i= ?

var[log(ORi)]=1

xti

+

1

nti− xti

+

1

xci

+

1

nci− xci

and xti and xci denote the observed number of deaths for the treatment and control groups re-

spectively for study i. From this table we can see that for the sixth study the log-odds ratio is

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 4

828

S. E. BROCKWELL AND I. R. GORDON

Figure 1. Estimated log-odds ratios and con?dence intervals for the six studies used

in the aspirin meta-analysis.

considerably di?erent from that of the remaining ?ve studies. This table also shows the e?ect of

the large di?erences in sample sizes with the estimated variance for the sixth study being consid-

erably smaller than for the other ?ve studies. Figure 1 shows the estimated log-odds ratios and

corresponding con?dence intervals (using a normal approximation) for each of the six studies.

The results of combining these studies, using four di?erent methods, are presented and discussed

in Section 5. In Section 6 some conclusions are drawn and alternative methods are brie?y discussed.

2. THE RANDOM EFFECTS MODEL AND ESTIMATION OF ?2

The random e?ects model given in (1) can also be written

Yi=? + ?i+ ei

where ei

d=N(0; ˆ ?2

i) and?i

d=N(0;?2)

relating the Yi directly to the overall measure of e?ect ?. By the independence of ?i and ei we

then have Yi

?ˆ wi(?)Yi

with variance

d=N(?; ˆ ?2

i+ ?2). A weighted average is again used to estimate ?, giving

ˆ ??=

?ˆ wi(?)

var( ˆ ??)=

1

?ˆ wi(?)

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 5

STATISTICAL METHODS FOR META-ANALYSIS

829

where ˆ wi(?)=[?2+ ˆ w−1

i

]−1and ˆ wi are as de?ned above. Assuming ?2is known, we then have

?

Note that ˆ wi(?)6 ˆ wi. This implies var( ˆ ??)¿var( ˆ ?), and hence random e?ects model con?dence

intervals for ? are generally wider than those constructed from the ?xed e?ects model.

In practice, ?2is unknown. The most commonly used estimator of ?2is a method of moments

based estimator proposed by DerSimonian and Laird [7], derived by equating an estimate of the

expected value of Qˆ w with its observed value. Note that

ˆ ??

d=N ?;

1

?ˆ wi(?)

?

E(Qˆ w)=k − 1 + ?2

??ˆ wi−

?ˆ w2

i

?ˆ wi

?

Suppose that t is obtained by solving

qˆ w=k − 1 + t

??ˆ wi−

?ˆ w2

i

?ˆ wi

?

giving t =

qˆ w− (k − 1)

?ˆ wi−?ˆ w2

i=?ˆ wi

It is possible that this value is negative, which is unacceptable as a value for ?2, so we de?ne

?

ˆ ?2=

t

0

if t¿0

if t60

Note that due to the truncation, ˆ ?2is a biased estimator for ?2.

DerSimonian and Laird proposed that this estimate can then be incorporated into the random

e?ects weights giving

ˆ wi(ˆ ?)=(ˆ ?2+ ˆ ?2

i)−1

An estimator of ? is then given by

ˆ ?ˆ ?=

?ˆ wi(ˆ ?)Yi

?ˆ wi(ˆ ?)

with variance estimated by

?

var( ˆ ?ˆ ?)=

1

?ˆ wi(ˆ ?)

Note that this is simply a straight substitution of ˆ ?2into the variance of ˆ ??, derived assuming ?2

is known. In obtaining con?dence intervals for ? using ˆ ?ˆ ?and ?

respectively.

Two main issues arise from the general application of the random e?ects model as described

above:

var( ˆ ?ˆ ?), it is common practice to

iin place of ?2and ?2

maintain the assumption of normality for ˆ ?ˆ ?, despite the use of ˆ ?2and ˆ ?2

i,

(i) The assumption of normality poses problems, ?rst in its validity, and secondly in our ability

to check that validity for meta-analyses based on a small number of studies. In particular,

the assumption of normally distributed random e?ects, or between study errors ?i is not

easily veri?ed or justi?ed. The issue of validating the assumption of normality is addressed

by Hardy and Thompson [6], however they consider only relatively large values of k.

Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840

Page 6

830

S. E. BROCKWELL AND I. R. GORDON

(ii) The variation between true study e?ect sizes is taken into account via the inclusion of

errors with variance ?2. It is however only an estimate of this variance which is added

into the weights, and the model takes no account of the uncertainty associated with this

estimate. In particular the distribution used for ˆ ?ˆ ?is not altered. As shown in Section 4.2,

this results in con?dence intervals for ? which are narrower on average than they should

be. It is common practice to use a t-distribution to account for the error associated with

a variance estimate [9]. This approach is not valid in the random e?ects meta-analysis

context.

3. LIKELIHOOD METHODS

Maximum likelihood theory is widely used for estimation and inference. In this section we review

two methods which use established likelihood theory to obtain con?dence intervals for ?. These

are considered as alternatives to the DerSimonian and Laird method.

3.1. Estimating ? and ?2using maximum likelihood

Recall that the standard random e?ects model has Yi

ˆ ?2

d= N(?; ˆ ?2

i+ ?2); i=1;2;:::;k and that the

iare treated as known constants. The log-likelihood function is

log L(?;?2)=−1

2

?log(2?( ˆ ?2

i+ ?2))−1

2

?(yi− ?)2

ˆ ?2

i+ ?2;?∈R; ?2¿0(2)

Maximum likelihood estimates ˆ ?mland ˆ ?2

details).

One major advantage of maximum likelihood estimation lies in the large body of asymptotic

theory existing for such estimators. In regular cases a maximum likelihood estimator from a sample

of k independent and identically distributed random variables has a normal distribution as k →∞.

It should be noted that the k variables Yi; i=1;2;:::;k from a meta-analysis are independent but

not identically distributed, since var(Yi)= ˆ ?2

standard assumptions will still apply however.

Using this asymptotic distribution, it is possible to construct a con?dence interval for ?. How-

ever, this is only an approximate interval, since the asymptotic variance of ˆ ?mldepends on the

unknown ?2. This variance is derived from the covariance matrix of (ˆ ?ml; ˆ ?2

mlcan be found in standard ways (see Appendix A for

i+?2. In any realistic meta-analysis with large k, the

ml) and is given by

var(ˆ ?ml)=

1

?( ˆ ?2

i+ ?2)−1

Under the assumption of asymptotic normality we therefore have

ˆ ?ml

d∼N

?

?;

1

?( ˆ ?2

i+ ?2)−1

?

as k →∞

This distribution is used for ˆ ?mleven though the likelihood estimate of ?2may lie on the boundary

of the parameter space, namely, ?2=0.

Note that the variance of ˆ ?mlis of the same form as that for the DerSimonian and Laird

random e?ects method. In this case var(ˆ ?ml) is estimated using ˆ ?2

mlwithout any modi?cation to

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 7

STATISTICAL METHODS FOR META-ANALYSIS

831

the distribution of ˆ ?ml; 95 per cent con?dence intervals are therefore

ˆ ?ml±

1:96

i+ ˆ ?2

√{?( ˆ ?2

ml)−1}

This method is referred to as the simple likelihood method.

3.2. Pro?le likelihood intervals

An alternative method, which uses a pro?le likelihood function, is proposed by Hardy and

Thompson [10]. Unlike the simple likelihood method this method allows for asymmetric inter-

vals and some imprecision in the estimate of ?2.

The pro?le likelihood function for ? is de?ned as

L?(?0)=L(?0; ˆ ?2(?0))

where ˆ ?2(?0) satis?es

ˆ ?2(?0)=??(yi− ?0)2− ˆ ?2

i

( ˆ ?2

i+ ˆ ?2(?0))2

???

1

( ˆ ?2

i+ ˆ ?2(?0))2

(3)

Clearly ˆ ?2(?0) is not assumed ?xed for all ?0.

An approximate 95 per cent con?dence interval for ? is then given by values of ?0which satisfy

log(L?(?0))¿log(L(ˆ ?ml; ˆ ?2

ml)) −1

2C0:95(?2

1) (4)

where C?(?2

of this method are given in Appendix B.

Biggersta? and Tweedie [11] use a similar method to ?nd a con?dence interval for ?2, but as

the focus of this paper is ?nding con?dence intervals for ?, intervals for ?2are not considered in

detail here.

1) is the ?-quantile of the ?2

1distribution. Details of the derivation and implementation

4. COMPARISON OF METHODS

4.1. Coverage probabilities and simulation methods

The coverage probability of a random interval (A;B) for ? is de?ned as Pr(?∈(A;B)) which

– for a nominal 95 per cent con?dence interval – should be close to 0.95. The exact coverage

can only be found if the distribution of the interval is known. If, however, as is more common,

the distribution is unknown, the coverage probability must be estimated using simulation. This is

done by simulating a large number of meta-analyses and for each meta-analysis calculating the

appropriate con?dence interval. The estimated coverage probability is then the proportion of these

intervals which contain ?.

The coverage probability is usually dependent on the parameters of the model and so the cov-

erages presented below are estimated for a range of values of ?2and k. For all the intervals

considered here, the coverage probability is not dependent on the value of ? since the procedure

is invariant with respect to a location shift. The data for each meta-analysis are simulated using

the random e?ects model described in Section 1, assuming normal errors eiand ?i, with zero mean

and variances ˆ ?2

iand ?2, respectively. For all simulations we use ?=0:5.

Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840

Page 8

832

S. E. BROCKWELL AND I. R. GORDON

Figure 2. Simulated meta-analyses with ?=0:5 and k =10. In graph (a) ?2=0:03, a small

amount of between study variation. In graph (b) ?2=0:07; the larger amount of between-study

variation manifests itself in a larger spread of estimates. The width of the con?dence intervals

for ? indicate the range of values of ˆ ?2

i used.

The coverage probability is then estimated by simulating 25000 meta-analyses of k studies, with

?=0:5 and ?2as speci?ed. A 95 per cent con?dence interval is then calculated for each of these

meta-analyses and the coverage is estimated as the proportion, out of 25000, which contain the

parameter value, ?=0:5. Since the true coverages are generally greater than 0.9, the standard

errors of the estimated coverage probabilities are essentially 60:002.

To give the simulations authenticity, parameter values are chosen to correspond to a typical

scenario for estimating a common log-odds ratio (that is, exp(?) is the parameter of interest). Ac-

cordingly, the Yi may be considered sample log-odds ratios, and the ˆ ?2

These variances are realizations from a ?2

lie within the interval (0:009;0:6). Values produced in this way are consistent with the typical dis-

tribution of ˆ ?2

in order to allow for the sampling error in these values, which although present in practice, is not

accounted for in the methods considered.

The simulation procedure described above is implemented for a given k and ?2. The procedure

is repeated for each of 11 values of ?2between 0.0 and 0.1, and values of k between 3 and 35.

Figure 2 shows two simulated meta-analyses with ?2=0:03 and ?2=0:07; in each case ?=0:5

and k =10. Such graphs were in part used to determine suitable values for ?2and the ˆ ?2

The simulations are implemented using a program written in C++, with each simulation gener-

ating k observations from normal distributions with mean ? and variance ˆ ?2

used to calculate the ?xed e?ects estimate of ?, the observed value of Qˆ w, the DerSimonian and

Laird estimate of ?2and the corresponding con?dence intervals for ?. Since in practice the out-

come of the ?2-test using Qˆ wis often used to determine which method shall be used, a combined

?xed e?ects=random e?ects method is considered here. This method re?ects the ?2-test procedure,

selecting the ?xed or random e?ects methods as determined by comparing the observed value of

Qˆ w with the 0.95 quantile of the ?2

itheir estimated variances.

1distribution, multiplied by 0.25 and then restricted to

ifor log-odds ratios in practice. The ˆ ?2

iare varied for each of the 25000 simulations

i.

i+?2. The data are then

k−1distribution. This is referred to as the Q-based method.

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 9

STATISTICAL METHODS FOR META-ANALYSIS

833

Figure 3. Estimated coverage probabilities for the ?xed e?ects (FE), DerSimonian and Laird random e?ects

(DL) and Q-based (Q) methods, for varying k and ?2. In graph (a) k =10 and in graph (b) k =20. In graph

(c) ?2=0:03 and in graph (d) ?2=0:07.

Estimated coverage probabilities for the ?xed e?ects method, the Q-based method, the

DerSimonian and Laird random e?ects method, the simple likelihood method and the pro?le

likelihood method are presented in the following section.

4.2. Simulation results

The results of the simulations are presented in two ?gures showing the estimated coverage prob-

abilities for di?erent values of k and ?2. Figure 3 shows the estimated coverage probabilities for

the ?xed e?ects method (FE). Here the ?xed e?ects method is used, despite the data being sim-

ulated with ?2¿0 in most cases. For ?2=0 the estimated coverage probability is close to 0.95

as expected, however increasing ?2substantially reduces the coverage. This re?ects the theoretical

coverage probability result for the ?xed e?ects model speci?ed by

coverage probability=?(z) − ?(−z)where z =

1:96

??

1 + ?2

?ˆ w2

i

?ˆ wi

?

(5)

and ? denotes the standard normal cumulative distribution function. A brief derivation of this

result is given in Appendix C. Clearly this coverage probability decreases as ?2increases.

In practice it is more common for the ?xed e?ects method to be used only after the ?2-test,

based on Qˆ w, suggests that the hypothesis H0:?2=0 should not be rejected. However if this test is

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 10

834

S. E. BROCKWELL AND I. R. GORDON

Figure 4. Coverages for the DerSimonian and Laird random e?ects (DL), pro?le likelihood (PL) and simple

likelihood (SL) methods, for varying k and ?2. Note that the scale di?ers from that in Figure 3. In graph

(a) k =10 and in graph (b) k =20. In graph (c) ?2=0:03 and in graph (d) ?2=0:07.

used to select an appropriate method, the ?xed e?ects method is selected a considerable proportion

of the time, even when the true value of ?2is large. For example, with k =10 and ?2=0:06, the

?xed e?ects method is selected an estimated 58 per cent of the time. This percentage decreases as

k increases, with the ?xed method selected in 26 per cent of the cases with k =25 and ?2=0:06.

Coverage probabilities for the Q-based method are also given in Figure 3. Not surprisingly,

these coverage probabilities lie between the ?xed e?ects and DerSimonian and Laird random

e?ects coverages (denoted DL in Figure 3). These latter coverage probabilities are for con?dence

intervals for ? constructed using the DerSimonian and Laird method in all cases, irrespective of

the value of Qˆ w. As indicated in Figure 3, any use of the ?xed e?ects method substantially reduces

the coverage probability.

Figure 4 shows estimated coverage probabilities for the two random e?ects, likelihood based

methods. Coverage probabilities for the DerSimonian and Laird method are also included for

comparison. Note that the scale di?ers from that of Figure 3. These plots indicate that, except when

?2is close to zero, the coverage probabilities for all three methods are below 0.95. For all values

of k and ?2the coverage probabilities for the simple likelihood method are below those of the

other two methods. Similarly the pro?le likelihood method consistently has the highest estimated

coverage probability. For the two cases where data are simulated with ?2=0, all three methods

have coverage probabilities above 0.95, some signi?cantly so. For these simulations we obtain an

estimated value ˆ ?2¿0, resulting in wider con?dence intervals and hence coverage probabilities

higher than the nominal value.

Copyright ? 2001 John Wiley & Sons, Ltd. Statist. Med. 2001; 20:825–840

Page 11

STATISTICAL METHODS FOR META-ANALYSIS

835

Table II. Estimated overall log-odds ratios and corresponding con?dence intervals for four di?erent

meta-analyses of the e?ect of aspirin after myocardial infarction.

Methodˆ ?95 per cent CIˆ ?2

Fixed e?ects

DerSimonian and Laird

Pro?le likelihood

Simple likelihood

−0:1015

−0:1689

−0:1175

−0:1175

(−0:2269;0:0238)

(−0:3609;0:0231)

(−0:3696;0:0352)

(−0:3902;0:0352)

?2=0 (assumed)

ˆ ?2=0:0269

ˆ ?2

ˆ ?2

ml=0:0390

ml=0:0390

The DerSimonian and Laird method for establishing con?dence intervals for ? is the most

commonly used technique, yet estimated values indicate that even for a large number of studies

the con?dence intervals obtained have a coverage probability below 0.95. This suggests that the

use of ˆ ?2in the estimation of ? and the standard error of ˆ ?ˆ ?, combined with the use of a normal

approximation for ˆ ?ˆ ?, produce intervals which are on average too narrow. This problem also arises

when using the simple likelihood method. The standard error of ˆ ?mlis estimated using ˆ ?2

modi?cation to the assumed distribution for ˆ ?ml. In both cases, coverage probabilities estimated

using the true ?2are close to the nominal value.

ml, without

5. AN EXAMPLE: ASPIRIN AFTER MYOCARDIAL INFARCTION

In this section, the four methods compared in Section 4 are applied to six studies on the e?ect

of aspirin after myocardial infarction reviewed in Section 1. Recall that one of the six studies is

considerably larger than the others and is the only study with an odds ratio greater than one.

The homogeneity statistic for these data is qˆ w=9:88 on 5 degrees of freedom, giving P =0:08

for the hypotheses H0:?2=0 versus H1:?2¿0. Whilst the null hypothesis is not rejected at the

0.05 level, the test does bring into question the validity of the ?xed e?ects model. This is brought

about solely by the large sixth study, since the homogeneity statistic for the ?rst ?ve studies is

qˆ w=0:63 on 4 degrees of freedom, giving P¿0:9. Adopting the random e?ects model for all six

studies, we obtain ˆ ?2=0:0269, using the DerSimonian and Laird estimator.

The ?xed e?ects method and three random e?ects methods have all been used to combine these

data. Table II gives estimated values of ?, the overall e?ect of aspirin versus the control, and ?2;

95 per cent con?dence intervals for ? are also presented. The three random e?ects estimates of

? are all considerably smaller than that obtained for the ?xed e?ects model. This is due to the

reduction in the weight ascribed to the large sixth study, which has a positive log-odds ratio.

All of the con?dence intervals obtained include zero, and the upper bounds for each are relatively

similar. The ?xed e?ects con?dence interval is considerably narrower than the three random e?ects

based methods. A rough estimate of the ?xed e?ects coverage probability for these data can be

obtained from the theoretical coverage given in Section 4.2. Using ˆ ?2=0:0269 and the estimated

values of ?2

igiven in Table I we obtain

z =

1:96

√(1 + 0:0269

?(ˆ ?−2

i

)2

?ˆ ?−2

i

)

=1:193

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 12

836

S. E. BROCKWELL AND I. R. GORDON

and estimated coverage probability

?(z) − ?(−z)=0:77

well below the nominal level of 0.95.

The markedly lower bound for the three random e?ects based methods are due to the allowance

of a non-zero value for ?2in the model and the corresponding change in weights. Table I shows

the weights allocated to each study, using both the ?xed and random e?ects models. As can be

seen in Table I, the ?xed e?ects model assigns a large weight to studies with small variance, and

small weights to those with large variance – generally corresponding to smaller sample sizes. In

the random e?ects model, however, the weights tend to be relatively similar, and the sixth study

now counts for only 26 per cent of the total weight, as opposed to 43 per cent in the ?xed e?ects

model. In general, in the random e?ects approach, large studies are downweighted relative to the

?xed e?ects model, and smaller studies given a greater weighting in estimation.

These tendencies produce interesting results for the aspirin meta-analysis. It is generally expected

that making adjustments to the weights does not substantially a?ect the estimate of ?, giving

ˆ ? ≈ ˆ ?ˆ ?. Similarly, since ˆ ?−2

the standard error of ˆ ?. The expected result is therefore that

????

0:0980=SE(ˆ ?ˆ ?), this di?erence is negated by the change in the estimates of ?: ˆ ?= − 0:1015

and ˆ ?ˆ ?= − 0:1689. We then have |zFE|=1:587¡1:724=|zRE|. Again this e?ect is a result of the

change in weights allocated to the sixth study.

This outcome is interesting, since to some extent it suggests a problem with the way in which

the random e?ects model attempts to increase the uncertainty associated with model estimates.

By adding extra variation into the model, we expect to make it more di?cult to reject a null

hypothesis such as H0: ?=0, and on average this is true. As shown by the aspirin meta-analysis,

however, it can occur that the random e?ects model produces a larger z-statistic, and hence a

smaller P-value.

i

= ˆ wi¿ ˆ wi(ˆ ?)=(ˆ ?2

i+ ˆ ?2)−1we have SE(ˆ ?)¡SE(ˆ ?ˆ ?), where SE(ˆ ?) is

|zFE|=

ˆ ?

SE(ˆ ?)

????¿

????

ˆ ?ˆ ?

SE(ˆ ?ˆ ?)

????=|zRE|

For the aspirin meta-analysis, however, we observe the reverse e?ect. Although SE(ˆ ?)=0:0640¡

6. COMMENTS

As demonstrated by the coverage probabilities presented in Section 4, the ?xed e?ects method

does not perform well unless there is very little between-study variation. In practice this would

rarely be the case, and should not be assumed to be so. It has also been demonstrated that use

of the ?2-test to determine which model is appropriate leads to con?dence intervals which are

on average too narrow. For small k in particular, the ?xed e?ects method is frequently selected,

even when ?2is large. As shown, this substantially reduces the coverage probability of con?dence

intervals for ?. It is therefore recommended that the random e?ects model be adopted irrespective

of the outcome of the ?2-test for heterogeneity. This then simpli?es to the ?xed e?ects method

only when ˆ ?2=0.

The random e?ects methods generally perform better than the ?xed e?ects methods, with respect

to coverage probabilities. However, particularly when the number of studies is modest (fewer than

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 13

STATISTICAL METHODS FOR META-ANALYSIS

837

20), the commonly used DerSimonian and Laird method has coverage probability considerably

below 0.95. This suggests that the error associated with the estimation of ?2is not adequately

being accounted for either through modi?cations to ?

imation, despite this estimation. Although it lacks the simplicity of the other three techniques, the

pro?le likelihood method produced the highest coverage probabilities in all cases. In particular,

coverage probabilities for small k were considerably closer to 0.95 than for the other two random

e?ects methods.

For both the ?xed and random e?ects methods, inference is carried out ignoring the sampling

errors in the individual study variances. Estimated values ˆ ?2

form of ˆ ?, its variance or distribution. It has been shown, however, that the ?xed e?ects variance

estimate, ?

unless all studies in the meta-analysis are small.

Non-Bayesian alternatives to the standard random e?ects approaches considered here have re-

cently been proposed in an attempt to incorporate more adequately the estimation of ?2into the

random e?ects model. The ?rst, outlined by Biggersta? and Tweedie [11] utilizes an approximate

distribution for ˆ ?2to modify the estimation of random e?ects weights. These new weights are then

used in the estimation of ? and its variance. However the variance of ˆ ?ˆ ?is still derived assuming

the weights are known, and the assumption of normality for ˆ ?ˆ ?is maintained. A second alternative

involves using an overdispersed generalized linear model to estimate the overall e?ect ?, with the

heterogeneity between studies being re?ected by the overdispersion parameter [13].

A third approach, currently being developed, uses simulations to model the 0.975 quantile of

(ˆ ?ˆ ?− ?)=√var(ˆ ?ˆ ?) as a function of k. The appropriate value, which will be greater than 1.96, is

then used in con?dence intervals for ?. Clearly all three methods need to be examined further

and incorporated into a comparative study of meta-analysis techniques. They do however provide

possible improvements to the de?ciencies highlighted in currently used techniques.

var(ˆ ?ˆ ?) or the distribution used for ˆ ?ˆ ?. Like

the DerSimonian and Laird method, the greatest source of error in simple likelihood con?dence

intervals comes from estimating var(ˆ ?ml) by substituting ˆ ?2

mlin for ?2; and using a normal approx-

iare used without modi?cation to the

var(ˆ ?), has a negative bias [12]. The estimation of ?2

random e?ects coverage probabilities. Hardy and Thompson [10] suggest however that in the case

of maximum likelihood procedures, allowing for the estimation of ?2

imight also be expected to in?uence

idoes not greatly a?ect results

APPENDIX A: MAXIMUM LIKELIHOOD ESTIMATION

If the log-likelihood function, given in (2), is partially di?erentiated with respect to ? and ?2and

the derivatives are set to zero, some algebraic arrangement gives

??

ˆ ?2

(ˆ ?2

ˆ ?m=?

m=? (yi− ˆ ?m)2− ˆ ?2

yi

ˆ ?2

i+ ˆ ?2

m

1

ˆ ?2

i+ ˆ ?2

??

m

(A1)

i

i+ ˆ ?2

m)2

1

(ˆ ?2

i+ ˆ ?2

m)2

(A2)

The maximum likelihood estimates ˆ ?mland ˆ ?2

mlare then

?(ˆ ?m; ˆ ?2

(ˆ ?;0)

(ˆ ?ml; ˆ ?2

ml)=

m)if ˆ ?2

m¿0

if ˆ ?2

m60

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 14

838

S. E. BROCKWELL AND I. R. GORDON

where ˆ ? is the ?xed e?ects estimate of ?. Note that since ˆ ?2

required.

Equations (A1) and (A2) must be solved iteratively. Substituting equation (A1) into (A2) we

obtain

mmay be less than zero, truncation is

ˆ ?2

m=f(ˆ ?2

m)

where f is the resulting function of ˆ ?2

mand the data. This can be solved for ˆ ?2

musing the iteration

ˆ ?2

t=f(ˆ ?2

t−1) (A3)

This simple dynamic system can be iterated until it converges to a ?xed point which will be the

desired estimate, and ˆ ?mcan then be evaluated by substituting ˆ ?2

For our simulations the iterations are initialized with ˆ ?2

estimates converge to within 10−6. The small value is added to the DerSimonian and Laird estimate

of ?2to prevent iterations starting at zero in cases where ˆ ?2= 0.

The convergence of this iterative procedure is not guaranteed however. Simulations show that in

some cases the iterative procedure does not converge to a single ?xed point, but to a limit cycle

of higher order, most commonly two. Whilst convergence to such cyclic behaviour does not often

occur, the possibility implies that the iterative procedure will not necessarily produce estimates

which maximize the likelihood function [14].

An obvious alternative to the iterative routine is to ?nd maximum likelihood estimates by direct

maximization of the likelihood function. The requirement that ˆ ?2

subject to this constraint. This method is used for our simulations where the iterative procedure

has not converged within 1000 iterations. It is implemented using Powell’s algorithm [15], which

minimizes the negative log-likelihood function.

minto (A1).

m= ˆ ?2+ 0:01 and repeated until both

ml¿0 necessitates maximization

APPENDIX B: PROFILE LIKELIHOOD INTERVALS

The pro?le likelihood interval is derived using the standard result for the asymptotic distribution

of a likelihood ratio statistic [9]. Applying this result and assuming H0:?=?0is true we have

?L?(?0)

provided L?(ˆ ?ml) is the maximum likelihood value under the general hypothesis. Note

that L?(ˆ ?ml) = L(ˆ ?ml; ˆ ?2

mates.

Finding the pro?le likelihood interval requires ?nding values of ?0which satisfy (4). This can

be achieved by implementing an iterative search for the lower and upper bounds of the interval.

We begin by selecting a value of ?0 well below the expected lower bound of the con?dence

interval. Substituting this into (3) we can iteratively solve for ˆ ?2(?0). A value of log(L?(?0)) is

then obtained by substituting both ?0 and ˆ ?2(?0) into the log-likelihood equation; log(L?(?0)) is

then compared to the right-hand side of (4). If the value of ?0 is either too small (and hence

the inequality is false), or too large, so it is within the interval rather than on the boundary, it

is adjusted and log(L?(?0)) recalculated. This is repeated until a con?dence bound is obtained to

within 10−6. A similar procedure is then used to ?nd the upper bound of the con?dence interval.

−2log

L?(ˆ ?ml)

?

d→?2

1

as k →∞

ml) – the likelihood function evaluated at the maximum likelihood esti-

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840

Page 15

STATISTICAL METHODS FOR META-ANALYSIS

839

As noted, each evaluation of log(L?(?0)) requires ?nding ˆ ?2(?0) which satis?es (3). As with

the iterative procedure discussed in Appendix A, iterations of this function will not necessarily

converge to a single value. Simulation suggests that for small ?2and small k, convergence to a

cycle is more likely to occur. It is possible, however, to ?nd ˆ ?2(?0) by directly maximizing the

likelihood function with respect to ?2, with ?=?0?xed. Again this maximization is implemented

using Powell’s algorithm [15].

APPENDIX C: FIXED EFFECTS COVERAGE PROBABILITY

If the ?xed e?ects method is used for any ?2¿0, the coverage probability is as given in (5). In

such cases we have Yi

?ˆ wiYi

The ?xed e?ects coverage probability is given by c, where

?

Standardizing ˆ ? we have

?−1:96

d=N(0;1). Hence

d=N(?; ˆ ?2

i+ ?2). This gives

ˆ ?=

?ˆ wi

d=N

?

?;

?ˆ wi+ ?2?ˆ w2

i

(?ˆ wi)2

?−1:96

?

c=Prˆ ? −

1:96

√?ˆ wi¡?¡ˆ ? +

1:96

√?ˆ wi

?ˆ wi

?

=Pr

√?ˆ wi¡ˆ ? − ?¡

1:96

√?ˆ wi

1:96

√?ˆ wi

?

c=Pr

√?ˆ wi

√(?ˆ wi+ ?2?ˆ w2

i)¡Z¡

?ˆ wi

√(?ˆ wi+ ?2?ˆ w2

i)

?

where Z

c = Pr(Z¡z) − Pr(Z¡ − z) = ?(z) − ?(−z)

where

z =

1:96

??

1 + ?2

?ˆ w2

i

?ˆ wi

?

ACKNOWLEDGEMENT

The authors thank Ray Watson for valuable discussions and suggestions during the preparation of this paper.

REFERENCES

1. Olkin I. Meta-analysis: methods for combining independent studies. Editor’s introduction. Statistical Science 1992;

7:226.

2. Hunter JE, Schmidt FL, Jackson GB. Meta-analysis: cumulating research ?ndings across studies. Sage: Beverly Hills,

1992.

3. Schmid JE, Koch GG, LaVange LM. An overview of statistical issues and methods of meta-analysis. Journal of

Biopharmaceutical Statistics 1991; 1:103–120.

4. Draper D. Graver DP, Goel PK, Greenhouse JB, Hedges LV, Morris CN, Tucker JR, Waternaux CM. Combining

Information: Statistical Issues and Opportunities for Research. National Academy Press: Washington, 1992.

Copyright ? 2001 John Wiley & Sons, Ltd.Statist. Med. 2001; 20:825–840