Content uploaded by Evangelos Kontopantelis

Author content

All content in this area was uploaded by Evangelos Kontopantelis

Content may be subject to copyright.

1

Letter. Performance of statistical methods for meta-analysis when

true study effects are non-normally distributed: a comparison

between DerSimonian-Laird and Restricted Maximum

Likelihood.

Evangelos Kontopantelis

National Primary Care Research and Development Centre

University of Manchester, Williamson Building 5

th

floor.

Oxford Road, M13 9PL

UK

e.kontopantelis@manchester.ac.uk

David Reeves

Health Sciences Primary Care Research Group

University of Manchester, Williamson Building 5

th

floor.

Oxford Road, M13 9PL

UK

In a recent paper we evaluated the performance of seven different methods for random-

effects meta-analyses under various non-normal distributions for the effect sizes.

1

However,

due to computational limitations we did not include Restricted Maximum Likelihood

(REML) estimator for the between-study variance. Lately, we have observed that the iterative

REML approach has been increasingly replacing the non-iterative DerSimonian-Laird (DL)

as the method of choice in published meta-analyses. Jackson et al examined the performance

of the two methods in terms of coverage, for normally distributed effects only, and found that

results for the two methods were similar.

2

However, REML requires an assumption that study

effects are normally distributed, which DL does not, and so the two methods may differ more

substantially when this assumption is violated.

Using the same simulation method and scenarios as in our previous paper, we assessed

the performance of REML in terms of coverage, power and overall effect estimation when

effect sizes do not follow a normal distribution. REML is a computationally expensive

2

iterative method (it took several months and a few computers to complete the simulations in

STATA

3

) which estimates the between study variance

and effect by maximising the

restricted log-likelihood function:

log

(

,

)

=

1

2

log

{

2

(

+

)}

+

(

)

(

+

)

1

2

log

1

(

+

)

, &

0

(1)

where is the number of studies being combined,

and

are the effect and variance

estimates for study and is the overall effect estimate with =

[

]

[

]

. Non-

negativity for

must be enforced at each iteration and iteration continues until convergence

or the maximum number of iterations is reached. REML is considered an improvement over

Maximum Likelihood (ML) since it adjusts for the loss of degrees of freedom due to the

estimation of the overall effect .

4

Non-convergence is a possibility, although it was rare in

our simulations (around 0.1%). The method has been implemented in the STATA command

metaan.

5

Coverage, power and confidence interval estimation (estimated confidence interval as a

percentage of the interval based on the true between-study variance) for the REML method

are presented in table1, with results for DL provided for comparison. The two methods

performed very similarly across all scenarios. In terms of coverage DL outperformed REML

slightly, by a maximum of 2%, particularly when heterogeneity was low. As expected the

picture was reversed with regards to power, with REML performing slightly better

(maximum 2% for

= 0). As heterogeneity and/or the number of studies increased the two

methods converged to almost identical results. However, power for DL caught up somewhat

quicker than coverage for REML. Results for confidence interval estimation do not identify a

clear ‘winner’: DL performed better (maximum 2%) in cases of small or moderate

heterogeneity, and more so with larger number of studies, while REML returned a slightly

more accurate interval (maximum 1%) in certain large study number scenarios with high

heterogeneity. Although the form of the effect size distribution had some overall impact on

performance, it did not alter the comparison of results between methods.

In conclusion, it seems that REML’s performance does not justify the extra level of

complexity associated with the method. In general, DL performed just as well in most

scenarios and scored marginally better in some. We stand by our earlier recommendation to

3

meta-analysts to use either DerSimonian-Laird or Profile Likelihood, depending on the

scenario and the requirements, as described in our paper.

1. Kontopantelis E, Reeves D. Performance of statistical methods for meta-analysis

when true study effects are non-normally distributed: A simulation study. Stat

Methods Med Res.

2. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986;

7(3): 177-88.

3. STATA Statistical Software for Windows: Release 10.0 [program]. 10 version.

College Station, TX: Stata Corporation, 2007.

4. Jackson D, Bowden J, Baker R. How does the DerSimonian and Laird procedure for

random effects meta-analysis compare with its more efficient but harder to compute

counterparts? Journal of statistical planning and inference 2010; 140(4): 961-970.

5. Kontopantelis E, Reeves D. metaan: random effects meta-analysis. The STATA

Journal 2010; 10(3): 395-407.

4

Table 1: Coverage, power and confidence interval estimation by degree of heterogeneity, between-study effect distribution, and number of studies,

assuming

2

1

-based within-study variances

Coverage

Power (25

th

centile)

Confidence interval estimation

# of studies:

2-5

6-15

16-25

26-35

2-5

6-15

16-25

26-35

2-5

6-15

16-25

26-35

2

H

i

Distribution

(skew, kurtosis)

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

REML

DL

1

None

0.95

0.96

0.94

0.96

0.94

0.96

0.95

0.96

0.30

0.29

0.69

0.67

0.93

0.92

0.99

0.99

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.18

Normal (0,3)

0.93

0.94

0.92

0.94

0.93

0.94

0.93

0.94

0.31

0.29

0.66

0.65

0.91

0.90

0.98

0.98

0.96

0.96

0.94

0.95

0.96

0.97

0.97

0.98

1.18

Skew-normal (1,4)

0.93

0.94

0.93

0.94

0.93

0.94

0.93

0.94

0.29

0.28

0.66

0.65

0.91

0.90

0.98

0.98

0.96

0.96

0.94

0.95

0.95

0.97

0.97

0.98

1.18

Skew-normal (2,9)

0.93

0.95

0.93

0.94

0.93

0.94

0.94

0.94

0.29

0.28

0.66

0.65

0.92

0.91

0.98

0.98

0.96

0.96

0.94

0.95

0.94

0.96

0.96

0.97

1.18

Uniform

0.93

0.94

0.92

0.94

0.93

0.94

0.93

0.94

0.29

0.28

0.66

0.65

0.91

0.90

0.98

0.98

0.96

0.96

0.94

0.95

0.96

0.97

0.97

0.98

1.18

Bimodal

0.93

0.94

0.92

0.94

0.93

0.94

0.93

0.94

0.30

0.29

0.66

0.65

0.91

0.90

0.98

0.98

0.96

0.96

0.95

0.95

0.96

0.97

0.97

0.98

1.18

D-spike

0.93

0.94

0.92

0.94

0.93

0.94

0.93

0.94

0.30

0.29

0.65

0.64

0.91

0.90

0.98

0.98

1.00

1.00

1.06

1.07

1.13

1.13

1.14

1.15

1.54

Normal (0,3)

0.90

0.91

0.91

0.92

0.92

0.92

0.93

0.93

0.33

0.32

0.65

0.65

0.90

0.90

0.97

0.97

0.90

0.91

0.93

0.94

0.97

0.97

0.98

0.98

1.54

Skew-normal (1,4)

0.90

0.91

0.91

0.92

0.92

0.92

0.93

0.93

0.31

0.30

0.66

0.65

0.91

0.91

0.98

0.98

0.90

0.91

0.93

0.94

0.96

0.97

0.97

0.98

1.54

Skew-normal (2,9)

0.91

0.92

0.91

0.92

0.92

0.93

0.92

0.93

0.29

0.28

0.64

0.63

0.90

0.89

0.98

0.98

0.90

0.90

0.90

0.92

0.94

0.95

0.95

0.96

1.54

Uniform

0.89

0.90

0.90

0.91

0.92

0.92

0.93

0.93

0.32

0.31

0.65

0.64

0.89

0.89

0.98

0.98

0.91

0.91

0.94

0.95

0.97

0.98

0.98

0.99

1.54

Bimodal

0.89

0.90

0.90

0.91

0.91

0.92

0.92

0.92

0.33

0.33

0.67

0.67

0.91

0.91

0.98

0.98

0.91

0.91

0.94

0.95

0.98

0.98

0.99

0.99

1.54

D-spike

0.89

0.90

0.89

0.90

0.91

0.91

0.92

0.92

0.32

0.32

0.67

0.67

0.91

0.91

0.98

0.98

1.04

1.06

1.29

1.30

1.37

1.37

1.38

1.39

2.78

Normal (0,3)

0.86

0.87

0.90

0.90

0.92

0.92

0.93

0.93

0.35

0.35

0.64

0.64

0.87

0.87

0.96

0.96

0.85

0.85

0.94

0.94

0.98

0.97

0.98

0.98

2.78

Skew-normal (1,4)

0.86

0.86

0.89

0.90

0.91

0.92

0.93

0.93

0.32

0.32

0.64

0.63

0.90

0.90

0.98

0.98

0.84

0.84

0.93

0.93

0.97

0.96

0.98

0.97

2.78

Skew-normal (2,9)

0.87

0.88

0.88

0.89

0.90

0.91

0.92

0.92

0.30

0.30

0.64

0.63

0.90

0.90

0.98

0.98

0.80

0.81

0.89

0.88

0.93

0.92

0.95

0.94

2.78

Uniform

0.84

0.85

0.89

0.89

0.92

0.92

0.93

0.93

0.36

0.36

0.66

0.66

0.91

0.91

0.98

0.98

0.86

0.86

0.96

0.96

0.99

0.98

0.99

0.99

2.78

Bimodal

0.82

0.83

0.88

0.88

0.92

0.92

0.93

0.93

0.36

0.36

0.67

0.67

0.92

0.92

0.99

0.99

0.87

0.87

0.97

0.97

0.99

0.99

0.99

0.99

2.78

D-spike

0.80

0.81

0.87

0.87

0.92

0.92

0.93

0.93

0.34

0.34

0.66

0.67

0.92

0.92

0.99

0.99

1.35

1.36

1.80

1.80

1.87

1.87

1.89

1.89