Content uploaded by Evangelos Kontopantelis
Author content
All content in this area was uploaded by Evangelos Kontopantelis
Content may be subject to copyright.
1
Letter. Performance of statistical methods for meta-analysis when
true study effects are non-normally distributed: a comparison
between DerSimonian-Laird and Restricted Maximum
Likelihood.
Evangelos Kontopantelis
National Primary Care Research and Development Centre
University of Manchester, Williamson Building 5
th
floor.
Oxford Road, M13 9PL
UK
e.kontopantelis@manchester.ac.uk
David Reeves
Health Sciences Primary Care Research Group
University of Manchester, Williamson Building 5
th
floor.
Oxford Road, M13 9PL
UK
In a recent paper we evaluated the performance of seven different methods for random-
effects meta-analyses under various non-normal distributions for the effect sizes.
1
However,
due to computational limitations we did not include Restricted Maximum Likelihood
(REML) estimator for the between-study variance. Lately, we have observed that the iterative
REML approach has been increasingly replacing the non-iterative DerSimonian-Laird (DL)
as the method of choice in published meta-analyses. Jackson et al examined the performance
of the two methods in terms of coverage, for normally distributed effects only, and found that
results for the two methods were similar.
2
However, REML requires an assumption that study
effects are normally distributed, which DL does not, and so the two methods may differ more
substantially when this assumption is violated.
Using the same simulation method and scenarios as in our previous paper, we assessed
the performance of REML in terms of coverage, power and overall effect estimation when
effect sizes do not follow a normal distribution. REML is a computationally expensive
2
iterative method (it took several months and a few computers to complete the simulations in
STATA
3
) which estimates the between study variance
and effect by maximising the
restricted log-likelihood function:
log
(
,
)
=
1
2
log
{
2
(
+
)}
+
(
)
(
+
)
1
2
log
1
(
+
)
, &
0
(1)
where is the number of studies being combined,
and
are the effect and variance
estimates for study and is the overall effect estimate with =
[
]
[
]
. Non-
negativity for
must be enforced at each iteration and iteration continues until convergence
or the maximum number of iterations is reached. REML is considered an improvement over
Maximum Likelihood (ML) since it adjusts for the loss of degrees of freedom due to the
estimation of the overall effect .
4
Non-convergence is a possibility, although it was rare in
our simulations (around 0.1%). The method has been implemented in the STATA command
metaan.
5
Coverage, power and confidence interval estimation (estimated confidence interval as a
percentage of the interval based on the true between-study variance) for the REML method
are presented in table1, with results for DL provided for comparison. The two methods
performed very similarly across all scenarios. In terms of coverage DL outperformed REML
slightly, by a maximum of 2%, particularly when heterogeneity was low. As expected the
picture was reversed with regards to power, with REML performing slightly better
(maximum 2% for
= 0). As heterogeneity and/or the number of studies increased the two
methods converged to almost identical results. However, power for DL caught up somewhat
quicker than coverage for REML. Results for confidence interval estimation do not identify a
clear ‘winner’: DL performed better (maximum 2%) in cases of small or moderate
heterogeneity, and more so with larger number of studies, while REML returned a slightly
more accurate interval (maximum 1%) in certain large study number scenarios with high
heterogeneity. Although the form of the effect size distribution had some overall impact on
performance, it did not alter the comparison of results between methods.
In conclusion, it seems that REML’s performance does not justify the extra level of
complexity associated with the method. In general, DL performed just as well in most
scenarios and scored marginally better in some. We stand by our earlier recommendation to
3
meta-analysts to use either DerSimonian-Laird or Profile Likelihood, depending on the
scenario and the requirements, as described in our paper.
1. Kontopantelis E, Reeves D. Performance of statistical methods for meta-analysis
when true study effects are non-normally distributed: A simulation study. Stat
Methods Med Res.
2. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986;
7(3): 177-88.
3. STATA Statistical Software for Windows: Release 10.0 [program]. 10 version.
College Station, TX: Stata Corporation, 2007.
4. Jackson D, Bowden J, Baker R. How does the DerSimonian and Laird procedure for
random effects meta-analysis compare with its more efficient but harder to compute
counterparts? Journal of statistical planning and inference 2010; 140(4): 961-970.
5. Kontopantelis E, Reeves D. metaan: random effects meta-analysis. The STATA
Journal 2010; 10(3): 395-407.
4
Table 1: Coverage, power and confidence interval estimation by degree of heterogeneity, between-study effect distribution, and number of studies,
assuming
2
1
-based within-study variances
Coverage
Power (25
th
centile)
Confidence interval estimation
# of studies:
2-5
6-15
16-25
26-35
2-5
6-15
16-25
26-35
2-5
6-15
16-25
26-35
2
H
i
Distribution
(skew, kurtosis)
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
REML
DL
1
None
0.95
0.96
0.94
0.96
0.94
0.96
0.95
0.96
0.30
0.29
0.69
0.67
0.93
0.92
0.99
0.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.18
Normal (0,3)
0.93
0.94
0.92
0.94
0.93
0.94
0.93
0.94
0.31
0.29
0.66
0.65
0.91
0.90
0.98
0.98
0.96
0.96
0.94
0.95
0.96
0.97
0.97
0.98
1.18
Skew-normal (1,4)
0.93
0.94
0.93
0.94
0.93
0.94
0.93
0.94
0.29
0.28
0.66
0.65
0.91
0.90
0.98
0.98
0.96
0.96
0.94
0.95
0.95
0.97
0.97
0.98
1.18
Skew-normal (2,9)
0.93
0.95
0.93
0.94
0.93
0.94
0.94
0.94
0.29
0.28
0.66
0.65
0.92
0.91
0.98
0.98
0.96
0.96
0.94
0.95
0.94
0.96
0.96
0.97
1.18
Uniform
0.93
0.94
0.92
0.94
0.93
0.94
0.93
0.94
0.29
0.28
0.66
0.65
0.91
0.90
0.98
0.98
0.96
0.96
0.94
0.95
0.96
0.97
0.97
0.98
1.18
Bimodal
0.93
0.94
0.92
0.94
0.93
0.94
0.93
0.94
0.30
0.29
0.66
0.65
0.91
0.90
0.98
0.98
0.96
0.96
0.95
0.95
0.96
0.97
0.97
0.98
1.18
D-spike
0.93
0.94
0.92
0.94
0.93
0.94
0.93
0.94
0.30
0.29
0.65
0.64
0.91
0.90
0.98
0.98
1.00
1.00
1.06
1.07
1.13
1.13
1.14
1.15
1.54
Normal (0,3)
0.90
0.91
0.91
0.92
0.92
0.92
0.93
0.93
0.33
0.32
0.65
0.65
0.90
0.90
0.97
0.97
0.90
0.91
0.93
0.94
0.97
0.97
0.98
0.98
1.54
Skew-normal (1,4)
0.90
0.91
0.91
0.92
0.92
0.92
0.93
0.93
0.31
0.30
0.66
0.65
0.91
0.91
0.98
0.98
0.90
0.91
0.93
0.94
0.96
0.97
0.97
0.98
1.54
Skew-normal (2,9)
0.91
0.92
0.91
0.92
0.92
0.93
0.92
0.93
0.29
0.28
0.64
0.63
0.90
0.89
0.98
0.98
0.90
0.90
0.90
0.92
0.94
0.95
0.95
0.96
1.54
Uniform
0.89
0.90
0.90
0.91
0.92
0.92
0.93
0.93
0.32
0.31
0.65
0.64
0.89
0.89
0.98
0.98
0.91
0.91
0.94
0.95
0.97
0.98
0.98
0.99
1.54
Bimodal
0.89
0.90
0.90
0.91
0.91
0.92
0.92
0.92
0.33
0.33
0.67
0.67
0.91
0.91
0.98
0.98
0.91
0.91
0.94
0.95
0.98
0.98
0.99
0.99
1.54
D-spike
0.89
0.90
0.89
0.90
0.91
0.91
0.92
0.92
0.32
0.32
0.67
0.67
0.91
0.91
0.98
0.98
1.04
1.06
1.29
1.30
1.37
1.37
1.38
1.39
2.78
Normal (0,3)
0.86
0.87
0.90
0.90
0.92
0.92
0.93
0.93
0.35
0.35
0.64
0.64
0.87
0.87
0.96
0.96
0.85
0.85
0.94
0.94
0.98
0.97
0.98
0.98
2.78
Skew-normal (1,4)
0.86
0.86
0.89
0.90
0.91
0.92
0.93
0.93
0.32
0.32
0.64
0.63
0.90
0.90
0.98
0.98
0.84
0.84
0.93
0.93
0.97
0.96
0.98
0.97
2.78
Skew-normal (2,9)
0.87
0.88
0.88
0.89
0.90
0.91
0.92
0.92
0.30
0.30
0.64
0.63
0.90
0.90
0.98
0.98
0.80
0.81
0.89
0.88
0.93
0.92
0.95
0.94
2.78
Uniform
0.84
0.85
0.89
0.89
0.92
0.92
0.93
0.93
0.36
0.36
0.66
0.66
0.91
0.91
0.98
0.98
0.86
0.86
0.96
0.96
0.99
0.98
0.99
0.99
2.78
Bimodal
0.82
0.83
0.88
0.88
0.92
0.92
0.93
0.93
0.36
0.36
0.67
0.67
0.92
0.92
0.99
0.99
0.87
0.87
0.97
0.97
0.99
0.99
0.99
0.99
2.78
D-spike
0.80
0.81
0.87
0.87
0.92
0.92
0.93
0.93
0.34
0.34
0.66
0.67
0.92
0.92
0.99
0.99
1.35
1.36
1.80
1.80
1.87
1.87
1.89
1.89