Page 1

University of Connecticut

DigitalCommons@UConn

CHIP Documents

Center for Health, Intervention, and Prevention

(CHIP)

6-1-2006

Assessing heterogeneity in meta-analysis: Q

statistic or I2 index?

Tania Huedo-Medina

University of Connecticut, tania.huedo-medina@uconn.edu

Julio Sanchez-Meca

University of Murcia, Spane

Fulgencio Marin-Martinez

University of Murcia, Spain

Juan Botella

Autonoma University of Madrid, Spain

This Article is brought to you for free and open access by the Center for Health, Intervention, and Prevention (CHIP) at DigitalCommons@UConn.

It has been accepted for inclusion in CHIP Documents by an authorized administrator of DigitalCommons@UConn. For more information, please

contactdigitalcommons@uconn.edu.

Recommended Citation

Huedo-Medina, Tania; Sanchez-Meca, Julio ; Marin-Martinez, Fulgencio ; and Botella, Juan, "Assessing heterogeneity in meta-

analysis: Q statistic or I2 index? " (2006).CHIP Documents.Paper 19.

http://digitalcommons.uconn.edu/chip_docs/19

Page 2

ASSESSING HETEROGENEITY IN META-ANALYSIS:

Q STATISTIC OR I2INDEX?

Tania B. Huedo-Medina,1Julio Sánchez-Meca,1Fulgencio Marín-Martínez,1

and Juan Botella2

Running head: Assessing heterogeneity in meta-analysis

2006

1University of Murcia, Spain

2Autónoma University of Madrid, Spain

Address for correspondence:

Tania B. Huedo-Medina

Dept. of Basic Psychology & Methodology, Faculty of Psychology, Espinardo Campus,

Murcia, Spain

Phone: + 34 968 364279

Fax: + 34 968 364115

E-mail: hmtania@um.es

* This work has been supported by Plan Nacional de Investigación Científica,

Desarrollo e Innovación Tecnológica 2004-07 from the Ministerio de Educación y

Ciencia and by funds from the Fondo Europeo de Desarrollo Regional, FEDER

(Proyect Number: SEJ2004-07278/PSIC).

Page 3

Assessing heterogeneity in meta-analysis

2

ASSESSING HETEROGENEITY IN META-ANALYSIS:

Q STATISTIC OR I2INDEX?

Abstract

In meta-analysis, the usual way of assessing whether a set of single studies are

homogeneous is by means of the Q test. However, the Q test only informs us about the

presence versus the absence of heterogeneity, but it does not report on the extent of such

heterogeneity. Recently, the I2index has been proposed to quantify the degree of

heterogeneity in a meta-analysis. In this paper, the performances of the Q test and the

confidence interval around the I2index are compared by means of a Monte Carlo

simulation. The results show the utility of the I2index as a complement to the Q test,

although it has the same problems of power with a small number of studies.

KEY WORDS: Meta-analysis, effect size, heterogeneity, I2index, Monte Carlo method.

Page 4

Assessing heterogeneity in meta-analysis

3

ASSESSING HETEROGENEITY IN META-ANALYSIS:

Q STATISTIC OR I2INDEX?

In the last 25 years meta-analysis has been widely accepted in the social and health

sciences as a very useful research methodology to quantitatively integrate the results of

a collection of single studies on a given topic. In a meta-analysis the result of every

study is quantified by means of an effect-size index (e.g., standardized mean difference,

correlation coefficient, odds ratio, etc.) that can be applied to all studies, enabling us to

give the study results in the same metric (Cooper, 1998; Cooper & Hedges, 1994;

Egger, Smith, & Altman, 2001; Glass, McGaw, & Smith, 1981; Hedges & Olkin, 1985;

Hunter & Schmidt, 2004; Rosenthal, 1991; Sutton, Abrams, Jones, Sheldon, & Song,

2000; Whitehead, 2002).

Typically, meta-analysis has three main goals: (a) to test whether the studies

results are homogeneous, (b) to obtain a global index about the effect magnitude of the

studied relation, joined to a confidence interval and its statistical significance, and (c) if

there is heterogeneity among studies, to identify possible variables or characteristics

moderating the results obtained. Here, we focus on how to assess the heterogeneity

among the results from a collection of studies. Basically, there can be two sources of

variability that explain the heterogeneity in a set of studies in a meta-analysis. One of

them is the variability due to sampling error, also named within-study variability. The

sampling error variability is always present in a meta-analysis, because every single

study uses different samples. The other source of heterogeneity is the between-studies

variability, which can appear in a meta-analysis when there is true heterogeneity among

the population effect sizes estimated by the individual studies. The between-studies

variability is due to the influence of an indeterminate number of characteristics that vary

among the studies, such as those related to the characteristics of the samples, variations

in the treatment, in the design quality, and so on (Brockwell & Gordon, 2001; Erez,

Bloom, & Wells, 1996; Field, 2003; Hunter & Schmidt, 2000; National Research

Council, 1992).

To assess the heterogeneity in meta-analysis is a crucial issue because the

presence versus the absence of true heterogeneity (between-studies variability) can

affect the statistical model that the meta-analyst decides to apply to the meta-analytic

Page 5

Assessing heterogeneity in meta-analysis

4

database. So, when the studies’ results only differ by the sampling error (homogeneous

case) a fixed-effects model can be applied to obtain an average effect size. By contrast,

if the study results differ by more than the sampling error (heterogeneous case), then the

meta-analyst can assume a random-effects model, in order to take into account both

within- and between-studies variability, or can decide to search for moderator variables

from a fixed-effects model (Field, 2001, 2003; Hedges, 1994; Hedges & Olkin, 1985;

Hedges & Vevea, 1998; Overton, 1998; Raudenbush, 1994).

The usual way of assessing whether there is true heterogeneity in a meta-

analysis has been to use the Q test, a statistical test defined by Cochran (1954). The Q

test is computed by summing the squared deviations of each study’s effect estimate

from the overall effect estimate, weighting the contribution of each study by its inverse

variance. Under the hypothesis of homogeneity among the effect sizes, the Q statistic

follows a chi-square distribution with k – 1 degrees of freedom, k being the number of

studies. Not rejecting the homogeneity hypothesis usually leads the meta-analyst to

adopt a fixed-effects model because it is assumed that the estimated effect sizes only

differ by sampling error. In contrast, rejecting the homogeneity assumption can lead to

applying a random-effects model that includes both within- and between-studies

variability. A shortcoming of the Q statistic is that it has poor power to detect true

heterogeneity among studies when the meta-analysis includes a small number of studies

and excessive power to detect negligible variability with a high number of studies

(Alexander, Scozzaro, & Borodkin, 1989; Cornwell, 1993; Cornwell & Ladd, 1993;

Hardy & Thompson, 1998; Harwell, 1997; Osburn, Callender, Greener, & Ashworth,

1983; Paul & Donner, 1992; Sackett, Harris, & Orr, 1986; Sagie & Koslowsky, 1993;

Sánchez-Meca & Marín-Martínez, 1997; Spector & Levine, 1987). Thus, a non-

significant result for the Q test with a small number of studies can lead a reviewer to

erroneously assume a fixed-effects model when there is true heterogeneity among the

studies; and vice versa. On the other hand, the Q statistic does not inform us of the

extent of true heterogeneity, only of its statistical significance.1

1It is important to note that the low statistical power of the Q test for small number of studies has

promoted the undesirable practice among some meta-analysts of ignoring the results of Q when it is not

statistically significant, and searching for moderator variables. On the other hand, the meta-analyst can a

priori adopt a statistical model (fixed- or random-effects model) on conceptual grounds. For example, if

the meta-analyst wishes to generalize the meta-analytic results to a population of studies with similar

characteristics than those of represented in the meta-analysis, a fixed-effects model can be selected. If, on

Page 6

Assessing heterogeneity in meta-analysis

5

Another strategy for quantifying the true heterogeneity in a meta-analysis

consists of estimating the between-studies variance, τ2. Assuming a random-effects

model, the between-studies variance reflects how much the true population effect sizes

estimated in the single studies of a meta-analysis differ. As the τ2depends on the

particular effect metric used in a meta-analysis, it is not possible to compare the τ2

values estimated from meta-analyses that have used different effect-size indices (e.g.,

standardized mean differences, correlation coefficients, odds ratios, etc.).

In order to overcome the shortcomings of the Q test and the τ2, Higgins and

Thompson (2002; see also Higgins, Thompson, Deeks, & Altman, 2003) have proposed

three indices for assessing heterogeneity in a meta-analysis: the H2, R2, and I2indices.

As they are inter-related, here we focus on the I2index, because of its easy

interpretation. The I2index measures the extent of true heterogeneity dividing the

difference between the result of the Q test and its degrees of freedom (k – 1) by the Q

value itself, and multiplied by 100. So, the I2index is similar to an intraclass correlation

in cluster sampling (Higgins & Thompson, 2002). The I2index can be interpreted as the

percentage of the total variability in a set of effect sizes due to true heterogeneity, that

is, to between-studies variability. For example, a meta-analysis with I2= 0 means that

all variability in effect size estimates is due to sampling error within studies. On the

other hand, a meta-analysis with I2= 50 means that half of the total variability among

effect sizes is caused not by sampling error, but by true heterogeneity between studies.

Higgins and Thompson (2002) proposed a tentative classification of I2values with the

purpose of helping to interpret its magnitude. Thus, percentages of around 25% (I2=

25), 50% (I2= 50), and 75% (I2= 75) would mean low, medium, and high

heterogeneity, respectively. The I2index and the between-studies variance, τ2, are

directly related: the higher the τ2, the higher the I2index. However, following Higgins

and Thompson (2002), an advantage of the I2index in respect to τ2is that I2indices

obtained from meta-analyses with different numbers of studies and different effect

metrics are directly comparable.

the contrary, the meta-analytic results have to be generalized to a wider population of studies, a random-

effects model should be the best option (Field, 2001; Hedges & Vevea, 1998).

Page 7

Assessing heterogeneity in meta-analysis

6

Together with this descriptive interpretation of the I2index, Higgins and

Thompson (2002) have derived a confidence interval for it that might be used in the

same way as the Q test is used to assess heterogeneity in meta-analysis. Thus, if the

confidence interval around I2contains the 0% value, then the meta-analyst can hold the

homogeneity hypothesis. If, on the contrary, the confidence interval does not include the

0% value, then there is evidence for the existence of true heterogeneity. Using the I2

index and its confidence interval is similar to applying the Q test. Because the I2index

assesses not only heterogeneity in meta-analysis, but also the extent of that

heterogeneity, it should be a more advisable procedure than the Q test in assessing

whether or not there is true heterogeneity among the studies in a meta-analysis.

However, the performance of the confidence interval around I2has not yet been studied

in terms of the control of Type I error rate and statistical power.

The purpose of this paper is to compare, by a Monte Carlo simulation, the

performance of the Q test and the confidence interval around the I2index, in terms of

their control of Type I error rate and statistical power. Different effect-size indices were

used and both the extent of true heterogeneity and the number of studies were varied.

Thus, it is possible to test whether the confidence interval for I2overcomes the

shortcomings of the Q test.

Effect-size indices

For each individual study, we assume two underlying populations representing the

experimental versus control groups on a continuous outcome. Let µEand µCbe the

experimental and control population means, and σE and σC the population standard

deviations, respectively. By including a control condition in the typical design we

restrict the applicability of our results to research fields in which such designs make

sense (e.g., treatment outcome evaluation in behavioral sciences, education, medicine,

etc.). Under the assumptions of normal distributions and homoscedasticity, the usual

parametric effect-size index is the standardized mean difference, δ, defined as the

difference between the experimental and control population means, µEand µC, divided

by the pooled population standard deviation, σ (Hedges & Olkin, 1985, p. 76, eq. 2),

Page 8

Assessing heterogeneity in meta-analysis

7

σ

µµ

δ

CE−

=

.(1)

The best estimator of the parametric effect size, δ, is the sample standardized

mean difference, d, proposed by Hedges and Olkin (1985, p. 81, eq. 10) and computed

by

S

yy

mcd

CE−

=

)(

, (2)

with

E

y

and

C

y

being the sample means of the experimental and control groups,

respectively, and S being a pooled estimate of the within-group standard deviation,

given by (Hedges & Olkin, 1985, p. 79),

()()

2nn

S1n

−

S1n

S

CE

2

CC

2

E

+

E

−+−

=

,(3)

with

2

E S ,

2

C S , nE, and nC being the sample variances and the sample sizes of the

experimental and control groups, respectively. The term c(m) is a correction factor for

the positive bias suffered by the standardized mean difference with small sample sizes

and estimated by (Hedges & Olkin, 1985, p. 81, eq. 7),

14

3

1

−

−=

m

c(m)

,(4)

with m = nE+ nC– 2. The sampling variance of the d index is estimated by Hedges and

Olkin (1985, p. 86, eq. 15) as

()

CE

2

CE

CE

n

2

d

nn

d

n

nn

S

+

+

+

=

2

. (5)

Another effect-size index from the d family is that proposed by Glass et al.

(1981; see also Glass, 1976), consisting of dividing the difference between the

Page 9

Assessing heterogeneity in meta-analysis

8

experimental and control group means by the standard deviation of the control group.

Here we will represent this index by g (Glass et al., 1981, p. 105):2

C

CE

S

yy

mcg

−

=

)(, (6)

where SC is the estimated standard deviation of the control group and c(m) is the

correction factor for small sample sizes given by equation (4), but with m = nC–1 (Glass

et al., 1981, p. 113). The g index is recommended when the homoscedasticity

assumption is violated. Glass et al. (1981) proposed dividing the mean difference by the

standard deviation of the control group because the experimental manipulation can

change the variability in the group; thus, under this circumstance they argue that it is

better to estimate the population standard deviation by the control group standard

deviation. Therefore, in the strict sense, the g index is estimating a different population

effect size from that defined in equation (1), δ, consisting in dividing the mean

difference by the population standard deviation of the control group: δC= (µE- µC)/σC

(Glass et al., 1981, p. 112). The sampling variance of the g index is given by Rosenthal

(1994, p. 238) as

() 1

−

+

+

n

=

C

2

CE

CE

n

2

g

n2

g

nn

S

. (7)

The statistical model

Once an effect-size estimate is obtained from each individual study, meta-analysis

integrates them by calculating an average effect size, assessing the statistical

heterogeneity around the average estimate, and searching for moderator variables when

there is more heterogeneity than can be explained by chance. In general, the most

realistic statistical model to integrate the effect estimates in a meta-analysis is the

random-effects model, because it incorporates the two possible sources of heterogeneity

2Although Glass et al. (1981) represented this effect-size index with the Greek symbol ∆, here we prefer

to keep Greek symbols to represent parameters, not estimates. Thus, we have selected the Latin letter g to

represent this effect-size index.

Page 10

Assessing heterogeneity in meta-analysis

9

among the studies in a meta-analysis: first, statistical variability caused by sampling

error and, second, substantive variability.

Let Tibe the ith effect estimate in a collection of k studies (i = 1, 2, ..., k). Here

Ticorresponds to the d and g effect indices defined in Section 2 by equations (2) and

(6), respectively. In a random-effects model it is assumed that every Ti effect is

estimating a parametric effect size, θi, with conditional variance

2

iσ , estimated by

2

iˆ σ .

The estimated conditional variances,

2

iˆ σ , for the d and g indices proposed in Section 2

are defined by equations (5) and (7), respectively. The model can be formulated as

iii

eT

+θ=

, where the errors, ei, are normally and independently distributed with mean

zero and variance

2

iσ [ei∼ N(0,

2

iσ )]. The conditional variance represents the within-

study variability, that is, the variability produced by random sampling.

In turn, the parametric effect sizes θipertain to an effect-parameter distribution

with mean µθand unconditional variance τ2. So, every θiparameter can be defined as

iiu

θ

θ = µ +

, where it is usually assumed that the errors ui are normally and

independently distributed with mean zero and variance τ2[ui ∼ N(0, τ2)]. The

unconditional variance, τ2, represents the extent of true heterogeneity among the study

effects produced by the influence of an innumerable number of substantive (e.g., type of

treatment, characteristics of the subjects, setting, etc.) and methodological (e.g., type of

design,attrition,samplesize, random

versus

non-random assignment,etc.)

characteristics of the studies (Lipsey, 1994). Therefore, the random-effects model can

be formulated as (Hedges & Vevea, 1998; Overton, 1998; Raudenbush, 1994):

iii

euT

++µ=

θ

,(8)

where the errors ui and ei represent the two variability sources affecting the effect

estimates, Ti, and quantified by the between-studies, τ2, and within-study,

2

iσ , variances.

Therefore, the effect estimates Tiwill be normally and independently distributed with

mean µθand variance τ2+

2

iσ [Ti∼ N(µθ, τ2+

2

iσ )].

Page 11

Assessing heterogeneity in meta-analysis

10

When there is no true heterogeneity among the effect estimates, then the

between-studies variance is zero (τ2= 0), and there only will be variability due to

sampling error, which is represented in the model by the conditional within-study

variance,

2

iσ . In this case, all the studies estimate one parametric effect size, θi= θ, and

the statistical model simplifies to

ii

Te

= θ+

, thus becoming a fixed-effects model. So,

the fixed-effects model can be considered as a particular case of the random-effects

model when there is no between-studies variability and, as a consequence, the effect

estimates, Ti, are only affected by sampling error,

2

iσ , following a normal distribution

with mean θ (being in this case θ = µθ) and variance

2

iσ

[Ti∼ N(θ,

2

iσ )] for large

sample sizes.

Assessing the extent of heterogeneity in a meta-analysis helps to decide which of

the two models is the most plausible and this decision affects, at least, the weighting

factor used to obtain an average effect size. The usual estimate of a mean effect size

consists of weighting every effect estimate, Ti, by its inverse variance, wi:

i i

i

i

i

wT

T

w

=∑

∑

.(9)

In a fixed-effects model, the weighting factor for the ith study is estimated by wi =

1/

2

iˆ σ . In a random-effects model, the weights are estimated by

22

ii

ˆˆ

1/(τ

)

w =+σ

. For the

d and g indices the estimated within-study variances,

2

iˆ σ , are defined in equations (5)

and (7), respectively. A commonly used estimator of the between-studies variance, τ2, is

an estimator based on the method of moments proposed by DerSimonian and Laird

(1986):

≤

>

−−

=

1)-( for

1)-( for

0

1)(

τ2

ˆ

kQ

kQ

c

kQ

(10)

Page 12

Assessing heterogeneity in meta-analysis

11

being c

2

i

i

i

w

w

cw

=−∑

∑

∑

(11)

where wiis the weighting factor for the ith study assuming a fixed-effects model (wi=

1/

2

iˆ σ ), k is the number of studies, and Q is the statistical test for heterogeneity proposed

by Cochran (1954) and defined in equation (12). To avoid negative values for

2ˆ τ when

Q ≤ (k – 1),

2ˆ τ is equated to 0. Note that due to this truncation,

2

τ ˆ

is a biased estimator

for τ2.

Assessing heterogeneity in meta-analysis

Quantifying the extent of heterogeneity among a collection of studies is one of the most

troublesome aspects of a meta-analysis. It is important because it can affect the decision

about the statistical model to be selected, fixed- or random-effects. On the other hand, if

significant variability is found, potential moderator variables can be sought to explain

this variability.

The between-studies variance, τ2, is the parameter in the statistical model that

mainly represents the true (substantive, clinical) heterogeneity among the true effects of

the studies. Therefore, a good procedure for determining whether there is true

heterogeneity among a collection of studies should be positively correlated with τ2. At

the same time, it should not be affected by the number of studies, and should be scale-

free in order to be comparable among meta-analyses that have applied different effect-

size indices.

The statistical test usually applied in meta-analysis for determining whether

there is true heterogeneity among the studies’ effects is the Q test, proposed by Cochran

(1954) and defined as (Hedges & Olkin, 1985, p. 123, eq. 25):

()

2

ii

Qw TT

=−

∑

,(12)

Page 13

Assessing heterogeneity in meta-analysis

12

where wiis the weighting factor for the ith study assuming a fixed-effects model, and T

is defined in equation (9). If we assume that the conditional within-study variances,

2

iσ ,

are known3, then under the null hypothesis of homogeneity (Ho: δ1= δ2= ... = δk; or

also Ho: τ2= 0), the Q statistic has a chi-square distribution with k – 1 degrees of

freedom. Thus, Q values higher than the critical point for a given significance level (α)

enable us to reject the null hypothesis and conclude that there is statistically significant

between-study variation.

One problem with the Q statistic is that its statistical power depends on the

number of studies, with power being very low or very high for a small or a large

number of studies, respectively. To solve the problems of the Q statistic and the non

comparability of the between-studies variance, τ2, among meta-analyses with different

effect-size metrics, Higgins and Thompson (2002) have recently proposed the I2index.

The I2index quantifies the extent of heterogeneity from a collection of effect sizes by

comparing the Q value to its expected value assuming homogeneity, that is, to its

degrees of freedom (df = k – 1):

≤

>

×

−−

=

1)-( for

1)-( for

0

% 100

1)(

Q

2

kQ

kQ

kQ

I

(13)

When the Q statistic is smaller than its degrees of freedom, then I2is truncated to zero.

The I2index can easily be interpreted as a percentage of heterogeneity, that is, the part

of total variation that is due to between-studies variance,

2

τ ˆ . Therefore, there is a direct

relationship between

2

τ ˆ

and I2that can be formalized from the equations (10) and (13)

as,

Q

ˆ

c

I

2

2

τ

=

(14)

3In practice, the population within-study variances never will be known, so they will have to be estimated

from the sample data. For example, equations (5) and (7) are used to estimate the within-study variances

for d and g indices.

Page 14

Assessing heterogeneity in meta-analysis

13

To show empirically this relation, Figure 1 presents the results of a simulation,

assuming a random-effects model with δ = 0.5, k = 50, an average sample size N = 50

(nE= nCfor every study), and manipulating the parametric between-studies variance,

2

τ ,

with values from 0.0 to 0.45, and 5 replications per condition. Figure 1 represents the

obtained values of

2

τ ˆ

and I2for every replication. So, for the manipulated conditions

2

τ ˆ

values around 0.025, 0.05, and 0.15 correspond to I2values of 25%, 50%, and 75%,

respectively. Further, note that beyond a certain value of

increase in I2. In particular, I2values higher than 85% will subsequently increase only

slightly even if the between-studies variance increases substantially. Therefore, the I2

2

τ

there is relatively little

index seems particularly useful in describing heterogeneity in a meta-analysis with a

medium-to-low between-studies variance, and not so useful for large

2

τ

values.

Higgins and Thompson (2002) have also developed a confidence interval for I2.

The interval is formulated by calculating another of their proposed measures of

heterogeneity, the H2index obtained by (Higgins & Thompson, 2002, p. 1545, eq. 6),

Q

k-

H=

2

1,(15)

also known as Birge’s ratio (Birge, 1932). Then they define I2in terms of H2by means

of (Higgins & Thompson, 2002, p. 1546, eq. 10),

2

2

2

1

100%

H

I

H

−

=×

. (16)

This allows us to express inferences of H2in terms of I2. For practical

application, Higgins and Thompson (2002, p. 1549) recommend a confidence interval

for the natural logarithm of H, ln(H), assuming a standard normal distribution, that

implies the Q statistic and k, given by,

[]

{}

/2

exp ln() SE ln()

HzH

α

±

, (17)

where |zα/2| is the (α/2) quantile of the standard normal distribution, and SE[ln(H)] is the

standard error of ln(H) and is estimated by

Page 15

Assessing heterogeneity in meta-analysis

14

[]

2

1 ln( ) ln(

2

(2 )

Q

1)

−

if

(23)

SE ln()

11

−

1 if

2(2) 3(2)

Qk

Qk

k

H

Qk

kk

−

−

−

>

=

−≤

−

(18)

The confidence limits obtained by equation (15) are in terms of the H index.

Consequently, they can be easily translated into the I2metric by applying equation (16)

to both confidence limits.

An example will help to illustrate the calculations for the Q statistic and the I2

index. Figure 2 presents some of the results of a meta-analysis about the effectiveness of

delinquent rehabilitation programs (Redondo, Sánchez-Meca, & Garrido, 1999). In

particular, Figure 2 presents the results of eight studies that compared a control group

with one of two different correctional programs: three studies that compared a control

group with a cognitive-behavioral treatment (CBT) and five studies that compared a

control group with a therapeutic community program (TC). The comparisons were

measured by the d index such as it is defined by equation (2). The purpose of the

example is to illustrate the problems of the Q statistic and how the I2index is able to

solve them.

As Figure 2 shows, the forest plot for the two groups of studies (the three studies

for CBT and those for TC) reflect high heterogeneity in both cases, but heterogeneity is

more pronounced for CBT studies than for TC studies. In fact, the estimated between-

studies variance,

2ˆ τ , for CBT is clearly higher than for TC (0.24 and 0.06, respectively).

However, the Q statistic is very similar and statistically significant in both cases [CBT:

Q(2) = 11.647, p = .003; TC: Q(4) = 11.931, p = .018]. Thus, a direct comparison of the

two Q values is not justified because their degrees of freedom differ, and can

erroneously lead to the conclusion that the two groups of studies are similarly

heterogeneous. But if we calculate the I2index for both groups, then differences in the

extent of heterogeneity are clearly apparent: whereas CBT studies present an I2value of

82.8%, implying high heterogeneity, the TC studies present an I2value of medium size

(66.5%). Thus, the I2index has been able to reflect differences in the degree of