ArticlePDF AvailableLiterature Review

Reading Comprehension and Its Underlying Components in Second-Language Learners: A Meta-Analysis of Studies Comparing First- and Second-Language Learners

Authors:

Abstract and Figures

We report a systematic meta-analytic review of studies comparing reading comprehension and its underlying components (language comprehension, decoding, and phonological awareness) in first- and second-language learners. The review included 82 studies, and 576 effect sizes were calculated for reading comprehension and underlying components. Key findings were that, compared to first-language learners, second-language learners display a medium-sized deficit in reading comprehension (pooled effect size d = -0.62), a large deficit in language comprehension (pooled effect size d = -1.12), but only small differences in phonological awareness (pooled effect size d = -0.08) and decoding (pooled effect size d = -0.12). A moderator analysis showed that characteristics related to the type of reading comprehension test reliably explained the variation in the differences in reading comprehension between first- and second-language learners. For language comprehension, studies of samples from low socioeconomic backgrounds and samples where only the first language was used at home generated the largest group differences in favor of first-language learners. Test characteristics and study origin reliably contributed to the variations between the studies of language comprehension. For decoding, Canadian studies showed group differences in favor of second-language learners, whereas the opposite was the case for U.S. studies. Regarding implications, unless specific decoding problems are detected, interventions that aim to ameliorate reading comprehension problems among second-language learners should focus on language comprehension skills. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Content may be subject to copyright.
Reading Comprehension and Its Underlying Components in
Second-Language Learners: A Meta-Analysis of Studies Comparing
First- and Second-Language Learners
Monica Melby-Lerva˚g and Arne Lerva˚g
University of Oslo
We report a systematic meta-analytic review of studies comparing reading comprehension and its
underlying components (language comprehension, decoding, and phonological awareness) in first- and
second-language learners. The review included 82 studies, and 576 effect sizes were calculated for
reading comprehension and underlying components. Key findings were that, compared to first-language
learners, second-language learners display a medium-sized deficit in reading comprehension (pooled
effect size d ⫽⫺0.62), a large deficit in language comprehension (pooled effect size d ⫽⫺1.12), but
only small differences in phonological awareness (pooled effect size d 0.08) and decoding (pooled
effect size d 0.12). A moderator analysis showed that characteristics related to the type of reading
comprehension test reliably explained the variation in the differences in reading comprehension between
first- and second-language learners. For language comprehension, studies of samples from low socio-
economic backgrounds and samples where only the first language was used at home generated the largest
group differences in favor of first-language learners. Test characteristics and study origin reliably
contributed to the variations between the studies of language comprehension. For decoding, Canadian
studies showed group differences in favor of second-language learners, whereas the opposite was the case
for U.S. studies. Regarding implications, unless specific decoding problems are detected, interventions
that aim to ameliorate reading comprehension problems among second-language learners should focus on
language comprehension skills.
Keywords: reading comprehension, bilingual development, language comprehension, decoding
Supplemental materials: http://dx.doi.org/10.1037/a0033890.supp
In 2008, 21% (or 10.9 million) of children and youths between
five and 17 years of age in the United States spoke a language
other than English at home (National Center for Education Statis-
tics, 2011). The number of second-language learners in school has
steadily increased over the last decades both in the United States
(National Center for Education Statistics, 2011) and in Europe
(Organization for Economic Co-operation and Development
[OECD] Reviews of Migrant Education, 2009). Given the large
number of second-language learners, it is particularly concerning
that these students have higher dropout rates and poorer educa-
tional outcomes than their monolingual first-language learner
counterparts. In 2009, the dropout rate for foreign-born students in
the United States was 21%, and the dropout rate for children born
to foreign-born parents was 13%. However, the national average
was only 8.4% (Child Trends Databank, 2011). The situation is
similar in European countries (OECD Reviews of Migrant Educa-
tion, 2009), where large-scale international comparative studies
have shown that second-language learners demonstrate poorer
learning outcomes in school than do first-language learners (e.g.,
Institute for Employment Studies, 2004; OECD, 2004).
A salient predictive factor for educational outcomes in most
school subjects is reading comprehension (e.g., OECD, 2000). As
the amount of text presented in all school subjects increases with
each grade level, children who possess poor reading comprehen-
sion skills will struggle academically throughout their education.
Poor reading comprehension skills can therefore be an important
cause of lower academic success. In this study, we aim to increase
our understanding of reading comprehension and its underlying
skills in second-language learners when compared to first-
language learners. Based on numerous prior studies, decoding (the
process of accurately and fluently translating print into spoken
words or units), phonological awareness (the ability to manipulate
the sounds in spoken words), and language comprehension (the
ability to understand the meaning of words and sentences in
language) are crucial antecedents for reading comprehension (for a
review, see National Institute for Literacy, 2008).
We present a meta-analysis of the differences and similarities
between second-language learners and first-language learners in
terms of reading comprehension and its underlying skills (i.e.,
language comprehension, decoding, and phonological awareness).
As a background for the meta-analysis, we first present a narrative
This article was published Online First August 12, 2013.
Monica Melby-Lerva˚g, Department of Special Needs Education, Uni-
versity of Oslo, Oslo, Norway; Arne Lerva˚g, Department of Educational
Research, University of Oslo.
The study was funded by the Norwegian Research Council, Grant
Education2020.
Correspondence concerning this article should be addressed to Arne
Lerva˚g, Department of Educational Research, University of Oslo, P.O. Box
1092 Blindern 0317, Oslo, Norway. E-mail: a.o.lervag@ped.uio.no
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Psychological Bulletin © 2013 American Psychological Association
2014, Vol. 140, No. 2, 409433 0033-2909/14/$12.00 DOI: 10.1037/a0033890
409
overview of prior studies and reviews on the typical development
of reading comprehension as well as reading comprehension
among second-language learners. In the meta-analysis, we first
examine reading comprehension and then examine each of the
underlying components (i.e., language comprehension, decoding,
and phonological awareness). Our overall purpose in the review is
twofold. First, we seek to identify important information regarding
the areas of strengths and challenges for second-language learners
compared with first-language learners. Second, we aim to uncover
factors that may explain the performance differences between the
two groups and to determine under the conditions in which the two
groups perform at a similar level. Detecting factors that may
explain this variation is crucial for understanding what affects the
reading comprehension level and underlying skills of second-
language learners. Taken together, this knowledge is of vital
importance when examining theoretical claims in the area of
literacy development among second-language learners and when
providing effective instruction and targeted interventions for them.
Providing effective instruction for the large number of second-
language learners is a challenge not only for children learning
English as a second language. Studies conducted by the European
Union (EU) and the OECD (see EU European Commission’s
Directorate-General for Education and Culture, 2008; OECD Re-
views of Migrant Education, 2009) reveal that challenges in coun-
tries and regions such as the Netherlands, France, or Scandinavia
related to second-language learners in school are very similar to
those demonstrated in Britain and also to some extent the United
States. Thus, in our meta-analysis, we include studies in which
second-language learners are learning either English or other Eu-
ropean languages as their second language.
The Development of Reading Comprehension
Reading Comprehension Development in
First-Language Learners
Individual differences in reading comprehension are often un-
derstood as the product of decoding and language comprehension
skills (Gough & Tunmer, 1986). Decoding refers to the process of
translating print into spoken words or units. It is often measured by
tests in which the children are asked to decipher a printed word or
a nonsense word into a pronounced unit (Hulme & Snowling,
2009). Fluency can be seen as a part of the decoding process and
refers to the degree of automatization of the decoding. A fluent
reader is able to read orally with speed, accuracy, and proper
expression (National Institute of Child Health and Human Devel-
opment [NICHD], 2000). Language comprehension is the ability
to attribute semantic meaning to spoken words, often measured by
tests of vocabulary (in this study, vocabulary refers to oral vocab-
ulary and not reading vocabulary), word definitions, or listening
comprehension (Gough & Tunmer, 1986). Numerous studies con-
sistently reveal that these two skills explain the majority of indi-
vidual differences in reading comprehension (for a review, see the
National Institute for Literacy, 2008). As noted by Snow and Kim
(2007), the area of language comprehension is a large problem
space; compared to phonological awareness and decoding, vocab-
ulary acquisition is the more difficult task.
Various studies of first-language learners have shown that the
skills underlying the development of reading comprehension begin
to evolve in early childhood, long before children receive formal
reading instruction in school (for a review, see National Institute
for Literacy, 2008). Furthermore, the relative importance of de-
coding and language comprehension in explaining differences in
reading comprehension has been shown to change during the
developmental course of schooling. In the early school years,
much of the variation in reading comprehension is explained by
individual differences in accuracy and the fluency of decoding
words and texts. As children become older, their decoding skills
are automatized, and more resources are allocated to comprehen-
sion (Lerva˚g & Aukrust, 2010; NICHD, 2000; NICHD Early Child
Care Research Network, 2005; Roth, Speece, & Cooper, 2002;
Schatschneider, Fletcher, Francis, Carlson, & Foorman, 2004;
Storch & Whitehurst, 2002). After the early primary school years,
language comprehension gradually accounts for a larger propor-
tion of individual differences in reading comprehension. For this
reason, the sample age is a crucial factor in explaining the results
of studies that compare reading comprehension between first- and
second-language learners; it will subsequently serve as a moder-
ator in our meta-analysis.
Reading Comprehension Development in
Second-Language Learners
Throughout the years, various theoretical claims have been
made concerning the nature of reading comprehension develop-
ment among second-language learners. These theoretical perspec-
tives underlie much extant research and are important for under-
standing what might moderate the differences between first- and
second-language learners. One influential theoretical account is
based on Cummins (1979). In this view, because of a common
underlying language proficiency, second-language learner status
can enhance second-language literacy skills because of the trans-
ference of skills from the first language. In addition, the develop-
ment of second-language skills is moderated by socioeconomic
status. Children from a higher socioeconomic background are
more likely to use context-independent language at home that
corresponds with the schooling language. This expansive use of
language will presumably facilitate language transfer and lead to
smaller group differences when second-language learners are com-
pared with first-language learners. Thus, socioeconomic back-
ground will also serve as a moderator in our meta-analysis.
A second influential theory is based on the notion of contrastive
analysis (Connor, 1996; Odlin, 1989). Within this perspective, the
first and second languages are analyzed for the purpose of identi-
fying structural (e.g., related to phonology, syntax, or semantics)
similarities and differences (Odlin, 1986), which can either facil-
itate or impede the acquisition of the second language. The degree
of structural similarity between the first and second languages may
affect the size of the group differences in reading comprehension
and underlying skills between first- and second-language learners.
Hence, structural similarities between language 1 (L1) and lan-
guage 2 (L2) constitute an important variable that will be exam-
ined as a moderator. This idea can also be applied to the similar-
ities and differences between writing systems. In other words,
learning to read a second orthography that is based on the same
principles for converting print to sound (e.g., the alphabetic prin-
ciple) as the first orthography should be easier than learning a
second orthography that uses a different principle for the conver-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
410
MELBY-LERVÅG AND LERVÅG
sion (e.g., idiographic). Therefore, orthography is another poten-
tially important moderator factor.
The third theoretical perspective is the so-called time on task
hypothesis (Porter, 1990). Here, the time spent learning the first
language may have a negative impact on the learner’s second-language
skills. Consistent with this view, because learning a new language
depends on exposure to that language, an emphasis on the first
language at home and at school can negatively affect second-
language learning. According to this perspective, first-language
skills do not have a positive impact on second-language skills.
Consequently, children who use both the first and the second
language at home should have better second-language skills than
children who use only their first language at home. Likewise,
children who are instructed in the second language only at school
should have better second-language skills than children who are
instructed in both languages. Thus, language used at home and in
instruction is a potentially important additional moderator variable.
Previous single studies. Various studies using different re-
search designs have compared reading comprehension skills be-
tween first- and second-language learners. In concurrent studies,
there is substantial variation in the size of the group differences in
reading comprehension. Although some studies show that second-
language learners perform at the same level as (or better than)
first-language learners on reading comprehension tests (Chiappe,
Glaeser, & Ferko, 2007; Lesaux, Rupp, & Siegel, 2007; Verhoeven
& Vermeer, 2006), others indicate that second-language learners
perform worse than their first-language-learner counterparts (Han-
non & McNally, 1986; Kovelman, Baker, & Petitto, 2008; Lerva˚g
& Aukrust, 2010).
Longitudinal studies of reading comprehension development are
crucial for understanding the factors that cause this large variation
in results. Unfortunately, only a few studies that examine reading
comprehension in second-language learners have followed the
same group of children across time. In one such study, Lerva˚g and
Aukrust (2010) showed that for both first- and second-language
learners (Urdu as L1 and Norwegian as L2), only language com-
prehension, and not decoding skills, explains the increase in read-
ing comprehension skills from the middle of second to the end of
third grade beyond mother’s educational level and nonverbal abil-
ities. They also found that language comprehension (vocabulary)
was a particularly strong predictor among second-language learn-
ers compared to first-language learners. The limitations in lan-
guage comprehension skills among the second-language learners
were sufficient to explain the gap between the two groups in
reading comprehension. The differences between the two groups
with respect to reading comprehension were large at the onset of
the study, and the gap between the two groups increased for both
measures of reading comprehension (Woodcock Reading Mastery
Tests—Revised and Neale Analysis of Reading Ability) during the
study period.
Droop and Verhoeven (2003) demonstrated that for both
first- and second-language learners (Moroccan and Turkish as
L1, Dutch as L2), decoding and language comprehension ex-
plained variation in reading comprehension skills in third grade,
but the influence of decoding skills ceased by the end of fourth
grade. As for group differences, there was a large gap between
the groups in favor of the first-language learners at the onset of
the study, but the gap decreased, increased, or remained stable
across measures of reading comprehension. Similarly, in a
study by Verhoeven (2000) of children starting at age 6, lan-
guage comprehension had a greater impact on reading compre-
hension among second-language learners (Mixed L1 languages,
Dutch as L2) than among first-language learners. Also, whereas
the reading comprehension level of second-language learners
was similar to that of first-language learners at the study’s
onset, the level increased for one measure and decreased for
another measure during the period of the study. On some tests,
children approached the ceiling on reading comprehension; as a
result, few differences could be found.
Moreover, Hutchinson, Whiteley, Smith, and Connors (2003)
found that second-language learners (mixed Arabic languages
as L1, English as L2) demonstrated poorer reading comprehen-
sion skills than first-language learners in second grade. This
trend remained stable through the fourth grade. In addition,
earlier language comprehension skills were more important for
reading comprehension among second-language learners than
among first-language learners. Nakamoto, Lindsey, and Manis
(2007) showed that phonological processing and language com-
prehension explained the growth in reading comprehension
from the first throughout the sixth grade in a manner consistent
with the findings in studies of first-language learners. As for
group differences, second-language learners (Spanish as L1,
English as L2) started to lag behind beginning in third grade,
and this gap increased through fifth grade. Hacquebord (1994)
found a similar increase in group differences during secondary
school (Turkish as L1; Dutch as L2). Further, Lesaux et al.
(2007) found a similar predictive pattern for first- and second-
language learners. With respect to group differences, Lesaux et
al. found that in fourth grade, differences in reading compre-
hension between first- and second-language learners (mixed as
L1; English as L2) were negligible.
Overall, longitudinal studies of first- and second-language learn-
ers confirm the pattern that phonological awareness, decoding, and
language comprehension skills are crucial in predicting later read-
ing comprehension, but the majority of studies indicate that lan-
guage comprehension seems to be even more important for
second-language learners than for first-language learners. How-
ever, when the size of the differences in reading comprehension
between first- and second-language learners and how these differ-
ences change over time are considered, results from extant longi-
tudinal studies are inconsistent.
Prior reviews and meta-analyses. Lesaux, Koda, Siegel, and
Shanahan (2006) conducted a narrative review of reading compre-
hension in second-language learners. They concluded that the
general tendency is that the second-language learners performed
less well on reading comprehension when compared with first-
language learners. Lesaux et al. further suggested that the factors
that influence reading comprehension among second-language
learners generally fall into two categories: contextual, such as the
learners’ socioeconomic background and the type of reading com-
prehension test; or individual, such as word-reading skills and
background knowledge.
Additionally, a meta-analysis by Melby-Lerva˚g and Lerva˚g
(2011) examined the cross-linguistic transfer of reading compre-
hension, language comprehension, decoding, and phonological
awareness. For decoding and phonological awareness skills,
second-language learners can benefit from the transference of
skills used in their first language, thereby reducing the size of the
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
411
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
group difference for these skills between first-language learners
and second-language learners. However, for language comprehen-
sion, the transfer in skills from the first to the second language is
very small.
Development of Language Comprehension Skills
Development of Language Comprehension Skills in
First-Language Learners
The acquisition of language comprehension skills is an essential
aspect of child development, as these skills are crucial both for
individual reasoning and for communicating with others. It has
been estimated that high school graduates must know the meaning
of approximately 75,000 English words; to accomplish this feat,
they will have to learn an average of 10 to 12 words per day
between the ages of 2 and 17 (Snow & Kim, 2007). Achieving this
goal is clearly a complex and multiple-layered task. In addition,
studies have shown that the rank-ordering of children according to
their language comprehension skills remains nearly unchanged
from the age of 4 to the fourth grade, suggesting that the skills that
underlie language learning are formed at an early age (Gathercole,
Willis, Emslie & Baddeley, 1992; Lerva˚g & Aukrust, 2010;
Melby-Lerva˚g, Lerva˚g, et al., 2012; Storch & Whitehurst, 2002).
It is also well established that being raised in an impoverished
environment leads to poorer overall outcomes with respect to
language comprehension than being raised in a middle- or upper-
class context (Hart & Risley, 1995; for a review, see Hoff, 2006).
A significant number of studies have shown that joint book reading
and exposure to literature enhance children’s language skills (Mol
& Bus, 2011) and that children from lower socioeconomic back-
grounds are typically less exposed to such experiences (Hoff,
2006). In addition, parents of high socioeconomic status (SES)
typically talk more often to their children, use a more elaborate
vocabulary, and engage their children more often in context-
independent conversation than do parents with lower socioeco-
nomic backgrounds (Hart & Risley, 1995; Hoff, 2006; Pan, Rowe,
Singer, & Snow, 2005). Thus, the weight of evidence suggests that
SES affects the quality and the quantity of the language to which
children are exposed. SES is thus an important variable that may
affect the results of studies comparing reading and language com-
prehension among first- and second-language learners, and it will
be used as a moderator variable in the subsequent meta-analysis.
Development of Language Comprehension Skills
Among Second-Language Learners
Prior single studies. Although some cross-sectional studies
show that second-language learners have language comprehension
skills similar to those of first-language learners (Bialystok, Shen-
field, & Codd, 2000; D’Angiulli, Siegel, & Sierra, 2001; Westman,
Korkman, Mickos, & Byring, 2008), the majority of studies dem-
onstrate that second-language learners have much poorer language
comprehension skills than first-language learners (e.g., Droop &
Verhoeven, 2003; Scruggs, Mastropieri, & Argulewicz, 1983).
This finding is not surprising, given that second-language learners
begin at a disadvantage in terms of their language comprehension
skills. Even if they can derive benefits when their first language
shares semantic features (cognates) with their second language
(see Odlin, 1986), second-language learners must develop lan-
guage comprehension skills at a faster pace if they are to achieve
the same level as first-language learners. Therefore, compared to
first-language learners, second- language learners often demon-
strate a restricted second-language vocabulary.
Longitudinal studies that compare language comprehension be-
tween first-language learners and second-language learners show
diverging results with respect to the gap between language com-
prehension skills and the stability or consistency of this difference
across time. Verhoeven (2000) showed that the group differences
in language comprehension were significant and that they favor
first-language learners at the beginning of first grade; however, by
the end of second grade, the gap between the two groups had
decreased (mixed as L1, Dutch as L2). Notably, according to
Verhoeven (2000), this gap may be due to ceiling effects on the
measure for first-language learners. Droop and Verhoeven (2003)
showed that second-language learners (Moroccan and Turkish as
L1, Dutch as L2) start out with much poorer language compre-
hension skills than first-language learners and that from the be-
ginning of the third grade to end of the fourth grade, the gap
between the groups increased. Jean and Geva (2009) found that
second-language learners began the fifth grade with less knowl-
edge of word meanings than first-language learners and that this
gap remained stable on a measure of receptive vocabulary but
increased on a measure of root word meanings in the sixth grade
(mixed as L1, English as L2).
Prior reviews and meta-analyses. No prior meta-analyses or
narrative reviews have been conducted on language comprehen-
sion skills among second-language learners. However, Geva
(2006) has conducted a narrative review of the relation between
language comprehension, decoding, and reading comprehension
among second-language learners. The conclusion was that al-
though English language comprehension plays a significant role in
the reading comprehension of second-language learners, multivar-
iate studies suggest that this relation is moderated by contextual
factors such as home language use, SES, and instructional expe-
riences.
Development of Decoding Skills
Development of Decoding Skills in
First-Language Learners
Developmental studies have shown that decoding is a code-
related skill that is heavily influenced by instruction. Overall,
decoding and fluency skills rapidly increase after the onset of
reading instruction and then level off in the early and middle
grades of primary school (Caravolas, Lerva˚g, Defior, Seidlová-
Málková, & Hulme, 2013; Seymour, Aro, & Erskine, 2003).
Studies of first-language learners reveal that proficient decoding
skills involve both visual and phonological processing (Seidenberg
& McClelland, 1989), in which visual processes activate skills that
allow the reader to link the visual symbol (letter) with the correct
sound. As for phonological processing, both longitudinal and
experimental training studies have shown that, in addition to letter
knowledge, phonological skills are a critical precursor for devel-
oping efficient decoding skills (e.g., de Jong & van der Leij, 1999;
Hulme et al., 2002; Lerva˚g, Bra˚ten, & Hulme, 2009; Näslund &
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
412
MELBY-LERVÅG AND LERVÅG
Schneider, 1991; National Early Literacy Panel, 2008; Wagner,
Torgesen, & Rashotte, 1994). In particular, the awareness of
phonemes, rather than larger units such as rhymes or syllables,
seems to play a pivotal role in the development of decoding skills
(Castles & Coltheart, 2004; Hulme et al., 2002; Macmillan, 2002;
Melby-Lerva˚g, Lyster, & Hulme, 2012). It has been argued that
phonological awareness in children progresses from awareness of
larger units (rhymes, syllable) to phoneme awareness and that
phoneme awareness tasks are more difficult than corresponding
tasks with larger units (Carroll, Snowling, Hulme, & Stevenson,
2003; McBride-Chang, 2004).
When considering the differences between alphabetical orthog-
raphies and the development of decoding skills, some have argued
that phoneme awareness is most critical for learning to read in
irregular orthographies, such as English, and that it is of less
importance in more regular orthographies (Aro & Wimmer, 2003;
Seymour et al., 2003; Share, 2008; Wimmer, 1993). In contrast,
others have posited that the predictive relations between phono-
logical skills and reading ability do not show any substantial
differences between English and other, more consistent alphabetic
orthographies (Caravolas et al., 2012; Caravolas, Volin, & Hulme,
2005; Vaessen et al., 2010; Ziegler et al., 2010). There is also a
growing recognition that phonological skills may be involved in
learning to read in nonalphabetic ideographic orthographies, such
as Chinese (Hanley, 2005; Huang & Hanley, 1995; McBride-
Chang et al., 2005). Given the inconclusive results, it is important
to examine how the type of orthography (both in the first and
second languages) affects the results of studies comparing decod-
ing and phonological awareness skills among first- and second-
language learners. Type of orthography will therefore be used as a
moderator in the subsequent meta-analysis.
Development of Decoding Skills Among
Second-Language Learners
Prior single studies. Numerous studies have compared de-
coding and phonological awareness skills between first-language
learners and second-language learners. Theoretically, it has been
hypothesized that because second-language learners have the op-
tion of comparing the structure between two languages, they have
an advantage in developing metalinguistic awareness compared to
first-language learners (see Bialystok, Majumder, & Martin,
2003). However, in studies comparing decoding and phonological
awareness between first- and second-language learners, there are
significant variations between the size of the group differences for
both decoding (e.g., Chiappe et al., 2007; McBride-Chang, Bia-
lystok, Chong, & Li, 2004) and phonological awareness (e.g.,
Bialystok, Luk, & Kwan, 2005; Kovelman, Baker, & Petitto,
2008).
As for the longitudinal studies that have examined growth in
decoding skills among first-language learners compared to second-
language learners, Verhoeven (2000) found that the two groups
began at a similar level that remained stable throughout the study
(mixed as L1, Dutch as L2). A study by Jongejan, Verhoeven, and
Siegel (2007) showed similar findings; that is, first- and second-
language learners (mixed as L1, English as L2) performed at the
same level on both decoding and phonological awareness tasks,
and this similarity remained stable from the first to the fourth
grade. Jongejan et al. also determined that phonological awareness
remains the most important predictor of decoding skills in the third
and fourth grades. Similarly, the study by Hutchinson, Whiteley,
Smith, and Connors (2004) found that the two groups start out at
the same level with respect to phonological awareness and that this
level remained stable from second to sixth grade (mixed Asian
languages as L1, English as L2). In Droop and Verhoeven’s (2003)
study, the results were mixed and dependent on the type of
decoding test. In general, second-language learners (Turkish or
Moroccan as L1, Dutch as L2) were at the same level as first-
language learners from low socioeconomic backgrounds at the
onset of the study. However, the decoding skills of second-
language learners were considerably lower than those of first-
language learners from higher socioeconomic backgrounds. Thus,
sample age can affect the size of the gap between first- and
second-language learners when decoding and phonological aware-
ness are considered, and it will therefore be used as a moderator in
the subsequent meta-analysis.
Prior reviews and meta-analyses. Lesaux et al. (2006) con-
ducted a meta-analysis of 10 studies that compared decoding skills
among first- and second-language learners. The results showed a
minute and insignificant difference in favor of second-language
learners (d ⫽⫺0.09). The results of the 10 studies were homog-
enous, and no analysis examined potential moderators to explain
the differences between studies. However, Lesaux et al. concluded
based on a narrative review that the process of learning to decode
in a second language, as in a first language, is highly influenced by
phonological processing. Because of large variations between the
studies, they concluded that it was not possible to draw any
conclusions regarding group differences in terms of phonological
awareness.
A meta-analysis by Adesope, Lavin, Thompson, and Unger-
leider (2010) found support for the hypothesis of a metalinguistic
advantage for second-language learners. On the basis of 29 studies,
they showed that second-language learners performed significantly
better than first-language learner controls (d 0.33) in terms of
metalinguistic awareness (a broader construct than phonological
awareness that included measures of language reasoning and
grammatical judgments). There was a wide variation between
studies regarding the size of the group differences, but no moder-
ator analysis was conducted exclusively for metalinguistic aware-
ness.
Measurement of Reading Comprehension and
Underlying Skills
Measurement type can affect the results of the studies that
compare reading comprehension and underlying skills between
first- and second-language learners. For reading comprehension,
different reading comprehension tests often demonstrate modest
intercorrelations, suggesting that the tests are measuring different
factors. Keenan, Betjemann, and Olson (2008) showed that tests
that used a cloze procedure (i.e., the reader is asked to fill in a
missing word in a sentence) relied heavily on decoding skills,
whereas tests that use open-ended questions are more dependent
on language comprehension skills. The role of decoding has also
been related to text length, as tests that used single-sentence or
two-sentence passages proved to be more sensitive to decoding
skills than tests that used longer passages (Francis et al., 2006;
Keenan et al., 2008). Keenan and Betjemann (2006) demonstrated
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
413
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
that a problem with tests using a multiple-choice format was that
the child could answer test questions correctly independent of the
passage. In an analysis of passage-independent items (in the Gray
Oral Reading Test), these items were not sensitive to reading
disability, and the learners’ performance on such items did not
correlate with performances on other reading comprehension tests.
The type of test used to measure reading comprehension is thus an
important factor that can affect the results of studies that compare
reading comprehension between first- and second-language learn-
ers, and it will be used as a moderator in the subsequent meta-
analysis.
With respect to assessing language comprehension, several mea-
sures—including receptive picture vocabulary tests (pointing to
the correct picture after being presented a word), word definition
tests (defining the meaning of a word), and oral cloze tests (orally
filling in a missing word in a sentence)—are commonly used. It
has been suggested that picture vocabulary and definition tests rely
on different skills, as the picture vocabulary tests depend on the
breadth of word knowledge, and word definition tasks depend on
the depth of word knowledge (Ordoñez, Carlo, Snow, & Laughlin,
2002). Others, however, have found no conceptual distinction
between the two types of tests (Vermeer, 2001). Another important
factor is that the alpha reliability of these measure types can differ:
Although picture vocabulary tests often demonstrate high reliabil-
ity, word definition tests tend to show lower reliability (e.g.,
Lerva˚g & Aukrust, 2010). In a bivariate relationship, unreliability
always attenuates a relationship (Shadish, Cook, & Campbell,
2002), which can mask true group differences between first- and
second-language learners. The oral cloze assessment differs from
the word definition and picture vocabulary tests, because it often
uses a multiple-choice format and the words are presented in a
sentence where the meaning of the word can be guessed from the
context. Such a format may be easier than word definition and
picture vocabulary tests (Pearson, Hiebert, & Kamil, 2007), thus
perhaps reducing the differences between first- and second-
language learners. Thus, task type will serve as a moderator in the
meta-analysis.
Measures of decoding are generally assessed by using either
untimed accuracy measures or timed fluency measures based on
the reading of real words or nonwords. Fluency measures are often
used to avoid ceiling effects in more transparent languages, in
which children learn to read at a faster pace than English-speaking
children do (Caravolas et al., 2013). When ceiling effects are
avoided, fluency and accuracy measures using both words and
nonwords tend to be highly reliable and highly correlated (see
Lerva˚g et al., 2009). On this basis, there is little reason to believe
that the decoding test type should affect the size of the gap
between first- and second-language learners. Still, accuracy versus
fluency tests will be used as a moderator for our later analyses. As
for measures of phonological awareness, it has been suggested that
tests of phoneme awareness (e.g., phoneme deletion) typically
demonstrate higher alpha reliability than tests of rhyme awareness
(e.g., rhyme detection; Muter, Hulme, Snowling, & Stevenson,
2004). This difference can deflate the size of the group variance
between first- and second-language learners regarding rhyme
awareness. Thus, test type will subsequently serve as a moderator
in the meta-analysis.
The Current Study
Based on the narrative review of prior studies, there are several
reasons for which a meta-analysis is necessary at this time. First,
no prior meta-analysis has summarized group differences in lan-
guage comprehension, reading comprehension, and phonological
awareness between first- and second-language learners. As the
findings from prior single studies are highly inconsistent, a meta-
analysis that summarizes the differences in language comprehen-
sion skills and examines factors that affect those skills in second-
language learners seems crucial. Second, no meta-analyses have
systematically tested potential explanations for group differences
in reading comprehension and underlying skills between first- and
second-language learners. Such an analysis may shed light on
factors that may explain the size of the group differences between
first- and second-language learners. Also, because single studies
examine children of different ages, a merging of the studies in a
meta-analysis will provide data that can be used to generate
hypotheses concerning developmental relations and the stability of
group differences over time. Given the few longitudinal studies in
this field and their inconsistent results regarding group differences,
using age as a moderating variable can offer important directions
for future research. Finally, the meta-analysis of decoding skills by
Lesaux et al. (2006) must be updated, given that their analysis
concluded in 2002.
Hypotheses
Our meta-analyses examine first- and second-language learners
in relation to four different constructs: reading comprehension,
language comprehension, decoding, and phonological awareness.
First, in our meta-analysis of group differences in reading compre-
hension, we examined whether the moderators of age, socioeconomic
status, home language, instructional language, differences between first
and second language, consistency of first-language orthography,
and test type could explain differences between the studies with
respect to the size of group differences. Second, in our meta-
analysis of studies comparing language comprehension between
first- and second-language learners, we examined the moderators
of age, socioeconomic status, home language, instructional lan-
guage, language type, and test type. Finally, in our meta-analysis
of both decoding and phonological awareness skills, we utilized
the moderators of age, socioeconomic status, home language,
instructional language, writing system in the first language, con-
sistency of first-language orthography, and test type. For all out-
comes, we used nonverbal IQ as a moderator to rule out the
possibility that the group differences would be a function of this
important factor. Further, we examined whether variables related
to methodological quality (year of publication and distributional
characteristics) could explain the differences between studies.
Year of publication is important because it has been demonstrated
that effect sizes in published studies tend to fade and decrease as
a function of time (see Ioannidis, 1998; Jennions & Møller, 2001).
Distributional characteristics are important, as floor effects among
second-language learners could lead to small differences between
the groups (related to overly complex tests), and ceiling effects
could lead to small differences between the groups because of
overly easy tests. Finally, we tested whether study origin (Asia,
Australia, Europe, Canada, or the United States) could explain
variations in the size of group differences between studies. Indeed,
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
414
MELBY-LERVÅG AND LERVÅG
Antecol, Cobb-Clark, and Trejo (2003) found that “Australian and
Canadian immigrants have higher levels of English fluency, edu-
cation, and income (relative to natives) than do U.S. immigrants”
(p. 192). As our introduction suggests, these are issues that might
affect the differences between the reading and language skills of
first- and second-language learners.
On the basis of theory and prior studies, we identify the follow-
ing main hypotheses to test in our meta-analyses. In the hypotheses
below, group differences refer to differences between first- and
second-language learners.
1. As reading comprehension is the product of decoding and
language comprehension (Gough & Tunmer, 1986), we expect that
the size of the mean effect size will fall between that of the studies
pertaining to decoding and language comprehension. The relative
importance of decoding and language comprehension skills for
reading comprehension changes during the course of development
(e.g., NICHD Early Child Care Research Network, 2005). There-
fore, we expect that age will be an important moderating variable
and that the group differences will be smaller for children in their
first years of primary school than the group differences for older
children. We also expect that characteristics related to reading-
comprehension test type (Francis et al., 2006; Keenan & Betje-
mann, 2006; Keenan et al., 2008) will be important in explaining
why the different studies yielded different results.
2. Because of the complexity of language comprehension (Snow
& Kim, 2007) and because second-language learners begin at a
disadvantage (with a limited degree of transference of first-
language skills; Melby-Lerva˚g & Lerva˚g, 2011), we expect that
studies examining language comprehension will demonstrate large
group differences. We expect that the size of the group differences
will be moderated by socioeconomic background (Hart & Risley,
1995; Hoff, 2006; Pan et al., 2005), the degree of exposure to the
second language at home and in school (Odlin, 1989; Porter,
1990), and the extent to which their first and second languages
share cognates. Because the child can receive contextual support
from a sentence in oral cloze assessments, we expect this test to
generate smaller group differences than studies that use tests with
no contextual support (i.e., word definitions and picture vocabu-
lary).
3. In the area of phonological awareness and decoding, because
such skills are easily taught and are sensitive to transference from
the first language (Melby-Lerva˚g & Lerva˚g, 2011; Snow & Kim,
2007), there will be small group differences that may possibly
favor the second-language learners (Adesope et al., 2010). We
further expect that group differences will be moderated by socio-
economic status (Hart & Risley, 1995; Hoff, 2006; Pan et al.,
2005) and the degree of exposure to the second language at home
and in school (Odlin, 1996; Porter, 1990). We also expect that
second-language learners with an ideographic first language will
have poorer phonological awareness and decoding skills than will
those second-language learners who have an alphabetic first lan-
guage (McBride-Chang et al., 2005). We also expect that group
differences for phonological awareness will be moderated by the
type of test used, as phoneme-awareness tasks are presumably
more difficult than tasks using larger units (Castles & Coltheart,
2004; Hulme et al., 2002; Macmillan, 2002; Melby-Lerva˚g, Lyster,
& Hulme, 2012).
Method
To ensure its methodological quality, our meta-analysis was
designed and reported to be consistent with Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA) rec-
ommendations (www.prisma-statement.org).
Literature Search, Inclusion Criteria,
and Study Coding
The literature search and inclusion criteria are shown in Figure
1. When we selected studies for the meta-analysis, “second-
language learners” were operationally defined as children/
youths who either use or study two languages. In addition, the
child/youth must be exposed to each language either regularly
at home with at least one parent or in school for at least 4 hours
per day. Control groups with monolingual first-language learn-
ers were defined as samples consisting of children who spoke
only one language at home, which had to be the same as the
instructional language.
Multiple methods were used to obtain a sample of relevant
studies. The electronic database search was conducted by investi-
gators under the supervision of librarians. Searches were devel-
oped from the keywords bilingual
, L2 learners, second-language
learners, English language learners (ELL), English second lan-
guage (ESL), English additional language (EAL), language mi-
nority, limited English proficient (LEP), limited English speaking
and multilingual
paired with phon
awareness, vocabulary, lan
-
guage comprehension and reading, decoding, and word attack.
Search limits included publications in English from 1965 to May 10,
2013. Abstracts for peer-reviewed studies, non-peer-reviewed studies,
book chapters, dissertations, conference proceedings, and reports
were also examined. All issues of International Journal of Bilingual
Education and Bilingualism, Bilingualism, TESOL Quarterly, and
International Journal of Bilingualism after 1980 were hand searched
for relevant papers. Finally, authors who were represented by more
than three independent studies in the meta-analysis were contacted by
e-mail and asked for unpublished or in-press material.
The target constructs in this study were reading comprehension,
language comprehension, phonological awareness, and decoding.
For each of these constructs, criteria were established to determine
the types of measures that represented each. The criteria estab-
lished for the indicators of each construct were broad, and a broad
range of tests were judged as valid indicators for the target con-
structs to increase the power of the overall analysis. Because the
criteria for the indicators were broad, the differences between test
types for each construct were also examined.
To be considered a measure of reading comprehension, studies
in which a child read a passage or sentence and answered questions
in relation to the text were included.
To be considered a measure of language comprehension, tests
that aim to measure expressive or receptive vocabulary by means
of pictures, oral cloze, or listening comprehension were included.
The reason we used this broad language comprehension construct
that also included listening comprehension was to increase the
power of the meta-analysis. Treating vocabulary and listening
comprehension as a single construct is also supported in a latent-
variable study by Lerva˚g (2010), which showed that after mea-
surement errors were taken into account by using latent variables,
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
415
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
listening comprehension was highly related to expressive and
receptive vocabulary (in 7-year-olds). Furthermore, they all loaded
on the same factor. To be considered as a measure of phonological
awareness, the task must involve deletion, blending, counting,
segmentation, generation, judgment, position analysis or replace-
ment of phoneme, onset, rhymes, and/or syllables in words. To be
considered as a decoding measure, the test should comprise read-
ing fluency and/or reading accuracy of words, nonwords, sentence
decoding, or passage decoding.
The abstracts from all search types were printed and judged
according to relevance, and papers that seemingly met the criteria
for inclusion based on the abstract were examined to determine
whether sufficient statistics for an effect-size calculation were
presented and to decide whether all inclusion criteria were met.
This process resulted in the coding of a total of 82 studies with 160
independent group comparisons that included 15,137 second-
language learners and 111,418 monolingual first-language learn-
ers.
Violating the assumption of independence by computing an
overall effect size based on information from the same sample
more than once can lead to incorrect estimates (Hunter & Schmidt,
1990). Thus, several considerations were made before the studies
were coded. First, studies from the same author were examined to
detect duplicate samples. When it was not possible to determine
whether samples were dependent, independence was assumed. For
longitudinal studies, information from only one time point was
coded. Because of attrition, the first time point usually provides the
largest sample and was therefore the preferred sample for coding.
An exception to this practice was if the longitudinal study began
before children had received formal reading instruction. In such
cases, the time point at which the children were first measured on
reading and reading-related measures was coded. For experimental
Records aer duplicates removed:
(n =
2,890)
Screening
Included
Eligibility
Abstracts screened
(n =
2,890)
Abstract excluded
(n = 2,067)
Full-text arcles assessed
for eligibility
(n = 823)
Full-text arcles excluded,
(n = 741)
Reasons:
Did not contain empirical
data on any of the target
measures
Did not report sufficient
data for effect size
calculaon
Foreign language learners
Studies included in meta-
analysis
(n = 82)
Inclusion criteria
Search features:
Electronic database searches (ERIC, Medline, PsycARTICLES, ProQuest
dissertaons,
PsycINFO)
Citaon searches and scanning of reference lists
Hand searches of journals that specialize in bilingual research
Searches of prior meta-analysis and narrave reviews
Google Scholar
Contacts with researchers in the field by e-mail and requests for unpublished or
in-press materials
Search
Included studies must
Report original empirical data based on direct tests (not teacher/parent
rang scales or surveys) of phonological awareness, decoding, language
comprehension and/or reading comprehension
Use a design in which the above skills in L2 learners are compared with a
monolingual L1 control group of the same chronological age
Report sample size, mean and standard deviaon on any of the above
measures for L2 learners and L1 controls
Have a mean sample age below the age of 18, and no reported learning
disabilies or samples based on foreign language learners
Be published in English
Figure 1. Flow diagram for the search and inclusion of studies. L1 first language; L2 second language.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
416
MELBY-LERVÅG AND LERVÅG
studies, only pretest data prior to any intervention were coded. In
the analysis, each construct (i.e., reading comprehension, language
comprehension, phonological awareness, and decoding) was ana-
lyzed separately, and the overall effect sizes were estimated across
each type of construct.
With respect to independence, special considerations were made
concerning the coding of the measures for each of the four con-
structs, given that some studies reported multiple measures for each of
these constructs. Therefore, one indicator was coded for each construct
based on an established set of guidelines. For reading comprehen-
sion, individual tests were coded before group tests, and open-
ended tests were coded before multiple choice tests. For language
comprehension, picture vocabulary tests were coded before other
measurement types. For phonological awareness, phoneme-based
measures were coded before other types of phonological aware-
ness measures, such as awareness for larger units (i.e., rhymes or
syllables) or composite scores. If a study reported several
phoneme-based measures, phoneme deletion was chosen. For de-
coding, real-word reading was coded before nonword reading, and
single word decoding was coded before passage reading.
The coding was conducted by the authors and one assistant.
Independent double coding was used for a random sample of 30%
of the studies. Before coding, the coder was trained in the proce-
dures and the criteria. The coder was a full-time employed research
assistant, with a master’s degree in education, who was trained in
meta-analyses. Intercoder correlation (Pearson’s) for the main
outcomes (i.e., reading comprehension, language comprehension,
phonological awareness, and decoding) was r .99 with an
agreement rate of 89%. Also, intercoder correlation for continuous
moderator variables was r .97 with an agreement rate of 90%.
Cohen’s kappa was used for categorical moderator variables and
was K .93. Disagreements were resolved through discussion or
by consulting the original paper.
Moderator Variables
We conducted a broad coding of a large number of moderators
that could potentially be important for explaining variations be-
tween studies. In addition to the moderators used in the analysis
(listed below), the age of second-language acquisition, length of
residence in the host country for children and parents, parental
second-language fluency, and motivational aspects were coded as
moderator variables. However, the impact of these variables could
not be analyzed, as too few studies (five) reported data on any of
these variables. As for methodological quality, publication status,
sampling method, and alpha reliability were coded. However, none
of these variables could be used as moderators, because only five
studies (with different outcomes) reported information about alpha
reliability. Also, despite special efforts to locate unpublished lit-
erature, only three studies that fulfilled the inclusion criteria could
be classified as such. Furthermore, because the vast majority of
studies used convenience sampling, this factor could not be used as
a moderator. Finally, the mean and standard deviation of language
comprehension skills in the first language and the mean and
standard deviation for language comprehension skills in the second
language were coded as moderators. Because of uncertainty and a
lack of information in the original papers as to whether the first
and second language tests used psychometrically comparable
scales, it was not meaningful to calculate an effect size for differ-
ences in first- and second-language competence in the second-
language learners.
Age. The mean ages of the second-language learners and the
first-language control children were coded. Studies that reported
information regarding only age range and in which the age range
exceeded 2 years were excluded from the age moderator analysis.
In cases where the study reported age within the range of 1 year,
the median in years was coded. When studies reported age accord-
ing to grade level, the median year that corresponded to the
reported grade was coded.
Nonverbal IQ. The means and standard deviations for non-
verbal IQ reported for each group in the original paper were coded.
Orthographic regularity. The degree of regularity between
letter–sound relationships was used as a moderator, and languages
were separated into two categories based on the degree: regular
orthographies or irregular (English).
Language differences. The differences between the first and
second language among second-language learners were coded into
two categories: (a) Indo-European first language/Indo-European
second language and (b) non-Indo-European first language/Indo-
European second language. Indo-European languages are broadly
a family of related languages that share cognates and a common
origin. The Indo-European languages include most languages of
Europe, the Middle East, and India and are distinct from a number
of unrelated language families that predominate elsewhere in the
world (Crystal, 1997).
Writing system. The writing system was coded into two
categories: alphabetic writing system or ideographic writing sys-
tem.
Instructional language. The language of instruction was
coded into two categories: (a) instruction in the second language
and (b) instruction in both first and second languages.
Home language. Home language was coded into two catego-
ries: (a) first language was the only language used with parents and
(b) use of first language was with one parent only; use of second
language was with the other parent.
Socioeconomic status. The information on the SES of
second-language learners was separated into four categories: high,
low, middle, or mixed. The coding of SES was based on informa-
tion reported in the papers concerning family or neighborhood
income and/or educational level. In the analysis, due to a small
number of studies in some of the categories, three categories were
used: (a) middle/high, (b) low, and (c) mixed.
Measure type. The test used to measure reading comprehen-
sion, language comprehension, phonological awareness, and de-
coding was coded.
Sample location. The location of the study—whether Asia,
Australia, Europe, the United States, or Canada—was coded.
Methodological quality. The year of publication and the ratio
between the standard deviation and the mean for each outcome
measure were coded as possible indicators of methodological
quality. The ratio between the standard deviation and the mean
(coefficient of variability) was calculated for each study by divid-
ing the standard deviation by the mean and multiplying by 100.
This calculation expresses the standard deviation as a percentage
of the mean. This moderator was used, as there were indications of
non-normal distributions. If the standard deviation was lower than
15% or higher than 75% of the mean, this was coded as an
indication of a non-normal distribution. We would have preferred
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
417
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
to analyze the mean/standard deviation ratio as a continuous vari-
able, but due to non-normality of the distribution this procedure
was not possible. Although this was not ideal from a methodolog-
ical point of view (see Preacher, Rucker, MacCallum, & Nicewan-
der, 2005, for a discussion), the categorization used a cutoff based
on the distribution of the mean/standard deviation ratio across the
studies. Year of publication was also used as an indicator of
methodological quality and a potential source of bias. It has been
Kovelman, Baker & Petitto, 2008
Kovelman, Baker & Petitto, 2008
Thomas & Collier, 2001
Kovelman, Baker & Petitto, 2008
Hannon & McNally, 1986
Cobo-Lewis, Pearson, et al. 2002
Dalton, Proctor et. al. 2011
Droop & Verhoeven, 2003
Abedi, Lord, et al. 2000
Cobo-Lewis, Pearson, et al. 2002
Lervåg & Aukrust, 2010
Cobo-Lewis, Pearson, et al. 2002
Frederickson & Frith, 1998
Cobo-Lewis, Pearson, et al. 2002
Hutchinson, Whiteley, et al. 2004
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Hutchinson, Whiteley, et al. 2003
Beech & Keys, 1997
Cobo-Lewis, Pearson, et al. 2002
Linn, 1967
Fernandez & Nielsen, 1986
Verhoeven, 2000
Rosenthal, Baker & Ginsburg, 1983
Cobo-Lewis, Pearson, et al. 2002
Droop & Verhoeven, 2003
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Carreker, Neuhaus, et al. 2007
Grant, Gottardo, et al. 2011
Thomas & Collier, 2001
van gelderen, Schoonen, et al 2003
Cobo-Lewis, Pearson, et al. 2002
Hacquebord, 1994
Cobo-Lewis, Pearson, et al. 2002
Argulewicz, Bingenheimer & Anderson, 1983
Proctor, Uccelli et al. 2013
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Garcia, 1991
Cobo-Lewis, Pearson, et al. 2002
Miller-Guron & Lundberg, 2003
Philips & Marvelly, 1984
Cobo-Lewis, Pearson, et al. 2002
Verhoeven & Vermeeer, 2006
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Dalton, Proctor et. al. 2011
Verhoeven & Vermeeer, 2006
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Lesaux, Rupp, & Siegel, 2007
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Chiappe, Glaeser & Ferko, 2007
-4.00 -2.00 0.00 2.00 4.00
Effect size
Studies
Favors monolingual L1
learners
Favors L2 learners
Overall mean effect size
Figure 2. Forest plot of overall average effect size for group differences in reading comprehension between second-
language learners and monolingual first-language learners (Cohen’s d, displayed by with confidence intervals represented
by horizontal lines) and effect sizes with confidence intervals for each study (Cohen’s d, displayed by with confidence
intervals represented by horizontal lines). L1 first language; L2 second language.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
418
MELBY-LERVÅG AND LERVÅG
demonstrated that the effect sizes in published studies tend to fade
and decrease as a function of time (Ioannidis, 1998; Jennions &
Møller, 2001). This trend is mainly due to publication bias: Studies
with large effect sizes tend to be published more easily and will be
published first, whereas studies with smaller or zero effect sizes
take longer to be published and will be published later.
Meta-Analytic Procedures
Effect size and heterogeneity. The majority of analyses were
conducted with Comprehensive Meta-Analysis software (Boren-
stein, Hedges, Higgins, & Rothstein, 2005). The analytic proce-
dures included the following steps. First, the effect sizes for the
studies entailing group comparisons were computed separately by
means of Cohen’s d based on Hedges’ formula (Hedges, 1981).
We used this calculation because it is corrected for sample size,
and, therefore, unlike other effect size estimates, it does not tend to
be upwardly biased for small samples. When Cohen’s d is ex-
pressed in positive terms, the second-language learners have better
performance on the test (i.e., a higher group mean) than the
monolingual children. A 95% confidence interval was calculated
for each effect size to examine whether it was larger than zero. If
the confidence interval does not cross zero, the effect is statisti-
cally significant.
The overall effect size was estimated by calculating a weighted
average of the effect sizes for each outcome construct. The com-
putation of an overall effect size was based on a random effects
model, which rests on the assumption that variations between
studies can be systematic and are therefore not only due to random
error as in the fixed-effect model. Whether the overall effect size
differed from zero was tested with a z test, and a sensitivity
analysis was used to determine the impact from outliers. A sensi-
tivity analysis allows for an adjusted overall effect size to be
estimated after removing studies, one by one, when extreme effect
sizes are detected.
The Q test of homogeneity was used to examine the variation in
effect sizes between studies (Hedges & Olkin, 1985); I squared (I
2
)
was used to determine the magnitude of heterogeneity. I
2
is the
proportion of the total variation between the effect sizes that are
caused by real heterogeneity rather than by chance (Borenstein,
Hedges, Higgins, & Rothstein, 2009).
Moderator variables. In all analyses of moderator variables,
as when estimating an overall effect size, random effects models
were used. For the continuous moderator variables, meta-
regression based on a method of moments regression analyses for
random effects models is used to predict study outcomes from the
moderator variables. In a random effects regression analysis based
on the method of moments (also known as the DerSimonian and
Table 1
Number of Effect Sizes, Effect Size, 95% Confidence Interval (CI), Heterogeneity Statistics, Differences in d Between Categories (With
Significance Test), and p Values for Moderators of Reading Comprehension Differences Between First- and
Second-Language Learners
Moderator variable
Number of
effect sizes (k)
Effect
size (d) 95% CI
Heterogeneity
(I
2
)
Difference in d
(highest lowest
category)
Significance test
of differences between
categories (Q test)
Socioeconomic status
Low 25 0.73
ⴱⴱ
[0.88, 0.57] 83.98
ⴱⴱ
High/middle 12 0.53
ⴱⴱ
[0.78, 0.27] 72.00
ⴱⴱ
0.20 .19
Instructional language
Both L2 and L1 12 0.47
ⴱⴱ
[0.71, 0.23] 66.93
ⴱⴱ
L2 34 0.52
ⴱⴱ
[0.62, 0.40] 77.79
ⴱⴱ
0.05 .74
Home language
Both 15 0.53
ⴱⴱ
[0.70, 0.36] 47.23
L1 18 0.76
ⴱⴱ
[1.08, 0.44] 90.54
ⴱⴱ
0.23 .21
Orthography L2
Irregular 48 0.63
ⴱⴱ
[0.74, 0.52] 87.69
ⴱⴱ
Regular 9 0.57
ⴱⴱ
[0.77, 0.37] 85.25
ⴱⴱ
0.06 .60
Task type in test
Cloze 34 0.58
ⴱⴱ
[0.78, 0.40] 85.06
ⴱⴱ
Single questions to text 15 0.78
ⴱⴱ
[0.93, 0.61] 87.10
ⴱⴱ
Multiple choice questions 5 0.38
ⴱⴱ
[0.60, 0.17] 67.23
ⴱⴱ
0.40 .01
ⴱⴱ
Text type in test
Sentence 28 0.43
ⴱⴱ
[0.60, 0.27] 70.19
ⴱⴱ
Passage 26 0.78
ⴱⴱ
[0.94, 0.62] 91.47
ⴱⴱ
0.35 .005
ⴱⴱ
Study origin
Europe 15 0.63
ⴱⴱ
[0.79, 0.47] 80.86
ⴱⴱ
United States 40 0.63
ⴱⴱ
[0.75, 0.50] 87.26
ⴱⴱ
0 .95
Distribution
Studies with floor or
ceiling effects 30 0.47
ⴱⴱ
[0.59, 0.35] 90.07
ⴱⴱ
Other studies 27 0.76
ⴱⴱ
[0.90, 0.62] 61.98
ⴱⴱ
0.31 .003
ⴱⴱ
Note. d the effect size for subsets of studies belonging to different categories of the moderator variable; k number of studies; I
2
the proportion
of total variation between the effect sizes that are caused by real heterogeneity rather than by chance; L1 first language; L2 second language.
p .05.
ⴱⴱ
p .01.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
419
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
Laird method) weights are assigned to each study by estimating
variance between studies on the basis of the sum of within-study
variance and between-studies variance (Borenstein et al., 2009).
Thus, this procedure will lead to each study being weighted more
evenly in the regression analysis, and the apparent difference in
weighting between small studies versus large studies that is present
in a fixed effect model will be less tangible. Because we expect
true differences between studies, this method of regression will be
a more plausible model than just assigning weights to studies on
the basis of their sample size. The meta-regression was conducted
with macros developed for SPSS (Lipsey & Wilson, 2001; Wilson,
2006). To determine the strength of the predictors on the study’s
outcome, a percentage of between-study variance explained (R
2
)
was used as an effect size.
For the categorical moderator variables, a Q test was used to
test the effect size differences between subgroups of studies
belonging to different categories in a moderator variable. Un-
fortunately, the number of studies was not sufficient to under-
take multivariate analysis of the categorical moderator vari-
ables. We therefore report results based on multiple one-way
significance tests (Q tests). Because these significance tests are
based on parts of the same data set, this increases the possibility
for making Type I errors (i.e., conclude that there is a signifi-
cant difference between subsets of studies when there are none;
see Pigott, 2012, for discussion). In order to deal with the
elevated level of Type I errors due to multiple significance
tests, we therefore emphasize the degree of overlap in confi-
dence intervals between the categories of each moderator vari-
able rather than significance tests. The degree of overlap be-
tween the confidence intervals for the mean effect size of each
category will yield the same information as a significance test
but is not affected with problems related to multiple compari-
sons (Pigott, 2012; Valentine, Pigott, & Rothstein, 2010).
For the categorical moderator analysis, if the omnibus test was
significant and there were three or more categories, we conducted
post hoc pairwise comparisons. Studies were separated in subsets
based on the categories of the moderator variable. The degree of
differences between the subsets of studies were determined by
comparing the overlap between confidence intervals for effect
sizes generated for each subset of studies and by comparing the
size of Cohen’s d between the study subsets.
Publication bias. A funnel plot was used to determine the
presence of publication bias. In the funnel plot, sample size is
plotted on the y-axis, and effect size is plotted on the x-axis. In
the absence of retrieval bias, this plot is expected to form an
inverted funnel. In the presence of bias, the funnel will be
asymmetric. To detect publication bias, one examines funnel
plots for all analyses presented. The trim and fill (Duval &
Tweedie, 2000) method was used to examine the impact from
Scruggs, Mastropieri & Argulewicz, 1983
Droop & Verhoeven, 2003
Schaerlaekens, Zink & Verheyden, 1995
Kovelman, Baker & Petitto, 2008
Vermeer, 2001
Lervåg & Aukrust, 2010
Droop & Verhoeven, 2003
Siegal, Iozzi & Surian, 2009
Cobo-Lewis, Pearson, et al. 2002
Schaerlaekens, Zink & Verheyden, 1995
Schaerlaekens, Zink & Verheyden, 1995
Kovelman, Baker & Petitto, 2008
Verhoeven, 2000
Schaerlaekens, Zink & Verheyden, 1995
San Francisco, 2003
Vermeer, 2001
Cobo-Lewis, Pearson, et al. 2002
Messer, Leseman, et al. 2010
Cobo-Lewis, Pearson, et al. 2002
San Francisco, Mo, et al. 2006
Myers & Goldstein, 1979
Beech & Keys, 1997
Hutchinson, Whiteley, et al. 2003
Hannon & McNally, 1986
Bialystok, Luk & Kwam, 2005
Vermeer, 2001
Schaerlaekens, Zink & Verheyden, 1995
Mahon & Crutchley, 2006
Cobo-Lewis, Pearson, et al. 2002
Chiappe, Glaeser & Ferko, 2007
Siegal, Iozzi & Surian, 2009
Myers & Goldstein, 1979
Chiappe & Siegel, 1999
Wade-Woolley & Siegel, 1997
Philips & Marvelly, 1984
Goetz, 2003
Cobo-Lewis, Pearson, et al. 2002
Hacquebord, 1994
Bialystok & Shapero, 2005
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Bialystok & Shapero, 2005
Myers & Goldstein, 1979
Hemsley, Holm & Dodd, 2006
Silverman, 2007
Kovelman, Baker & Petitto, 2008
Cobo-Lewis, Pearson, et al. 2002
Grant, Gottardo, et al. 2011
Bialystok, McBride-Chang & Luk, 2005
Schaerlaekens, Zink & Verheyden, 1995
van gelderen, Schoonen, et al 2003
Proctor, Uccelli et al. 2013
Bialystok, Luk & Kwam, 2005
Martin-Rhee & Bialystok, 2008
Janssen, Bosman, et. al. 2013
Thorn & Gathercole, 1999
Bialystok, McBride-Chang & Luk, 2005
Geva & Zadeh, 2006
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Miller-Guron & Lundberg, 2003
Hutchinson, Whiteley, et al. 2004
Hemsley, Holm & Dodd, 2006
Mahon & Crutchley, 2006
Da Fontoura & Siegel, 1995
A
rgulewicz, Bingenheimer & Anderson, 1983
Bialystok, Shenfield & Codd, 2000
Goetz, 2003
Nicoladis, 2006
Schaerlaekens, Zink & Verheyden, 1995
Garcia, 1991
Schaerlaekens, Zink & Verheyden, 1995
Goetry, Wade-Woolley et al. 2006
Wang & Geva, 2003
Bialystok, 1988
Engel de Abreu, 2011
Bialystok, Shenfield & Codd, 2000
Goetry, Wade-Woolley et al. 2006
Biyalystok, Majumder & Martin, 2003
Cobo-Lewis, Pearson, et al. 2002
Bialystok & Senman, 2004
Bialystok, Shenfield & Codd, 2000
San Francisco, Mo, et al. 2006
San Francisco, 2003
Cobo-Lewis, Pearson, et al. 2002
Verhoeven & Vermeeer, 2006
Bialystok, Barac, et al. 2013
Cobo-Lewis, Pearson, et al. 2002
Sheng, McGregor & Marian, 2006
Bialystok & Senman, 2004
Jean & Geva, 2009
Bialystok, 1997
Bialystok, 1997
Bialystok, 1999
Linn, 1967
Cobo-Lewis, Pearson, et al. 2002
Verhoeven & Vermeeer, 2006
Martin-Rhee & Bialystok, 2008
Fernandez & Nielsen, 1986
Bialystok, Luk & Kwam, 2005
Chiappe, Siegel & Wade-Woolley, 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Bialystok, Barac, et al. 2013
Cobo-Lewis, Pearson, et al. 2002
Davidson & Tell, 2005
Chiappe, Siegel & Gottardo, 2002
Davidson & Tell, 2005
Cobo-Lewis, Pearson, et al. 2002
Biyalystok, Majumder & Martin, 2003
Cobo-Lewis, Pearson, et al. 2002
Lesaux, Rupp, & Siegel, 2007
A
bu-Rabia & Siegel, 2002
Cobo-Lewis, Pearson, et al. 2002
Sattler & Altes, 1984
Bialystok, 1999
Cobo-Lewis, Pearson, et al. 2002
Westman, Korkman, et al. 2008
Bialystok, Shenfield & Codd, 2000
D'Angiuli, Siegel & Serra, 2001
D'Angiuli, Siegel & Serra, 2001
-4.00 -2.00 0.00 2.00 4.00
Studies
Effect size
Favors monolingual L1
learners
Favors L2 learners
Overall mean effect size
Figure 3. Forest plot of overall average effect size for group differences in
language comprehension between second-language learners and monolingual
first-language learners (Cohen’s d, displayed by with confidence intervals
represented by horizontal lines) and effect sizes with confidence intervals for
each study (Cohen’s d, displayed by with confidence intervals represented
by horizontal lines). L1 first language; L2 second language.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
420
MELBY-LERVÅG AND LERVÅG
possible missing studies. The trim and fill method imputes
values in the funnel plot to render it symmetrical and then
calculates an estimated overall effect size.
Missing data. During coding, numerous instances of miss-
ing data became apparent. If the data that were missing were
critical for calculating the effect size for the main outcomes, we
contacted the author. However, this approach was usually un-
successful. If data were missing for moderator variables, the
study was excluded from the moderator analysis for which data
were missing but was included in all moderator analyses for
which data were provided.
Results
Characteristics for each study included in the meta-analysis
are presented in Table S1 in the online supplemental materials.
A correlation matrix of all outcomes and continuous moderator
variables is shown in Table S2 (see supplemental materials).
The group differences between the first- and second-language
samples in reading comprehension, language comprehension,
phonological awareness, and decoding with confidence inter-
vals (CI), overall and for each study, are shown in Figures 2 to
5. The results for the categorical moderator variables are pre-
sented in Tables 1 to 4. The tables show the number of studies
in each category of the moderator variable, the effect sizes with
95% CI for the subsets of studies in each category, and the
differences between the highest and lowest categories with
significance test.
Reading Comprehension
A total of 57 independent effect sizes, including 6,464 second-
language learners (mean sample size 113.40, SD 268.64,
range 17–1,876) and 33,534 monolingual first-language learners
(mean sample size 588.32, SD 2,278.98, range 11–13,436),
examined the differences in reading comprehension between the
two groups. As demonstrated in Figure 2, the overall mean effect
size was moderate in favor of the monolingual first-language
learners, d ⫽⫺0.62, 95% CI [0.71, 0.52], and significant,
z(56) ⫽⫺12.43, p .01. The effect sizes varied between d
3.47 and 0.91, and this variation was significant and large,
Q(56) 416.45, p .01, I
2
86.55. A sensitivity analysis
showed that after removal of outliers, the overall effect size was
within the range of d ⫽⫺0.64, 95% CI [0.73, 0.54] to d
0.59, 95% CI [0.68, 0.50]. As for publication bias, the funnel
plot indicated that studies to the left of the mean were missing (i.e.,
Table 2
Number of Effect Sizes, Effect Size, 95% Confidence Interval (CI), Heterogeneity Statistics, Differences in d Between Categories (With
Significance Test), and p Values for Moderators of Language Comprehension Differences Between First- and Second-Language
Learners
Moderator variable
Number of
effect sizes (k)
Effect
size (d) 95% CI
Heterogeneity
(I
2
)
Difference in d
(highest lowest
category)
Significance
test of differences between
categories (Q test)
Socioeconomic status
Low 42 1.31
ⴱⴱ
[1.55, 1.07] 95.03
ⴱⴱ
High/middle 31 0.82
ⴱⴱ
[1.08, 0.57] 86.81
ⴱⴱ
0.49 .007
ⴱⴱ
Instructional language
Both L2 and L1 30 0.95
ⴱⴱ
[.1.26, 0.65] 89.09
ⴱⴱ
L2 78 1.20
ⴱⴱ
[1.34, 1.04] 91.61
ⴱⴱ
0.25 .15
Home language
Both 36 0.76
ⴱⴱ
[0.97, 0.56] 84.41
ⴱⴱ
L1 45 1.47
ⴱⴱ
[1.69, 1.25] 91.39
ⴱⴱ
0.71
ⴱⴱ
.0001
ⴱⴱ
Language comprehension test type
Expressive vocabulary test 11 1.31
ⴱⴱ
[1.63, 0.99] 87.55
ⴱⴱ
Oral cloze 10 0.55
[0.96, 0.14] 93.56
ⴱⴱ
Receptive picture vocabulary 89 1.14
ⴱⴱ
[1.28, 1.00] 87.07
ⴱⴱ
0.76
ⴱⴱ
.01
ⴱⴱ
Language type
Indo-European L1 and L2 68 1.09
ⴱⴱ
[1.24, 0.93] 88.78
ⴱⴱ
Non-Indo-European L1/
Indo-European L2 43 1.24
ⴱⴱ
[1.44, 1.02] 91.90
ⴱⴱ
0.15 .24
Study origin
Canada 33 0.73
ⴱⴱ
[0.91, 0.55] 83.27
ⴱⴱ
United Kingdom 13 1.33
ⴱⴱ
[1.51, 1.15] 39.16
Other European countries 25 1.59
ⴱⴱ
[1.86, 1.32] 94.71
ⴱⴱ
(non-English L2)
United States 48 1.10
ⴱⴱ
[1.26 0.95] 82.75
ⴱⴱ
0.86 .0001
ⴱⴱ
Distribution
Studies with floor or ceiling
effects 35 0.95
ⴱⴱ
[1.23 0.64] 92.68
ⴱⴱ
Other studies 89 1.19
ⴱⴱ
[1.32, 1.05] 91.30
ⴱⴱ
0.24 .13
Note. d the effect size for subsets of studies belonging to different categories of the moderator variable; k number of studies; I
2
the
proportion of total variation between the effect sizes that are caused by real heterogeneity rather than by chance; L1 first language; L2 second
language.
p .05.
ⴱⴱ
p .01.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
421
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
Kovelman, Baker & Petitto, 2008
Kovelman, Baker & Petitto, 2008
San Francisco, 2003
Biyalystok, Majumder & Martin, 2003
Verhoeven, 2000
McBride-Chang, Bialystok, et al. 2004
Biyalystok, Majumder & Martin, 2003
Miller-Guron & Lundberg, 2003
Biyalystok, Majumder & Martin, 2003
Jongejan, Verhoeven, & Siegel, 2007
Segers & Verhoeven, 2005
McBride-Chang, Bialystok, et al. 2004
Wade-Woolley & Siegel, 1997
Grant, Gottardo, et al. 2011
Chen, Anderson, et al. 2004
Chiappe & Siegel, 1999
San Francisco, 2003
Hutchinson, Whiteley, et al. 2004
Chiappe, Siegel & Gottardo, 2002
McBride-Chang, Bialystok, et al. 2004
Westman, Korkman, et al. 2008
Everatt, Smythe, et al. 2000
Frederickson & Frith, 1998
Kovelman, Baker & Petitto, 2008
Chiappe, Siegel & Wade-Woolley, 2002
Kelly, Gomez-Bellenge, et al. 2008
Bialystok, Luk & Kwam, 2005
Goetry, Wade-Woolley et al. 2006
Lesaux, Rupp, & Siegel, 2007
Goetry, Wade-Woolley et al. 2006
Dodd, So & Lam, 2008
Linn, 1967
Segers & Verhoeven, 2005
Bruck & Genesee, 1995
Janssen, Bosman, et. al. 2013
Jongejan, Verhoeven, & Siegel, 2007
Jean & Geva, 2009
Jongejan, Verhoeven, & Siegel, 2007
Chen, Anderson, et al. 2004
Chen, Anderson, et al. 2004
Abu-Rabia & Siegel, 2002
Chen, Anderson, et al. 2004
Geva & Zadeh, 2006
Dodd, So & Lam, 2008
Jongejan, Verhoeven, & Siegel, 2007
Biyalystok, Majumder & Martin, 2003
Chiappe, Glaeser & Ferko, 2007
Chen, Anderson, et al. 2004
Bialystok, Luk & Kwam, 2005
Campbell & Sais, 1995
Bialystok, Luk & Kwam, 2005
-4.00
-2.00 0.00 2.00 4.00
Studies
Effect size
Favors monolingual L1
learners
Favors L2 learners
Overall mean effect size
Figure 4. Forest plot of overall average effect size for group differences in phonological awareness between second-
language learners and monolingual first-language learners (Cohen’s d, displayed by with confidence intervals represented
by horizontal lines) and effect sizes with confidence intervals for each study (Cohen’s d, displayed by with confidence
intervals represented by horizontal lines). L1 first language; L2 second language.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
422
MELBY-LERVÅG AND LERVÅG
studies in which second-language learners compared with first-
language learners had poorer reading comprehension skills than
the overall average effect size). In a trim and fill analysis, four
studies were imputed to the left of the mean, and the adjusted
overall effect size was d ⫽⫺0.67, 95% CI [0.77, 0.58].
Table 1 shows the results of the moderator analysis. As
shown in Table 1, variables related to text type in tests reliably
explained the variations between the studies, and tests using
passage reading yielded significantly higher effect sizes than
tests using sentence reading. The confidence intervals for the
two text types are nonoverlapping. Similarly, task type reliably
explained the variations between the studies, as there were no
overlapping confidence intervals for multiple choice tests and
for single open-ended question. In short, open-ended question
tests yielded a higher mean effect size than multiple choice
tests. Also, as shown in Table 1, the difference between studies
with floor or ceiling effects and studies with more normal
distributions was significant, with nonoverlapping confidence
intervals. Those studies with floor or ceiling effects in their
reading comprehension measure had attenuated sizes of the
difference between groups.
Language comprehension reliably explained variance in reading
comprehension, ␤⫽0.34, p .01, k 46, R
2
.12. However,
there were four outliers that demonstrated a large deficit in lan-
guage comprehension for second-language learners, but these out-
liers showed a moderate to large advantage in favor of the second-
language learners on reading comprehension. Presumably, these
studies are outliers because they examine children who are in the
very beginning stages of developing reading skills. After removal
of these outliers, language comprehension explained 30% of the
variation in reading comprehension, ␤⫽0.55, p .01, k 42,
R
2
.30. When age and language comprehension were analyzed
together as predictors of reading comprehension, age was signifi-
cant, ␤⫽⫺0.23, p .03, k 45. The total R
2
for the two
variables was 0.17, which means that the group differences in
reading comprehension between first- and second-language learn-
ers (in favor of the first-language learners) decreases as a function
of sample age. Notably, the influence of language comprehension
on reading comprehension increases as children get older. Decod-
ing significantly explains variations in reading comprehension
skills, ␤⫽0.49, p .01, k 41, R
2
.25. When age was
combined with decoding skills, age was not a significant predictor,
␤⫽⫺0.15, p .21, k 41. When language comprehension and
decoding skills were entered together (after removal of the four
outliers), the total R
2
for explaining variations in reading compre
-
hension skills was 0.46; for decoding, ␤⫽0.36, p .01; and for
language comprehension, ␤⫽0.52, p .01. The moderator
variable of age entered alone did not reliably explain variances in
effect sizes across studies, ␤⫽⫺0.16, p .10, k 55, R
2
.03.
Table 3
Number of Effect Sizes, Effect Size, 95% Confidence Interval (CI), Heterogeneity Statistics, Differences in d Between Categories (With
Significance Test), and p Values for Moderators of Phonological Awareness Differences Between L1 and L2 Learners
Moderator variable
Number of
effect sizes (k)
Effect
size (d) 95% CI
Heterogeneity
(I
2
)
Difference in d
(highest lowest
category)
Significance
test of differences between
categories (Q test)
Socioeconomic status
Low 11 0.26
[0.51, 0.02] 91.41
ⴱⴱ
High/middle 9 0.23
[0.46, 0.02] 62.88
ⴱⴱ
0.03 .84
Instructional language
Both L2 and L1 5 0.24 [0.28, 0.80] 94.55
ⴱⴱ
L2 34 0.01 [0.15, 0.13] 78.71
ⴱⴱ
0.25 .64
Home language
Both 8 0.26 [0.50, 0.01] 61.72
ⴱⴱ
L1 16 0.01 [0.31, 0.28] 86.80
ⴱⴱ
0.25 .22
Phonological awareness test
Rhyme/syllable test 8 0.29 [0.05, 0.63] 76.00
ⴱⴱ
Phoneme test 43 0.14
[0.25, 0.02] 82.52
ⴱⴱ
0.15 .02
Writing system L1
Alphabetic 25 0.07 [0.26, 0.12] 89.13
ⴱⴱ
Ideographic 12 0.05 [0.30, 0.19] 73.52
ⴱⴱ
0.02 .97
Orthography L2
Irregular (English L2) 36 0.09 [0.22, 0.04] 81.08
ⴱⴱ
Regular (non-English L2) 8 0.29
[0.51, 0.07] 51.50 0.20 .12
Study origin
Asia 8 0.10 [0.18, 0.38] 72.11
ⴱⴱ
Canada 23 0.00 [0.17, 0.19] 78.84
ⴱⴱ
Europe 12 0.13 [0.40, 0.14] 74.78
ⴱⴱ
United States 8 0.44
[0.85, 0.03] 88.52
ⴱⴱ
0.44 .15
Distribution
Studies with floor or
ceiling effects 5 0.31 [0.22, 0.84] 80.97
ⴱⴱ
Other studies 46 0.09 [0.22, 0.04] 90.73
ⴱⴱ
0.40 .12
Note. d the effect size for subsets of studies belonging to different categories of the moderator variable; k number of studies; I
2
the proportion
of total variation between the effect sizes that are caused by real heterogeneity rather than by chance; L1 first language; L2 second language.
p .05.
ⴱⴱ
p .01.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
423
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
As for methodological quality, no significant impact was de-
tected related to publication year, ␤⫽⫺0.08, p .55, k 57,
R
2
.01.
Language Comprehension
A total of 124 independent effect sizes consisting of 7,973
second-language learners (mean sample size 64.29, SD
181.19, range 8–1,876) and 23,345 monolingual first-language
learners (mean sample size 188.27, SD 1,210.29, range
10–13,436) examined the differences in language comprehension
between the two groups. As shown in Figure 3, the overall mean
effect size was large in favor of first-language learners, d ⫽⫺1.12,
95% CI [1.24, 1.00], and significant, z(123) ⫽⫺18.12, p
.01. The effect sizes varied between d ⫽⫺3.20 and 0.85, and this
variation was significant and large, Q(123) 1,609.93, p .01,
I
2
92.36. A sensitivity analysis shows that after removing
outliers, the overall effect size was within the range of d ⫽⫺1.14,
95% CI [1.26, 1.02] to d ⫽⫺1.10, 95% CI [1.22, 0.99].
As for publication bias, the funnel plot indicated that studies on the
left side of the mean were missing (i.e., studies in which second-
language learners, compared with first-language learners, had lan-
guage comprehension skills poorer than the overall average effect
size). In a trim and fill analysis, 21 studies were imputed on the left
side of the mean, and the adjusted overall effect size was d
1.31, 95% CI [1.45, 1.19].
Table 2 shows the results for the categorical moderator vari-
ables. As shown in Table 2, socioeconomic status was a significant
moderator variable and children from low socioeconomic back-
grounds tend to have poorer language comprehension skills than
do their monolingual first-language peers and children from mid-
dle and high socioeconomic backgrounds. The home language was
also found to be a significant moderator variable, and second-
language learners who used the first language exclusively at home
tend to have poorer language comprehension skills than second-
language learners who used both first language and second lan-
guage at home. Also, the differences between language compre-
hension test type was significant, but here all the confidence
intervals were overlapping. Finally, study origin was a significant
variable, and pairwise comparisons showed that studies conducted
in Canada tended to both demonstrate smaller group differences
than did the U.S. studies, Q(1) 9.40, p .01; European studies
from non-English countries, Q(1) 26.33, p .0001; and studies
from the United Kingdom, Q(1) 21.02, p .0001. European
studies from non-English countries also showed significantly
higher mean effect sizes than the U.S. studies, Q(1) 9.14, p
.002. Studies from the United States showed smaller group differ-
ences than studies from the United Kingdom, but this difference
did not reach significance, Q(1) 9.14, p .06.
In addition, we conducted a meta-regression to examine whether
group differences are affected by the age of the sample. It was
determined that age had no significant impact on effect size, ␤⫽
0.14, p .15, k 124, R
2
.02. Thus, the group differences in
language comprehension between monolingual first-language
learners and second-language learners were stable across the age
range (3.5 to 15.5 years). Similarly, nonverbal IQ did not reliably
explain variations between studies, ␤⫽0.41, p .12, k 13,
R
2
.17. As for methodological quality, no significant impact was
detected according to publication year, ␤⫽0.06, p .53, k
127, R
2
.00.
Phonological Awareness
A total of 51 independent effect sizes comprising 7,053 second-
language learners (mean sample size 138.29, SD 618.24,
range 7–4,494) and 77,987 monolingual first-language learners
(mean sample size 1,529.16, SD 10,071.15, range
7–72,716) examined differences in phonological awareness be-
tween the two groups. As displayed in Figure 4, the overall mean
effect size was small, d ⫽⫺0.08, 95% CI [0.18, 0.03], and was
not significant, z(50) ⫽⫺1.32, p .19. The effect sizes varied
between d ⫽⫺1.57 and 1.73, and this variation is significant and
large, Q(50) 283.75, p .01, I
2
82.37. A sensitivity analysis
shows that after removal of outliers, the overall effect size was
within the range of d ⫽⫺0.11, 95% CI [0.21, 0.01] to d
0.05, 95% CI [0.16, 0.05]. As for publication bias, the funnel
plot indicated that studies on the right side of the mean were
missing (i.e., studies in which second-language learners compared
with first-language learners had better phonological awareness
skills than the overall average effect size). In a trim and fill
analysis, 11 studies were imputed on the right side of the mean,
and the adjusted overall effect size was d 0.11, 95% CI [0.02,
0.21].
Table 3 shows the analysis of the moderator variables. As seen
in Table 3, the only moderator variable that showed a reliable
difference between categories on the Q test was phonological
awareness test type, but here the confidence intervals were over-
lapping. The impact of age on effect size was analyzed (age
range 5.0 to 15.0), and the results showed that this impact could
not reliably explain variations in study outcomes, ␤⫽⫺0.13, p
.22, k 50, R
2
.02. As for methodological quality, no signifi
-
cant impact was detected due to publication year, ␤⫽⫺0.32, p
.12, k 51, R
2
.10.
Decoding
A total of 79 independent effect sizes comprised of 8,452
second-language learners (mean sample size 106.99, SD
501.28, range 4 4,494) and 79,860 monolingual first-language
learners (mean sample size 1,010.89, SD 8,122.2, range
7–72,716) examined differences in decoding between the two
groups. As demonstrated in Figure 5, the overall mean effect size
was small in favor of first-language learners, d ⫽⫺0.12, 95% CI
[0.22, 0.02], but significant, z(78) ⫽⫺2.40, p .02. The
effect sizes varied between d ⫽⫺2.85 and 3.38, and this variation
was significant and large, Q(78) 587.63, p .01, I
2
86.73. A
sensitivity analysis shows that after removing outliers, the overall
effect size was within the range of d ⫽⫺0.15, 95% CI [0.24,
0.06] and d ⫽⫺0.09, 95% CI [0.19, 0.004]. As for publication
bias, the funnel plot indicated that studies on the right side of the
mean were missing (i.e., studies in which second-language learners
compared with first-language learners had better decoding skills
than the overall average effect size). In a trim and fill analysis, 27
studies were imputed to the right of the mean, and the adjusted
overall effect size was d 0.14, 95% CI [0.04, 0.25].
Table 4 shows the results for the analysis of moderator vari-
ables. As apparent from Table 4, the origin of the study was a
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
424
MELBY-LERVÅG AND LERVÅG
significant moderator variable: Studies conducted in Canada reli-
ably showed group differences in favor of second-language learn-
ers, whereas the same was not true for either the U.S. or the
European studies. Pairwise comparisons showed that there was a
significant difference between Canadian studies and U.S. studies,
Q(1) 17.01, p .001, and between Canadian and European
studies, Q(1) 4.76, p .03. The European studies also demon-
strated significantly smaller group differences in favor of first-
language learners than did the U.S. studies, Q(1) 14.44, p
.001. There was no overlap between the confidence intervals. The
difference between low and middle/high socioeconomic back-
ground was also significant, but here the confidence intervals were
overlapping. The impact of age on effect size was analyzed (age
range 5.0 to 15.0 years), and the results showed that this could
not explain variations in study outcomes, ␤⫽0.003, p .97, k
75, R
2
.00. As for methodological quality, no significant impact
was detected based on the publication year, ␤⫽⫺0.01, p .90,
k 79, R
2
.00.
Discussion
Our empirical review of reading comprehension skills and
their underlying components revealed a number of critical find-
ings concerning the differences and factors that moderate the
differences between first- and second-language learners. First,
first-language learners demonstrated moderately better reading
comprehension skills than did second-language learners. Both
language comprehension and decoding skills moderated this
relationship. Good language and decoding skills were associ-
ated with good reading comprehension skills, and the impact of
language comprehension on reading comprehension increased
with age. Furthermore, the differences between first- and
second-language learners in terms of reading comprehension
were also moderated by test-specific characteristics: For
second-language learners, answering single open-ended ques-
tions is more difficult than answering multiple choice or cloze
questions. Moreover, answering questions from passages was
more difficult than answering questions from single sentences
for second-language learners. Finally, floor and ceiling effects
on the reading comprehension tests were associated with atten-
uated effect sizes of the differences between first- and the
second-language learners.
Regarding the components of reading comprehension skills,
first-language learners had better oral language comprehension
skills than second-language learners. This difference was greater
for children from low than from middle or high SES families. The
difference was also greater for children who spoke only their first
language at home. Finally, there were less-pronounced differences
between first- and second-language learners in Canada than in
other Western countries.
Overall, second-language learners were slightly worse on de-
coding skills than were second-language learners. However,
second-language learners had better decoding skills than first-
language learners in Canada, although they had poorer decoding
skills than first-language learners in the USA. There were no
reliable differences between the first- and second-language learn-
ers on phonological awareness tasks.
McBride-Chang, Bialystok, et al. 2004
de Ramirez & Shapiro, 2006
Kovelman, Baker & Petitto, 2008
Kovelman, Baker & Petitto, 2008
de Ramirez & Shapiro, 2006
de Ramirez & Shapiro, 2006
Cobo-Lewis, Pearson, et al. 2002
de Ramirez & Shapiro, 2006
Cobo-Lewis, Pearson, et al. 2002
Linn, 1967
de Ramirez & Shapiro, 2006
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Beech & Keys, 1997
Frederickson & Frith, 1998
Proctor, Uccelli et al. 2013
Goetry, Wade-Woolley et al. 2006
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Grant, Gottardo, et al. 2011
Cobo-Lewis, Pearson, et al. 2002
McBride-Chang, Bialystok, et al. 2004
Bialystok, McBride-Chang & Luk, 2005
Kovelman, Baker & Petitto, 2008
Biyalystok, Majumder & Martin, 2003
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Wade-Woolley & Siegel, 1997
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Kelly, Gomez-Bellenge, et al. 2008
Droop & Verhoeven, 2003
Cobo-Lewis, Pearson, et al. 2002
Da Fontoura & Siegel, 1995
Miller-Guron & Lundberg, 2003
Verhoeven & Vermeeer, 2006
Abu-Rabia & Siegel, 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Goetry, Wade-Woolley et al. 2006
Jean & Geva, 2009
McBride-Chang, Bialystok, et al. 2004
Droop & Verhoeven, 2003
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
Chiappe, Siegel & Gottardo, 2002
Bialystok, McBride-Chang & Luk, 2005
Bialystok, Luk & Kwam, 2005
Verhoeven & Vermeeer, 2006
Lesaux, Rupp, & Siegel, 2007
Cobo-Lewis, Pearson, et al. 2002
Chiappe & Siegel, 1999
Lervåg & Aukrust, 2010
Hutchinson, Whiteley, et al. 2004
Chiappe, Siegel & Wade-Woolley, 2002
Jongejan, Verhoeven, & Siegel, 2007
Hutchinson, Whiteley, et al. 2003
Cobo-Lewis, Pearson, et al. 2002
Verhoeven, 2000
Janssen, Bosman, et. al. 2013
Cobo-Lewis, Pearson, et al. 2002
Jongejan, Verhoeven, & Siegel, 2007
Geva & Zadeh, 2006
Ibrahim, Eviatar & Aharon-Peretz, 2007
Biyalystok, Majumder & Martin, 2003
Jongejan, Verhoeven, & Siegel, 2007
Everatt, Smythe, et al. 2000
Cobo-Lewis, Pearson, et al. 2002
Cobo-Lewis, Pearson, et al. 2002
O'Toole, Aubeeluck, et al. 2001
Jongejan, Verhoeven, & Siegel, 2007
Bialystok, Luk & Kwam, 2005
Chiappe, Glaeser & Ferko, 2007
Bialystok, Luk & Kwam, 2005
D'Angiuli, Siegel & Serra, 2001
D'Angiuli, Siegel & Serra, 2001
-4.00 -2.00 0.00 2.00 4.00
Studies
Effect size
Favors monolingual L1
learners
Favors L2 learners
Overall mean effect size
Figure 5. Forest plot of overall average effect size for group differences
in decoding between second-language learners and monolingual first-
language learners (Cohen’s d, displayed by with confidence intervals
represented by horizontal lines) and effect sizes with confidence intervals for
each study (Cohen’s d, displayed by with confidence intervals repre-
sented by horizontal lines). L1 first language; L2 second language.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
425
READING COMPREHENSION IN SECOND-LANGUAGE LEARNERS
Reading Comprehension
In accordance with our hypothesis, both language comprehen-
sion and decoding skills moderated the degree to which the reading
comprehension skills of the second-language learner samples
lagged behind those of first-language-learner samples. Samples
with small differences between first- and second-language learners
in language comprehension and decoding skills showed compara-
ble small differences between reading comprehension skills. This
finding is in accordance both with single studies and with the
simple view of reading that sees reading comprehension as the
product of language comprehension and decoding skills. Unex-
pectedly, SES did not moderate the differences between first- and
second-language learners’ reading comprehension skills. At first
glance, this finding is in disagreement with Cummins’ notion that
children of high SES are more likely to be exposed to a rich
language that is less dependent on the here-and-now context,
which in turn should facilitate the transference of skills to the
second language and lead to better second-language skill acquisi-
tion (Cummins, 1979). Such facilitation could then be expected to
decrease the gap between first- and second-language learners. Still,
SES does moderate one of the underlying components of reading
comprehension skills (language comprehension): The gap between
first- and second-language learners was smaller for children of
middle and high SES. Given this finding, the inability of SES to
moderate reading comprehension differences between first- and
second-language learners becomes even more surprising. How-
ever, a closer look at the 38 independent group comparisons of
reading comprehension and SES skills reveal that 24 of the com-
parisons emanate from the study by Cobo-Lewis, Pearson, Eilers,
and Umbel (2002). In this study, all of the children came from the
Spanish-speaking population in Florida, and one may question
whether there are features specific to this study that led to this
unexpected result. It is not entirely clear how the children in
Cobo-Lewis et al. (2002) are separated into SES categories, but
when the mother’s language skills are examined, it appears that
there were small differences between children of low and middle/
high SES. Such small differences may explain why SES does not
seem to affect reading comprehension. Notably, the reading com-
prehension test used in Cobo-Lewis et al. is an oral cloze test based
on sentence reading. As noted, this type of reading comprehension
test tends to be more sensitive to decoding skills than to language
comprehension (Keenan et al., 2008). The test type may also
explain why the differences between the effect sizes of SES on
reading comprehension were small for the two groups. Without the
samples from the study by Cobo-Lewis et al., the number of
samples that reported SES is too few to conduct a moderator
analysis. Thus, more research on how SES relates to second-
language learner reading comprehension is warranted. In future
studies, one should also measure exposure to decontextualized
language or, preferably, decontextualized language skills to deter-
Table 4
Number of Effect Sizes, Effect Size, 95% Confidence Interval (CI), Heterogeneity Statistics, Differences in d Between Categories (With
Significance Test), and p Values for Moderators of Decoding Differences Between First- and Second-Language Learners
Moderator variable
Number of
effect sizes (k)
Effect
size (d) 95% CI
Heterogeneity
(I
2
)
Difference in d
(highest lowest
category)
Significance test
of differences between
categories (Q test)
Socioeconomic status
Low 32 0.30
ⴱⴱ
[0.43, 0.19] 78.84
ⴱⴱ
High/middle 19 0.14 [0.26, 0.55] 93.69
ⴱⴱ
0.44 .04
Instructional language
Both L2 and L1 22 0.15 [0.65, 0.35] 94.99
ⴱⴱ
L2 47 0.06 [0.13, 0.02] 40.37
ⴱⴱ
.09 .72
Home language
Both 21 0.18 [0.19, 0.54] 92.87
ⴱⴱ
L1 32 <