Technical ReportPDF Available

Technical report on change in language abilities of students enrolled in the McMaster English Language Development (MELD) program

Authors:

Abstract and Figures

Using a large longitudinal database, this report investigated gains in English language skills in a sample of 340 adult second language learners. All participants were enrolled in the McMaster English Language Development (MELD) program, a university-level English bridging program at McMaster University (Ontario, Canada). Two cohorts of students were administered a battery of English language skill tests at the beginning and end of the 8-month program. The test battery included assessments of passage reading comprehension, vocabulary knowledge and phonological processing. Eye-movements were recorded during the reading comprehension assessment. The study found across-the-board gains in phonological processing, vocabulary depth, vocabulary breadth, reading fluency and reading comprehension. In particular, the magnitude of increases were substantial for phonological awareness and phonological memory. Furthermore, incoming vocabulary knowledge and phonological awareness each uniquely contributed to reading comprehension and reading fluency outcomes at the end of the program. Results have implications for English language instruction in university-level bridging programs. The report recommends assessing phonological awareness early in the program in order to identify students who may fall behind in developing vocabulary knowledge and reading proficiency.
Content may be subject to copyright.
Technical report on change in language abilities of students
enrolled in the McMaster English Language Development
(MELD) program
Daniel Schmidtke
McMaster University, Canada
Sadaf Rahmanian
McMaster University, Canada
Anna Moro
McMaster University, Canada
Abstract
Using a large longitudinal database, this report investigated gains in English language skills in a
sample of 340 adult second language learners. All participants were enrolled in the McMaster
English Language Development (MELD) program, a university-level English bridging program at
McMaster University (Ontario, Canada). Two cohorts of students were administered a battery of
English language skill tests at the beginning and end of the 8-month program. The test battery
included assessments of passage reading comprehension, vocabulary knowledge and phonolog-
ical processing. Eye-movements were recorded during the reading comprehension assessment.
The study found across-the-board gains in phonological processing, vocabulary depth, vocabulary
breadth, reading fluency and reading comprehension. In particular, the magnitude of increases
were substantial for phonological awareness and phonological memory. Furthermore, incoming
vocabulary knowledge and phonological awareness each uniquely contributed to reading compre-
hension and reading fluency outcomes at the end of the program. Results have implications for
English language instruction in university-level bridging programs. The report recommends as-
sessing phonological awareness early in the program in order to identify students who may fall
behind in developing vocabulary knowledge and reading proficiency.
Keywords: ESL instruction, language learning, reading development, eye movements, phono-
logical awareness, individual differences
Corresponding author:
Daniel Schmidtke
Department of Linguistics and Languages
Centre for Advanced Research in Experimental & Applied Linguistics (ARiEL)
McMaster English Language Development Program
L.R. Wilson Hall
McMaster University
1280 Main Street West
Hamilton, Ontario
Canada L8S 4M2
Email: schmiddf@mcmaster.ca
1
Acknowledgements
For data collection, we are thankful to Semona Basin, Joyce Bauman, Laura Beaudin, Kyla Belis-
ario, Cassandra Chapman, Sebastian Chong, Melda Coşkun, Abhijeet Dhey, Vera Filippava, Nabil
Hawwa, Meliha Horzum, Constance Imbault, Victoria Ivankovic, Alyssa Janes, Katya Kachu-
rina, Daniel Lamanna, Frank Luo, Richard Mah, Kamila Marchwica, Gajuna Mathiyalagan, Anya
Minchenkova, Yasmin Mohamed, Kelly Nisbet, Sadaf Rahmanian, Fareeha Rana, Bryor Snef-
jella, Laura Street, Chantelle Wark and Melek Yilmaz. For logistical and technical support, we
are grateful for the help of Pierre Auguste-Ahrens. We are indebted to Victor Kuperman for
providing infrastructure support.
2
Executive summary
This longitudinal study tracked the development of English language skills of students enrolled in
the McMaster English Language Development (MELD) program. A battery of English language
tests were administered at the beginning and the end of the program for students enrolled in the
program’s 2017-18 and 2018-19 academic sessions. This battery included tests that measured
phonological processing, vocabulary knowledge, reading fluency, and reading comprehension in
English.
The primary focus of this study is to assess the extent of gains in each of the above skills.
This will provide insight into how much change in these component English language skills can
be expected in an intensive eight-month English language instruction program. The secondary
aim is to identify early predictors of growth in language skills. This knowledge will help educa-
tional practitioners to understand which factors affect English language development and allocate
targeted language support to students as soon as possible.
1. Skill change at the cohort level
When examining gains at the level of the cohort, consistent changes were found in phonological
awareness, phonological memory, and receptive vocabulary knowledge. Across both cohorts, effect
size estimates of developmental shifts in phonological awareness and phonological memory were
in the medium to large range, while effect size estimates of changes in receptive vocabulary depth
were in the small to medium range.
Furthermore, an assessment of vocabulary size breadth estimated that at the beginning of
the program the average MELD student already knows enough word families in order to study as
an undergraduate student at university. Average vocabulary breadth also exhibited longitudinal
gains, albeit with a small effect size.
2. Individual change in language skills
Measures of within-student change showed that there was large variability in meaningful change
across students within each cohort. Most students demonstrated statistically reliable gains in
measures of phonological memory, phonological awareness and vocabulary knowledge. Changes
in expressive vocabulary exhibited relatively high proportions of deterioration which is a result
of gross differences in test reliability across different versions of the test. Furthermore, by the
end of the MELD program, very few students fell within the expected age-matched range of the
native-speaker population for receptive and expressive vocabulary.
3. Precursors of growth in language skills
For all assessed skills, students entering with weaker ability in a skill also tended to make the most
gains in that same skill. For instance, a student entering the MELD program with a relatively
small vocabulary size will be most likely to undergo the greatest gains in vocabulary size.
Notably, phonological awareness at the beginning of the MELD program was a strong pre-
dictor of positive growth in phonological memory and vocabulary size. Thus, consistent with the
“Matthew effect” (Stanovich, 1986), students with phonological awareness skills in place at the
beginning of MELD have an advantage to acquire more vocabulary and develop their phonological
memory, yet students who enter with relatively weak phonological awareness will struggle to gain
as much as their peers in these skills.
4. Reading development
Both passage reading comprehension and reading speed (as gauged via eye-movement behaviour)
increased significantly through the MELD program. On average, faster readers in the MELD
program tend to comprehend text more poorly than their slower counterparts. This may be a
3
consequence of “goal-driven” reading. The reading task required participants to answer compre-
hension questions subsequent to the completion of passage reading, therefore students who read
slower tended to adopt a more careful reading strategy.
Lastly, both phonological awareness and vocabulary knowledge are early indicators of post-
MELD reading comprehension and speed. That is, students entering MELD with greater phono-
logical awareness ability and vocabulary knowledge were also better readers by the end of the
program.
Summary and recommendations
Our findings have significant pedagogical implications for the MELD program. What arises from
the results of this report is that phonological processing, and in particular, phonological awareness,
is a foundational skill for growth in the tested language skills of students in the MELD program.
Failure to identify students with weak phonological awareness skills at the outset of the MELD
program could result in these students (i) falling behind their peers in expanding their vocabularies,
and (ii) ending up with poorer text comprehension and reading fluency by the end of the bridging
program. Thus, future iterations of the MELD program should consider ways to quickly assess
phonological awareness at the beginning of the program, and use this data to identify those
students who may need additional targeted support. This additional support may be delivered
through extra-curricular speech and language clinics which are already implemented in the MELD
program.
In addition, this report shows that although gains are made in deepening English vocabulary
knowledge (particularly receptive vocabulary), an overwhelming majority of MELD students fail
to move to within 2 standard deviations of the vocabulary knowledge depth of the average native
English speaker by the end of the program. This observation contrasts with the results of the
Vocabulary Size Test, which show that, in terms of vocabulary breadth, MELD students know
enough word families to be able to competently study at the undergraduate level at university. As
a result, we recommend intensive instruction which strengthens depth of vocabulary knowledge,
such as providing multiple contexts in which a word can be used, or through activities which force
students to process meanings of words in deep and thoughtful ways.
Outlook
Phonological awareness is shown here as an important foundational skill for reading comprehen-
sion and reading fluency development. This finding conforms with the idea that decoding skill,
the ability to convert graphemes to phonemes during reading, is an essential precondition for
recognizing meaning in printed text. Though phonological awareness does not directly tap into
decoding skill, it is well known that both phonological awareness and decoding are highly related
skill sets. Thus, it could be argued that in order to directly address the role of decoding in reading
development, measures of English word and non-word decoding would be a necessary inclusion
to the battery of assessments. Furthermore, as a measure of orthographic word knowledge, we
incorporate a spelling assessment in the tests battery. With the inclusion of these assessments,
we hope to build a profile of a MELD student’s ability to link phonemes onto graphemes.
4
Introduction
Proficiency in the language of instruction is essential for international students who use English
as a foreign language (henceforth referred to as EFL students). Since the academic success of
language minority students appears to be critically dependent upon L2 literacy skills (e.g., Daller
& Phelan, 2013; Masrai & Milton, 2017; Trenkic & Warmington, 2019; Yixin & Daller, 2014), it
is valuable to be able to identify the factors which contribute to the development of literacy skills
in this population. The present report addresses this issue by means of a longitudinal study of
students enrolled in an English bridging program. The bridging program is designed to enhance
the English language proficiency of EFL students in preparation for undergraduate university
education. We conducted a longitudinal study in two cohorts of this program with the aim of
examining the extent of change and identifying predictors of change in a number of key English
language skills.
The study population comprises students enrolled in the McMaster English Language Devel-
opment Diploma (MELD) program at McMaster University (Hamilton, Ontario, Canada). The
MELD program is an eight-month full-time intensive program of English language instruction.
The program provides two semesters of English instruction designed for international students
who meet the academic requirements for an undergraduate program but whose overall scores on
the Academic version of the International English Language Testing System (IELTS) assessment
do not meet the university’s English language proficiency threshold1.
The research questions driving this report are summarized as follows:
1. How much change in English language proficiency is possible in the MELD program?
2. What are the incoming English language skills that facilitate gains in aspects of English
language proficiency?
There are multiple facets of language skills that comprise L2 proficiency. In this study, we
focus on skills which have been shown to predict reading proficiency: phonological awareness
(the ability to manipulate, isolate and synthesize the sound patterns of a language), phonological
memory (the ability to hold onto speech-based information in short-term memory), rapid symbolic
naming (expertise in quickly identifying and naming a series of common stimuli), vocabulary depth
(both productive and receptive knowledge) and vocabulary breadth. Reading comprehension was
assessed by a passage comprehension test, in which students read a series of stories silently and
answered comprehension questions after each passage. Students’ eye-movements were recorded
during the passage reading. Eye-movement measures provide information about the cognitive
processes which underpin reading comprehension, and thus supplement reading comprehension
scores.
The report is structured as follows. In Section 1we quantify the extent of change in compo-
nent language skills of reading proficiency aggregated at the cohort level of MELD. This section
evaluates the average amount of gains in various sub-components of phonological processing and
vocabulary knowledge. In Section 2, we bring into focus within-student change in component lan-
guage skills. This section presents measures which are geared toward determining the magnitude
of change for each individual MELD student. The purpose of the MELD program is to enhance
the English language proficiency of EFL students; therefore, it is of the most crucial importance
to ascertain the early factors affecting this progression and whether the available data can give
insights into the factors that can assist successful student progression. Section 3addresses this
issue by identifying which incoming skills affect gains in phonological processing and vocabulary
knowledge during the course of the MELD program. Finally, Section 4examines reading compre-
hension and fluency. Here we first probe the degree of change in passage comprehension ability
and reading speed, and the relationship between both measures. In a subsequent analysis, we
1The threshold is 6.5 and prospective students must obtain a minimum overall IELTS score of 5 to be admitted
to MELD. IELTS is one of the officially recognized English language proficiency qualifications for Canadian Higher
Education institutions. It is assessed on a 9-band scale, ranging from non-user (band score 1) to expert (band score 9),
with a band score of 6 equivalent to a competent user and a band score of 7 equivalent to a good user.
5
identify the contributions of vocabulary knowledge and phonological awareness at the beginning
of MELD on reading comprehension and fluency upon completing MELD.
Participants
We tested two cohorts of MELD students: the 2017-18 cohort and the 2018-19 cohort. Data was
collected from a total of 70 students from the 2017-18 MELD cohort (34 students were female
and 36 students were male). Data was collected from a total of 280 students from the 2018-19
MELD cohort (144 students were female, 135 students were male, and the sex of one participant
was unaccounted for). The mean overall IELTS score aggregated over both student cohorts was
5.85 (SD = 0.76, range = 5-6). A breakdown of incoming IELTS scores is provided in Figure 1.
Across both cohorts, 99% of students were native speakers of Mandarin or Cantonese. Average
age at the beginning of testing was 19 years (SD = 0.76, range = 17-21). All participants had
normal or corrected-to-normal vision, and none had a diagnosed reading or learning disability.
Data collection
Students were administered a battery of tests that assessed their phonological processing, vocab-
ulary knowledge, and reading comprehension. We collected participants’ online eye-movements
during silent reading for the reading comprehension component of the test battery (see Appendix
A for a full descriptive overview of tests and testing procedure). The full battery of tests, in-
cluding eye-tracking, was administered at the beginning of the program (time 1: t1); the same
battery was repeated 7 months later at the end of the program (time 2: t2). Participants were
administered each test individually. For the 2017-18 cohort all tests were administered within the
same testing session at both t1and t2. The testing session lasted approximately 2 hours (read-
ing comprehension/eye-tracking, 60 minutes; phonological processing and vocabulary knowledge
tests, 60 minutes). For the 2018-19 cohort, tests at each time-point were split across two separate
45 minute sessions (session one: reading comprehension/eye-tracking and the expressive vocab-
ulary test; session two: phonological processing and receptive vocabulary tests). Participants
received course credit for their participation for each test session at both testing time-points. The
data collection procedure for the 2017-18 MELD student cohort was approved by the McMaster
University Research Ethics Board (protocol 2011-165). Data collection and analysis procedure
for the 2018-19 MELD student cohort was approved by the McMaster University Research Ethics
Board (protocol 2019-0239).
6
● ●
Listening
Reading
Writing
Speaking
Overall
3 44.5 5 5.5 66.5 7 7.5 88.5 3 44.55 5.5 6 6.57 7.5 8 8.5 3 4 4.5 55.5 6 6.5 77.5 8 8.5 3 4 4.5 5 5.56 6.5 7 7.58 8.5 3 44.5 5 5.5 66.5 7 7.5 88.5
0
10
20
30
40
IELTS score
Proportion of students (%)
Cohort
MELD: 17−18 MELD: 18−19
Figure 1: Distribution of incoming IELTS scores broken down by cohort.
1 Assessment of language skill change at the cohort level
1.1 Standardized language assessments
When examining changes on standardized behavioural assessments (i.e., response to English lan-
guage instruction), a series of paired samples one-tailed t-tests revealed effects of time-point on
all language assessments except expressive vocabulary knowledge (EVT) in both student cohorts
and rapid symbolic naming (CTOPP Rapid Symbolic Naming Composite) in the 2017-18 stu-
dent cohort. The t-tests revealed that phonological awareness (CTOPP Phonological Awareness
Composite), phonological memory (CTOPP Phonological Memory Composite), rapid symbolic
naming (2018-19 cohort only), and receptive vocabulary (PPVT Receptive Vocabulary Score)
scores increased by the end of MELD language instruction (all p< 0.001 after the application of
the Bonferroni adjustment for multiple comparisons). Figure 2visualizes these effects: an effect is
significant if the error bar representing the 95% confidence interval falls outside of the no change
(dashed red) line.
A comparison of standard effect size estimates across skills (see Cohen’s din Table 1) indicated
that the greatest significant gain was observed for phonological awareness in the 2018-19 cohort
(d= 1.11). Based on convention, d> .8 qualifies as a large effect size (Sawilowsky, 2009). Next,
significant changes in phonological memory fell within the medium effect size range (2017-18
cohort: d= 0.67; 2018-19 cohort: d= 0.56). Finally, the smallest significant shifts were found
for receptive vocabulary knowledge (2017-18 cohort: d= 0.4; 2018-19 cohort: d= 0.3) and
rapid symbolic naming (2018-19 cohort: d= 0.32).
7
MELD: 2017−2018
MELD: 2018−2019
Phonological Processing
Vocabulary
−5 0 5 10 15 20 −5 0 5 10 15 20
Phonological awareness
Phonological memory
Rapid symbolic naming
Expressive vocabulary
Receptive vocabulary
Mean change between pre− and post−language instruction
Figure 2: Pre-to-post changes in standard scores on language tests by cohort. Positive scores indicate a
score increase, while negative scores indicate a score decrease. Error bars represent the 95% confidence
interval for the mean difference.
Table 1: List of standardized language assessments, observed standard score means and standard
deviations at pre- and post-language instruction broken down by cohort. Cohen’s deffect sizes are
provided for pre- and post-language instruction comparison of means, along with pairwise Pearson rank
correlation coefficients for the relationship between t1and t2scores. * indicates the presence of a
significant effect at p<0.05 after Bonferroni correction for multiple comparisons.
Measure Cohort
t1
M(SD)
t2
M(SD)d r
Rapid symbolic naminga2017-18 89.34 (15.16) 92.16 (13.66) 0.2 0.72*
2018-19 89.67 (12.39) 93.63 (12.51) 0.32* 0.68*
Phonological memorya2017-18 88.03 (13.05) 97.76 (15.84) 0.67* 0.53*
2018-19 81.99 (11.06) 88.16 (10.81) 0.56* 0.62*
Phonological awarenessa2017-18 85.16 (13.98) 92.3 (12.76) 0.53* 0.44*
2018-19 72.17 (12.04) 85.02 (11.12) 1.11* 0.69*
Receptive vocabularyb2017-18 34.2 (12.41) 40.51 (18.3) 0.4* 0.69*
2018-19 34.15 (11.46) 37.72 (12.07) 0.3* 0.63*
Expressive vocabularyc2017-18 40.86 (14.06) 43.53 (15.18) 0.18 0.35
2018-19 40.87 (11.01) 42.28 (12.41) 0.12 0.26*
n= 70 (2017-18); n= 280 (2018-19)
aCTOPP scaled composite score reported.
bPPVT scaled score reported.
cEVT scaled score reported.
1.2 Vocabulary Size Test
A paired-samples t-test was conducted to compare receptive vocabulary size breadth as measured
by the Vocabulary Size Test (VST, Beglar & Nation, 2007) at t1compared to t2. This test
was introduced to the test battery in the 2018-19 cohort, and data was collected as a classroom
exercise. As a result of the VST assessment being administered as a controlled classroom exercise,
there was less student attrition compared to other tests, which provided a complete set of t1and
t2responses for 318 students. As would be expected, the VST correlates more strongly with
8
receptive vocabulary (r= .44) than with the expressive vocabulary (r= .31).
Vocabulary size estimates were significantly greater at t2(M= 7775, SD = 1393) compared
to at t1(M= 7592, SD = 1207.03); t(313) = 2.23, p= 0.03. This indicates that, on average,
MELD students had a larger receptive vocabulary size at the end of the program than they did
when they began (= 183). The standardized effect size estimate, d= 0.14, indicates that the
shift in vocabulary size was small. Figure 3displays the distribution of vocabulary size test scores
for both time-points.
The VST estimates the number of word families an individual knows. Previous research
indicates that approximately 5,000 to 6,000 word families are necessary for undergraduate non-
native speakers of non-European backgrounds to cope with study at an English-speaking university
(Beglar & Nation, 2007). Our results show that the average MELD student is already above this
threshold upon entry into the MELD program, and that this skill improves by the end of the
MELD program.
Figure 3: Histogram of vocabulary size test scores broken down by pre- and post-MELD instruction.
2 Assessment of individual change in language skills
While the MELD student population may show a change in their mean performance in a language
ability (see section 1.1), mean change does not provide any information about the variability of
response to language instruction within the MELD student population. In other words, there may
have been a significant overall change in a language skill at the cohort-level (see Figure 2and
Table 1), but at the individual level, some students made larger gains than others. Indeed, some
students may even exhibit skill deterioration post-MELD (either through actual skill decline or
through low motivation at t2).
Pairwise correlations between t1and t2can be used to estimate variability in the rate of
change between pre- and post-MELD instruction testing, with weaker correlations signifying high
variability in the rate of individual change. The Pearson correlation coefficients (r) presented
in Table 1were all positive, indicating that on average, better performance in a skill at t1was
associated with better performance in that skill at t2. However, the magnitudes of the correlation
coefficients in Table 1indicate that, in general, the rate of change was variable among each
cohort. Furthermore, correlation coefficients vary across tests, showing that the rate of change
9
was more variable for some tests compared to others. In particular, the rate of individual change
was most variable for the test of Expressive Vocabulary, as reflected by the small to medium
correlation coefficients for both 2017-18 and 2018-19 MELD cohorts.
2.1 Clinical significance tests
To assess individual change in language skill from the beginning to the end of MELD we computed
two standard measures of clinical significance (Jacobson & Truax, 1991) for each student. These
measures are geared to evaluate treatment efficacy in clinical contexts and define post-intervention
change in an individual as reliable if either of the following two criteria are satisfied:
(a) the magnitude of the individual’s change after intervention is not due to mere chance
(b) the individual has moved to be within the range of the normative population
First, the Reliable Change Index (RCI) is used to assess criterion (a). The RCI is defined as
the difference between an individual’s pre- and post-language instruction scores, divided by the
standard error of the pre-post difference. An RCI exceeding ±1.96 would be unlikely to occur (p
< .05) without an individual undergoing actual change. Therefore, a student with an RCI score
exceeding ±1.96 is considered to have undergone reliable change.
The second measure is the clinical significance cut-off criterion which determines whether
criterion (b) has been satisfied. The cut-off is defined as the test value that is 2 standard deviations
below the age-matched native-speaker population mean. Therefore, if a MELD student reaches
a score greater than this cut-off at t2, the student is assumed to be within the native-speaker
population range after language instruction. Both clinical significance diagnostics were computed
for the skill assessments for which an age-equivalent standard score is provided in the test kit
(i.e., all except the VST).
2.2 How to interpret clinical change plots
The scatter plots in Figure 4visualize the clinical significance results for Rapid Symbolic Naming,
Phonological Memory, Phonological Awareness, Receptive Vocabulary, and Expressive Vocabulary.
In each plot, a data-point represents an individual MELD student’s performance in a skill test,
where pre-MELD (t1) scores (x-axis) are plotted against post-MELD (t2) scores (y-axis). The
solid diagonal line represents the line of zero change. The grey band around the zero change line
represents the confidence interval for the RCI measure, criterion (a). Data-points that lie above
the solid line of zero change and fall outside of the shaded confidence interval area represent
individual MELD students that underwent statistically reliable improvement during the MELD
program, i.e., RCI > 1.96, p< .05. Data-points that lie within the grey RCI confidence interval
represent individual MELD students that demonstrated no statistically reliable improvement or
deterioration from t1to t2. Data-points that lie below the line of change and below the grey RCI
confidence interval represent individual MELD students that show statistically reliable deteriora-
tion from t1to t2, i.e., RCI < -1.96, p> .05. The horizontal dashed line represents criterion
(b), the cut-off threshold for clinical significance. Any student that falls above this line therefore
reached the normal population range for that given skill at t2.
10
50
75
100
125
50 75 100 125
Pre−MELD
Post−MELD
Rapid Symbolic Naming
A
50
75
100
125
50 75 100 125
Pre−MELD
Post−MELD
Phonological Memory
B
50
75
100
125
50 75 100 125
Pre−MELD
Post−MELD
Phonological Awareness
C
25
50
75
100
125
25 50 75 100 125
Pre−MELD
Post−MELD
Receptive Vocabulary
D
25
50
75
100
125
25 50 75 100 125
Pre−MELD
Post−MELD
Expressive Vocabulary
E
Reliable Change Index Clinical Significance Cut−off Cohort 17−18 18−19
Figure 4: Scatter plots of pre-MELD and post-MELD scores for each language ability. The grey band
shows 95% confidence interval for the Reliable Change Index. The horizontal line shows the cut-off
point for clinically significant change.
2.3 Results summary
2.3.1 Which skills undergo the most reliable change?
Figure 5visualizes the proportions of reliably improved, improved, deteriorated, and reliably dete-
riorated students (based on the RCI measure) for each skill test, broken down by cohort. Phono-
logical Awareness in the 2018-19 cohort showed the most improvement with 93% of students
either improving or reliably improving. This amount was greater than the 2017-18 cohort which
saw a combined improvement rate (reliably or not) of 73% in Phonological Awareness. Rapid
Symbolic Naming (RSN) shows a large proportion of students undergoing growth in the 2017-18
cohort and 2018-19 cohort, with 54% and 60% of students improving respectively. This observed
increase in the proportion of student improvement from the 2017-18 to 2018-19 iteration of the
MELD program was also observed for all other skill tests except Expressive Vocabulary.
Expressive Vocabulary showed the most deterioration in the 2018-19 cohort, with about half
of students (47%) seemingly deteriorated by t2. Independent t-tests showed that Form A and
Form B were reliably different at both t1and t2
2. We conclude that EVT is unreliable because
Form A and B are not matched.
2.3.2 Which L2 skills end up most native-like?
Whereas the RCI provides information about the size of a student’s leap in English language skill
during the MELD program, the clinical significance cut-off threshold marks where the student
lands by the end of the program. A student who reaches the clinical significance cut-off threshold
2This was not the case for PPVT.
11
MELD: 17−18
MELD: 18−19
Rapid
Naming Phonological
Memory Phonological
Awareness Receptive
Vocabulary Expressive
Vocabulary
0
25
50
75
100
0
25
50
75
100
Language skill
Proportion of students (%)
Reliable Change Index
Reliably improved
Improved
Deteriorated
Reliably deteriorated
Which skills undergo the most reliable change?
Figure 5: Breakdown of reliable change for each language ability by cohort.
for a particular skill by t2is more likely to be drawn from the age-matched native English speaker
population for that skill. As presented in Figure 6, there was variability across language skills in
the proportion of students who reached the cut-off threshold at t2.
The 2017-18 cohort had a greater proportion of students reach the native-like range for
Phonological Memory (67%), Phonological Awareness (61%), Receptive Vocabulary (6%), and
Expressive Vocabulary (3%) than the 2018-19 cohort. In contrast, the proportion of MELD
students reaching clinical significance in these same measures for the 2018-19 cohort were 41%,
36%, 1%, and 2% respectively. The one measure in which a greater proportion reached the
clinical significance cut-off in the 2018-19 cohort than the 2017-18 cohort was Rapid Naming,
which had proportions of 66% and 60%.
12
MELD: 17−18
MELD: 18−19
Rapid
Naming Phonological
Memory Phonological
Awareness Receptive
Vocabulary Expressive
Vocabulary
0
25
50
75
100
0
25
50
75
100
Language skill
Proportion of students (%)
Which L2 skills end up most native−like?
Figure 6: Proportion of students reaching clinical significance cut-offs for each language ability broken
down by cohort.
3 Precursors of growth in language skills
Our goal in this section was to identify which incoming language skills are related to skill change
during the MELD program. In addition to pinpointing which early language skills are relevant
for skill growth during the MELD program, it is also important to determine the nature of the
relationship between t1scores and skill growth. There are two possible relationships:
1. Negative relationship: Weaker t1ability in a given skill is associated with greater growth in
a skill.
Implication: There is a boost to growth in language skill for students entering MELD with
poorer ability in a skill.
2. Positive relationship: Greater t1ability is associated greater growth in a given skill.
Implication: Students who enter MELD with greater ability in a given skill already in place
are expected to make greater gains in the skill under consideration.
To examine which skills are relevant for growth, and to identify which of the above scenarios
is supported by the data, we conducted a series of regression analyses. We collapsed students
from both cohorts to create one sample (n= 350) and fitted separate linear regression models to
RCI scores for each skill test. We ran models predicting change scores with t1scores included as
predictor variables as recommended by Castro-Schilo and Grimm (2018). We did not fit a model
to RCI scores for Expressive Vocabulary given the purported unreliability of the measure due to
form counterbalancing artifacts (see section 2.3). As predictors we included the t1scores for each
language measure. We applied a backward elimination procedure, removing any variables that no
13
longer provide an improvement in model fit. We refitted models after removing outliers from all
datasets by excluding absolute standardized residuals exceeding 2.5 standard deviations (Crawley,
2002; Baayen & Milin, 2010).
3.1 Results
The regression models indicate that each skill at t1significantly predicted growth for that same
skill. That is, phonological awareness at t1significantly predicted growth in phonological awareness
[β= -0.05; SE = 0.004; t= -12.411; p< .0001], phonological memory at t1was a reliable
predictor of growth in phonological memory [β= -0.068; SE = 0.008; t= -8.91; p< .0001],
t1rapid symbolic naming reliably predicted growth in rapid symbolic naming [β= -0.046; SE
= 0.006; t= -8.184; p< .0001], and vocabulary knowledge at t1was a significant predictor
of growth in vocabulary knowledge [β= -0.084; SE = 0.012; t= -7.189; p< .0001]. The
negative sign of the model estimates conform with scenario 1: a weaker t1score for a particular
skill predicts greater growth in that skill. In sum, these patterns indicate that MELD students
entering the program with poor aptitude in a skill tend to make the most gains in the same skill.
In addition, the regression models indicate that phonological awareness at t1is a predictor of
growth in phonological memory [β= 0.024; SE = 0.007; t= 3.502; p< .0001] and growth in
receptive vocabulary size [β= 0.043; SE = 0.01; t= 4.159; p< .0001]. These model estimates
confirm the patterns described in scenario 2: greater initial phonological awareness is associated
with larger longitudinal gains in phonological memory and vocabulary size. Importantly, these
effects hold even when controlling for baseline scores of vocabulary knowledge and phonological
memory.
According to the regression model estimates, students entering the MELD program with a t1
phonological awareness score of greater than 98 are predicted to undergo reliable change (RCI
> 1.96) in vocabulary size by t2. Moreover, students entering the MELD program with a t1
phonological awareness score of greater than 112 are predicted to experience reliable change in
phonological memory. Figure 7plots the partial effect of t1phonological awareness on vocabulary
size growth (panel A), and phonological memory growth (panel B). The dashed red line indicates
the t1value that is associated with significant reliable change (i.e., RCI > 1.96).
To conclude, the results indicate that the early ability to analyze, synthesize and manipulate
phonemes and syllables, as tapped by a composite measure of phonological awareness, drives
growth in vocabulary and phonological storage during the course of the MELD program. These
findings conform with research (see e.g. Bowey, 1996; Messbauer & de Jong, 2003; Windfuhr
& Snowling, 2001) which shows that phonological awareness plays a role in determining perfor-
mance on experimental tests of both verbal short-term memory and word learning because of
the need to accurately encode and maintain phonological information in such tasks. In sum, the
results indicate that early phonological awareness is a key building block for the development of
vocabulary and phonological memory.
14
−1.00
−0.50
0.00
0.50
1.00
1.50
1.96
2.50
3.00
3.50
50 60 70 80 90 100 110 120
Phonological awareness pre−MELD
Vocabulary size growth (RCI)
A.
0.00
0.50
1.00
1.50
1.96
2.50
50 60 70 80 90 100 110 120
Phonological awareness pre−MELD
Phonological memory growth (RCI)
B.
Figure 7: Phonological awareness at t1as a predictor of vocabulary size growth (RCI) and phonological
memory (RCI). The dashed red line represents the threshold for reliable growth (RCI = 1.96) and the
associated t1score value.
4 Reading development
In the following, we present the results of two analyses. First, we assess MELD students’ develop-
ment in reading comprehension and silent reading rate during the course of the MELD program.
Second, we analyse the contribution of language abilities upon entry to MELD (t1) to reading
comprehension and silent reading rate at the end of the MELD program (t2).
Dependent variables The dependent variables of interest were passage comprehension scores
and silent reading rate. Passage comprehension was computed as proportion correct versus incor-
rect responses per story and silent reading rate was operationalized as words per minute (wpm)
per story.
Data preparation We collapsed results from both cohorts to create one large sample of
students who had completed a full test battery (i.e., eye-tracking and standardized language
assessments) at both time-points (n= 340). The original dataset contained 4,220 passage
readings. We removed 87 readings (2% of data-points) because of signal loss and excessive
blinks. We also eliminated data-points from the top and bottom 1% of the wpm distribution
(83 data-points). The final dataset contained 4,050 separate passage readings (nreadings t1=
2,005, nreadings t2= 2,045).
4.1 Passage reading change
Statistical method In this analysis we used a (generalized) linear mixed-effects approach
(Baayen, Davidson, & Bates, 2008). To determine the impact of the MELD program on reading
behaviour, we examined the effect of testing session (pre-MELD vs. post-MELD) on the two
measures of interest. To this end, Time (two levels: t1and t2) was included as a fixed effect
in models. Passage Complexity was also included as a fixed effect, since it is possible that more
complex texts were more difficult to process and comprehend. The random effects structure of
models included by-participant random intercepts with random slopes for Time. By-participant
random slopes for Time are necessary to appropriately model longitudinal data since they account
for possible participant heterogeneity over time and dependency among repeated measures (Long,
2012).
15
The model for passage comprehension scores (proportion of answers correct) was fitted as-
suming a binomial underlying distribution, and the model for words per minute (log-transformed
to reduce skewness of the distribution) was fitted using the Gaussian underlying distribution. As
control variables, both models included the fixed effects of GORT version (Form A vs. Form B)
and Order (AB vs. BA) to test whether main effects persisted in the presence of any influence
of form version and the order of the forms, despite counterbalancing. Finally, we refitted models
after removing outliers from all datasets by excluding absolute standardized residuals exceeding
2.5 standard deviations (Crawley, 2002; Baayen & Milin, 2010). For further details on modelling
procedures and programming software used in this analysis, see Schmidtke and Moro (accepted).
4.1.1 Results
Reading comprehension First, there was a main effect for Time. The passage comprehension
of MELD students improved, with M= 48% (SE = 0.62%), at t1, and M= 52% (SE = 0.62%),
at t2. The improvement from t1to t2,= 4%, was significant [β= 0.08; SE = 0.03; z= 2.99;
p= < .01]. Second, the main effect for Passage Complexity was significant: less complex stories
were better understood than more complex stories [β= -0.44; SE = 0.05; z= -8.18; p= <
.01]. Figure 8visualizes the estimated partial effects of Time and Passage Complexity on passage
comprehension. We did not observe an interaction between Time and Passage Complexity, which
indicates that longitudinal change in passage comprehension did not differ across stories of varying
complexity.
10
20
30
40
50
60
70
80
3456789
Passage complexity
Passage comprehension (%)
Time
Pre−MELD
Post−MELD
Figure 8: Partial effects of the main effects of Time and Passage Complexity on passage comprehension.
Error bars represent the 95% confidence interval.
Silent reading rate The results for silent reading rate showed a significant main effect of
Time. Overall, reading rates of MELD students increased significantly, with M= 111 wpm (SE
= 1.01), at t1, and M= 117 wpm (SE = 1.03), at t2. The increase in wpm from t1to t2,= 6
wpm, was significant [β= 0.03; SE = 0.01; t= 3.9; p= < .001]. Note that βand SE estimates
from the regression models are expressed in log units of wpm. Second, a main effect for Passage
Complexity was explained by fewer words per minute for more complex stories [β= -0.11; SE =
0.01; t= -8.63; p= < .001]. As in the comprehension data, there was no interaction between
16
Time and Passage Complexity, suggesting that silent reading rate increased at the same rate for
both easy and difficult passages.
The relationship between reading rate and comprehension Next, we examined the
relationship between reading rate and passage comprehension. We formed two quasi-experimental
‘passage comprehension’ conditions by performing a median split on comprehension scores. The
median split was based on the median value for each story, per each GORT form and time-
point. The resulting factor, ‘Comprehension’ (two levels: below average and above average
comprehension), was added to the model predicting silent reading rate.
The results showed that there was an effect of Comprehension on silent reading rate: the
average silent reading rate for passages that were poorly understood (M= 119 wpm; SE =
1.31) was greater than for passages that were better understood (M= 111 wpm; SE = .87),
= 8 wpm [β= -0.05; SE = 0.01; t= -5.68; p= < .001]. We argue that the relationship
between slow reading rates and more successful passage comprehension can be explained by more
effortful mental structuring and organization of semantic information and cohesive structure of
texts, which allows for a richer memory of the information conveyed by the passage (Meyer, Talbot
& Florencio, 1999). This effect did not interact with Passage Complexity or Time, and did not
enter into a three-way interaction with Passage Complexity and Time. This indicates that the
magnitude of the difference in wpm between good and poor comprehenders did not covary with
the difficulty of the passage. Moreover, interestingly the difference in reading rate between good
and poor comprehension persists from the beginning to the end of the MELD program, indicating
that this is a general pattern of reading for these EFL students. Figure 9shows the predicted
effects of Time and Passage Complexity on wpm, broken down by Comprehension.
How do MELD students’ reading rates compare with L1 readers and other L2
populations? According to a meta-analysis by Brysbaert (2019), the average silent reading
rate of an L1 English adult without a diagnosed reading impairment is 238 wpm for non-fiction
and 260 wpm for fiction. The observed results of the present study show that the average reading
rate for MELD students ranges between 150 wpm (simple texts) and 80 wpm (difficult texts).
These estimates fall below the expected L1 average for fiction or non-fiction, despite a significant
average increase of 6 wpm during the course of the MELD program. Available studies in the
L2 literature confirm that reading speed is slower for L2 readers. For example, reading rates
are 17% slower in Dutch-English bilinguals than in L1 readers (Cop, Drieghe & Duyck, 2015).
Moreover, the average reading rate for Japanese university students reading English texts with
good comprehension is 139 wpm, but falls below 60 wpm for readers with poor comprehension
ability (Hirai, 1999). Though the estimates reported by Hirai (1999) are comparable with those
reported in the present study, contrary to the present data, Hirai found that faster readers tended
to have better overall comprehension. We propose that the reason for the difference in the
relationship between comprehension and reading rate across studies may be attributed to the
difference in the motivation of participants between each study.
4.2 Early precursors of reading outcomes
In this analysis, we evaluated whether early phonological awareness and vocabulary knowledge
made a unique contribution to the prediction of post-MELD passage comprehension and silent
reading rate, in addition to control variables. A multiple hierarchical linear regression analysis was
conducted to test whether phonological awareness at t1(measured via the CTOPP phonological
awareness composite score) and vocabulary knowledge at t1(measured via the PPVT Receptive
Vocabulary knowledge score) each uniquely predict passage comprehension and silent reading rate
at t2. Phonological awareness and vocabulary knowledge were selected based on prior research
on the same EFL population which showed that these facets of L2 language skill are impor-
tant contributors of word-level reading behaviour (Schmidtke & Moro, submitted). Based on
these results, we expect that MELD students’ incoming phonological awareness and vocabulary
knowledge would each explain significant unique amounts of variance in reading rate and passage
17
Below average comprehension
Above average comprehension
3456789 3456789
60
70
80
90
100
110
120
130
140
150
160
Passage complexity
Words per minute
Time
Pre−MELD
Post−MELD
Figure 9: Partial effects of the main effects of Time and Passage Complexity on silent reading rate
broken down by Comprehension. Error bars represent the 95% confidence interval.
comprehension at the end of the MELD program. Furthermore, phonological awareness and vo-
cabulary tap into two facets of L2 reading comprehension: phonological decoding and language
comprehension (Farnia & Geva, 2013; Jeon & Yamashita, 2014). These two facets are outlined
in the simple view of reading (SVR) model (Hoover & Gough, 1990), as the two main ‘pillars’ of
L1 and L2 reading comprehension.
Statistical method We conducted two sets of multiple hierarchical regression models. The
first set was conducted to predict t2passage comprehension and the second set was conducted to
predict t2silent reading rate (wpm). The hierarchical modelling procedure was identical for both
sets of models. As control variables, Step 1 estimated the amount of variance explained by the
relevant dependent variable measured at t1(either passage comprehension or silent reading rate),
Passage Complexity of the GORT passage at t2, GORT form at t2(two levels: Form A and Form
B), and MELD cohort (two levels: 2017-18 and 2018-19). The amount of variance explained (R2)
by this model is compared to the amount of variance explained by an intercept-only regression
model fitted to the same measure. This step establishes whether the set of control variables
explain any variance beyond no predictors. In Step 2, we added in the Phonological Awareness
measure as assessed at t1, and at Step 3 we added Vocabulary Knowledge as measured at t1.
At each step, the amount of variance explained by the more complex model is compared to the
amount of variance explained at the previous step (R2), and the significance of the gain in
model fit is assessed through an ANOVA test. The comparison of models at Step 1 and Step 2
identify the unique contribution of Phonological Awareness at t1. The comparison of models at
Step 2 and Step 3 identify the unique contribution of Vocabulary Knowledge as measured at t1.
Pearson correlations between Vocabulary Knowledge and Phonological Awareness at t1are shown
in Table B1 in Appendix B.
We restricted the dataset used in the prior analysis to include only eye-movement behaviour
and passage comprehension measured at t2. Since our models included the response variable
measured at t1as a predictor variable, this necessitated restricting the data to include a student’s
observations for which passage readings of equivalent complexity were available for both time-
18
points after data-cleaning (1,951 data-points; n= 340). All variables were z-transformed before
they were added to the model. Models for silent reading rate were fitted to log-transformed wpm,
and passage comprehension was measured as the total number of correct responses per story.
4.2.1 Results
Early predictors of post-MELD reading comprehension Table 2presents the results
of the hierarchical regression models fitted to t2reading comprehension data. The final model
accounted for 27.98% of the variance in t2reading comprehension data (F(6, 1944) = 125.89,
p< .0001). The unique contributions of Phonological Awareness at t1(0.96%), and Vocabulary
Knowledge at t1(1.12%) were statistically significant (Table 2), confirming that the mastery of
these skills on arrival is positively related to students reading comprehension outcomes at the
end of the MELD program. Both variables explained variance above that explained by control
variables, including reading comprehension assessed at t1and Passage Complexity. The effects
indicate that after Passage Complexity, Vocabulary Knowledge at t1had the strongest effect
on reading comprehension at t2, followed by Phonological Awareness at t1. In sum, this analysis
supports the hypothesis that phonological awareness and vocabulary knowledge are early indicators
of reading development in the MELD program.
Early predictors of post-MELD silent reading rate The final model for silent reading
rate (Table 2) accounted for 34.27% of the variance in wpm measured at t2(F(6, 1944) =
168.94, p< .0001). The effects, displayed in Table 2, show that the unique contributions of
Phonological Awareness at t1(0.42%) and Vocabulary Knowledge at t1(0.18%) were significant.
The addition of both variables improved model fit, over and above the contribution of control
variables, including wpm at t1and Passage Complexity. The effects confirm that the level of
phonological awareness and vocabulary knowledge with which MELD students start the program
positively affects their silent reading rate at the end of MELD. That is, students with stronger
phonological awareness and larger vocabularies at the beginning of MELD are able to read passages
more rapidly for comprehension at the end of the MELD program.
19
Table 2: Multiple Hierarchical Regression Analyses predicting Reading Comprehension and Silent Read-
ing Rate from t1Phonological Awareness, and t1Vocabulary Knowledge.
Reading Comprehension Silent Reading Rate
ˆ
βcheesecheese tcheesecheesecheese ˆ
βcheesecheese t
Step 1
DV t1-0.24 -0.38 0.17 20.18***
Passage Complexity t2-14.35 -23.45*** -0.1 -11.8***
GORT Form t2-1.38 -2.52* 0.03 3.37***
MELD Cohort -2.24 -3.38*** 0.01 1.23
R2= 25.91***, F(4, 1,946) = 170.2 R2= 33.59***, F(4, 1,946) = 246.1
Step 2
DV t1-0.64 -1.04 0.17 19.93***
Passage Complexity t2-14.55 -23.87*** -0.1 -11.96***
GORT Form t2-1.31 -2.4* 0.03 3.4***
MELD Cohort -0.75 -1.03 0.02 2.53*
Phonological Awareness t13.01 5.04*** 0.03 3.52***
R2= 0.96***, F(5, 1,945) = 142.9 R2= 0.42***, F(5, 1,945) = 200.5
Step 3
DV t1-1.02 -1.65 0.16 19.38***
Passage Complexity t2-14.72 -24.3*** -0.1 -12.14***
GORT Form t2-0.35 -0.62 0.03 4.05***
MELD Cohort -1.29 -1.78 0.02 2.09***
Phonological Awareness t11.77 2.79** 0.02 2.28*
Vocabulary Knowledge t13.34 5.49*** 0.02 2.76*
R2= 1.12***, F(5, 1,944) = 125.9 R2= 0.26**, F(5, 1,944) = 168.9
Note: R2and R2given as percentage of variance explained.
DV t1refers to the dependent measure (either comprehension or silent reading rate) of the respective model measured at t1.
*p<.05.
**p<.01.
***p<.001.
20
References
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with
crossed random effects for subjects and items. Journal of Memory and Language,
59(4), 390–412.
Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of
Psychological Research,3(2), 12–28.
Beglar, D., & Nation, P. (2007). A vocabulary size test. The Language Teacher,31(7),
9–13.
Bowey, J. A. (1996). On the association between phonological memory and receptive
vocabulary in five-year-olds. Journal of experimental child psychology ,63 (1), 44–
78.
Brysbaert, M. (2019). How many words do we read per minute? A review and meta-
analysis of reading rate. Journal of Memory and Language.
Castro-Schilo, L., & Grimm, K. J. (2018). Using residualized change versus difference
scores for longitudinal research. Journal of Social and Personal Relationships,35 (1),
32–58.
Cop, U., Drieghe, D., & Duyck, W. (2015). Eye movement patterns in natural reading:
A comparison of monolingual and bilingual reading of a novel. PloS one,10 (8),
e0134008.
Crawley, M. J. (2002). Statistical computing: An introduction to data analysis using
S-Plus. New York: Wiley.
Daller, M. H., & Phelan, D. (2013). Predicting international student study success.
Applied Linguistics Review,4(1), 173–193.
Dunn, L. M., & Dunn, D. M. (2007). PPVT-4: Peabody picture vocabulary test. Pearson
Assessments.
Farnia, F., & Geva, E. (2013). Growth and predictors of change in English language
learners’ reading comprehension. Journal of Research in Reading ,36 (4), 389–421.
Hirai, A. (1999). The relationship between listening and reading rates of Japanese EFL
learners. The modern language journal,83 (3), 367–384.
Hoover, W. A., & Gough, P. B. (1990). The simple view of reading. Reading and Writing,
2, 127–160.
Jacobson, N. S., & Truax, P. (1991). Clinical significance: a statistical approach to
defining meaningful change in psychotherapy research. Journal of consulting and
clinical psychology,59 (1), 12–19.
Jeon, E. H., & Yamashita, J. (2014). L2 reading comprehension and its correlates: A
meta-analysis. Language Learning,64(1), 160–212.
Long, J. D. (2012). Longitudinal data analysis for the behavioral sciences using R. Sage.
Masrai, A., & Milton, J. (2017). Recognition vocabulary knowledge and intelligence as
predictors of academic achievement in EFL context. TESOL International Journal ,
12(1), 128–142.
Messbauer, V. C., & de Jong, P. F. (2003). Word, nonword, and visual paired associate
learning in Dutch dyslexic children. Journal of Experimental Child Psychology,
84(2), 77–96.
Meyer, B. J., Talbot, A. P., & Florencio, D. (1999). Reading rate and prose retrieval.
Scientific Studies of Reading,3(4), 303–329.
Sawilowsky, S. S. (2009). New effect size rules of thumb. Journal of Modern Applied
Statistical Methods,8(2), 467–474.
Schmidtke, D., & Moro, A. (accepted). Determinants of word reading development in
EFL university students: a longitudinal eye-movement study. Reading Research
Quarterly.
21
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual
differences in the the acquisition of literacy. Reading Research Quarterly,21 (4),
360–407.
Trenkic, D., & Warmington, M. (2019). Language and literacy skills of home and interna-
tional university students: How different are they, and does it matter? Bilingualism:
Language and Cognition,22 (2), 349–365.
Wagner, R., Torgesen, J., Rashotte, C., & Pearson, N. A. (1999). CTOPP-2: Compre-
hensive test of phonological processing–second edition. Pro-ed Austin, TX.
Wiederholt, J., & Bryant, B. (2012). Gray oral reading tests–fifth edition (GORT-5).
Austin, TX: Pro-Ed.
Williams, K. T., & Williams, K. T. (2007). EVT-2: Expressive vocabulary test. Pearson.
Windfuhr, K. L., & Snowling, M. J. (2001). The relationship between paired associate
learning and phonological skills in normally developing readers. Journal of experi-
mental child psychology,80 (2), 160–173.
Yixin, W., & Daller, M. (2014, September). Predicting Chinese students’ academic
achievement in the UK. In J. Angouri, T. Harrison, S. Schnurr, & S. Wharton
(Eds.), Learning, working and communicating in a global context: Proceedings of
the 47th annual meeting of the British Association for Applied Linguistics (pp. 217–
227).
22
Appendix A: Overview of tests and assessments
Phonological processing
Phonological awareness
The Phonological Awareness Composite is composed of three subtests from the Comprehensive
Test of Phonological Processing, Second Edition (CTOPP-2; Wagner, Torgesen, Rashotte &
Pearson, 1999): Elision (EL), Blending Words (BW), and Phoneme Isolation (PI). For the Elision
(EL) subtest, participants are asked to delete a phonological segment from a spoken word to
form a new word (e.g., say “farm” without saying /f/) [34 items]. For the Blending Words (BW)
subtest, participants listen to individual sounds and combine them into a recognizable word (e.g.,
what word is /n/ /æ/ /p/?) [33 items]. For the Phoneme Isolation (PI) subtest, participants are
tested on the ability to recognize individual sounds in words (e.g., what is the first sound in the
word “fan”?) [32 items]. The range of standardized scores achieved were 54 to 131. Cronbach’s
alpha values for the Phonological Awareness subtests were .92 (Elision), .83 (Blending words),
and .77 (Phoneme Isolation).
Phonological memory
The Phonological Memory Composite is composed of two subtests from the Comprehensive Test
of Phonological Processing, Second Edition (CTOPP-2; Wagner, Torgesen, Rashotte & Pearson,
1999): Memory for Digits (MD) and Nonword Repetition (NR). Memory for Digits (MD) consisted
of participants listening to strings of numbers and repeating them in the correct order. Testing
begins with two digit strings and ends with recall of nine. digit strings [28 items]. Nonword
Repetition (NR) consisted of participants listening to a nonword and repeating it accurately (e.g.,
/tg/). Nonwords range in length from 3 to 15 sounds [30 items]. Cronbach’s alpha estimates for
Phonological Memory subtests were .82 (Memory for Digits) and .82 (Non-word Repetition).
Rapid symbolic naming
The Rapid Symbolic Naming Composite is composed of two subtests from the Comprehensive
Test of Phonological Processing, Second Edition (CTOPP-2; Wagner, Torgesen, Rashotte &
Pearson, 1999): Rapid Digit Naming (RD) and Rapid Letter Naming (RL). Rapid Digit Naming
(RD) requires participants to rapidly name 36 randomly assorted numbers on a grid (4 rows of
9 numbers) beginning at the top row from left to right. Rapid Letter Naming (RL) requires
participants to rapidly name 36 randomly assorted letters on a grid (4 rows of 9 letters) beginning
at the top row from left to right [36 items]. Cronbach’s alpha for accuracy scores in the Rapid
Digit Naming and Rapid Letter Naming tests were .76.
Vocabulary knowledge
Receptive vocabulary knowledge
Vocabulary knowledge was assessed using the Peabody Picture Vocabulary Test 4th Edition
(PPVT-4; Dunn & Dunn, 2007) and the Expressive Vocabulary Test, Second Edition (EVT-
2; Williams, 2007). For each trial of the PPVT, participants were presented with an array of
four pictures. The examiner orally produced a word which described one of the pictures and the
participant was asked to point out the picture that the word described. There are 228 items
presented in order of increasing difficulty. Items were split into 19 blocks, with 12 items per each
block. Testing stopped when participants provided 8 or more incorrect answers within a block.
There are two version of the PPVT-4 test: Form A and Form B. At t1, half of the participants
were administered Form A, and the remaining half of the participants were administered Form B.
At t2, the forms were counterbalanced such that the participants who were administered Form A
at t1were administered Form B, and vice-versa. Cronbach’s alpha for the internal consistency of
Form A was .86 and Form B was .78.
23
Expressive vocabulary knowledge
For each trial of the Expressive Vocabulary Test (EVT) the experimenter displayed a picture and
read a stimulus question. The examinee would answer with one word that provided an acceptable
label, answered a specific question, or provided a synonym for a word that fits the picture. The
EVT includes 190 test items presented in order of increasing difficulty. Testing is stopped when the
participant provides five consecutive incorrect answers. There are two versions of the EVT-2 test:
Form A and Form B. At t1, half of the participants were randomly administered Form A, and the
remaining half of participants were administered Form B. At t2, the forms were counterbalanced
such that the participants who were administered Form A at t1were administered Form B, and
vice-versa.
Vocabulary size estimates
The Vocabulary Size Test measures an individual’s written receptive vocabulary size in English
(Beglar & Nation, 2007). Participants are given a word in a sentence followed by four descriptions.
They are asked to choose the description that correctly describes the word. Although the word is
provided within a sentence, it is provided in a non-defining context. The test is composed of 100
items which are presented in order of frequency (most to least frequent). Participants are given
45 minutes to complete it. Participants receive a point for each item that they answer correctly.
The total score is multiplied by 200 to get the total receptive vocabulary size. The maximum
score that can be achieved is 20,000. The final score indicates the number of word families an
individual knows. The greater the word family size, the greater the written receptive vocabulary
size.
Passage reading
Reading comprehension
Reading Comprehension was assessed using the Gray Oral Reading Comprehension Test Fifth
Edition (GORT-5; Wiederholt & Bryant, 2012). Students read a series of GORT stories of
increasing complexity. For measures of syntactic complexity, referential cohesion, lexical diversity,
and L2 readability (taken from the Coh-Metrix online tool: McNamara, Louwerse, Cai & Grasser,
2013), we found that these indices of passage complexity increased monotonically with the ordinal
number of GORT text complexity. Each story is followed by five comprehension questions each.
The stories are presented one by one in order of increasing difficulty. The 2017-18 cohort read
stories 3-9 and the 2018-19 cohort read stories 3-8 (we removed story 9 because of testing time
restrictions). As with the PPVT and EVT, there is an A and B version of each GORT story:
forms were counterbalanced across participants and testing time-points. The percentage accuracy
of responses to the comprehension questions per each story is computed as a measure of reading
comprehension skill.
Eye-movements during reading
Three eye-tracking systems (system 1, EyeLink 1000; systems 2 and 3, EyeLink 1000Plus) manu-
factured by SR Research Ltd. (Ottawa, Ontario Canada) were used to record eye-movements as
students silently read the GORT passages. This offers the opportunity to tap into the online pro-
cesses of reading at the word and passage-level. Eye-movement measures at the word-level may
include total reading time, number of fixations, refixation probability and number of regressions.
Eye-movement measures at the passage-level may include words per minute, total passage read-
ing time, and total number of fixations (per word). For further methodological and technological
documentation regarding the eye-tracking procedure, see Schmidtke and Moro (submitted).
24
Appendix B: Correlation matrices
Table B1. Pearson correlation matrix of language skills assessed at t1.
1. 1a. 1b. 1c. 2. 2a. 2b. 3. 3a. 3b. 4.
1. Phonological Awarenessa
1a. Elision 0.82***
1b. Blending Words 0.77*** 0.50***
1c. Phoneme Isolation 0.79*** 0.44*** 0.41***
2. Phonological Memorya0.44*** 0.32*** 0.39*** 0.34***
2a. Memory for Digits 0.28*** 0.18*** 0.26*** 0.22*** 0.77***
2b. Nonword Repetition 0.40*** 0.31*** 0.35*** 0.30*** 0.76*** 0.17**
3. Rapid Symb olic Naminga0.20*** 0.09 0.20*** 0.19*** 0.15** 0.08 0.14**
3a. Rapid Digit Naming 0.24*** 0.12* 0.26*** 0.20*** 0.16** 0.11* 0.13* 0.88***
3b. Rapid Letter Naming 0.11* 0.04 0.09 0.13* 0.10 0.04 0.12* 0.86*** 0.51***
4. Receptive Vocabularyb0.34*** 0.31*** 0.21*** 0.27*** 0.22*** 0.16** 0.17** 0.04 0.05 0.01
5. Expressive Vocabularyc0.26*** 0.22*** 0.24*** 0.17** 0.13* 0.06 0.15** 0.17** 0.17** 0.12* 0.39***
aComposite score reported.
bPeabody Picture Vocabulary test.
cExpressive Vocabulary test.
*p<.05.
**p<.01.
***p<.001.
Table B2. Pearson correlation matrix of language skills assessed at t2.
1. 1a. 1b. 1c. 2. 2a. 2b. 3. 3a. 3b. 4.
1. Phonological Awarenessa
1a. Elision 0.78***
1b. Blending Words 0.72*** 0.32***
1c. Phoneme Isolation 0.74*** 0.39*** 0.31***
2. Phonological Memorya0.36*** 0.26*** 0.33*** 0.21***
2a. Memory for Digits 0.27*** 0.18** 0.29*** 0.14** 0.81***
2b. Nonword Repetition 0.32*** 0.25*** 0.26*** 0.21*** 0.81*** 0.31***
3. Rapid Symb olic Naminga0.21*** 0.20*** 0.12* 0.16** 0.12* 0.12* 0.06
3a. Rapid Digit Naming 0.21*** 0.20*** 0.12* 0.17** 0.12* 0.12* 0.07 0.88***
3b. Rapid Letter Naming 0.15** 0.15** 0.08 0.11* 0.08 0.09 0.04 0.85*** 0.50***
4. Receptive Vocabularyb0.36*** 0.28*** 0.35*** 0.16** 0.13* 0.13* 0.07 0.12* 0.15** 0.06
5. Expressive Vocabularyc0.34*** 0.31*** 0.28*** 0.17** 0.20*** 0.15** 0.17** 0.11* 0.14* 0.04 0.51***
aComposite score reported.
bPeabody Picture Vocabulary test.
cExpressive Vocabulary test.
*p<.05.
**p<.01.
***p<.001.
25
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We investigated the word-reading development of adult second-language learners of English. A sample of 70 (Mandarin or Cantonese) Chinese-speaking students enrolled in a university-level English bridging program at a Canadian university silently read passages of text at the beginning and end of the program while their eye movements were recorded. At each timepoint, we also administered a battery of tests that measure key component skills of secondlanguage reading (phonological processing, vocabulary knowledge, and listening comprehension). We found longitudinal changes in lexical processing for long words in early (refixation probability and gaze duration) and late (go-past time and total reading time) eye movement measures, indicating a shift from a sublexical to a holistic word-processing strategy. We found the largest gains in sublexical processing among students with stronger phonological awareness upon entry to the program and students who acquired more vocabulary than their peers during the program. We interpret the results of this study as evidence of a transition from a lexical processing strategy that is heavily reliant on phonological decoding to word-reading behavior that is more actively engaged in higher order cognitive processes, such as meaning integration. This research offers novel insights into predictors of reading skill in postsecondary English-language bridging programs.
Article
Full-text available
Based on the analysis of 190 studies (18,573 participants), we estimate that the average silent reading rate for adults in English is 238 words per minute (wpm) for non-fiction and 260 wpm for fiction. The difference can be predicted by taking into account the length of the words, with longer words in non-fiction than in fiction. The estimates are lower than the numbers often cited in scientific and popular writings. The reasons for the overestimates are reviewed. The average oral reading rate (based on 77 studies and 5,965 participants) is 183 wpm. Reading rates are lower for children, old adults, and readers with English as second language. The reading rates are in line with maximum listening speed and do not require the assumption of reading-specific language processing. Within each group/task there are reliable individual differences, which are not yet fully understood. For silent reading of English non-fiction most adults fall in the range of 175 to 300 wpm; for fiction the range is 200 to 320 wpm. Reading rates in other languages can be predicted reasonably well by taking into account the number of words these languages require to convey the same message as in English.
Article
Full-text available
Although international students experience lower attainment at university than home students (Morrison et al., 2005), reasons are poorly understood. Some question the role of language proficiency as international students come with required language qualifications. This study investigated language and literacy of international students who successfully met language entry requirements and those of home students, matched on non-verbal cognition, studying in their native language. In a sample of 63 Chinese and 64 British students at a UK university, large and significant group differences were found at entry and eight months later. Furthermore, language and literacy indicators explained 51% of variance in the Chinese group’s grades, without predicting the home students’ achievement. Thus language proficiency appears predictive of academic outcomes only before a certain threshold is reached, and this threshold does not correspond to the minimum language entry requirements. This highlights a systematic disadvantage with which many international students pursue their education.
Article
Full-text available
Research has shown that general vocabulary knowledge (e.g., Milton & Treffers-Daller, 2013), academic vocabulary knowledge (e.g., Townsend et al., 2012) and general intelligence (e.g., Laidra et al., 2007) are good predictors of academic achievement. While the effect of these factors has mostly been examined separately, Townsend et al (2012) have tried to model the contribution of general and academic vocabulary to academic achievement and find academic vocabulary knowledge adds only marginally to the predictive ability of general vocabulary knowledge. This study, therefore, examines further factors as part of a more extensive predictive model of academic performance, including L1 vocabulary knowledge, L2 general and academic vocabulary knowledge, and intelligence (IQ) as predictors of overall academic achievement among learners of EFL. Performance on these measures was correlated with Grade Point Average (GPA) as a measure of academic achievement for undergraduate Arabic L1 users (N = 96). The results show positive significant correlations between all the measures and academic achievement. However, academic vocabulary knowledge shows the strongest correlation (r = .72) suggesting that the pedagogical use of this list remains important. To further explore the data, multiple regression and factor analyses were performed. The results show that academic and general vocabulary knowledge combined can explain about 56% of the variance in students’ GPAs. The findings, thus, suggest that, in addition to L1 and L2 vocabulary size, and IQ, knowledge of academic vocabulary is an important factor that explains an additional variance in learners’ academic achievement.
Conference Paper
Full-text available
In the past few decades, the number of Chinese students pursuing higher education abroad has increased rapidly and steadily. According to the Department for Business, Innovation and Skills (BIS, 2013) nearly a fifth of all international mobile students in the UK were Chinese in 2010. Compared to studying at home universities in China, it is rather challenging for these students to study abroad. Apart from the high tuition fees for international students, the living expenses in the UK are about 50% higher on average than in China (NUMBEO, 2014). Besides these significant costs, Chinese international students are also facing other challenges such as a language barrier, homesickness and possible culture shock. Study failure is a major concern for these international students. An early and accurate detection of international students at risk of study failure will be beneficial to both students themselves and the host universities. Many factors other than language ability, such as appropriate learning strategies in the new learning environment, motivation, general acculturation ability and intelligence, are important to international students’ academic achievement. However, there is a consensus among researchers that English language ability is the most important factor for academic achievement (Graham, 1987; Bellingham, 1993; Johnson & Ngor, 1996; Reid, Kirkpatrick & Mulligan, 1996; Volet & Renshaw, 1996; Briguglio, 2000; Brooks & Adams, 2002; Lee & Greene, 2007). We therefore focus on English language ability in the present study.
Article
Full-text available
This paper presents a corpus of sentence level eye movement parameters for unbalanced bilingual first language (L1) and second-language (L2) reading and monolingual reading of a complete novel (56 000 words). We present important sentence-level basic eye movement parameters of both bilingual and monolingual natural reading extracted from this large data corpus. Bilingual L2 reading patterns show longer sentence reading times (20%), more fixations (21%), shorter saccades (12%) and less word skipping (4.6%), than L1 reading patterns. Regression rates are the same for L1 and L2 reading. These results could indicate, analogous to a previous simulation with the E-Z reader model in the literature, that it is primarily the speeding up of lexical access that drives both L1 and L2 reading development. Bilingual L1 reading does not differ in any major way from monolingual reading. This contrasts with predictions made by the weaker links account, which predicts a bilingual disadvantage in language processing caused by divided exposure between languages.
Article
Full-text available
Reaction times (RTs) are an important source of information in experimental psychology. Classical methodological considerations pertaining to the statistical analysis of RT data are optimized for analyses of aggregated data, based on subject or item means (c.f., Forster & Dickinson, 1976). Mixed-effects modeling (see, e.g., Baayen, Davidson, & Bates, 2008) does not require prior aggregation and allows the researcher the more ambitious goal of predicting individual responses. Mixed-modeling calls for a reconsideration of the classical methodological strategies for analysing rts. In this study, we argue for empirical exibility with respect to the choice of transformation for the RTs. We advocate minimal a-priori data trimming, combined with model criticism. We also show how trial-to-trial, longitudinal dependencies between individual observations can be brought into the statistical model. These strategies are illustrated for a large dataset with a non-trivial random-effects structure. Special attention is paid to the evaluation of interactions involving fixed-effect factors that partition the levels sampled by random-effect factors.
Article
Full-text available
More and more students study outside their own countries and by 2020 a rise to 7 million international students is predicted world-wide. The present study investigates the level of language proficiency that is necessary for international students to study successfully at universities in English-speaking countries and how this proficiency can be measured. Standardized tests such as the International English Language Test System (IELTS) or the Test of English as a Foreign Language (TOEFL) are carefully developed and constantly scrutinized by the research community, and they provide a valid cut-off point for entry to university, but they do not seem to be good predictors of study success on their own. This is mainly due to the fact that most students who enter universities with these tests have similar scores which leave researchers with a truncated sample where correlations between these test scores and study success, e.g. marks obtained after one year, are necessarily low. The present study investigates alternative measures of language proficiency that can predict the study success of international students. In a longitudinal study with 74 international students a battery of language tests was used at the beginning of the academic year to predict the average marks that the students obtained at the end of the academic year. Several multiple regressions show that between 33% and 96% of the marks can be predicted with tests based mainly on vocabulary knowledge. The findings of the present study have implications for decisions on admission criteria and for language support provision in addition to subject specific learning. There may be many factors other than language proficiency that influence study success of international students such as cultural factors, motivation and familiarity with the subject area. However, our findings indicate that language proficiency and especially vocabulary knowledge is the key factor that explains in some cases almost entirely the final marks that the students achieve.
Article
Researchers interested in studying change over time are often faced with an analytical conundrum: Whether a residualized change model versus a difference score model should be used to assess the effect of a key predictor on change that took place between two occasions. In this article, the authors pose a motivating example in which a researcher wants to investigate the effect of cohabitation on pre- to post-marriage change in relationship satisfaction. Key features of this example include the likely self-selection of dyads with lower relationship satisfaction to cohabit and the impossibility of using experimentation procedures to attain equivalent groups (i.e., cohabitants vs. not cohabitants). The authors use this example of a nonrandomized study to compare the residualized change and difference score models analytically and empirically. The authors describe the assumptions of the models to explain Lord’s paradox; that is, the fact that these models can lead to different inferences about the effect under investigation. They also provide recommendations for modeling data from nonrandomized studies using a latent change score framework.