ArticlePDF Available

The challenges of the Ontario Secondary School Literacy Test for Second Language Students

Authors:

Abstract and Figures

Results from the Ontario Secondary School Literacy Test (OSSLT) indicate that English as a Second Language (ESL) and English Literacy Development (ELD) students have comparatively low success and high deferral rates. This study examined the 2002 and 2003 OSSLT test performances of ESL/ELD and non-ESL/ELD students in order to identify and understand the factors that may help explain why ESL/ELD students failed the test at relatively high rates. The analyses also attempted to determine if there were significant and systematic differences in ESL/ELD students’ test performance. The performance of ESL/ELD students was consistently and similarly lower across item formats, reading text types, skills and strategies, and the four writing tasks. Using discriminant analyses, it was found that narrative text type, indirect understanding skill, vocabulary strategy of reading, and the news report writing task were significant predictors of ESL/ELD membership. The results of this study provide direction for further research and instruction regarding English literacy achievement for these second language students within the context of having to complete large-scale English literacy tests designed and constructed for first English language students.
Content may be subject to copyright.
Language Testing 2007 24 (2) 185–208 10.1177/0265532207076363 © 2007 SAGE Publications
The challenges of the Ontario
Secondary School Literacy Test for
second language students
Liying Cheng, Don A. Klinger and Ying Zheng Queen’s
University, Canada
Results from the Ontario Secondary School Literacy Test (OSSLT) indicate
that English as a Second Language (ESL) and English Literacy Development
(ELD) students have comparatively low success and high deferral rates. This
study examined the 2002 and 2003 OSSLT test performances of ESL/ELD
and non-ESL/ELD students in order to identify and understand the factors
that may help explain why ESL/ELD students failed the test at relatively high
rates. The analyses also attempted to determine if there were significant and
systematic differences in ESL/ELD students’test performance. The perform-
ance of ESL/ELD students was consistently and similarly lower across item
formats, reading text types, skills and strategies, and the four writing tasks.
Using discriminant analyses, it was found that narrative text type, indirect
understanding skill, vocabulary strategy of reading, and the news report
writing task were significant predictors of ESL/ELD membership. The
results of this study provide direction for further research and instruction
regarding English literacy achievement for these second language students
within the context of having to complete large-scale English literacy tests
designed and constructed for first English language students.
I Introduction
Over the past twenty years, immigration policies have resulted in an
increasing proportion of immigrants entering educational systems
throughout North America with little or no experience or education in
English. These English as a Second Language (ESL) and English
Address for correspondence: Liying Cheng, Faculty of Education, Queens University, Kingston,
ON K7L 3N6, Canada; email: chengl@edu.queensu.ca
Literacy Development (ELD)1students are generally provided with
extra support for only a short period to help them quickly achieve fun-
damental English literacy skills (see, for example, Proposition 203,
Arizona, 2000; Proposition 227, California, 1998; Question 2,
Massachusetts, 2002). Further, such support requires additional school
expenditures that may not be readily available when funding for edu-
cation is being restricted. For example, although the number of
ESL/ELD students in the province of Ontario in Canada increased by
23% in one year alone (2001–2002), the number of ESL/ELD teach-
ers and support programs in Ontario schools has declined by 30% over
the past five years (Blackett, 2002). However, this situation is not
unique to Ontario, as jurisdictions throughout North America report
increased numbers of second language students and a lack of resources
to support these students (Mueller et al., 2004; Wright, 2005).
This shift in the student population is occurring alongside increas-
ing educational expectations and accountability. The accountability
framework has resulted in increased use of standards-based curricula
and assessments to address the fears of declining standards (Mazzeo,
2001). These accountability frameworks have resulted in increasing
performance expectations throughout North American jurisdictions.
Recent examples include, but are not limited to, the grade 3 Alberta
provincial achievement test, the Ontario Secondary Schools Literacy
Test (OSSLT), and, likely the most extreme example, the ‘No Child
Left Behind’ Act (2002). In Alberta, grade 3 students who fail the
provincial achievement test will have to write a supplemental exami-
nation in Grade 4. In Ontario, successful completion of the OSSLT, or
the Ontario Secondary School Literacy Course (OSSLC) recently
implemented for students after the first administrations of the test for
those have failed the OSSLT, is a graduation requirement. With ‘No
Child Left Behind’in the United States, schools and districts have until
186 Challenges for second language students
1The Ontario curriculum (Ministry of Education and Training, 1999) defines ESL students as stu-
dents who are learning English as the language of instruction, can read and write in their first lan-
guage (L1), and mostly have had continued schooling before arriving in Canada. ELD students are
those who may not read and/or write in their first language (L1) and may have missed years of
schooling. They could come from countries where Standard English is the official language but
where other varieties of English are in common use. And still others live in communities in Ontario
where access to English is limited. We have used the term of ESL/ELD as it is used in the Ontario
curriculum. ESL/ELD is treated as the same group of test-takers on the OSSLT although we rec-
ognize the highly heterogeneous characteristics of the group (Cumming et al., 1993). In this paper,
the term ‘second language students’ are also used to refer to the ESL/ELD students in the context
of this study.
2014 to ensure that all students, with few, if any, student exemptions,
meet educational expectations. Schools must document their annual
yearly progress and those with less than satisfactory performance may
face a variety of sanctions.
The confluence of both increased numbers of ESL/ELD students
and this expanding assessment (testing) framework has created a
new and largely unanticipated educational problem – alarmingly
high failure rates for these students (Watt and Roessingh, 2001). In
addition, these large-scale tests are designed and constructed for first
language English speakers. Research suggests, however, that these
assessments may have lower reliability and validity for second lan-
guage students and should be interpreted differently (Abedi, 2004;
Abedi et al., 2003). Clearly, these tests result in extra challenges to
the academic success of these second language students and the
teachers who are responsible for their success. The high failure rates
of these students also highlight a systemic and urgent educational
issue that will likely become more problematic as the numbers of
second language students continue to increase and the requirements
for all students to be successful becomes increasingly important.
The OSSLT was developed by the Education Quality and
Accountability Office (EQAO). The purpose of the Ontario
Secondary School Literacy Test (OSSLT) is to ensure that students
have acquired the essential reading and writing skills that apply to all
subject areas in the provincial curriculum up to the end of Grade 9.
All students in public and private secondary schools who are work-
ing toward an Ontario Secondary School Diploma must complete the
OSSLT, or for those who have not been successful on the OSSLT, the
OSSLC. Given the large geographic size of Ontario and variability in
urban structures (urban and rural locations), Ontario education is
expected to meet the needs of a very diverse student population.
According to the EQAO, the OSSLT provides a useful quality assur-
ance measure that shows the extent to which all Ontario students,
regardless of geographic location, curriculum streams, and language
of instruction (English or French) are meeting a common, basic stan-
dard for literacy across the province (EQAO, 2002: 1). Since its trial
run in 2000, there have been five administrations of the test
(February 2002, October 2002, October 2003, October 2004, and
March 2006).
The OSSLT is a cross-curricular literacy test consisting of a sepa-
rate 2.5-hour reading and a 2.5-hour writing component, both of
which must be successfully completed for credit to be obtained. The
OSSLT is administered in high schools throughout Ontario during
Liying Cheng, Don A. Klinger and Ying Zheng 187
two specified administration days in October of each year.2Students
registered in Grade 10 are eligible to write the test, although they
may choose to defer writing the assessment until a subsequent admin-
istration. Students who fail one or both components are also expected
to write the incomplete component(s) in a subsequent administration.
There are two versions of the OSSLT in English and French. The
majority of ESL/ELD students in Ontario take the OSSLT in English,
to which the implications of the study will apply in this paper.
The reading component consists of a total of 12 short passages
within three EQAO defined text types: information3(50%), graphic
(25%), and narrative (25%). The passages include fictional, non-
fictional, and graphical texts (e.g. graphs, schedules) that vary across
administrations. In response to the passages, the students are
expected to demonstrate their understanding and comprehension
using test items having three different formats: multiple-choice (MC)
(40%), constructed response (CR) (35%), and constructed response
with explanation (CRE) (25%). The CR items require a short answer
and the CRE items require a written explanation, i.e. the students are
asked to justify or explain the thinking behind their answers. The test
items are also designed to measure three reading skills (direct under-
standing,indirect understanding, and connections) and four reading
strategies (vocabulary,syntax,organization, and graphical features).
These skills and strategies are defined by EQAO based on the
Ontario learning expectations. The writing component consists of
four writing tasks: a summary, a series of paragraphs expressing an
opinion,a news report, and an information paragraph. The purpose
and audience for each task is provided (e.g. to report on an event for
the readers of a newspaper). Students are also provided with guide-
lines regarding the length and methods to structure each writing sam-
ple. For example, to complete the news report writing task, the
students are provided with a headline and picture (e.g. school
receives computers as a reward) and are expected to make up the
facts and information to address questions of who, what, where, why,
how using ‘the limited space provided’(a page with lines) as a guide-
line for the text length.
188 Challenges for second language students
2Review of the OSSLT have been conducted by the EQAO resulting in upcoming changes. Among
the changes, separate reading and writing scores are no longer to be used and the test is shortened
in length from two days to one day (with two 75-minute sessions).
3Facets of the OSSLT item formats, text types, skills and strategies of reading and the four writing
tasks are italicized in the paper as they are defined by EQAO.
The completed tests are sent to EQAO4and are centrally marked
by trained teachers and markers. The MC and CR items on the
reading component are scored on a 2-point (0, 2) scale and the CRE
items are scored using item specific scoring rubrics on a 3-point scale
(2-points for correct, 1-point for partly correct, or 0 for incorrect).
Each of the four writing tasks is scored using a 4-point (1–4)5
scoring scale having specific performance descriptors. A score of 0
is given for responses classified as ‘blank/illegible and irrelevant
content/off-task. A score of 4 is given the highest quality written
responses.
II The interaction of test method effects and other factors
Not surprisingly, ESL/ELD students find it very difficult to succeed
in large-scale English literacy examinations. In addition, evidence
suggests that these difficulties extend beyond language abilities.
Taylor and Tubianasa (2001) suggest the interpretation of test results
must include the learner as a context and accommodate learners’
potential from a variety of measures. The content, types and context
of reading passages can have a significant impact on students’ read-
ing performance (Anderson et al., 1991; Freedle and Kostin, 1993;
Kobayashi, 2002; Lee, 2002; Peretz and Shoham, 1990; Perkins,
1992). Familiarity with the content of a passage can affect perform-
ance (see also Jennings et al., 1999; Pulido, 2004). For instance,
Peretz and Shoham (1990) pointed out that a group of English as a
Foreign Language (EFL) university students found texts related to
their fields of study more comprehensible than tests related to other
topics. Carrell and Wallace (1983) investigated the individual and
interactive effects of both context and familiarity on the reading
comprehension of both native English and non-native English read-
ers (i.e. second language readers) to see if these two components of
background knowledge – context and familiarity – would interact.
The findings indicated that native (first language) English readers
utilize context as part of a processing strategy to make cognitive pre-
dictions of what a text is going to be about as it is being read, while
nonnative (second language) English readers do not process a text in
Liying Cheng, Don A. Klinger and Ying Zheng 189
4The Education Quality and Accountability Office (EQAO) develops, distributes, collects, and
scores the OSSLT.
5A Level 1–4 system is used in all Ontario student report cards for school academic achievement.
this way. They concluded that the more second language students are
familiar with the context, content and types of passages, the better
they would perform in measures of reading comprehension. Carrell
(1983) also argued that non-native English readers do not make the
necessary connections between the text and appropriate background
knowledge as do their native English counterparts.
In addition to test content, item format has also been shown to dif-
ferentially affect test performance (Bachman and Palmer, 1982;
Shohamy, 1983; 1984; Hancock, 1994; Bennett et al., 1991). Various
test item formats are used to assess reading comprehension
(Anderson et al., 1991). Bachman (1990) emphasized the impor-
tance of research into the effects of personal attributes and test
method facets on test performance. He claimed that ‘test developers
will have better information about which characteristics interact with
which test method facets, and should utilize this information in
designing tests that are less susceptible to such effects, that provides
the greatest opportunity for test takers to exhibit their ‘best’perform-
ance, and which are hence better and fairer measures of the language
abilities of interest’ (1990: 156). As an example, the construct valid-
ity of multiple-choice (MC) items has been criticized as students
may use test-wiseness to rule out or identify correct answers rather
than the process involved in actual reading comprehension
(Anderson et al., 1991; Bachman, 1990). Second language students
have been shown to employ such strategies to succeed in large-scale
testing situations (Cheng and Gao, 2002). It is also not clear that MC
items measure the comprehension of a given passage, or instead, ‘the
reader’s world knowledge and his or her ability to reason and think
about the contents of a passage’ (Royer, 1990: 162).
Although much attention has been given to the MC item format
(e.g. Freedle and Kostin, 1993; Katz et al., 1990; Royer, 1990), com-
paratively less research has focused on other test item formats (e.g.
constructed response) or the relationship between students’ perform-
ance and test item formats. Anderson et al. (1991) investigated the
relationships among student test-taking strategies, content analyses of
test items, and student scores on these items using a triangulation
approach. They found that combining these data sources, greater
meaning was obtained about the interactions among learner strategies,
test content and test performance. Kobayashi (2002) focused on three
other variables, text organization/text types (from loosely to tightly
organized text based on Meyer’s (1985) model of content structure
analysis), item format (close, open-ended questions, and summary),
and learners’ English proficiency. Her findings demonstrated that
190 Challenges for second language students
different item formats, even different types of items within the same
format measured different aspects of reading comprehension. Among
them, summary writing best distinguished learners of different lan-
guage proficiency (see also Riley and Lee, 1996). Her findings also
support the concept of a ‘linguistic threshold’ (2002: 210), that is,
learners below a certain level of proficiency will have difficulty with
understanding beyond sentence-level or literal understanding.
The types of writing tasks can create different challenges for
second language students (e.g. Connor-Linton, 1995a; 1995b;
Hamp-Lyons, 1996; Kobayashi and Rinnert, 1996). In addition, the
difference in second language students’ writing performance may be
caused by the familiarity with certain writing tasks derived from
their first language (Connor-Linton, 1995b). Silva (1997) summa-
rized rhetorical, linguistic, conventional, and strategic issues that
would seriously disadvantage second language writers. He argued
that besides the linguistic aspect, in which second language students
exhibit simpler forms of writing, at the discourse level, these writers
usually demonstrate distinct features of exposition, argumentation,
and narration which, more often than not, do not meet the expecta-
tions of native English markers. Connor-Linton (1995b: 102)
claimed that different raters’ ‘characterization of the writing’ and
their ‘construction of the text’ reflected not only the respective
instructional goals and emphases, but also different societies’ theo-
ries of the uses and values of written English. Given the importance
of examinations such as the OSSLT, it is essential to explore and
understand the factors associated with the differences in literacy
achievement of ESL/ELD and non-ESL/ELD students.
Based on the 2002 results, 63% of the ESL/ELD publicly
schooled students who wrote the OSSLT test in February 2002 failed
at least one component of the test (EQAO, 2002), as compared to the
population failure rate of 25%. Further, 52% of the eligible
ESL/ELD students deferred writing the examination for at least one
year. The relatively poor performance of ESL/ELD students
indicates the challenging nature of this test for these students and
potentially hinders their academic success and graduation. Further,
in an accountability framework, the performance of these students
will have an increasing influence on the conclusions drawn regard-
ing education quality and opportunity to learn (Gee, 2003).
It is within this context that we examined ESL/ELD students’
performance in two test administrations (February 2002 and October
2003) of the OSSLT. The purpose of the study was to determine
if there were significant and systematic differences across item
Liying Cheng, Don A. Klinger and Ying Zheng 191
formats, reading text types, underlying skills and strategies and the
four writing tasks that may help explain why ESL/ELD students
failed the test at such high rates relative to non-ESL/ELD students.
The research was guided by two research questions. First, what are
the challenges experienced by ESL/ELD students on the OSSLT?
Second, are there significant and systematic differences in test per-
formance between ESL/ELD and non-ESL/ELD students in relation
to specific constructs of reading and writing tasks? The comparison
between ESL/ELD students’ test performance and that of non-
ESL/ELD counterparts were made from the following perspectives
on the basis of the previous research reviewed above:
1) item formats: multiple-choice, constructed response, and
constructed response with explanations;
2) reading constructs: text types, skills, and strategies and their
associated facets; and
3) writing tasks: a summary, a series of paragraphs expressing an
opinion, a news report, and an information paragraph.
III Methodology
OSSLT item level achievement data for the 2002 and 2003 administra-
tions taken by grade 10 secondary school students across Ontario were
used in this study to address these issues. The data from the two years
were used to cross-validate the findings. Analyses of the reading com-
ponent focused on item formats, text types, skills and strategies as
defined by EQAO. The analyses of the writing component focused on
the proportion of students obtaining each score level (0–4) on the four
writing tasks. For the 2002 data, a total of 2686 ESL/ELD students
actually wrote the OSSLT (2164 ESL/ELD deferred taking the test in
that year). These students were compared to a stratified random sam-
ple of non-ESL/ELD students (4068 out of 138 392 students). For the
2003 data, only the 3635 first time eligible ESL/ELD students, from
the total sample of 4311 ESL/ELD students, who wrote both compo-
nents of the test were included in the analyses. This reduced sample
was comparable to the 2002 sample in which only first time eligible
students were included without repeating students. This 2003 sample
of ESL/ELD students was compared to a random selection of 5003
first-time eligible non-ESL/ELD students.
Item level data were used to obtain measures of achievement sub-
divided by item formats, text types, skills, or strategies for reading
192 Challenges for second language students
and the four writing tasks. Descriptive statistics and correlations
were first calculated to determine the overall distributions of item
level responses, the interrelationships amongst responses, and the
relative performance differences for ESL/ELD and non-ESL/ELD
students. Second, separate discriminant analyses using step-wise
procedures were conducted with ESL/ELD membership as a depend-
ent variable and the reading constructs (item formats, text types,
skills or strategies and their associated sub-constructs) and writing
tasks as independent variables. Discriminant analysis provides a
method to determine which variables best distinguish group mem-
bership. Discriminant analysis also has an advantage over logistic
regression as it provides better protection against multicollinearity
that could exist due to the nature of the constructs being measured
(Huberty, 1994; Tabachnick and Fidell, 1989).
IV Results
1 Reading
The reading results are provided in Table 1. The first vertical panel
contains the results for 2002, and the second vertical panel contains
the results for 2003. Descriptive statistics provide the overall distri-
butions of item level responses for ESL/ELD and non-ESL/ELD stu-
dents and illustrate the relative performance differences to answer
the first research question of the study. In terms of item format, the
analyses from the 2002 OSSLT data demonstrate both ESL/ELD and
non-ESL/ELD students perform relatively better on the multiple-
choice (MC) test items, and the CRE was the most difficult item for-
mat for both groups. For example, ESL/ELD students achieved an
average score of 58.8% (47/80) on the MC items, 54.3% (38/70) on
the CR items, and 42.0% (21/50) on the CRE items. In contrast, non-
ESL/ELD students averaged 77.5% (62/80), 75.7% (53/70), and
62.0% (31/50) on each set of items. ESL/ELD students scored
approximately 20% lower on each format, although the difference
was slightly larger for the CR items. Significance testing using t-tests
and with alpha set to 0.01 indicated that these differences were sig-
nificant. The three item formats were highly correlated (between
0.80 and 0.88). Using the standardized canonical discriminant func-
tion coefficients, it was found that each of the three formats served
to determine ESL/ELD and non-ESL/ELD group membership. The
discrimination coefficients were similar for each of the three formats
Liying Cheng, Don A. Klinger and Ying Zheng 193
194 Challenges for second language students
Table 1 Percentage scores and discriminant function coefficients for reading item formats, text types, skills and strategies for ESL/ELD and
non-ESL/ELD students
2002 2003
ESL/ELD Non-ESL/ELD Difference Discriminant ESL/ELD Non-ESL/ELD Difference Discriminant
(n 2686) (n 4068) coefficient (n 3635) (n 5003) coefficient
Reading item format:
Multiple choice (MC) (80) 58.8 77.5 18.8 .336 60.0 74.1 14.1 .927
Construct responses (CR) (70) 54.3 75.7 21.4 .420 58.4 72.7 14.3 .062
CR with explanations (CRE) (50) 42.0 62.0 20.0 .297 51.6 65.2 13.6 .028
Reading text types: *Wilks’ Lambda .771 *Wilks’
Lambda .830
Information (90) 51.1 71.1 20.0 .189 53.9 69.0 15.1 .545
Graphic (50) 52.0 72.0 20.0 .158 65.2 75.7 10.5 .489
Narrative (60) 56.7 78.3 21.7 .639 56.1 71.3 15.2 .877
Reading skills: *Wilks’ Lambda .766 *Wilks’ Lambda .813
Understanding directly stated 60.0 76.7 16.7 .579 63.3 74.7 11.4 .303
ideas and information (60)
Understanding indirectly stated 51.1 72.2 21.1 1.191 57.3 72.3 15.0 .963
ideas and information (90)
Making connections (50) 48.0 68.0 20.0 .327 50.4 65.7 15.3 .323
Reading strategies: *Wilks’ Lambda .740 *Wilks’ Lambda .832
Vocabulary (30) 50.0 73.3 23.3 .956 43.2 67.1 23.9 1.131
Syntax (30) 43.3 63.3 20.0 48.7 64.3 15.6 .202
Organization (36) 52.8 69.4 16.7 .113 54.0 67.9 13.9 .136
Graphic features (24) 50.0 66.7 16.7 .176 68.1 76.0 7.9 .359
*Wilks’ Lambda .738 *Wilks’ Lambda .748
Note:The scores have been converted into percentages.
*Standardized discriminant coefficient p 0.001
with the constructed response items (CR) appearing to be the best
single predictor of ESL/ELD membership (0.420).
Similar performance differences between the two groups were
found for the facets of the reading constructs of text types, skills, and
strategies. All of the differences were found to be significant using
t-tests. The information text type, the making connections skill,
and the syntax strategy were the most difficult for both the ESL/ELD
and non-ESL/ELD students. The facets of reading text types, skills
and strategies were also moderately high to highly correlated
(between 0.75 and 0.91). Each of the three text types (information,
graphic, and narrative) served to separate ESL/ELD and non-
ESL/ELD group membership, with narrative being the single best
discriminator (0.639). Each facet of the three reading skills (direct
understanding, indirect understanding, and making connections)
was significantly related to ESL/ELD and non-ESL/ELD member-
ship, with indirect understanding being the single best discriminator
(1.191). Three of the four facets of strategies (vocabulary, organiza-
tion, and graphic features) were associated with ESL/ELD and non-
ESL/ELD group membership. Of the three, the vast majority of the
discriminatory power was due to vocabulary (0.956).
The 2003 results produced some notable differences and similari-
ties. Overall, ESL/ELD performance was higher during the 2003
administration. Of particular interest, ESL/ELD students had much
higher performance on the 2003 graphic text type and the strategy
using graphic features relative to the ESL/ELD student reading per-
formance on these facets in 2002; the differences being 13 and 18 per-
cent respectively. At the same time, differences between ESL/ELD
and non-ESL/ELD students were smaller; however, the differences
remained significant based on the use of t-tests (alpha 0.01) across
item formats, facets of reading text types, skills and strategies.
Summarizing Table 1, the average performance difference between
ESL/ELD and non-ESL/ELD across all facets of reading text types,
skills and strategies was 19.7% in 2002 and 14.3% in 2003.
The relative difficulty of the item formats and the facets of read-
ing text types, skills, and strategies were similar for 2003 as found in
2002. Those facets of reading text types, skills, and strategies that
were the most difficult in the 2002 test administration were also
found to be the most difficult in the 2003 administration. For exam-
ple, the CRE item format, information text type, and the making con-
nections skill were the most difficult for both the ESL/ELD and
non-ESL/ELD students in 2003 as it was in 2002 with the exception
of the strategy facet. Vocabulary turned out to be the most difficult
Liying Cheng, Don A. Klinger and Ying Zheng 195
for ESL/ELD students in 2003 while syntax remained the most
difficult for non-ESL/ELD students in both years.
The correlations between item formats were similar albeit slightly
higher (0.84 to 0.90). In contrast, the correlations amongst the facets
of reading text types, skills, and strategies were slightly lower (0.66
to 0.87). In terms of group separation, the multiple-choice format
served to best separate the two groups during the 2003 test adminis-
tration, with the other two formats providing little further separation.
With respect to the reading constructs, the 2003 data provided differ-
ences in the discriminant coefficients and the larger of Wilks’Lambda
values indicate that the discrimination of the ESL/ELD and non-
ESL/ELD groups was less pronounced in 2003 compared with the
2002 data. In terms of the text types, narrative remained the single
best discriminator. The reading skill of indirect understanding was
found to similarly distinguish between ESL/ELD and non-ESL/ELD
students in both years. Lastly, each of the four facets of reading strate-
gies was found to separate the ESL/ELD and non-ESL/ELD students
and vocabulary was again the single best predictor.
2 Writing
The writing results are provided in Table 2 and Table 3. The 2002
results are provided in the first vertical panels of each table.
Examining the overall 2002 results, the summary was the most diffi-
cult for both the ESL/ELD and non-ESL/ELD students (see Table 2).
Overall, ESL/ELD students performed lower on each of the four
writing tasks, summary, paragraphs expressing an opinion, news
report, and information paragraph, with the largest average differ-
ences occurring for the news report (0.84) and the smallest for the
summary paragraph (0.38). Table 3 contains the proportion of
ESL/ELD and non-ESL/ELD students who obtained each score
point on the four writing tasks. In the 2002 administration, larger
proportions of ESL/ELD students obtained lower scores (0, 1, or 2)
than their non-ESL/ELD counterparts, especially scores of 0 or 2.
Based on significance tests using t-tests and an alpha set to 0.01,
these differences were found to represent significant differences in
test performance. The correlations amongst the writing tasks were
lower than those found for the facets of reading text types, skills and
strategies (0.35 to 0.53) of the 2002 data. Based on the discriminant
analysis, each of the four tasks provided separation between
ESL/ELD and non-ESL/ELD students, with the news report being
the single best discriminator among the four.
196 Challenges for second language students
Liying Cheng, Don A. Klinger and Ying Zheng 197
Table 2 Average writing task scores and discriminant coefficients for ESL/ELD and non-ESL/ELD students
2002 holistic score 2003 holistic score
ESL/ELD Non-ESL/ELD Difference Standardized ESL/ELD Non-ESL/ELD Difference Standardized
(n 2686) (n 4068) discriminant (n 3635) (n 5003) discriminant
coefficient coefficient
Summary 1.94 2.32 0.38 .134 2.38 2.36 0.02 .544
Opinion 2.15 2.87 0.72 .415 2.68 2.97 0.29 .332
News report 2.14 2.98 0.84 .578 2.21 2.73 0.52 .736
Information 2.03 2.77 0.74 .311 2.24 2.53 0.29 .304
paragraph
*Wilks’ Lambda .857 *Wilks’ Lambda .936
*p 0.001
Examining the 2003 writing results, the overall score averages for
the ESL/ELD students were higher on each task than 2002; however,
these students continued to have significantly lower average scores
than the non-ESL/ELD students (see Table 2). The relative difficulty
of the writing tasks was different in 2003 for the ESL/ELD students
with both the news report and the information paragraph being more
difficult than the summary task. The widest observed gap between
ESL/ELD and non ESL/ELD students occurred for the news report
(0.52) and the smallest for the summary paragraph (0.02). The
score distributions for the 2003 administration were also somewhat
198 Challenges for second language students
Table 3 Distribution of ESL/ELD vs. non-ESL/ELD students’ performance on the
writing tasks
Writing task 2002 2003
ESL/ELD Non-ESL/ELD ESL/ELD Non-ESL/ELD
(n 2686) (n 4068) (n 3635) (n 5003)
Summary
0 points* 30.3% 17.9% 8.7% 12.1%
1 point 2.5 2.1 8.7 9.0
2 points 21.9 26.8 26.6 22.2
3 points 34.0 36.7 47.4 44.6
4 points 11.3 16.4 8.6 12.1
Paragraphs
expressing
an opinion
0 point 22.1% 6.7% 4.7% 2.4%
1 point 0.7 0.2 7.5 5.3
2 points 28.5 16.8 16.9 12.6
3 points 38.2 51.7 57.2 51.9
4 points 10.6 24.7 13.7 27.8
News report
0 point 24.7% 6.3% 20.5% 11.6%
1 point 1.9 .2 .2 .0
2 points 20.3 11.4 25.2 13.2
3 points 40.9 52.8 46.2 54.4
4 points 12.1 29.3 7.9 20.7
Information
paragraph
0 point 26.7% 11.2% 7.4% 4.7%
1 point 2.0 .3 11.0 6.8
2 points 22.1 13.1 37.8 32.6
3 points 39.1 51.0 38.3 42.8
4 points 9.9 24.3 5.5 13.1
*0 points are awarded for Blank/Illegible or Irrelevant Content or if the response is
‘Off-task’
different than the 2002 administration (see Table 3). As with the
2002 administration, a greater percent of ESL/ELD students
obtained lower scores than non-ESL/ELD students. However, with
the exception of the non-ESL/ELD students completing the news
report task, a smaller percentage of both ESL/ELD and non-
ESL/ELD students obtained a score of 0 on the 2003 writing tasks.
And a larger percentage of the students in both groups obtained a
score of 1. Again, the news report was an exception to this trend.
The correlations among the four tasks were similar, albeit slightly
lower, than those found in 2002 (0.34 to 0.50). Based on the discrim-
inant analysis and similar to the 2002 results, each of the four tasks
provided separation between ESL/ELD and non-ESL/ELD students,
with the news report being the single best discriminator among the
four. The news report was again the single best discriminator. The
2003 data did provide differences in the discriminant coefficients and
the larger of Wilks’ Lambda values indicate that the discrimination
of the ESL/ELD and non-ESL/ELD groups was less pronounced in
2003 compared with the 2002 data.
V Discussion
The current study examined ESL/ELD students’performance in both
the reading constructs and writing tasks of the OSSLT in comparison
with non-ESL/ELD students in order to better understand why
ESL/ELD students fail at a higher rate. The research also examined
if significant and systematic performance differences exist between
ESL/ELD and non-ESL/ELD students in relation to the item format,
text types, skills and strategies of reading and the four writing tasks.
This study was conducted over two OSSLT administrations in order
to cross-validate the results and reflect a more systematic and accu-
rate situation.
Given that students are required to pass both the reading and writ-
ing components prior to graduation, the high ESL/ELD failure and
deferral rates reported by the EQAO (56% in 2002 for example, see
also EQAO, 2004) illustrate the potentially negative ramifications for
ESL/ELD students writing the OSSLT. Similarly, the findings of the
current study also clearly show that ESL/ELD students have a lower
overall level of performance compared to their non-ESL/ELD coun-
terparts across all reading constructs, regardless of item formats, text
types, skills, strategies, and writing tasks. Such findings are of
increasing concern given that it is estimated that ESL/ELD students
Liying Cheng, Don A. Klinger and Ying Zheng 199
form 20–50% of the general student population in urban K-12 sys-
tems across Canada (Roessingh, 1999). Recent studies conducted in
urban schools indicate that the dropout rates of ESL/ELD students
were as high as 74% (Watt and Roessingh, 1994) and 52% (Norrid-
Lacey and Spencer, 1999). Research has also indicated that high-
stakes, large-scale testing is one of the main reasons contributing to
such a high dropout rates (Catterall, 1989; Madaus and Clarke, 2001,
Mehan, 1997; Reardon and Galindo, 2002; Shepard, 1991).
While the differences in average performance across the reading
constructs and writing tasks provide evidence of lower ESL/ELD
performance (see Tables 1, 2, and 3), they also indicate that the dif-
ferences may be declining as the performance differences were
smaller in 2003 as compared to 2002. The fact that there were more
ESL/ELD students who passed the OSSLT in 2003 (42% in 2003 vs.
37% in 2002) could indicate a greater familiarity of the test content
to the teachers and students, and therefore potentially better prepara-
tion for the test.
Taking a closer look at the group performance differences, there
was a consistent pattern over the two test administrations across item
formats, reading text types, skills, and strategies. The patterns of test
performances of ESL/ELD and non-ESL/ELD students were similar
in each of the item formats. For example, the CRE item format,
information text type, the making connections reading skill proved to
be the most difficult consistently for both groups in 2002 and 2003.
The one exception was that the syntax reading strategy was the most
difficult for ESL/ELD students in 2002 and the vocabulary strategy
was so in 2003 (see Table 1).
Examination of the discriminant analysis results for item formats
were not consistent over the two years. For the 2002 data, the CR
item format best separated the two groups, whereas the MC item for-
mat did so for the 2003 data. Coupled with the similar differences in
the average scores across item formats, these results indicate that
item format does not provide systematic separation between
ESL/ELD and non-ESL/ELD students. Certainly, these findings
require further examination. In contrast, the consistent discriminant
analysis results for the reading text types, skills and strategies over
the two test administrations indicate that there were small but sys-
tematic performance differences that could be used to help distin-
guish between ESL/ELD and non-ESL/ELD membership. The most
difficult constructs mentioned above did not necessarily distinguish
group membership. Rather, the narrative reading text type, the indi-
rect understanding reading skill, and the vocabulary reading strategy
200 Challenges for second language students
best distinguished the groups. These results are theoretically sup-
ported. For example, the narrative genre of reading potentially
requires comprehension of more embedded social, cultural, and con-
ventional meanings. The setting of such narratives tends to be quite
elaborately detailed or implied rather than overtly stated (Emmitt
et al., 2003). Given this, it may take longer for ESL/ELD students to
achieve such levels of reading comprehension compared with other
types of reading (Cummins, 1982). Information and graphic text
types of reading are more dependent on a formatted text structure
such as a museum or a bus schedule, therefore, less dependent on the
interpretation of embedded meanings (see Kobayashi, 2002). In
addition, ESL/ELD students may have less exposure to narrative
types of reading compared with information and graphic types in the
school settings (Carrell and Wallace, 1983).
In terms of reading skills, indirect understanding provided the
best discrimination, arguably the most cognitively demanding of the
three measured reading skills. These results point to a potentially
unique characteristic of reading development for ESL/ELD students
and the development of higher learning skills (see Chan et al., 2002).
However, more evidence needs to be gathered through a systematic
item analysis or test item review with both ESL/ELD and non-
ESL/ELD students to investigate how these students understand the
requirement of the items and how these skills interact with their test
performance.
The pattern for reading strategies demonstrated that vocabulary
best separated the two groups. Again these results are supported by
many research studies conducted in second/foreign language research
(e.g. Qian and Schedl, 2004; Hulstijn and Laufer, 2001; Read, 2000;
2004). Vocabulary is a key learning attribute in the developmental
learning process and indicator of language proficiency for second lan-
guage students where both incidental and intentional vocabulary
learning are important. In addition, increased ESL students’ vocabu-
lary can be obtained most efficiently with increased exposure to an
English medium of instruction over time so that ESL/ELD students
can be on a par with their native English-speaking counterparts.
As to the differences in the four writing tasks, the overall perform-
ance varied over the two administrations. For 2002, ESL/ELD stu-
dents performed least well on the summary task, whereas in 2003,
they performed least well on the news report task. The performance
in the summary task for non-ESL/ELD students remained the same
(most difficult among the four tasks) in both years. Such variation
could have many causes, including variability of scoring or
Liying Cheng, Don A. Klinger and Ying Zheng 201
contextual changes in the tasks. However, such causes seemed to
affect ESL/ELD students alone. Further, ESL/ELD student writing
performance was higher in the 2003 administration of the OSSLT.
These increases may be due to more specific coaching or tutoring of
ESL/ELD students in schools associated with increasing teacher and
student familiarity of the OSSLT. Such changes may also be due to
an evolving scoring process that provides raters with better training
on how to score the writing tasks, resulting in a systematic difference
in the manner in which the writing of ESL/ELD students was scored.
Given the importance of the writing component of the OSSLT, fur-
ther examination of factors such as raters, tasks, or context that may
affect ESL/ELD student writing performance is warranted (e.g.
Connor-Linton, 1995b; Silva, 1997).
Regardless, the news report writing task best separated the two
groups in both administrations. This task requires students to use a
picture prompt and accompanying headline, a common activity for
non-ESL/ELD students. Given the importance of contextual knowl-
edge (Cummins, 1982; 1996), such visual stimuli and context may
be less familiar to some ESL/ELD students who recently came to
Ontario secondary schools, making the task relatively more difficult
for these students. If this is true, it is possible that these students may
still need to adjust to a very different media literacy environment in
order to be successful on such tasks. In this sense, a large-scale test
such as the OSSLT, designed and constructed for the non-ESL/ELD
student population, will inevitably cause certain potential problems
for ESL/ELD students.
VI Conclusion
The combined results from both the 2002 and 2003 OSSLT data
analyses have provided evidence for understanding the challenges
experienced by ESL/ELD students on the OSSLT and for exploring
any significant and systematic differences in test performance
between ESL/ELD and non-ESL/ELD students in relation to specific
constructs of reading and writing tasks. While the relative perform-
ance differences in both years were similar across reading text types,
skills, and strategies, and the four writing tasks, the best predictors
of ESL/ELD membership were generally those supported by previ-
ous second/foreign language research. However, inconsistent results
were found regarding reading item formats over the two years, which
indicates the complexity of test method facets (Bachman, 1990).
202 Challenges for second language students
While the tests are equated across years and the test constructs and
the test formats remained the same, there is not sufficient evidence to
demonstrate the two tests (2002 and 2003) were equivalent. Hence
variability in specific reading constructs or writing tasks may have
been due to these differences. Further examination into the OSSLT
data from future administrations could be used to determine if these
results are found across subsequent administrations. Other analytic
procedures such as Differential Item or Bundle Functioning may also
provide insights regarding systematic performance differences for
specific items or sets of related items (bundles).
Furthermore, the findings inform ESL/ELD educators of the item
formats, reading text types, skills and strategies, and writing tasks that
are found to be most problematic for these students. Based on these
results, it appears that ESL/ELD students require a greater under-
standing of the cultural and contextual aspects of reading and writing
as characterized by narrative texts. Such a focus should also be com-
bined with increased emphasis on vocabulary and understanding of
nuance (indirect understanding). Similarly, ESL/ELD students need
continued and focused instruction on writing. Based on the writing
results, the continued focus on cultural and contextual aspects (using
visuals depicting Canadian activities as in the news report tasks) may
provide the most benefit for these second language students.
We hope to use these results, as a first step, to begin to understand
not only the areas of challenges in reading and writing that ESL/ELD
students are facing in successfully completing the OSSLT, but also
the potential validity issues of such high-stakes testing programs for
these students. Such an understanding may help create targeted sup-
port to these students so that construct irrelevant factors and negative
ramifications of such testing can be minimized. These results can
also inform us about the unique aspects of literacy performance of
the increasing ESL/ELD population in Ontario. Further, these find-
ings also provide direction for further research and instruction
regarding English literacy development for these second language
students within the context of having to complete large-scale English
literacy tests designed and constructed for first language students.
The results suggest the Ontario Secondary School Literacy Test is
not only an English literacy test (an academic content test), but also
a language proficiency test for ESL/ELD students. Such a testing
practice deserves serious ethical consideration (see Bailey and
Butler, 2004; Shepard, 1991; Shohamy, 1997).
As evidenced by current educational policies, using large-scale,
high-stakes examinations as high school graduation requirements is an
Liying Cheng, Don A. Klinger and Ying Zheng 203
increasingly common practice in many parts of the world including
European and Asian countries. However, completing a high-stakes lit-
eracy examination in a second language, as in the case of the OSSLT,
provides a unique challenge to students whose English language pro-
ficiency is not on par with their native English-speaking counterparts.
This challenge is more prominent in English-speaking countries such
as USA, Canada, UK and Australia where there is a large second lan-
guage student population in schools. In this sense, this study con-
tributes to the limited, yet increasing literature on how well immigrant
children do in schools in the context of system accountability (see also
Christian et al., 2005). The study also helps to highlight the ongoing
issues associated with these accountability frameworks. These juris-
dictions are often faced with the dilemma of ensuring literacy stan-
dards while also meeting accountability frameworks.
In attempts to be fair to all students, most high-stakes testing pro-
grams provide test accommodations to second language students, for
example, extra time or instructions in students’ first languages.
However, such accommodations may not be sufficient, resulting in
new policies designed to address the negative consequences of these
testing programs. The Ontario Secondary School Literacy Course
was implemented to provide a graduation mechanism for students
unable to pass the Ontario Secondary School Literacy Test. The
timeline for the ‘No Child Left Behind’ act has been extended to
2014. Further, minimum sizes for sub-groups have been imple-
mented. Hence schools with fewer than 30 students in a sub-group
would not be required to track their progress. In the case of Arizona,
this policy resulted in 680 schools from being designated as ‘Failing’
under the No Child Left Behind Act (Kossan, 2004). While pursuing
methods to ensure graduation or avoid sanctions, are the required
English literacy skills of these immigrant children being obtained or
are policy makers resigned to the belief that the school system can
not adequately support literacy achievement for these students? The
current study has provided evidence, by understanding the system-
atic performance differences of these students, that it may be possi-
ble to implement educational practices that will truly support the
literacy development and achievement of ESL/ELD students.
Acknowledgements
The authors acknowledge the support from the Social Sciences and
Humanities Research Council (SSHRC) of Canada, and from
Education Quality and Accountability Office (EQAO) for releasing
204 Challenges for second language students
the February 2002 and October 2003 OSSLT data for this study. We
would particularly like to thank the three anonymous reviewers
whose detailed and constructive comments helped us to make the
paper stronger and more succinct.
VII References
Abedi, J. 2004: The No Child Left Behind Act and English language learners:
Assessment and accountability issues. Educational Researcher 33(1): 4–14.
Abedi, J., Leon, S. and Mirocha, J. 2003: Impact of students’language back-
ground on content-based assessment: Analyses of extant data (CSE.
Tech. Rep. No. 603). Los Angeles: University of California, National
Centre for Research on Evaluation, Standards, and Students Testing.
Anderson, N.J., Bachman, L.F., Perkins, K. and Cohen, A. 1991: An
exploratory study into the construct validity of a reading comprehension
test: Triangulation of the data sources. Language Testing 8: 41–66.
Bachman, L.F. 1990: Fundamental considerations in language testing.
Oxford: Oxford University Press.
Bachman, L.F. and Palmer, A.S. 1982: The construct validation of some com-
ponents of communicative proficiency. TESOL Quarterly 16: 449–65.
Bailey, A.L. and Butler, F.A. 2004: Ethical considerations in the assessment of
the language and content knowledge of U.S. school-age English learners.
Language Assessment Quarterly 1(2–3): 177–93.
Bennett, R.E., Rock, D. and Wang, M. 1991: Equivalence of free-response
and multiple choice items. Journal of Educational Measurement 28:
77–92.
Blackett, K. 2002: Ontario schools losing English as a second language pro-
grams – despite increase in immigration. Retrieved on 28 July 2004 from
http://www.peopleforeducation.com/releases/2003/oct24_02.html.
Carrell, P.L. 1983: Three components of background knowledge in reading
comprehension. Language Learning 33: 183–203.
Carrell, P.L. and Wallace, B. 1983: Background knowledge: Context and
familiarity in reading comprehension. Paper presented at the annual con-
vention of teachers of English to speakers of other languages, Honolulu,
HI. (ERIC document reproduction service No. ED228901).
Catterall, J.S. 1989: Standards and school dropouts: A national study of tests
required for high school graduation. American Journal of Education
98(1): 1–35.
Chan, C.C., Tsui, M.S., Chan, M.Y. and Hong, J.H. 2002: Applying the
structure of the observed learning outcomes (SOLO) taxonomy on stu-
dent’s learning outcomes: an empirical study. Assessment and Education
in Higher Education 27: 511–27.
Cheng, L. and Gao, L. 2002: Passage dependence in standardized reading
comprehension: Exploring the College English Test. Asian Journal of
English Language Teaching 12: 161–78.
Liying Cheng, Don A. Klinger and Ying Zheng 205
Christian, D., Saunders, B., Genesee, F., Lindholm-Leary, K. and
Goldenberg, C. 2005: Educating English language learners: A synthe-
sis of research evidence. Symposium presented at the American
Educational Research Association, Montreal, Quebec, Canada.
Connor-Linton, J. 1995a: Looking behind the curtain: What do L2 composi-
tion ratings really mean? TESOL Quarterly 29: 762–65.
—— 1995b: Crosscultural comparison of writing standards: American ESL
and Japanese EFL. World Englishes 14(1): 99–115.
Cumming, A., Hart, D., Corson, D., Labrie, N. and Cummins, J. 1993:
Provisions and demands for ESL, ESD, and ALF education in Ontario
schools. Toronto: Ontario Institute for Studies in Education. Report sub-
mitted to the Ontario Ministry of Education and Training.
Cummins, J. 1982: Tests, achievement, and bilingual students. Arlington, VA:
National Clearinghouse for Bilingual Education.
—— 1996: Negotiating identities: Education for empowerment in a diverse
society. Los Angeles, CA: California Association for Bilingual
Education.
Education Quality and Accountability Office 2002: Ontario Secondary
School Literacy Test, February 2002: Report of Provincial Results.
Retrieved 6 February 2003 from http://www.eqao.com/pdf_e/02/
02P026e.pdf
—— 2004: Ontario Secondary School Literacy Test, October 2003: Report of
Provincial Results. Retrieved 20 January 2005 from http://www.eqao.
com/pdf_e/04/04P002e.pdf
Emmitt, M., Pollock, J. and Komesaroff, L. 2003: Language and learning,
third edition. Oxford: Oxford University Press.
Freedle, R. and Kostin, I. 1993: The prediction of TOEFL reading item
difficulty: Implications for construct validity. Language Testing 10:
133–70.
Gee, J.P. 2003: Opportunity to learn: A language-based perspective on assess-
ment. Assessment in Education 10: 27–46.
Hamp-Lyons, L. 1996: The challenges of second language writing assessment.
In White, E., Lutz, W., and Kamusikiri, S., editors, Assessment of writing:
Policies, politics, practice. New York: Modern Language Association,
226–40.
Hancock, G.R. 1994: Cognitive complexity and the comparability of multiple-
choice and constructed-response test formats. Journal of Experimental
Education 62: 143–57.
Huberty, C.J. 1994: Applied discriminant analysis. New York: Wiley.
Hulstijn, J.H. and Laufer, B. 2001: Some empirical evidence for the involve-
ment load hypothesis in vocabulary acquisition. Language Learning 51:
539–58.
Jennings, M., Fox, J., Graves, B. and Shohamy, E. 1999: The test-takers’
choice: An investigation of the effect of topic on language test perform-
ance. Language Testing 16(4): 426–56.
Katz, S., Lautenschlager, G., Blackburn, A. and Harris, F. 1990: Answering
reading comprehension items without passages on the SAT.
Psychological Science 1: 122–27.
206 Challenges for second language students
Kobayashi, H. and Rinnert, C. 1996: Factors affecting composition evalua-
tion in an EFL context: Cultural rhetorical pattern and readers’ back-
ground. Language Learning 46: 397–437.
Kobayashi, M. 2002: Method effects on reading comprehension test perform-
ance: Text organization and response format. Language Testing 19:
193–220.
Kossan, P. 2004: Ariz easing fed’s rules for school standards. Arizona
Republic, 4 April, p. B1.
Lee, G. 2002: The influence of several factors on reliability for complex read-
ing comprehension tests. Journal of Educational Measurement 39:
149–64.
Madaus, G. and Clarke, M. 2001: The impact of high-stakes testing on minor-
ity students. In Kornhaber, M. and Orfield G., editors, Raising standards
or raising barriers: Inequality and high stakes testing in public educa-
tion, New York: Century Foundation, 85–106.
Mazzeo, C. 2001: Frameworks of state: Assessment policy in historical per-
spective. Teachers College Record 103: 367–97.
Mehan, H. 1997: Contextual factors surrounding Hispanic dropouts.
Retrieved on 16 January 2004 from http://www.ncela.gwu.edu/pubs/
hdp/1/
Meyer, B.J.F. 1985: Prose analysis: Purpose, procedures, and problems: Parts
I and II. In Britton, B. and Black, J.B., editors, Understanding expository
text. Hillsdale, NJ: Lawrence Erlbaum, 269–304.
Mueller, T.G., Singer, G.H.S. and Grace E.J. 2004: The individuals with dis-
abilities education act and California’s Proposition 227: Implications for
English language learners with special needs. Bilingual Research
Journal 28(2): 231–51.
No Child Left Behind Act 2002: Pub. L. No. 107–10.
Norrid-Lacey, B. and Spencer, D.A. 1999: Dreams I wanted to be reality:
Experience of Hispanic immigrant students at an urban high school.
Paper presented at the Annual Meeting of the American Educational
Research Association, Montreal, Canada.
Ontario Ministry of Education and Training. 1999: The Ontario Curriculum
Grade 9 to 12 English as a Second Language and English Literacy
Development. Toronto: Queen’s Printer for Ontario.
Peretz, A.S. and Shoham, M. 1990: Testing reading comprehension in LSP:
Does topic familiarity affect assessed difficulty and actual performance?
Reading in a Foreign Language 7: 447–55.
Perkins, K. 1992: The effect of passage and topical structure types on ESL
reading comprehension difficulty. Language Testing 9: 163–72.
Proposition 203, English for the Children, Arizona voter initiative. 2000.
Proposition 227, English for the Children, California voter initiative. 1998.
Pulido, D. 2004: The relationship between text comprehension and second
language incidental vocabulary acquisition: a matter of topic familiarity?
Language Learning 54(3): 469–523.
Qian, D.D. and Schedl, M. 2004: Evaluation of an in-depth vocabulary knowl-
edge measure for assessing reading comprehension. Language Testing
21(1): 28–53.
Liying Cheng, Don A. Klinger and Ying Zheng 207
Question 2, English Language Education in Public Schools, Massachusetts
voter initiative 2002.
Read, J. 2000: Assessing vocabulary. Cambridge: Cambridge University
Press.
—— 2004: Research in teaching vocabulary. Annual Review of Applied
Linguistics 24: 146–61.
Reardon, S.F. and Galindo, C. 2002: Do high-stakes tests affect student’s
decisions to drop out of school? Evidence from NELS. Retrieved on
28 December 2003 from http://www.pop.psu.edu/general/pubs/
working_papers/psu-pri/wp0301.pdf
Riley, S. and Lee, J.F. 1996:A comparison of recall and summary protocols as
measures of second language reading comprehension. Language Testing
13(2): 173–90.
Roessingh, H. 1999: Adjunct support for high school ESL learners in main-
stream English classes: Ensuring success. TESL Canada Journal 17(1):
72–85.
Royer, J. 1990: The sentence verification technique: A new direction in the
assessment of reading comprehension. In Legg, S. and Algina, J., editors,
Cognitive assessment of language and math outcomes. Norwood, NJ:
Ablex, 145–91.
Shepard, L. 1991: When does assessment and diagnosis turn into sorting and
segregation? In Hiebert, E., editor, Literacy for a diverse society:
Perspectives, practices, and policies. New York: Teacher College Press,
279–98.
Shohamy, E. 1983: The stability of oral proficiency assessment on the oral
interview testing procedure. Language Learning 33: 527–40.
—— 1984: Does the testing method make a difference? The case of reading
comprehension. Language Testing 1: 147–70.
—— 1997: Testing methods, testing consequences: Are they ethical? Are they
fair? Language Testing 14(3): 340–49.
Silva, T. 1997: On the ethical treatment of ESL writers. TESOL Quarterly
21(2): 359–63.
Tabachnick, B.G. and Fidell, L.S. 1989: Using multivariate statistics. New
York: Harper Collins.
Taylor, A.R. and Tubianasa, T. 2001: Student Assessment in Canada:
Improving the learning environment through effective evaluation. Society
for the Advancement of Excellence in Education: Kelowna, BC.
Watt, D. and Roessingh, H. 1994: ESL drop-out: The myth of educational
equity. The Alberta Journal of Educational Research 40(3): 283–96.
—— 2001: The dynamics of ESL drop-out: Plus ¸ca change. The Canadian
Modern Language Review 58: 203–22.
Wright, W.E. 2005: English language learners left behind in Arizona: The
nullification of accommodations in the intersection of federal and state
policies. Bilingual Research Journal 29(1): 1–29.
208 Challenges for second language students
... Alongside the increase in ELL student population are amplified educational expectations and assessment (Cheng, Klinger, & Zheng, 2007). In North America, large-scale high-stakes testing is used to measure and ensure student competency and provide system accountability (Firestone, Mayrowetz, & Fairman, 1998;Ryan, 2002). ...
... Large-scale high-stakes testing such as the OSSLT has brought more and new challenges to students (Luce-Kapler & Klinger, 2005), particularly those taking these tests in a language other than their first language (Cheng & Stroud, 2002;Sider, 2003). Cheng et al (2007) examined two years' OSSLT test performance of two groups of students: (a) those whose first language was English or who had been in the Canadian school system since they were born, and (b) ELLs who recently came to the Canadian school system and used English as a second language. This study identified specific areas of challenges in literacy development faced by the English Language Learners in taking the OSSLT. ...
... As evidenced by low success and high deferral rates of ESL students on the OSSLT (see Education Quality and Accountability Office, 2003Office, , 2007aOffice, , 2007bOffice, , 2007cOffice, , and 2009, many of these students are struggling as they attempt to obtain fundamental English literacy skills and succeed in their schooling. A number of studies on high school ELL students in Canada (e.g., Cheng et al., 2007;Derwing et al., 1999;Watt & Roessingh, 1994 have reported quantifiable outcomes that illustrate the problematic situations of this group of students and have identified the factors that facilitate and/or impede their academic success. Nevertheless, fewer studies have documented these students' academic experiences in the context of large-scale provincial testing. ...
Article
The study tracked 8 high school English Language Learners (ELLs) from 2 Ontario secondary schools located in small Ontario cities for 3 years and closely documented their linguistic, cognitive, and socio-cultural learning experiences. Each student was visited 2-4 times per year and face-to-face interviews were used on each visit. It was found that these students‘ difficulties in academic English were the major challenge to their success in their high school subject-area courses and in the Ontario Secondary School Literacy Test (OSSLT). These challenges, in turn, affected their educational goals. The findings of this study provide a rich understanding of these students‘ educational achievement as they concurrently learned English and their subject-area courses within the context of the OSSLT—a large-scale high-stakes literacy test. Cette étude recense trois années d‘expériences d‘apprentissage linguistiques, cognitives et socio culturelles de huit allophones d‘anglais langue seconde (ALS) issus de deux écoles secondaires situées dans des petites villes de l‘Ontario. Chaque étudiant a été interviewé en face à face deux à quatre fois par an. Les résultats montrent que la difficulté majeure de ces étudiants est l‘apprentissage de l‘anglais académique et que cet apprentissage détermine leur succès dans tous les autres cours au niveau du lycée et au Test Provincial des Compétences Linguistiques (TPCL) de l‘Ontario. Ce défi affecte également leurs objectifs éducatifs. Les résultats de cette étude montrent non seulement les performances éducatives de ces étudiants en ALS mais aussi dans leurs autres cours dans le cadre du TPCL, un test à enjeux élevés.
... To date, a number of studies have investigated washback effects that large-scale proficiency tests have on language learning and teaching (Alderson and Hamp-Lyons, 1996;Allen, 2016;Cheng, 2004;Cheng, Klinger & Zheng, 2007;Green, 2007;Qi, 2004;Read & Hayes, 2003;Watanabe, 1996). For instance, researching courses offered in the private extracurricular institutions called yobiko (preparatory school), a type of school that prepared secondary school students for the English section of university entrance examinations, Watanabe (1996) examined the effect of the university entrance examination on the use of the grammar-translation method in Japan and found that the entrance examination failed to play a significant role in the choice of teaching methodology. ...
... Next, in comparing Test of English as a Foreign Language (TOEFL) preparation course and non-TOEFL preparation course taught by the same instructor, Alderson and Hamp-Lyons (1996) discovered that the TOEFL affected language teachers on the content of instruction and teaching strategies. In addition, Cheng, Klinger, and Zheng (2007) also conducted a washback study to investigate the impact of Ontario Secondary School Literacy Test (OSSLT), a large-scale literacy test, on second language (L2) students in Canada. Test performances of English as a Second Language (ESL) and English Literacy Development (ELD) students were analyzed to identify the factors why ESL/ELD students had lower scores than their non-ESL/ELD counterparts. ...
... Particularly noteworthy is that less washback was found on the interaction with the teachers and peers. It was believed that tests would affect test-takers' interactions with teachers and peers (Cheng, 2004;Cheng, Klinger & Zheng, 2007). However, in the current study, such aspects of washback were not salient. ...
Article
Full-text available
Washback refers to the influence of tests on learning and teaching. To date, a number of studies have revealed that tests affect teaching content, course design, and classroom practices. However, in Asian higher education contexts, little research has examined the washback of proficiency tests on English learning in comparison with the efforts on teaching. Thus, the current study bridged this research gap by exploring the washback effects of a proficiency test on student learning in a campus-wide English curriculum, uncovering relationships between washback and learner characteristics such as major, gender, and proficiency level. A total of 694 students from engineering-, business-, and foreign language-related disciplines at a national university in Taiwan were surveyed. The results revealed washback effects on such aspects as personal image, learning motivation, emotion, and future job opportunities were especially salient. In addition, the relationship between washback and proficiency level was found to be statistically significant. However, male and female students did not differ statistically in washback nor was there a statistically significant difference in washback among different majors. With detailed information and consideration of different aspects of washback, stakeholders, including instructors, school administrators, and language policymakers, can make informed decisions when formulating language-related policies.
... Moreover, testing is more influential and exerts stronger effects on teaching in schools eelt0814.indd 4 2/10/2017 11:06:48 AM serving more disadvantaged students, such as ELLs in North America (Cheng, Klinger, & Zheng, 2007). Madaus andRussell (2010/2011) observed that when test score gains are tied to awards, teachers feel motivated rather than pressured to increase scores. ...
... The most vulnerable student population directly affected by such educational testing is ELLs. These learners are from diverse multilingual backgrounds and use English as an international language in their schooling (Cheng, Klinger, & Zheng, 2007). Research has indicated that ELLs face multiple challenges in North American educational institutions while they learn English and their school subjects at the same time. ...
... The test questions or prompts that L2 students found difficult were particularly related to culturally specific knowledge (Fox & Cheng, 2007). For example, the narrative reading test type and the news report writing task of the OSSLT significantly distinguished the performance of L2 and L1 test-takers of English (Cheng, Klinger, & Zheng, 2007). Abedi, Leon, and Mirocha (2003) reported similar findings with other tests developed and normed for an L1 student population, pointing out implications on the inferences and/or the accuracies drawn from these test scores about the students' learning achievement. ...
Chapter
English as an international language (EIL) refers to the use of English as a means by which people from different parts of the world communicate with each other. Geopolitics concerns political power within and across geographic space. Within this broadly defined EIL context, this entry frames the political power of assessment and addresses this power from two dimensions. The first dimension is the geopolitics of international language testing, which deals with the testing of proficiency in English by speakers of other languages whose purpose of taking the test is to pursue academic study in English‐speaking countries, such as the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS). The second dimension focuses on the testing of proficiency in English by speakers of other languages whose purpose is for schooling in English‐speaking countries, such as the Ontario Secondary School Literacy Test (OSSLT) in Canada and the No Child Left Behind (NCLB) policy in the United States.
... Limited existing literature offers varying perspectives on how teachers conceptualize support for MLs in Ontario. MLs' low performance on the Ontario Secondary School Literacy Test (OSSLT) indicates a need for specialized instruction to raise ML achievement (Cheng et al., 2007). Improvement strategies to support MLs in elementary schools include differentiating specific programs for MLs; identifying effective methods to design these programs; defining timelines for the successful acquisition of academic English; presenting teachers with clear criteria and classifications for MLs; and reconfiguring educators' misconceptions around MLs (Hubbard, 2018). ...
Article
Full-text available
Supporting Ontario’s diverse multilingual learners (MLs) requires more than “just good teaching” (de Jong & Harper, 2005, p. 102). MLs’ success is tied to specific teacher knowledge, attitudes, and pedagogical moves based on linguistically responsive teaching (Lucas & Villegas, 2013). This study investigated the perspectives of teachers, curriculum leaders, and consultants regarding how MLs can best be supported, their challenges and successes in working with MLs, and what needs to change in teacher education to achieve the goal of supporting MLs across their curricula. Semi-structured interviews were conducted with 11 teachers currently working with MLs in Ontario, organized around their personal and professional backgrounds and experiences, issues faced in supporting MLs, perspectives on how Ontario’s policies impact their work, and opinions about how to enable future teachers to develop necessary skills to support MLs. Findings from an inductive thematic analysis of the interviews suggest the need for teachers to connect with MLs through shared language learning experiences, use assetbased, linguistically responsive and translanguaging approaches, and involve parents and communities. The findings also highlight issues around policy accessibility, the lack of specialized training, and inadequate resources. Finally, the study makes recommendations for preparing future teachers with practical strategies to support MLs in K–12 classrooms. Le soutien des divers apprenants multilingues de l’Ontario exige plus que « just good teaching » (de Jong et Harper, 2005, p. 102). La réussite des multilingues est liée à la connaissance spécifique des enseignants, aux attitudes et aux mouvements pédagogiques fondés sur un enseignement répondant aux besoins linguistiques (Lucas et Villegas, 2013). Cette étude examine les perspectives des enseignants, des responsables de programmes et des consultants sur la façon dont on peut optimiser le soutien aux multilingues, les défis et les réussites rencontrés en travaillant avec les multilingues et ce qui a besoin de changer dans la formation des enseignants pour qu’ils parviennent à apporter du soutien aux multilingues dans tous leurs programmes. On a effectué des entrevues semi-structurées auprès de 11 enseignants travaillant avec des multilingues en Ontario. Ces entrevues étaient organisées autour de leurs antécédents et expériences personnels et professionnels, des problèmes auxquels ils avaient fait face dans le soutien des multilingues, de leurs perspectives sur la manière dont les politiques ontariennes influencent leur travail et de leur opinion sur la façon de permettre aux futurs enseignants d’acquérir les habiletés nécessaires pour soutenir les multilingues. Les résultats tirés d’une analyse thématique inductive des entrevues suggèrent que les enseignants doivent établir des liens avec les multilingues en partageant leurs expériences d’apprentissage linquistique, en utilisant des approches fondées sur les atouts, adaptées sur le plan linguistique et du translanguaging, et en faisant participer les parents et les communautés. Les résultats soulignent également les problèmes concernant les politiques d’accessibilité, le manque de formation et les ressources inadéquates. Pour finir, l’étude propose des recommandations pour préparer les futurs enseignants à l’aide de stratégies pratiques afin de soutenir les multilingues dans les salles de classe de la maternelle à la 12e année.
... The employment of high-stakes and large-scale testing, especially for highschool graduation, is becoming more common in many countries around the world, especially in Europe and Asia (Cheng, Klinger & Zheng, 2007). Studies show that the stakes of the examination, whether it is high or low, determine the magnitude of its washback, where low-stakes examinations tend to have less washback and high-stakes ones trigger more washback (Alderson & Hamp-Lyons, 1996;Hamp-Lyons, 1998;Green, 2006;Cheng, 2008). ...
Thesis
Full-text available
The present study investigated the washback of the National Baccalaureate Examination of English on English language teaching practices in the 12th Grade in Syrian secondary schools. Testing in the Syrian educational system has been growing in the past six years with the average number of tests that schools and colleges set every year increasing three-folds. This test inflation paved the way for the birth of a ‘testocracy’ that brought about new challenges for stakeholders and test developers. Of all the tests that Syrian students take, the National Baccalaureate Examination is the most critical. Within the broad lines of language testing, this study aimed to investigate the multifaceted nature of test impact via close examination of the teaching/learning environment. The study was exploratory and used natural data that were collected from the schools that were selected for the purpose. This study emphasised the context-specificity of washback by readdressing classical questions in a never-explored-context. The study adopted a mixed method for data collection and analysis. The data were collected directly from the field, where six schools were visited and six teachers were interviewed. The study adopted a triangulation of tools in data collection and analysis. This approach helped in validating the data collected through the three tools as well as in crosschecking the results and the findings. Classroom observations and teachers’ interviews were used to investigate the classroom practices and dynamics associated with the washback of the National Baccalaureate Examination in Syria. The study also used students’ questionnaires to investigate the major components, namely the participants, the process and the product (of teaching and learning). The analysis of the data collected through the three tools showed that the National Baccalaureate Examination of English has been shaping the classroom practices and dynamics. The teaching has largely been examination-oriented, mechanical and superficial. The examination has been fostering the teacher’s authoritarian attitude in the classroom and the tendency to leave little space for interaction and spontaneous use of language. The examination has also been affecting the use of teaching materials by narrowing the curricula and recycling the textbooks in an effort to make them more examination-friendly. The classroom observations also prove that the communicative approach to the teaching of English has not been adopted. However, the teachers’ choice of the teaching methodology was not exclusively determined by the examination washback. These results are consistent with what other researchers found in other washback contexts. The teachers’ interviews showed that the National Baccalaureate Examination of English gives rise to the teachers’ tendency to plan their lessons according to the students’ goals, namely passing the examination. The main goal of teaching for all the teachers interviewed was to use such teaching strategies which help the students to score well in the examination. The teachers are driven by the importance of the examination in determining the students’ future education as well as their public image. Furthermore, the analysis of the students’ questionnaire revealed the learners’ dependence on private tutoring for excelling in English and scoring high marks in the examination. However, the interviews with teachers showed that private tutoring in the Syrian context is a social trend rather than an exclusively washback-related-phenomenon. The study concludes that findings on washback of the National Baccalaureate Examination in the Syrian context are consistent with the results of studies on washback of high-stakes examinations and grand tests around the world. The study also supports the claim that examinations may not be the sole determiner of teaching methodologies adopted in the class, yet they play a role in changing certain practices and strategies in teaching and learning. In addition to that, the study proposes that washback is context-specific, thus suggesting that the learners and the direct social milieu associated with them need to be considered for arriving at a better understanding of the washback mechanism. The study also suggests using examinations as tools for pedagogical reforms. However, this examination-based-reform cannot be achieved successfully until and unless a careful study of examination washback is undertaken, for which the present study could prove to be of some use.
... This finding is in agreement with studies of Kim (2011) and Xie (2016), but compared with them, the students in the present study appeared to have a worse dominance on different attributes of L2 writing, with a higher mastery rates for GRM, MCH, and ORG (52%, 56%, and 61%, respectively) and a lower mastery rates for VOC and CON (43% and 32%, respectively). This finding substantiates English as a second language (ESL)/EFL writing research indicating that content and vocabulary are the most important attributes in the process of producing a high-level essay (Cheng, Klinger, & Zheng, 2007;Milanovic, Saville, & Shuhong, 1996;Schoonen, Snellings, Stevenson, & van Gelderen, 2009). According to Wilson et al. (2016), mechanics and grammar are low-level writing skills and vocabulary, organization, and content are higher-level writing skills. ...
Article
Full-text available
Cognitive diagnostic models (CDMs) have recently received a surge of interest in the field of second language assessment due to their promise for providing fine-grained information about strengths and weaknesses of test takers. For the same reason, the present study used the additive CDM (ACDM) as a compensatory and additive model to diagnose Iranian English as a foreign language (EFL) university students' L2 writing ability. To this end, the performance of 500 university students on a writing task was marked by four EFL teachers using the Empirically derived Descriptor-based Diagnostic (EDD) checklist. Teachers, as content experts, also specified the relationships among the checklist items and five writing sub-skills. The initial Q-matrix was empirically refined and validated by the GDINA package. Then, the resultant ratings were analyzed by the ACDM in the CDM package. The estimation of the skill profiles of the test takers showed that vocabulary use and content fulfillment are the most difficult attributes for the students. Finally, the study found that the skills diagnosis approach can provide informative and valid information about the learning status of students.
Article
Full-text available
This study primarily focused the EFL undergraduate students at a public university in Pakistan. In this study, task-based language assessment (TBLA) was used to assess reading skills because it is one of the most assessed language skills in Pakistan. Reading is an academic skill. Supposedly, students should have good reading skills at higher education. Thus, there are three courses taught specifically focusing on English language at higher education and reading is a prime focus in all three of them. However, the reading test items are criticized for testing rather the writing skills instead of reading. Thus, a Task-based Language Assessment (TBLA) was used to check its backwash effect. TBLA has not been adopted in the local context yet, specifically in a reading context. Therefore, the present study was conducted to examine the backwash effect of TBLA on reading skills. In this study, action research design was followed within the qualitative research paradigm. Sample was drawn using convenient sampling. The participants were 12 undergraduate students. Students' reading journals, exit slips and teacher's diary were used as data collection instruments. The findings revealed that TBLA has a positive backwash effect on both teaching and learning in the EFL context. Further pedagogical implications are proposed in the study.
Article
Full-text available
High-impact practices (HIPs) have been adopted by many universities around the world to enhance student learning. The aim of this qualitative study is to analyze how building ePortfolios may impact the student learning experience via weekly reflections. A total of 47 senior undergraduate English as a foreign language (EFL) student participants’ shared reflections about their research experiences via the development of ePortfolios. Data analysis was carried out through deductive coding of the transcribed focus groups and weekly reflections. Findings suggest that the participants used their ePortfolios to improve their writing skills and increase their knowledge. Students were able to track their progress and felt more engaged in learning. They also showed appreciation for the feedback provided by their instructors on their reflections. However, the students shared some challenges they faced while developing their ePortfolios. Despite these challenges, there was a positive impact of ePortfolios on students’ learning experiences. Implications and further directions for research are discussed.
Article
Full-text available
In Ontario, a mandatory high-stakes standardized literacy test called the OSSLT is administered in the tenth grade. With notable failure rates and acknowledged test anxiety, students are in need of better test preparatory methods. In this article, we examine some of the challenges and analyze ways that online multimedia learning tools can be designed in order to support diverse student needs. The online tool developed in this study features learning principles and practices based on research in literacy testing and online education. Our discussion contributes new ideas and future directions for task designers and online test preparatory tools.
Book
Full-text available
This book examines immigrant student achievement and education policy across a range of Western nations. It is divided into 3 sections: Part 1 introduces the topic of immigrant student achievement and the performance disadvantage that is consistently reported across a range of international jurisdictions. Part 2 then presents national profiles from scholars in ten countries (England, Germany, Italy, Sweden, Finland, Netherlands, Republic of Ireland, Canada, Australia, and New Zealand). These educational jurisdictions were selected because they represent a range of Western nations engaged in large-scale reform efforts geared towards enhancing their immigrant students’ achievement. Each of the national profiles provides a brief overview of the evolution of the cultural composition of their respective school-aged student population; explains the trajectory of achievement results in non-immigrant and immigrant student groups in relation to both national and international large-scale assessment measures; and discusses the effectiveness of policy responses that have been adopted to close the achievement gap between non-immigrant and immigrant student populations. It also examines the relationships between education policies and immigrant student achievement and discusses how education policies have evolved across various cultural contexts. In conclusion, Part 3 analyzes cross-cultural approaches designed to address the performance disadvantage of immigrant students and proposes future areas of inquiry stemming from the national profiles. The book offers insights into a diverse cross-section of nations and policy approaches to addressing the performance disadvantage.
Article
Full-text available
Performance of college undergraduates on the Reading Comprehension task of the SAT was well above chance when the passages were deleted. Moreover, examinees and test items performed similarly with or without the passages: individual performance correlated with verbal SAT score, and the difficulty of items belonging to a passage correlated with a normative measure based on equated delta. The findings demonstrate that the Reading Comprehension task substantially measures factors unrelated to reading comprehension. © 1990, Institution of Mechanical Engineers. All rights reserved.
Article
Full-text available
The demographic profile of Alberta is changing rapidly. As the new millennium approaches, new patterns of immigration to Canada, and hence Alberta, are emerging and are forcing educators to rethink the goals of ESL instruction. The current wave of immigrants has expectations for academic success. Educational attention must be focused on programmatic responses that will ensure that ESL learners develop the level of English language proficiency necessary for academic success in high school and postsecondary study. Adjunct ESL instruction that complements the demands ofhigh school English literature courses can result in the development of English language proficiency, and in turn academic success for ESL learners.
Article
Full-text available
The notion of communicative competence has received wide attention in the past few years, and numerous attempts have been made to define it. Canale and Swain (1980) have reviewed these attempts and have developed a framework which defines several hypothesized components of communicative competence and makes the implicit claim that tests of components of communicative competence measure different abilities. In this study we examine the construct validity of some tests of components communicative competence and of a hypothesized model. Three distinct traits—linguistic competence, pragmatic competence and sociolinguistic competence—were posited as components of communicative competence.A multitrait-multimethod design was used, in which each of the three hypothesized traits was tested using four methods: an oral interview, a writing sample, a multiple-choice test and a self-rating. The subjects were 116 adult non-native speakers of English from various language and language-learning backgrounds. Confirmatory factor analysis was used to examine the plausibility of several causal models, involving from one to three trait factors. The results indicate that the model which best fits the data includes a general and two specific trait factors —grammatical/pragmatic competence and sociolinguistic competence. The relative importance of the trait and method factors in the various tests used is also indicated.
Book
This volume presents a framework that expands the traditional concept of a vocabulary test to cover a range of procedures for assessing the vocabulary knowledge of second language learners.
Chapter
An overview of the state of the field of assessing writing at the end of the 1980s, as assessment by reading actual student writing once again became an accepted practice in the US.
Article
Partial table of contents: Discriminant Analysis in Research. PREDICTION. Basic Ideas of Classification. Multivariate Normal Rules. Classification Results. Hit Rate Estimation. Nonnormal Rules. Reporting Results of a PDA. Applications of PDA. DESCRIPTION. Group Separation. Assessing Effects. Describing Effects. Selecting and Ordering Variables. Reporting Results of a DDA. Applications of DDA. ISSUES AND PROBLEMS. Special Problems. Appendices. Answers to Exercises. References. Index.