ArticlePDF Available

Abstract

This article discusses the main purposes of reading comprehension assessment and identifies the key features of good assessment. The article also identifies pitfalls that clinicians and educators should avoid to conduct valid assessments of reading comprehension, such as the degree to which the measure taps the constructive and integrative processes of reading comprehension, the effects of socioeconomic and cultural-linguistic differences on student performance, and sampling variation. Lastly, it discusses recent technological innovations to the assessment of reading comprehension, including 2 specific computer-supported tools, Measures of Academic Progress and Independent Comprehensive Adaptive Reading Evaluation.
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
Top Lang Disorders
Vol. 25, No. 1, pp. 33–50
c
2005 Lippincott Williams & Wilkins, Inc.
The Assessment of Reading
Comprehension
Considerations and Cautions
Lynn Snyder, PhD; Donna Caccamise, PhD;
Barbara Wise, PhD
This article discusses the main purposes of reading comprehension assessment and identifies the
key features of good assessment. The article also identifies pitfalls that clinicians and educators
should avoid to conduct valid assessments of reading comprehension, such as the degree to which
the measure taps the constructive and integrative processes of reading comprehension, the ef-
fects of socioeconomic and cultural–linguistic differences on student performance, and sampling
variation. Lastly, it discusses recent technological innovations to the assessment of reading compre-
hension, including 2 specific computer-supported tools, Measures of Academic Progress and In-
dependent Comprehensive Adaptive Reading Evaluation. Key words: inferences,measurement,
reading comprehension,technology
PURPOSES of reading comprehension as-
sessment vary widely. Correspondingly,
assessment tools and activities can take many
forms, ranging from statewide “high stakes”
assessments, to district- or schoolwide paper-
and-pencil silent reading tests, to formal tests
administered individually as part of diagnostic
protocols, and to dynamic qualitative assess-
From the Center for Language and Learning,
Department of Speech, Language and Hearing
Sciences (Dr Snyder); Institute of Cognitive Science,
Department of Psychology (Dr Caccamise); and
Center for Spoken Language Research (Dr Wise),
University of Colorado at Boulder.
Thework on this article has been supported, in part,
by an InterAgency Educational Research Initiative
National Science Foundation award #030.05.0431B
to Walter Kintsch, the first two authors, Ron Cole
and Richard Olson, and by an Institute for Educa-
tion Sciences U.S. Department of Education award
#R305G040097 to the third author. Any opinions, con-
clusions, and recommendations expressed in this article
are those of the authors and do not necessarily reflect
the views of the National Science Foundation, the Insti-
tute for Education Sciences, or the official policies, ei-
ther expressed or implied, of the sponsors or of the U.S.
Government.
Corresponding author: Lynn Snyder, PhD, Department
of Speech, Language and Hearing Sciences, Campus Box
409, University of Colorado at Boulder, Boulder, CO
80309 (e-mail: lynn.snyder@colorado.edu).
ments of how deeply an individual student
understands a particular curriculum-related
selection.
When preparing to conduct assessments,
most clinicians and educators are aware of
the need to guide their choice of appropriate
quantitative measures of language and founda-
tional reading skills with standard psychome-
tric criteria for judging reliability and validity
(Anastasi & Urbina, 1997). They may be less
knowledgeable about implications of recent
theoretical developments in reading compre-
hension theory (e.g., Kintsch, 1998) for judg-
ing construct and content validity of reading
comprehension assessments. In particular, ex-
aminers should be cautious about assuming
that content validity is present simply because
reading passages for measuring comprehen-
sion have been selected from grade-level cur-
ricula or evaluated with traditional readabil-
ity formulae. Such simplistic approaches do
not necessarily yield assessment tools with
the anticipated properties. Clinicians and ed-
ucators need sophisticated strategies to evalu-
ate and interpret reading comprehension pas-
sages, using criteria that reflect the state of the
discipline.
Consequently, this article first exam-
ines the expanded purposes of reading
33
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
34 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
comprehension assessment. Second, it ad-
dresses frequent problems encountered in
interpreting reading comprehension findings
within expanded contexts, emphasizing key
features of construct and content validity
for different forms of assessment. Finally,
technological innovations in the assessment
of reading comprehension are introduced.
EXPANDED PURPOSES OF READING
COMPREHENSION ASSESSMENT
When considering purposes for assessing
reading comprehension, the prominent focus
is often placed on the assessment of com-
prehension by children with known reading
problems and monitoring their progress dur-
ing the course of intervention. In recent years,
however, the purposes for assessing compre-
hension have expanded. According to Carlisle
and Rice (2004), the assessment of reading
comprehension in school settings has at least
four essential purposes:
1. state and district evaluation and account-
ability of programs and curricula;
2. identification of children at risk for prob-
lems;
3. differential diagnosis of children with
reading problems; and
4. measurement of student progress/out-
comes during the course of intervention.
The first purpose, that is, evaluating school
programs and curricula, is nothing new in and
of itself. What has changed is the role that as-
sessment of student learning outcomes plays
in this process. In recent years, assessment
not only became the definitive tool for ac-
countability, but also has led to the develop-
ment of reading comprehension assessments
with unique features.
Federally driven state and district
evaluation and accountability
In the current climate of educational re-
form, national policy has raised the bar, with
the aim that every child will be able to read
adequately by the end of third grade (Bush,
2002). This far-reaching goal is legislated and
spelled out in the provisions of the No Child
Left Behind Act (NCLB, 2002, PL 107–110).
The NCLB Act has a dominant theme of ac-
countability, setting Federal standards to en-
sure that state and local educational agencies
are motivated to achieve the new national ed-
ucational goal. Among the standards of NCLB
are three that relate directly to accountability,
namely, that schools are to
1. use evidence based reading practices;
2. make “adequate yearly progress,” doc-
umented by assessing academic out-
comes; and
3. respond when labeled “in need of im-
provement,” that is, when they do not
meet the standards.
Such conditions, especially when coupled
with stipulations that parents may elect to
move their children if their school does not
achieve adequate yearly progress, give assess-
ment a very public, important, and controver-
sial role.
In using assessment to demonstrate edu-
cational outcomes for students and identify
schools “in need of improvement,” there is
room for multiple interpretations of process.
As educational policy commentators have
observed, the wording of “adequate yearly
progress” may be sufficiently vague that it
has resulted in misinterpretations, particularly
at the state level (Haycock & Weiner, 2003;
Silliman & Wilkinson, 2004). For example,
“adequate yearly progress”does not necessar-
ily mean annual yearly progress; identifying
schools as “in need of improvement”need not
necessarily determine whether a school is fail-
ing (Haycock & Weiner, 2003). This wording
of the legislation has led to a number of dif-
ferent interpretations of the law and resulted
in varying types of implementation by states
(Silliman & Wilkinson, 2004).
Regardless of variability across states, read-
ing assessment at the state and local lev-
els provides the data that are used in im-
plementing its educational standards. States
have committees that develop and articulate
standards concerning what proficient reading
comprehension includes at each grade level.
In turn, many states have developed their own
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 35
statewide assessments to measure student
mastery of these skills, and they have begun
to administer the assessments, often on an an-
nual basis. States’ success in developing mean-
ingful assessments in reading comprehension
varies considerably. Snow (2003) mentioned
both favorable reports from researchers about
the tests designed by some states, as well as
much less positive reports about other states.
Forexample, she pointed out a problem com-
mon to many tests of reading comprehension
occurs when test items can be answered with-
out ever reading the selection because the stu-
dents can use their prior knowledge (Snow,
2003).
Many of these statewide assessments use
rating scales similar to that used by the Na-
tional Assessment of Educational Progress
(2000). That is, students’ reading achievement
is judged along a descriptive continuum that
corresponds to a set of standards. The descrip-
tors used to describe student performance
levels are usually anchored to descriptions
of standards that characterize reading profi-
ciency at each grade level. For example, in the
state of Colorado, the Colorado Student As-
sessment Program,the state’s “high stakes”
assessment, characterizes student’s reading
proficiency along a 4-point scale: unsatisfac-
tory, partially proficient, proficient, and ad-
vanced. (Visit http://www.cde.co.state.us for
additional information.) The state provides
abreakdown of each student’s performance
within each area of academic achievement,
forexample, reading vocabulary versus com-
prehension, but without clear direction about
how to use the information to plan instruc-
tion or interventions. In other words, some
assessments that are designed for accountabil-
ity purposes are not necessarily suited to the
purposes of guiding instruction and interven-
tion for individual students. They may provide
feedback about student performance relative
to grade-level standards, but not relate the in-
formation to benchmarks more directly. In or-
der for educators and clinicians to use these
data for other purposes, that is, guide inter-
vention decisions, additional analytical steps
must be taken.
Some states even link student performance
on statewide measures to decisions about pro-
motion from a grade and high school grad-
uation (Thompson & Thurlow, 2003). This
poses considerable problems for many stu-
dents with reading difficulties. In such in-
stances, it is crucial that clinicians and edu-
cators understand how to interpret student
performance on these “high stakes” assess-
ments. What may have begun as a way to co-
ordinate assessment, curriculum and instruc-
tion, may have in some instances fostered a
negative climate and could even be hinder-
ing reform. See Silliman and Wilkinson (2004)
foranexcellent discussion of these issues and
the challenges they pose. District and school
program evaluation and accountability is not
the only area in which public policy has in-
fluenced and expanded the purposes of as-
sessment; the use of screening tests to iden-
tify children at risk for reading problems is
another such instance.
Identification of children at risk for
problems in reading comprehension
The second purpose that Carlisle and Rice
(2004) identified for the assessment of reading
comprehension in schools is to identify chil-
dren at risk before their difficulties have be-
come entrenched. That is, assessment is now
expected to be able to identify, with a reason-
able level of accuracy and reliability, students
at risk for future reading problems, including
problems in reading comprehension, with the
goal of providing prevention services.
PL 107–110, the No Child Left Behind Act,
initiated educational reforms that reached fur-
ther than just accountability for educational
practices. Motivated by research findings in-
dicating that most children who do not learn
to read by the end of third grade fail to catch
up with their peers (Simmons, Kuykendall,
Cornachione, Kame’enui, 2000), the law for-
mulated a clear national educational goal: ev-
ery child must learn to read by the end of third
grade. Its sweeping reforms not only placed
assessment squarely in the middle of educa-
tion, but also expanded its role. If all children
must learn to read by the end of third grade,
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
36 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
educators needed a way to identify those chil-
dren at risk for reading failure early enough to
help them before then. Screening assessments
have become the highlighted measures with
which to accomplish this.
School districts’ need for cost efficiency
demands screening tools that demonstrate a
low rate of false positives in identification,
so that students without problems will not
be misidentified and given help they do not
need. The motive to help all children who
need it demands tools that yield a low rate of
false negatives, so that children in need will
not be missed, and hence, will fail to receive
the programmatic help they need to be suc-
cessful by third grade. Screening sensitivity,
hit rates, and degree of risk became crucial
criteria for evaluating and selecting screening
assessments (Catts, Fey, Zhang, & Tomblin,
2001). This was further compounded by the
need to identify students sufficiently early,
even as early as kindergarten, to begin in-
tervention (O’Connor & Jenkins, 1999). Re-
cent intervention research has established
that fewer and less intensive hours of reme-
diation are needed to help children identi-
fied as at risk in kindergarten or first grade,
than those identified in third grade (Torgesen,
Wagner, & Rashotte, 1997).
Screening measures have become remark-
ably successful in identifying children at
risk for word reading and other reading
problems in the primary grades (Riordan &
Snyder, 2002). They are particularly good at
identifying students at young ages who are
likely to exhibit poor reading comprehension
later as a result of their slower and less fluent
decoding (Perfetti, 1985; Rayner, Foorman,
Perfetti, Pesetsky, & Seidenberg, 2001) and re-
sulting diminished exposure to print and com-
plex printed language and content (Stanovich,
1986, 2000). On the other hand, early screen-
ings focused on word decoding may over-
look semantic differences demonstrated by
many students whose cultural or socioeco-
nomic backgrounds have provided them with
vocabularies, discourse structures, and world
knowledge that are discrepant from the ma-
terials they encounter in school. Further, this
strategy may also fail to identify other students
at risk for reading comprehension deficits. See
Nation and Norbury’s (2005) article in this
issue for a description of the range of stu-
dents who experience problems with reading
comprehension.
Motivated by the clear need to identify
at-risk students early, most screening assess-
ments are constructed to identify students at
risk for reading difficulties by the end of sec-
ond grade. For example, Catts et al.’s (2001)
screening includes measures of phonological
awareness, alphabet knowledge, verbal mem-
ory for sentences, and serial rapid naming
skills. The inclusion of these tests was de-
termined by statistical analyses that demon-
strated that these measures and maternal ed-
ucational level best accounted for variance
in students’ composite reading scores near
the end of second grade. Reading compre-
hension at later grade levels (fourth grade
and above), on the other hand, seems best
accounted for by early measures of oral re-
ceptive and expressive vocabulary, syntax,
and verbal memory for sentences and stories
(Snow, 2002; Snow, Burns, & Griffin, 1998).
Scarborough (1998) also found that rapid au-
tomatic naming (RAN) at the second grade
level accounted for variance in eighth-grade
reading scores. Thus, most current screening
measures seem best able to identify those stu-
dents at risk for failure because they demon-
strate low foundational reading skills that are
needed to learn to decode print. The prob-
lem is that similarly sophisticated and ac-
curate screening assessments for identifying
students whose reading comprehension prob-
lems emerge later because of more specific
deficits in working memory and language
comprehension have not yet become avail-
able. It is in this arena of comprehensive lan-
guage abilities that speech–language pathol-
ogists may make a particularly significant
contribution to literacy practices.
Diagnosis of children with problems
in reading comprehension
Assessment and diagnosis of children with
reading comprehension problems is the
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 37
third purpose outlined by Carlisle and Rice
(2004). Assessment for this purpose often
looks at a student’s reading performance
from a componential perspective. That is,
in addition to a student’s performance on
measures of oral or silent reading compre-
hension, skills thought to underlie the ability
to read are assessed (Wise & Snyder, 2002).
The Report of the National Reading Panel
(2000) recommended that educators and clin-
icians assess students’ alphabet knowledge,
phonemic awareness skills, phonological
processing, reading vocabulary, and reading
fluency. Recent research suggests that even
this scope of testing, though necessary,
may not be sufficient to evaluate the range
of students with reading comprehension
problems.
Abroad research consensus has shown
that the language difficulties that most stu-
dents with reading problems exhibit in the
area of phonological processing (NRP, 2000;
Stanovich, 2000; Wise & Snyder, 2002) are ac-
companied for some students by additional
problems in oral language. A smaller group
of students also have significant problems in
the domain of language comprehension, de-
spite average word decoding skills (Carlisle
& Rice, 2004; Nation, Adams, Bowyer-Crane,
& Snowling, 1999; Nation & Snowling, 1998;
Wise & Snyder, 2002). These children appear
to struggle with the semantic knowledge, lin-
guistic, and cognitive constructive processes
needed to comprehend language, whether
written or spoken (see Nation and Norbury,
2005, for a more thorough discussion of this
type of reading difficulty). Catts and Hogan
(2002) found that students with adequate de-
coding but poor comprehension abilities ac-
counted for 3% of all fourth graders they stud-
ied, and close to 20% of the students identified
with reading difficulties in the fourth grade.
Clinicians and educators need to assess the
knowledge and processes related to broader
language comprehension problems if they are
to understand the difficulties experienced by
students whose semantic and syntactic oral
language problems impact their reading com-
prehension.
It is unclear whether existing standardized
measures of reading comprehension can iden-
tify the key features of comprehension prob-
lems encountered by students whose oral
language difficulties contribute to their poor
comprehension. To understand whether as-
sessments can meet this need, we examine
the properties of both word reading and lan-
guage comprehension and their assessment in
this section of the article. Factors crucial to
the content and construct validity of assess-
ments in general will be covered later in the
section on features of good assessments.
Effects of poor word reading
on comprehension
Conventional wisdom in reading research
points to issues of resource bottlenecks
in reading comprehension (Perfetti, 1985;
Rayner et al., 2001). Many students with read-
ing disabilities have underlying problems with
phonological processing and are weak or slow
at decoding print. That is, they have not de-
veloped accurate and automatic word read-
ing. As a result, they need to devote cognitive
resources to decoding, creating a bottleneck
and leaving few resources for constructive
comprehension, or for getting the meaning
from the text. Obviously comprehension suf-
fers if a student misreads an important word,
but these children’s comprehension also suf-
fers even when they read a word correctly
but slowly. According to the resource lim-
itation view, this occurs not because they
have difficulties with language comprehen-
sion processes per se, but because their slow
word reading takes so much effort and re-
sources, the students lack the available re-
sources needed to invest in comprehension.
Areading comprehension problem stem-
ming from poor word reading can be fur-
ther compounded by another concern, re-
lated to what Stanovich (1986, 2000) has
called “Matthew Effects.” According to this
view, students who are slow to develop read-
ing skills are more likely to read materials at
lower grade levels, and to read less material
overall. The books they do read contain sim-
pler vocabulary and syntactic constructions
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
38 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
with simpler text structures than texts at their
grade level. The resulting lack of exposure to
print and to print containing higher order lan-
guage structures and content can keep these
students at less mature levels of comprehen-
sion processing and at slower reading rates,
a long-term effect of problems with decod-
ing print. In terms of the “Matthew Effect,”
the early poor readers get poorer over time
relative to the good readers, who get increas-
ingly richer, that is, acquire larger vocabular-
ies, expand their world knowledge, and de-
velop higher order language structures.
Inaccurate word reading, resource bottle-
necks, and Matthew effects, according to
these explanations, may compromise many
students’ reading comprehension as sec-
ondary effects of decoding problems, but not
as a result of an inherent problem with the
constructive comprehension processes that
underlie language comprehension. A single
assessment of comprehending text in silent
reading is insufficient to sort out secondary
effects from effects associated with a pri-
mary language disorder. However, assessment
can easily identify these types of problems
when findings from text reading are viewed in
conjunction with the student’s performance
on other measures like word reading or de-
coding, alphabetic knowledge, phonological
processing, reading automaticity, and listen-
ing comprehension. These more comprehen-
sive measures provide clinicians and special
educators with useful tools for differential
diagnosis that they then can supplement
with information obtained from qualitative
measures.
Effects of underlying oral language
problems on reading comprehension
The recent research emphasis on phono-
logical processing deficits has resulted in
a currently prevalent view that if students
are poor or slow decoders, their reading
comprehension deficits are almost surely a
secondary result of the resource bottleneck
discussed above (Perfetti, 1985). Such does
appear to be the case for those students who
show little or no problem in listening com-
prehension. However, it is not the case for
students with underlying language problems
that cross all components of language, includ-
ing the semantic component, or for students
with specific significant deficits in listening
as well as in reading comprehension whose
language problems are primarily character-
ized by underlying semantic issues (Nation
et al., 1999; Nation & Norbury; 2005; Nation
& Snowling, 1998; Wise & Snyder, 2002).
If instructors and clinicians adopt the view
that reading comprehension problems are al-
ways secondary to a bottleneck from slow
or inaccurate word reading, they might as-
sume that the comprehension problem will
resolve without further intervention when
that student learns to read automatically and
fluently. However, this is an unsafe assump-
tion unless assessments have been completed
to show the student has no difficulties in
listening comprehension, by using materials
that are sufficiently complex to measure key
components of discourse comprehension. If
astudent shows difficulties with language
knowledge such as weak vocabulary or se-
mantic and cognitive processes for making
connections across discourse or text, inter-
ventions aimed at treating these deficits must
be part of his or her program of remedi-
ation. Speech–language pathologists’ profes-
sional training includes an extensive base of
knowledge and clinical experience with the
assessment of oral language. They can provide
important data to illuminate an educational
team’s understanding of the language deficits
that underlie poor reading comprehension.
They also can assess key oral language abilities
directly and provide interventions to address
any problems.
Monitoring students’ progress in
reading comprehension
The fourth and final purpose for read-
ing comprehension assessments identified by
Carlisle and Rice (2004) was to allow teach-
ers and schools to monitor student progress
in the context of instruction and intervention.
To assess progress, clinicians and educators
rely on varying tools, including standardized
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 39
measures, different types of more “authentic”
measures including criterion-referenced test-
ing, curriculum-based measures, or portfolios.
The discussion here is restricted to just a few
types of measures and cautions to keep in
mind when using them for this purpose.
Standardized measures
Using standardized measures to monitor
student progress in reading comprehension
is valid only if clinicians can be sure that
the population to be tested matches the
population among whom the norms were
developed. This is not always the case.
Forexample, Rathvon (2004) found that
awell-known norm-based reading compre-
hension measure, the Standardized Read-
ing Inventory—Second Edition (Newcomer,
1999), suffered from restrictions in the demo-
graphics of the sample and errors in the scor-
ing examples. Also, there were problems in
the sensitivity of items for students below the
ageof9years so that very young children who
earned a raw score of 1 placed in the “average”
range when it was transformed into a standard
score. Such weaknesses made the measure in-
sensitive to small differences in younger read-
ers, rendering it invalid for the group of chil-
dren who are of particular interest if early
identification is to occur.
Another problem is that many standard-
ized measures of reading comprehension do
not provide teachers with specific informa-
tion about crucial core comprehension profi-
ciencies, even when the tests contain items
that tap competencies in these areas. They
simply provide a total score/standard score
forreading comprehension. Core competen-
cies, such as ability to construct a gist of
the text, draw inferences and integrate new
information with existing world knowledge
(Caccamise & Snyder, 2005), are skills that
clinicians, classroom teachers, and special ed-
ucators often need to target in their work
with students. If baseline levels of proficiency
in such competencies cannot be determined
from standardized measures, formal test re-
sults must be supplemented with informa-
tion obtained from other types of informal
assessment, particularly criterion-referenced
and curriculum-based measures.
Criterion-referenced measures
Many teachers use criterion-referenced
measures of reading comprehension to moni-
tor student progress. Such measures describe
student performance on graded passages in
terms of mastery (or independent), instruc-
tional, and frustration levels of performance.
These levels allow teachers to select the most
appropriate reading materials and avoid those
that would either be too far below or too
farabove the student’s optimal learning level,
and therefore, likely to induce either boredom
or frustration.
Informal reading inventories (e.g., the
Qualitative Reading Inventory-3 by Leslie
and Caldwell, 2001) are frequently used
criterion-referenced measures. When we ex-
amined protocols obtained with one infor-
mal reading inventory, however, our results
showed low reliability across teachers in sev-
eral aspects of scoring (Snyder, 2002). Al-
though this seems to be a serious short-
coming, it could probably be ameliorated
with teacher training and refresher courses.
It should be noted, in fact, that the authors of
this particular inventory (Leslie and Caldwell)
reported higher levels of interscorer reliabil-
ity in their manual than we observed in our
independent research.
Beyond concerns with scoring reliability,
an even more serious shortcoming of many
inventories involves methods for measuring
the construction of meaning. Keenan and
Betjemann (2003) reported that much more
than half of the comprehension questions on
the reading inventory they evaluated could
be answered on the basis of world knowl-
edge alone, without ever having read the pas-
sages. For example, a fourth-grade reading
passage on Johnny Appleseed assessed com-
prehension of information that is taught rou-
tinely in fourth-grade curricula that contain
units on Johnny Appleseed. Many students
in Keenan and Betjemann’s study, therefore,
could answer the questions correctly because
of the elaborated background knowledge they
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
40 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
had about the topic without reading the pas-
sage. Performance on measures with such
traits cannot reflect students’ comprehension
of the text or the ability to construct meaning.
Some other inventories, such as the An-
alytic Reading Inventory (Woods & Moe,
1999), do evaluate a student’s ability to
make predictions and measure a wide vari-
ety of strategies at different levels of read-
ing and listening comprehension. The cau-
tion to be observed by examiners is that
criterion-referenced measures have an un-
even track record. Educators and clinicians
need to examine inventories carefully, to de-
termine whether the student behaviors rated
and described actually reflect the construc-
tion of meaning (Jacobs, 2000), which is at the
core of reading comprehension.
Curriculum-based measures
In the last decade, some teachers, clin-
icians, and researchers have begun us-
ing curriculum-based measures to assess
progress. These measures are especially
appealing because they relate directly to the
content of instruction. Curriculum-based
reading comprehension measures often in-
volve asking students to read text passages or
selections silently or orally in a short period
of time and then to retell or write about what
they have read (Goode & Kaminski, 2000).
In other instances, students are asked to
read short selections and complete a maze
task (Shin, Deno, & Espin, 2000). In maze
tasks, every Nth word is deleted from the text,
and students must select, from three options,
the word that belongs in the sentence. Espin
and Foegen (1996) suggested that maze tasks
are valid ways to assess reading comprehen-
sion. However, like cloze tasks (Carlisle &
Rice, 2004), this form of testing may only ad-
dress a student’s knowledge of lexical choice,
syntactic and morphological restrictions on
sentence formulation, and local knowledge.
They may not tap the processes involved in
making sense of larger selections in which stu-
dents must draw inferences across stretches
of text and construct a coherent representa-
tion of what they have read.
Fuchs and Fuchs (1998) recommended se-
lecting only curriculum-based measures that
demonstrate robust reliability and validity. We
take that recommendation one step further to
caution examiners to restrict their choice of
CBMs to measures that use selections of suffi-
cient length and complexity to allow children
to make constructive connections across text,
similar to texts encountered in classrooms. In
this recommendation, we are advocating for
text selection that demonstrates a higher level
of construct validity.
Portfolio assessment
To accomplish this goal of construct valid-
ity, clinicians and educators often use other
types of descriptive curriculum-based perfor-
mance assessments in which they collect a
portfolio of samples of work that students
produce in the context of actual classroom
curricular activities. They then inspect these
portfolios to glean information about reading
comprehension skills not captured by other
types of measures.
Using portfolios for assessment of read-
ing comprehension involves collecting a wide
range of samples of students’ work in the
form of written responses to selections they
have read. Shapely and Bush (1999) suggested
that grading criteria and rubrics should be
used to assess portfolio data to document
students’ progress in relation to instructional
goals. In the past, states, such as Califor-
nia, that have required the use of portfolio
assessments have developed scoring rubrics
for this purpose. Uniform grading within and
across teachers and samples may be difficult
to achieve, however. In fact, Valencia and Wix-
son (2000) found that educators’ portfolio
scoring rubrics for reading were neither reli-
able nor valid for measuring progress.
In summary, clinicians and educators have
access to different forms of reading com-
prehension assessment as they meet the ex-
panded purposes of assessment. Best prac-
tices require that assessments be viewed
within those different contexts, and that
they be selected, used, and interpreted with
caution. Further, the complexity of reading
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 41
comprehension, the knowledge, and pro-
cesses that underlie it require a fresh look at
the construct and content validity of reading
comprehension assessments. These are the
concerns we address in the next section.
KEY FEATURES OF TEXTS AND
TEST VALIDITY
Thus far, we have emphasized that effec-
tive assessment of reading comprehension
must address all the crucial knowledge and
processes thought to undergird reading and
language comprehension, not only issues re-
lated to inaccurate or slow word reading
that might constrain available resources. To
achieve full construct and content validity,
assessment practices must include and ac-
count for constructive reading comprehen-
sion. It is such constructive comprehension
processes that are involved in the making
of meaning, which research suggests is the
very heart and soul of skilled reading (Jacobs,
2000). Thus, the ultimate outcome of effec-
tive assessment of reading comprehension for
purposes of instruction, and educational pro-
gramming must be a measure of students’ abil-
ity to construct deeper meanings from what
they read in relation to what they already
know.
The construct of reading
comprehension: Key features
to be assessed
Reading comprehension is a complex, mul-
tidimensional, dynamic process whose pri-
mary focus is constructing or interpreting the
meaning of what is read. (See Caccamise &
Snyder, 2005, for an overview.) It involves
avast array of linguistic and semantic cog-
nitive processes. Graesser, McNamara, and
Louwerse (2003) suggested that readers build
meaning at many levels in building a coherent
representation of what they read. According
to their observations, readers engage in each
of the following processes:
1. extract meaning from the wording and
grammar of the text;
2. build a basic understanding of the text
base explicitly contained in the selec-
tion;
3. go beyond what is explicitly stated in the
text, drawing inferences to build a men-
tal representation or idea of what the
text is about;
4. deal with genre-specific text structures
(e.g., narrative vs. rhetorical text orga-
nization) and apprehend the different
channels of communication used by the
writer (e.g., narrator, participant, or au-
dience); and
5. follow culturally appropriate ground
rules for respective text types.
Graesser et al. (2003) also reported that
when readers interact enough with the text
to make sensible connections among these
different levels, they construct a coherent
representation of what they have read. This
certainly is more complex than merely iden-
tifying a story’s theme and remembering sig-
nificant details. See Kintsch (2005) for an ex-
tended description of deep comprehension.
Although some well-designed high stakes as-
sessments and standardized measures do sam-
ple some skills at higher levels of comprehen-
sion, many fail to examine processes explicitly
or sufficiently at many of these levels. For this
reason, they may be lacking in validity across
anumber of dimensions.
Factors that influence validity
Among the many factors that influence va-
lidity, those that relate to the normative pop-
ulation, test content, and the construct it is
designed to measure are particularly likely to
have been overlooked with respect to reading
comprehension assessments. As noted previ-
ously, to be valid, assessments must be used
with students similar to the population on
which the measure was normed and standard-
ized. Test items also must have content va-
lidity, including consideration of cultural dif-
ferences. None of this can occur unless tests
have construct validity in terms of their ability
to assess important aspects of comprehension
in relevant ways.
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
42 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
Accounting for variations in normative
sampling and demographics
Of course, assessment findings from mea-
sures are only as good as the tools themselves.
That is, one cannot assume that a standardized
measure will necessarily test what it purports
to unless evidence supports that the demo-
graphics of the normative sample on which
the measure was standardized is sufficiently
similar to the students to be tested. When it
comes to standardized measures of any kind,
clinicians and educators must pick up the
technical manual and read the fine print to
be sure the test is valid for the students to be
tested.
Although this caution holds for tests of
all kinds, it is even more important when it
comes to measures of reading comprehen-
sion. This is because the central processes of
reading comprehension are semantic in na-
ture (Kintsch, 1998; and the articles in this
issue), and it is well known that knowledge
of word meanings, related concepts, and fa-
miliarity with text structures (all crucial for
comprehension) are highly sensitive to demo-
graphic variations in socioeconomic level, cul-
ture, and language. Therefore, it is especially
important for assessors of comprehension to
check that the normative sample on which a
test was developed is similar to the population
to be measured.
Effects of socioeconomic differences
on test performance
In issuing her caution related to this con-
cern, Payne (2003) demonstrated the perva-
sive influence of poverty on the types of dis-
course structures understood and used by
both children and adults in economically chal-
lenged positions that occur in addition to cul-
tural influences that are present. For example,
informal narrative recounts of events are the
discourse events most often heard and pro-
duced by individuals living in poverty. Such
experiences are likely insufficient to prepare
readers to comprehend the formal discourse
structures of expository texts needed for suc-
cess in school.
Living in low-income families or communi-
ties may limit students’ access to some other
reading-related opportunities, as well. Snow
(2003) noted that low socioeconomic sta-
tus can hinder the development of strong
lexical and listening comprehension skills as
used in “school language,” because a com-
munity’s typical transactional lexicon and dis-
course register influences students’ oral lan-
guage skills. These differences in discourse
processing that affect listening comprehen-
sion should similarly affect reading compre-
hension. Thus, it is often through schooling,
rather than at home, that students from low-
income families must develop proficiency
with the abilities on Snow’s summary of pre-
requisites for successful reading comprehen-
sion. That list includes the following:
1. good oral language skills, particularly
well-developed oral vocabularies and
strong listening comprehension;
2. a knowledge base that covers a wide
range of topics;
3. social interactions in the home, class-
rooms, and communities that encourage
reading;
4. lots of practice with reading; and
5. the availability of many kinds of reading
materials.
Snow (2002) confirmed the influences of
these differences in the findings from her lon-
gitudinal study of literacy development in chil-
dren from low-income sectors.
Also, students who have been raised in
conditions of poverty may find a significant
mismatch between their world knowledge
and life experiences and the situations de-
picted in selections on reading comprehen-
sion measures. This mismatch may lead assess-
ments to underestimate the comprehension
abilities of students with world experiences
that do not include situations covered in a
narrative.
Research has supported observations that
the vocabularies of students from low-income
families may not match or grow in the same di-
rection as students from families with higher
SES levels. Chall, Jacobs, and Baldwin (1990)
studied why children from some low-income
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 43
families succeeded at reading when others did
not. They found that even the above-average
lower income readers slipped in their per-
formance over time on national norms, es-
pecially in the area of word-reading fluency.
One primary reason for the students’ declin-
ing performance in fluency was found to be
their limited knowledge of word meanings.
By the sixth and seventh grades, they only
understood words at the fourth-grade level,
despite adequate decoding skills. These stu-
dents’ knowledge of vocabulary and associ-
ated concepts was not expanding at a pace
to allow them to apprehend the language and
concepts of middle school texts.
Effects of cultural–linguistic differences
on test performance
Students who speak English as a second
language also are affected in that they may
use text and discourse structures from their
first language that do not always conform to
the Western European text structures found
on most reading comprehension measures. In
fact, sometimes the text structures used in
one language may be at considerable odds
with the second. For example, Navajo dis-
course structures do not express causality in
the same way that English does (Lindstedt,
2000); yet causality is one of the main
ways of structuring plots in the mainstream
culture. Lindstedt found that when Navajo
third-grade students who spoke English as a
second-language related events, their dis-
course structures differed markedly from
those produced by students who spoke En-
glish as a first language. Therefore, com-
prehension summaries produced by students
who are English language learners can be af-
fected by this mismatch in narrative style.
Similarly, stories from African American cul-
tures tend to use more topic associative story
lines than the Western European story plots
that populate narratives read in American
schools. As Pisha (cited in Frase-Blunt, 2000)
pointed out, when African American students
are asked to interpret or summarize the West-
ern European stories prevalent in American
classrooms, they may be starting in an entirely
different place and using a different cognitive
style.
Factors related to sample characteristics
In addition to examining information about
the demographics of the normative sample,
clinicians should ask whether the technical
manual provides other information about stu-
dents included or excluded from the sample.
When samples are limited to students who
do not receive special services, the sample is
said to be “truncated.” McFadden (1996) ar-
gued that a truncated sample does not rep-
resent all of the children in the group, and
when lower achieving students have been ex-
cluded from sample, the true distribution of
students’ scores is skewed. Had the sample
not been truncated, students who would oth-
erwise have scored in the low average range
might appear to be in the range of marked
low deficit. This results in many more stu-
dents demonstrating problems than would be
identified with a more complete sample with
the full range of ability of the original popula-
tion distribution considered “typical.”That is,
norms from a truncated sample would overi-
dentify students with reading problems.
Another way to look at this concern is to
examine the manual for clear evidence that
the test discriminates performance well on
the basis of the intended variable rather than
on some unintended variable (such as socioe-
conomic status or cultural/ethnic difference).
That is, the manual should report sensitivity
data about the percentage of students with
known poor comprehension (as measured by
some accepted “gold standard”) who also are
identified by the test as having disorders. It
should report selectivity data, as well, about
the percentage of students who are known to
have acceptable comprehension on the gold
standard, who also score in the acceptable
range on the test. For this reason it is very im-
portant that clinicians and educators choose
norm-referenced standardized measures with
strong psychometric properties (Riegelman,
2000; Sackett, Straus, Richardson, Rosenberg,
&Haynes, 2000).
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
44 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
Summary of desirable features
In summary, standardized measures of read-
ing comprehension can be used to meet the
four purposes of assessment when key criteria
are met. These criteria include the following:
1. assessing vocabulary, listening compre-
hension, and other foundational reading
skills identified by the National Reading
Panel (2000);
2. using standardized measures with ad-
equate psychometric properties, espe-
cially with regard to sampling; and
3. selecting standardized measures that
elicit and assess constructive compre-
hension skills.
These criteria may be difficult to meet
by administering a single standardized test.
The diversity of students who must be as-
sessed, and the particular challenges of assess-
ing reading comprehension suggest the need
fornew tools. It is our observation that assess-
ment results can be improved by adding some
authentic measures, such as curriculum-based
or other descriptive measures of tasks specif-
ically related to classroom content. We also
have been involved in developing new theory-
driven technological tools to support the as-
sessment of reading comprehension. The last
section of this article introduces some of these
tools.
TECHNOLOGICAL INNOVATIONS
On the one hand, there is no substitute for
highly effective and sensitive clinicians and
teachers who can establish a comfortable rap-
port for testing, adjust the pace of testing and
delivery of instructions in response to stu-
dents’ needs, and motivate and encourage stu-
dents at appropriate times. On the other hand,
clinicians and educators are often pressed for
time to test students and harried by unre-
lenting paperwork demands. Thus, they are
likely to cut corners in selecting tests, or to
make errors in measuring time or accuracy
on mundane tasks of test administration and
scoring.
Tec hnology innovations, supported by
computerized delivery systems, offer partic-
ular strengths to help clinicians handle some
of these issues. Computers can
1. be remarkably patient and encouraging
with students;
2. streamline assessment with adaptive pro-
grams;
3. present, time, and score items reliably
no matter how much repetition they in-
volve; and
4. reduce and even eliminate the paper-
work load for teachers.
Good computerized assessment programs
could, in fact, make a strong contribution
to the valid assessment of reading compre-
hension. In reality, the complex, multidi-
mensional nature of reading comprehension
may actually require an informed infrastruc-
ture with an advanced technological engine
to cover reliably the key features identified
here.
Many recent group- and individually admin-
istered paper-and-pencil measures of reading
comprehension have responded to the need
to ask better comprehension questions, ask
questions that probe for necessary inferences,
and determine whether readers have formed
a coherent mental understanding of the se-
lections. However, in their present form,
they cannot allow assessment to be suffi-
ciently interactive so that readers can quickly
be presented with items at their instruc-
tional levels, as good criterion-referenced
and performance-based measures attempt
to do.
Computer technology should help do all of
this in a reliable and valid manner. As an exam-
ple, the Intelligent Essay AssessorTM (Foltz,
Landauer, & Laham, 1999) is a computer pro-
gram that uses latent semantic analysis (LSA;
Landauer, Foltz, & Laham, 1998) to assess
the content and conciseness of essays col-
lege students write to demonstrate their deep
comprehension of what they have read. Foltz
et al. (1999) found that this program was able
to grade students’ essays as reliably as teach-
ers. Kintsch (2005) has provided a description
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 45
of LSA and its application to evaluating the se-
mantic complexity of both assessment texts
and the essays or summaries that students
write on the basis of their comprehension
of those texts. Kintsch also described Sum-
mary Street®as another Web-based reading
and writing intervention that uses LSA as its
analytic engine to successfully support inter-
mediate and older readers to improve their
ability to write concise and complete sum-
maries of expository texts they have read
(Kintsch, Steinhardt, Matthews, Lamb, & LSA
Research Group, 2000; Wade-Stein & Kintsch,
2004).
This ability of computer programs to mea-
sure the semantic complexity of expository
texts expands their power for assessing and
instructing reading comprehension (Kintsch
& Kintsch, 2005). Combine that new power
with the well-demonstrated capabilities for
computer programs to measure foundational
reading skills (e.g., Olson, Forsberg, Wise, &
Rack, 1994), and their advantage to assess-
ment becomes apparent.
In addition to the applications for sec-
ondary and postsecondary students men-
tioned previously, several computer-based as-
sessments of reading comprehension exist for
elementary and middle school children. Some
of these commercial tools have been widely
adopted across the nation. However, our in-
spection of a sample of test items from the
most popular of these measures indicates that
most of the inferences it tapped were lex-
ical, not the inferences that bridge mean-
ing across text. For example, one item on
a computer-based assessment describes the
Mall of the Americas in Minnesota, explicitly
stating that it is enormous. The question, in
the form of a cloze response, requires that
the student choose the word “huge” as the
correct answer to describe it. In contrast to
this limited representation of “comprehen-
sion”at a purely lexical level, current theoret-
ical advances clearly identify inferencing as a
hallmark of constructive reading comprehen-
sion (Best, Rowe, Ozuru, & McNamara, 2005;
Kintsch, 1998). As mentioned previously, if a
student can answer the comprehension ques-
tions correctly just by knowing the vocabu-
lary, examiners do not know how to interpret
the results. This is a serious shortcoming.
In this section, we describe two computer-
based assessments that are designed to take
advantage of the computational power, flex-
ibility, adaptability, and efficiency of technol-
ogy, but also to avoid the pitfalls described in
this article. One of those tools, known as MAP,
is already available; the other, ICARE, is in the
research stage of development at the Univer-
sity of Colorado. The former tool was devel-
oped by the Northwest Evaluation Association
(NWEA, 2000b); Wise leads the research ef-
fort to develop the latter.
AWeb-based assessment tool: The MAP
The NWEA’s Measures of Academic
Progress (MAP)isaWeb-based version of
their paper-and-pencil Achievement Level
Tests (NWEA, 2000a, 2000b). This tool mea-
sures five reading subcomponents: word
analysis, literal comprehension, interpretive
comprehension, evaluative comprehension,
and literary response.
In taking this assessment, students read se-
lections and items online and enter their re-
sponses. The program presents items at a stu-
dent’s ability, on the basis of the student’slevel
that is obtained by adaptive testing when the
student begins the test. The computational
base built into the program then allows the
MAP to adjust the reading level after a stu-
dent’s response to each item.
The skills tapped by the MAP provide a
comprehensive view of the student’s ability
to build an understanding at different levels
of meaning, from word, to expository text, to
literature. The MAP also provides information
on the student’s ability to draw inferences,
solve problems, and engage in literary anal-
ysis, all elements of higher level, or deeper
comprehension (Best et al., 2005; Kintsch,
2005). It also represents a learning contin-
uum that provides specific instructional goals
to correspond to different levels indexed
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
46 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
by students’ scores. Results are provided for
individual schools and for districts in a remark-
ably short turnaround time, for example, 48
hr at the school level. This reading measure
not only addresses the constructive reading
comprehension skills that form the core of
reading comprehension, but also is aligned so
that it corresponds to key features of state
high stakes assessments. The reading tests of
the MAP are designed to be used with spring
term second graders through 10th graders,
and they can be administered twice a year
to document progress. Normed on a national
sample of more than 1 million students, the
MAP provides schools and districts not only
with students’ scores, percentiles, and per-
centile ranges, but also with the average on
each measure for the district as well as the
norm group.
More than 1 thousand school districts are
now using this assessment, resulting in a
highly sophisticated and responsive database.
This database allows educators and districts to
examine student performance with respect to
groups of students with similar demographic
characteristics. The item pool developed for
the MAP is extensive and the readability of all
items is yoked to Stenner’s lexiles (Stenner,
2000). Lexiles®are a readability metric based
on the vocabulary level and sentence length
of reading selections. The MAP also uses
Rasch’s item response formulation (Rasch,
1980), so that it forms a precise scale teach-
ers can use to measure progress in reading in
areliable manner.
The MAP is impressive in its construct and
content validity, and in the reliability and
ease with which it seems to handle the de-
mands of high stakes testing, including lo-
cal district accountability, as well as in moni-
toring student progress. Information obtained
on the subcomponents of the reading mea-
sure can help inform comprehensive assess-
ments when combined with other paper-and-
pencil tests that measure foundational skills.
Primarily designed for use by school districts,
the MAP has a large database that allows dis-
tricts to make comparisons with samples with
similar demographic characteristics, address-
ing some concerns about students from low-
income groups.
A computer-based assessment tool
in development: ICARE
A second promising technology-driven as-
sessment is called ICARE for Independent
Comprehensive Adaptive Reading Evalua-
tion. This project, headed by Wise (2004),
is being developed with colleagues Cole,
Pellom, and Van Vuuren at the Center for
Spoken Language Research and with other
key Colorado personnel at the University of
Colorado including Snyder and Olson. Still in
its first year of development, ICARE is an in-
teractive and adaptive computer-based assess-
ment system for reading comprehension.
ICARE’s Intermediate version is designed
for use with students from third through
eighth grades. It aims to screen and exit chil-
dren quickly who read fluently with good
comprehension at grade level. It also aims
to provide (in approximately 20–45 min) a
performance-based initial instructional profile
forchildren whose difficulties in reading com-
prehension result from problems related to
word reading, listening comprehension, or
both. ICARE is being designed to assess the se-
mantic and constructive processes thought to
underlie comprehension of oral and written
texts, as well as processes related to accurate
and automatic word reading, as identified by
the National Reading Panel (2000).
An adaptive system, ICARE, initially screens
three skills: time-limited single word reading,
reading or listening comprehension at grade
level, and spelling. If a student succeeds with
these at grade level, testing is complete. If a
student scores below grade level on any of
these tests, further testing continues. Tests
and items within tests are continuously adap-
tive, adjusting according to previous perfor-
mance. This continuous automatic adaptivity,
item by item, is intended to keep testing times
low.
Word-level measures being investigated for
ICARE include untimed word reading, timed
decoding, untimed decoding, rapid number
and letter naming, orthographic coding, and
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 47
phoneme deletion or reversal. If a student
performs poorly on the screening measures
of listening and/or reading comprehension,
ICARE will administer some of another se-
ries of related measures. These include vo-
cabulary, vocabulary analogies, and listening
and reading comprehension measures that
score gist comprehension, major fact finding,
and inferences identified by type. By includ-
ing oral language measures such as listening
comprehension and oral vocabulary, the sys-
tem handles the dilemma of identifying only
students whose reading comprehension diffi-
culties are primarily based on their slow de-
coding skills. At the same time, it can iden-
tify students whose primary difficulties lie in
constructive comprehension processes and
can also identify students with mixed deficits.
ICARE is designed in such a way that stu-
dents will be able to use it with minimal
teacher effort. A teacher should be present
to answer questions, but in a laboratory situ-
ation with multiple computers, ICARE could
assess more than one student at a time. This
would greatly alleviate the needs of districts
and teachers to provide comprehensive as-
sessment for all children at risk for reading
problems. The system will accomplish this
independence in at least two ways. For one
thing, an engaging animated “virtual assessor”
gives all instruction, support, and feedback.
Second, many tests also have oral versions
that are scored by Sonic, a speech recognition
system with remarkable and ever-increasing
levels of accuracy even with young children
(Pellom & Hacioglu, 2003).
Unlike the MAP, ICARE does not align with
state curricula. However, its proposed scope
and adaptive and independent nature are re-
markable, improving validity and reducing
time for developing initial instructional pro-
files. The instructional profiles ICARE pro-
vides can aid instructional planning in class-
rooms and/or on a related remedial system
Foundations to Literacy (FTL; Cole et al., 2003;
Wise et al., in press). Finally, ICARE plans a ver-
sion designed for English language learners.
This assessment system thus seems to promise
many attractive features.
Summary of new tools
Both the MAP and the ICARE can pro-
vide key information to clinicians and edu-
cators about students’ reading comprehen-
sion in valid, reliable, and efficient ways.
Both measure the semantic heart of read-
ing comprehension—making meaning across
large stretches of narrative and expository
texts. They avoid asking comprehension ques-
tions that can be answered solely on the ba-
sis of vocabulary and world knowledge with-
out having to read the selection. Both online
assessments can be used with multiple stu-
dents at the same time, giving clinicians and
teachers more time for in-depth observation
in classrooms or learning laboratories. They
also both adapt on the basis of students’ an-
swers, reducing testing time while refining
the individualization of assessment.
CONCLUSIONS AND
RECOMMENDATIONS FOR THE FUTURE
This discussion of the assessment of read-
ing comprehension has focused on the com-
plex, constructive nature of reading compre-
hension and on factors that can selectively in-
fluence student performance. Some serious
pitfalls to the successful assessment of read-
ing comprehension have been highlighted
without associating these weaknesses with
specific measures because clinicians can use
these criteria or guidelines to evaluate assess-
ments themselves.
Clinicians and educators understand the
policy and pedagogical contexts of their dis-
tricts, their schools, and the students they
serve, and consequently, they need to make
judicious assessment choices within that
framework. What began as a discussion of
the purposes, features, and validity issues in-
volved in the assessment of reading compre-
hension has developed into a roadmap to
guide clinicians and educators in making the
prudent choices and informed decisions for
better understanding students’ individual dif-
ferences in reading comprehension.
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
48 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
REFERENCES
Anastasi, A., & Urbina, S. (1997). Psychological test-
ing (7th ed.). Upper Saddle River, NJ: Prentice-Hall,
Inc.
Best, R. M., Rowe, M., Ozuru, Y., & McNamara, D. S.
(2005). Deep-level comprehension of science texts:
the role of the reader and the text. Topics in Language
Disorders,25(1), 65–83.
Bush, G. W. (2002, January 8). No child left behind frame-
work. Retrieved from http://www.ed.gov/nclb
Caccamise, D., & Snyder, L. (2005). Theory and pedagog-
ical practices of text comprehension. Topics in Lan-
guage Disorders,25(1), 5–20.
Carlisle, J., & Rice, M. (2004). Assessment of reading com-
prehension. In A. Stone, E. Silliman, B. Ehren, & K.
Apel (Eds.), Handbook of language and literacy (pp.
521–555). New York: Guilford Press.
Catts, H., Fey, H., Zhang, X., & Tomblin, B. (2001). Eval-
uating the risk of future reading difficulties in kinder-
garten children: A research-based model and its clini-
cal implications. Language, Speech, and Hearing Ser-
vices in Schools,32, 38–50.
Catts, H., & Hogan, T. (2002). Thefourth grade slump:
Late emerging poor readers.Poster presented at an-
nual meeting of Society for the Scientific Study of
Reading, Chicago.
Chall, J., Jacobs, V., & Baldwin, L. (1990). Thereading
crisis: Why poor children fall behind. Cambridge, MA:
Harvard University Press.
Cole, R., van Vuuren, S., Pellom, B., Hacioglu, K., Ma,
J.,Movellan, J., et al. (2003). Perceptive animated in-
terfaces: First steps toward a new paradigm for hu-
man computer interaction. Proceedings of the IEEE:
Special Issue on Human Computer Interaction,91,
1391–1405.
Espin, C., & Foegen, A. (1996). Validity of general out-
come measures for predicted secondary students’ per-
formance on content-area tasks. Exceptional Chil-
dren,62, 497–515.
Foltz, P. W., Landauer, T., & Laham, D. (1999). The Intelli-
gent Essay Assessor: Applications to educational tech-
nology. Interactive Multimedia Electronic Journal of
Computer-Enhanced Learning,1.
Frase-Blunt, M. (2000). New roads to reading comprehen-
sion. Today,7(1), 9, 15.
Fuchs, L., & Fuchs, D. (1998). Treatment validity: A unify-
ing concept for reconceptualizing the identification of
learning disabilities. Learning Disabilities Research &
Practice,13, 204–219.
Goode, R., & Kaminski, R. (2002). Dynamic indicators
of basic literacy skills (6th ed.). Eugene: Oregon In-
stitute for the Development of Educational Achieve-
ment.
Graesser, A., McNamara, D., & Louwerse, M. (2003). What
do readers need to learn in order to process coher-
ence relations in narrative and expository text? In
A. P. Sweet & C. E. Snow (Eds.), Rethinking read-
ing comprehension (pp. 82–98). New York: Guilford
Press.
Haycock, K., & Weiner, R. (2003, April). Adequate yearly
progress under NCLB.Paper presented at the National
Center on Education and the Economy Policy Forum,
New York.
Jacobs, V. (2000). Using reading to learn: The matter of
understanding. Per spectives,26, 38–40.
Keenan, J. & Betjemann, R. (2003). Comprehension
and lexical computations in skilled reading and
dyslexia.Paper presented at the Annual Meeting of
the Society for the Scientific Study of Reading, June
2003, Boulder, CO.
Kintsch, E. (2005). Comprehension theory as a guide for
the design of thoughtful questions. Topics in Lan-
guage Disorders,25(1), 51–64.
Kintsch, E., Steinhardt, D., Matthews, C., Lamb, R., & LSA
Research Group. (2000). Developing summarization
skills through the use of LSA-based feedback. Interac-
tive Learning Environments,8, 87–109.
Kintsch, W. (1998). Comprehension: A paradigm for
cognition. Cambridge, England: Cambridge University
Press.
Kintsch, W., & Kintsch, E. (2005). Comprehension. In S.
Paris&S. Stahl (Eds.), Current issues in reading com-
prehension and assessment (pp. 71–92). Mahwah,
NJ: Erlbaum.
Landauer, T., Foltz, P., & Laham, D. (1998). An introduc-
tion of latent semantic analysis. Discourse Processes,
25, 259–284.
Leslie, L., & Caldwell, J. (2001). Qualitative Reading
Inventory-3. New York: Addison-Wesley Longman,
Inc.
Lindstedt, D. E. (2000). Eyewitness reporting by Navajo
and mainstream-culture children. Communication
Disorders Quarterly,21, 166–175.
McFadden, T. (1996). Creating language impairments in
typically achieving children: The pitfalls of “normal”
language sampling. Language, Speech & Hearing Ser-
vices in the Schools,27, 3–9.
Nation, K., Adams, J., Bowyer-Crane, N., & Snowling,
M. (1999). Working memory deficits in poor com-
prehenders reflect underlying language impairments.
Journal of Experimental Child Psychology,73, 139–
158.
Nation, K., & Norbury, C. (2005). Specific comprehen-
sion deficits in children with reading disabilities and
specific language impairment. Topics in Language
Disorders,25(1), 21–32.
Nation, K., & Snowling, M. (1998). Individual differences
in contextual facilitation: Evidence from dyslexia and
poor reading comprehension. Child Development,
69, 996–1011.
National Center for Education Studies. (2000). National
assessment of educational progress. Washington, DC:
Department of Education.
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
The Assessment of Reading Comprehension 49
National Reading Panel. (2000). Teaching children to
read. Bethesda, MD: Department of Health and Hu-
man Services, NICHD Clearinghouse.
Newcomer, P. L. (1999). Standardized Reading Inven-
tory (2nd ed.). Austin, TX: PRO-ED.
No Child Left Behind Act. (2002). Bill summary and sta-
tus for the 107th Congress, PL 107–110. Retrieved
from http://www.thomas.loc.gov
Northwest Evaluation Association. (2000a). Measures of
academic progress. Portland, OR: Author.
Northwest Evaluation Association. (2000b). Achievement
level tests. Portland, OR: Author.
O’Connor, R., & Jenkins, J. (1999). Prediction of reading
disabilities in kindergarten and first grade. Scientific
Studies of Reading,3, 159–197.
Olson, R., Forsberg, H., Wise, B., & Rack, J. (1994).
Measurement of word recognition, orthographic, and
phonological skills. In G. R. Lyon (Ed.), Frames of ref-
erence for the assessment of learning disabilities:
New views on measurement issues (pp. 243–277).
Baltimore: Brookes.
Payne, R. (2003). Aframework for understanding
poverty (3rd Rev. ed.). Bayswater, TX: Aha Process,
Inc.
Pellom, B., & Hacioglu, K. (2003, April). Recent Improve-
ments in the CU Sonic ASR System for Noisy Speech.
Paper presented at IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP),
Hong Kong.
Perfetti, C. (1985). Reading ability. Oxford, England:
Oxford University Press.
Rasch, G. (1980). Probablistic models for some intelli-
gence and attainment tests. (Expanded ed.). Chicago:
University of Chicago Press.
Rathvon, N. (2004). Advances in early reading assess-
ment. New York: Guilford Press.
Rayner, K., Foorman, B. R., Perfetti, C. A., Pesetsky, D.,
&Seidenberg, M. S. (2001). How psychological sci-
ence informs the teaching of reading. Psychological
Science in the Public Interest,2, 31–74.
Riegelman, R. K. (2000). Studying a study and testing
atest: How to read the medical evidence (4th ed.).
Philadelphia: Lippincott Williams & Wilkins.
Riordan, J., & Snyder, L. (2002, February). Identifying
kindergarten students at risk for reading difficul-
ties.Paper presented at the Courage to Risk Coun-
cil for Exceptional Children Conference, Colorado
Springs.
Sackett, D. L., Straus, S. E., Richardson, W. S., Rosenberg,
W. ,&Haynes, R. B. (2000). Evidence-based medicine:
How to practice and teach EBM (2nd ed.). Edinburgh,
UK: Churchill Livingstone.
Scarborough, H. (1998). Predicting the future achieve-
ment of second graders with reading disabilities: Con-
tributions of phonemic awareness, verbal memory,
rapid naming and I.Q. Annals of Dyslexia,48, 115–
136.
Shapely, K. S., & Prash, A. J. (1999). Developing a valid
and reliable porfolio assessment in the primary grades.
Applied Measurements in Education,17, 111–132.
Shin, J., Deno, S., & Espin, C. (2000). Technical adequacy
of the maze task for curriculum-based measurement
of reading growth. Journal of Special Education,34,
164–172.
Silliman, E., & Wilkinson, L. C. (2004). Collaboration for
language and literacy learning: Three challenges. In E.
Silliman & L. C. Wilkinson (Eds.), Language and liter-
acy learning in schools (pp. 3–38). New York: Guil-
ford Press.
Simmons, D., Kuykendall, K., Cornachione, C., &
Kame’enui, E. (2000). Implementation of a school-
wide reading improvement model. Learning Disabil-
ities Research & Practice,15, 92–100.
Snow, C. (2002, February 23). The centrality of lan-
guage in literacy development.Paper presented at
the Katherine G. Butler Symposium on Child Lan-
guage: Innovations in Research and Practice, San Jose,
CA.
Snow, C. (2003). Assessment of reading comprehen-
sion. In A. P. Sweet & C. E. Snow (Eds.), Rethinking
reading comprehension (pp. 192–206). New York:
Guilford Press.
Snow, C., Burns, M., & Griffin, P. (Eds.). (1998). Pre-
venting reading difficulties in young children. Wash-
ington, DC: National Research Council, National
Academy Press.
Snyder, L. (2002, August). Report of the Summer Literacy
Academy assessment pilot study.Laboratory Techni-
cal Paper, Center for hyphen language Research, Uni-
versity of Colorado at Borlder.
Stanovich, K. (1986). Matthew effects in reading: Some
consequences of individual differences in acquisition
of literacy. Reading Research Quarterly,21, 1–29.
Stanovich, K. (2000). Progress in understanding read-
ing. New York: Guilford Press.
Stenner, A. J. (2000). Measuring reading ability, read-
ability and reading comprehension. Durham, NC:
MetaMetrics, Inc.
Thompson, S., & Thurlow, M. (2003). 2003 state special
education outcomes: Marching on. Minneapolis, MN:
National Center on Educational Outcomes Report.
Torgesen, J., Wagner, R., & Rashotte, C. (1997). Preven-
tion and remediation of severe reading disabilities:
Keeping the end in mind. Scientific Studies of Read-
ing,1, 217–234.
Valencia, S., & Wixson, K. (2000). Policy oriented re-
search on literacy standards and assessment. In M.
Kamil, P. Mosenthal, P. D. Pearson, & R. Barr (Eds.),
Handbook of reading research (Vol. III, pp. 909–
935). Mahwah, NJ: Erlbaum.
Wade-Stein, D., & Kintsch, E. (2004). Summary Street®:
Interactive computer support for writing. Cognition
and Instruction,22, 333–362.
Wise, B. (2004, November 4). Past and present re-
search with talking computers in Colorado.Part
of a symposium at the annual meeting of the
LWW/TLD lwwj081-04 March 30, 2005 21:56 Char Count= 0
50 TOPICS IN LANGUAGE DISORDERS/JANUARY–MARCH 2005
International Dyslexia Association on Computers &
Reading Disabilities Research & Research-Based Appli-
cations, Philadelphia.
Wise, B., Cole, R., van Vuuren, S., Schwartz, S., Snyder, L.,
& Ngampatipatpong, N. (in press). Learning to read
with a virtual tutor: Foundational exercises and inter-
active books. In C. Kinzer & L. Verhoeven (Eds.), In-
teractive literacy education. Mahwah, NJ: Erlbaum.
Wise, B., & Snyder, L. (2002). Clinical judgments in
identifying and teaching children with language-based
reading difficulties. In R. Bradley, L. Danielson, & D.
Hallahan (Eds.), Identification of learning disabili-
ties: Research to practice (pp. 653–692). Mahwah, NJ:
Erlbaum.
Woods, M., & Moe, W. (1999). Analytical reading inven-
tory (6th ed.). Columbus, OH: Merrill.
... Μια άλλη έρευνα (Keenan, Betjemann, & Olson, 2008) ενισχύει την άποψη αυτή, υποστηρίζοντας ότι στα σύντομα κείμενα η μη σημασιολογική αναγνώριση των λέξεων δυσκολεύει την κατανόηση από ότι στα μεγαλύτερης έκτασης κείμενα. Όπως ήταν αναμενόμενο, ο τύπος αυτός μέτρησης έχει δεχθεί έντονη κριτική, καθώς δεν απαιτεί από τον αναγνώστη να έχει μια συνολική εικόνα του κειμένου και, έτσι, δεν προάγει την ενδοκειμενική και διακειμενική κατανόηση (Stahl, 2009(Stahl, • Κρόκου, 2011, αλλά αξιολογεί τις ικανότητές του για τη σωστή λεξιλογική επιλογή μέσα από τους συντακτικούς και μορφολογικούς περιορισμούς που επιβάλλει ο προτασιακός σχηματισμός και η τοπική κειμενική γνώση (Snyder, Caccamise, & Wise, 2005). ...
... Κατασκευή ερωτήσεων πολλαπλής επιλογής Οι ερωτήσεις πολλαπλής επιλογής αποτελούν τον πιο συχνόχρηστο τρόπο αξιολόγησης της αναγνωστικής κατανόησης και αξιοποιούνται στα περισσότερα σταθμισμένα τεστ (Snyder, Caccamise, & Wise, 2005• Pearson & Hamm, 2005. ...
Article
Full-text available
Οι έρευνες τις τελευταίες δεκαετίες, οι οποίες εστιάζουν στην εφαρμογή της διαμορφωτικής αξιολόγησης για την αποτίμηση της μάθησης μέσω της αναγνωστικής κατανόησης των μαθητών, προτείνουν τον σχεδιασμό κατάλληλων ερωτήσεων, οι οποίες θα προωθούν και θα ανιχνεύουν τη διαδικασία της μάθησης και θα αποτελούν παράλληλα εργαλεία ενίσχυσης της κυριολεκτικής, της ενδοκειμενικής και διακειμενικής κατανόησης αλλά και κίνητρα εμπλοκής των μαθητών στη μάθηση. Για να επιτελούν οι ερωτήσεις τον ρόλο για τον οποίο σχεδιάζονται, δηλαδή να αξιολογούν την κατανόηση των μαθητών κυρίως ως διαδικασία και μετά ως αποτέλεσμα, θα πρέπει να πληρούν συγκεκριμένες προδιαγραφές, όπως είναι η σαφήνεια, η ακρίβεια, να είναι εύληπτες και να έχουν ξεκάθαρο γνωστικό στόχο. Για τους μαθητές με μαθησιακές δυσκολίες, απαιτείται διαφοροποιημένος σχεδιασμός, που θα τους βοηθήσει στην ευχερή αποκωδικοποίηση και θα τους παρωθήσει να διαβάσουν το κείμενο και να συλλέξουν τις βασικότερες πληροφορίες από αυτό, καθώς αυτοί αντιμετωπίζουν δυσκολίες στην κατανόηση των ίδιων των ερωτήσεων. Η συμβολή των ερωτήσεων είναι σημαντική στην ανάπτυξη και ενίσχυση της αναγνωστικής κατανόησης∙ από τη μία καθοδηγούν τη σκέψη και λεκτικοποιούν τον τρόπο που συντελείται η μάθηση και από την άλλη αποτελούν εργαλεία της αξιολόγησης της κατανόησης, τόσο ως διαδικασία όσο και ως αποτέλεσμα της μάθησης (Κρόκου, 2011). Μέσα από τον σχεδιασμό κατάλληλων ερωτήσεων, ο εκπαιδευτικός μπορεί να παρατηρήσει τη διάρθρωση της σκέψης του μαθητή και να ανιχνεύσει ποιες στρατηγικές ανάκλησης και κατανόησης χρησιμοποιεί. Οι μαθητές, οι οποίοι μπορούν και παράγουν ερωτήσεις, είναι ικανοί να σκέφτονται και να μαθαίνουν. Υπάρχουν αρκετά είδη ερωτήσεων που ενεργοποιούν και κινητοποιούν τη σκέψη ή αναζητούν αυτό που κρύβεται κάτω από την επιφανειακή γνώση ή κινητοποιούν για την αναζήτηση επιπλέον πηγών ή ελέγχουν την ορθότητα των αποφάσεών μας.
... Reading comprehension refers to the essential skill of extracting information and knowledge from diverse textual sources, including narratives, scientific articles, and historical documents [1]. Reading is a commonplace that involves thinking, feeling, and imagining [2]. ...
... To optimize the identifier, we employ the Mean Square Error (MSE) as the objective function L (1) , comparing the actual importance score y i with the predicted importance scoreŷ i : ...
Preprint
Full-text available
The cultivation of children's intellectual and literacy skills will benefit from heuristic questioning in educational readings like fairy tales. However, as not all stories naturally encompass expert-derived questions, machine-generated questions need to serve as indispensable supplements to enrich the learning experience. Unfortunately, current text generation models fail to generate high-cognitive educational questions closely related to diverse knowledge of stories. To this end, we propose a novel framework that employs automatic hybrid prompt learning to enable self-questioning to organize knowledge. Initially, we design an identifier to locate sentences containing knowledge within a given text, and then train a model to generate corresponding knowledge inferences. Each inference, which acts as the hard prompt, is concatenated with the soft prompt (\textit{i.e.}, learnable parameters) to construct the hybrid prompt. Equipped with these prompts, pre-trained language models can be facilitated to generate questions and then their answers. These question and answer pairs are distillations of the reading's knowledge. We evaluate the generation performance of our framework on an educational question-answering benchmark known as FairytaleQA. Experimental results demonstrate that our framework outperforms baselines according to automatic and manual evaluation metrics. Notably, our approach excels at generating diverse heuristic questions. Moreover, our work holds the potential to contribute significantly to the advancement of children's education.
... As a result, many of them failed to attend to the given questions correctly as confirmed by the evaluation of the lesson plan for teacher A2 in Figure 1 . This form of assessment therefore does not agree with Snyder's et al, (2005) key features of knowledge skills that reading comprehension assessment seeks to achieve. This also confirms Tasdemir (2010) assertion who states in his research done in Turkey on the Effects of Reading Comprehension Techniques on Students' Success, that reading comprehension in school has been simply reading and answering questions. ...
... cular topic." Therefore, the type of assessment tasks or questions determine what skill the learner is required to demonstrate in attending to the given task or problem. Different scholars from all over the world have conducted studies related to reading comprehension assessments tasks and/or strategies and the skills reading comprehension engages.Snyder et al (2005, p.10) presents the following as the five key features of knowledge skills that reading comprehension assessment seeks to achieve:  Extracting meaning from the wording and grammar of the text. Building a basic understanding of the text base explicitly contained in the selection. Going beyond what is explicitly stated in the text, draw ...
Thesis
Full-text available
Effective and appropriate instructional practices in teaching and assessing reading comprehension by secondary school teachers are very fundamental if learners were to build the reading comprehension skills required for them to pass English language and other subjects. The study therefore examined the teachers’ instructional practices in the teaching and assessment of reading comprehension. This study generally looked at what instructional practices teachers used to help the learners derive meaning or answers from explicitly stated information, making of both direct and indirect inferences, and interpreting/integrating ideas and information. Lastly, the study also focussed on establishing teachers’ instructional practices in ensuring that learners were able to examine and evaluate content, language and textual elements. The other feature that was important to this study was to find out if learners were truly failing reading comprehension assessment or not. Therefore, part of the study assessed the learners in reading comprehension in order to determine if the statistics at national level were the same as at classroom level. Twelve teachers, three heads of departments and 200 learners were sampled from three different government schools of Ndola on the Copperbelt Province of Zambia. These were purposively and simple randomly sampled respectively to participate in the study. The Study employed a Mixed Method approach by way of embedded research design. Data collection was done using lesson observations, semi- structured interviews, questionnaires, document analysis and a reading comprehension test. Thematic analysis was used to analyse qualitative data while Microsoft Excel to find mean averages on learner performance with regard to the four processes of reading comprehension by the Progress in International Reading Literacy Study (PIRLS) was used to analyse quantitative data. The findings revealed that, even though teachers taught and assessed the learners in reading comprehension, they did not impart the necessary reading comprehension skills in totality. All the teachers mostly focussed merely on reading, speed, pronunciation of words and vocabulary. They did not give detailed explanations on how the learners were expected to acquire the necessary Comprehension skills. To them, reading comprehension was about reading a text and then answer a few questions thereafter. Furthermore, it was revealed that learners performed well in the questions that did not require them to think beyond the text but on the other hand, the performance was very poor in those questions that required high thinking skills. In view of this, it was recommended that teachers should be oriented to Progress in International Reading Literacy Study so that they can apply in class when teaching and assessing learners in Reading Comprehension. This will help them discover ways of ensuring that the skills of reading comprehension are imparted in learners. Teachers should also ensure that they plan adequately for reading comprehension and formulate questions based on the skills they are teaching in each particular lesson.
... Ένα άλλο ενδιαφέρον σημείο της παρούσας έρευνας είναι ότι οι περισσότεροι, από τη συγκεκριμένη πολιτισμική ομάδα, μαθητές έχουν γεννηθεί στην Ελλάδα, χωρίς, όμως, αυτό να αποτελεί πάντα διαφοροποιητικό παράγοντα των αναγνωστικών τους ικανοτήτων, καθώς όπως αναφέρουν οι Snyder, Caccamise, & Wise (2005) ελλοχεύει πάντα μια ασυμφωνία του γνωστικού τους υπόβαθρου και των εμπειριών τους με τις κειμενικές καταστάσεις όπως απεικονίζονται στα εργαλεία αξιολόγησης της αναγνωστικής κατανόησης. Έτσι, οι αξιολογήσεις τείνουν να υποτιμούν τις ικανότητες κατανόησης των μαθητών των οποίων οι εμπειρίες δεν περιλαμβάνουν καταστάσεις που αναφέρονται σε ένα αφηγηματικό κείμενο ή δεν έχουν τις απαραίτητες προηγούμενες γνώσεις για την αρτιότερη προσπέλαση ενός πληροφοριακού κειμένου. ...
Conference Paper
Full-text available
Οι μαθητές των Ε΄ και Στ΄ τάξεων του δημοτικού σχολείου παρουσιάζουν, σύμφωνα και με τη διεθνή βιβλιογραφία, μία αδικαιολόγητη “διασάλευση της αναγνωστικής κατανόησης”, φαινόμενο το οποίο προβληματίζει τους εκπαιδευτικούς αλλά και τους ερευνητές της αναγνωστικής διαδικασίας. Η συνεξέταση των ενδοατομικών - γνωστικών παραγόντων με τους κοινωνικούς - οικογενειακούς και τους σχολικούς παράγοντες μπορεί να μας δώσει μία σαφή εικόνα της ποσοτικής και ποιοτικής επίδρασής τους στην αναγνωστική κατανόηση των μαθητών. Σε ένα δείγμα 1614 μαθητών ερευνάται η επίδραση α. του φύλου, β. των μαθησιακών δυσκολιών που κάποιοι μαθητές αντιμετωπίζουν, γ. το μορφωτικό επίπεδο των γονέων, δ. η εθνικοπολιτισμική καταγωγή και ε. η σχολική επίδοση, στην ενίσχυση ή παρακώλυση της ικανότητας της αναγνωστικής κατανόησης των μαθητών. Τα αποτελέσματα της συγκεκριμένης έρευνας τις περισσότερες φορές συνάδουν και με τα σύγχρονα ερευνητικά δεδομένα του διεθνούς ακαδημαϊκού χώρου και προσφέρουν έναυσμα αξιοποίησης εναλλακτικών διδακτικών προσεγγίσεων. Κατά τη στατιστική επεξεργασία των ερευνητικών δεδομένων που συλλέξαμε, διαφάνηκε ότι οι μαθητές με μαθησιακές δυσκολίες εμφάνισαν χαμηλότερες επιδόσεις σε σχέση με τους μαθητές που δεν αντιμετωπίζουν μαθησιακές δυσκολίες. Η διαφορά μεταξύ των δύο μέσων τιμών είναι στατιστικά σημαντική (t(361)=13,918, p=0,000<0,001), το οποίο σημαίνει ότι το εύρημα αυτό δυνητικά μπορεί να ισχύει για όλο τον μαθητικό πληθυσμό (δηλαδή, οι μαθητές με μαθησιακές δυσκολίες παρουσιάζουν σημαντική διαφορά στην αναγνωστική κατανόηση σε σχέση με εκείνους τους μαθητές που δεν εμφανίζουν μαθησιακές δυσκολίες). Τα παραπάνω δεδομένα έρχονται σε πλήρη συμφωνία με τα αποτελέσματα όλων των ερευνών, οι οποίες συνεξετάζουν την αναγνωστική κατανόηση μεταξύ των μαθητών που δεν αντιμετωπίζουν αναγνωστικές δυσκολίες και εκείνων που εμφανίζουν διαγνωσμένες μαθησιακές δυσκολίες και οι οποίες δείχνουν τις σημαντικές διαφορές στην επίδοση των δύο ομάδων ως προς την αναγνωστική κατανόηση, με την ομάδα των μαθητών που δεν αντιμετωπίζουν αναγνωστικές δυσκολίες να υπερτερούν σε όλα τα επίπεδα αξιολόγησης της αναγνωστικής κατανόησης.
... Οι ερωτήσεις πολλαπλής επιλογής αποτελούν τον πιο συχνόχρηστο τρόπο αξιολόγησης της αναγνωστικής κατανόησης και συναντάται στα περισσότερα σταθµισµένα τεστ (Snyder, Caccamise, & Wise, 2005. Pearson & Hamm, 2005. ...
Conference Paper
Full-text available
Οι αναγνωστικές δυσκολίες που αντιμετωπίζουν οι μαθητές είτε στην αποκωδικοποίηση είτε στην κατανόηση είτε και στις δύο διαδικασίες, αποτελούν το 70% των μαθησιακών δυσκολιών. Στον ελληνικό εκπαιδευτικό χώρο η κατανόηση και η αξιολόγησή της δεν έχουν ερευνηθεί αρκετά και δεν έχουν δημιουργηθεί ανάλογα εργαλεία τα οποία θα αποτελέσουν ένα εύχρηστο επικουρικό υλικό στα χέρια των εκπαιδευτικών. Για να μπορέσει να αξιολογηθεί η κατανόηση, θα πρέπει να διευκρινιστούν με σαφήνεια τα επίπεδά της αλλά και οι θεωρίες οι σχετικές με την ανάγνωση και την αναγνωστική κατανόηση ιδιαίτερα, πάνω στις οποίες θα στηριχτεί η ανάπτυξη ενός εργαλείου. Η συγκεκριμένη έρευνα έγινε στο πλαίσιο της διδακτορικής μου διατριβής, το θέμα της οποίας αφορούσε την ανάπτυξη και παραγωγή ενός εργαλείου ανίχνευσης των δυσκολιών στην αναγνωστική κατανόηση που τυχόν αντιμετωπίζουν οι μαθητές των Ε΄ και Στ΄ τάξεων του δημοτικού σχολείου είτε αντιμετωπίζουν αναγνωστικές δυσκολίες είτε όχι. Πραγματοποιήθηκε σε σχολεία της περιφέρειας της Αττικής και το δείγμα το αποτελούσαν 1614 μαθητές από τους οποίους οι 272 αντιμετώπιζαν διαγνωσμένες μαθησιακές δυσκολίες. Τα αποτελέσματα της έρευνας στη στατιστική ανάλυση που ακολούθησε, αξιολογήθηκαν σε ποσοτικό και σε ποιοτικό επίπεδο. Ειδικότερα, το συγκεκριμένο εργαλείο πληροί σε σημαντικό βαθμό την προϋπόθεση της ανίχνευσης των τυχόν αναγνωστικών δυσκολιών που αντιμετωπίζουν οι συγκεκριμένοι μαθητές.
... Μια άλλη έρευνα (Keenan, Betjemann, & Olson, 2008) ενισχύει την άποψη αυτή, υποστηρίζοντας ότι στα σύντομα κείμενα η μη σημασιολογική αναγνώριση των λέξεων είναι περισσότερο καταστροφική από ότι στα μεγαλύτερης έκτασης κείμενα. Αναμενόμενα, λοιπόν, ο τύπος αυτός μέτρησης έχει δεχθεί έντονη κριτική, καθώς δεν απαιτεί από τον αναγνώστη να έχει μια συνολική εικόνα του κειμένου (Fuchs, Fuchs, & Maxwel, 1988) και, έτσι, δεν προάγει την ενδοκειμενική και διακειμενική κατανόηση (Stahl, 2009), αλλά αξιολογεί τις ικανότητές του για τη σωστή λεξιλογική επιλογή μέσα από τους συντακτικούς και μορφολογικούς περιορισμούς που επιβάλλει ο προτασιακός σχηματισμός και η τοπική γνώση (Snyder, Caccamise, & Wise, 2005). ...
Article
Full-text available
Η αναγνωστική κατανόηση δεν είναι το αποτέλεσμα αλλά η δια¬δικασία της ανάγνωσης. Για τον λόγο αυτό έχει μελετηθεί τις τελευταίες δεκαετίες η αξιολόγησή της, τόσο στην ποιοτική όσο και στην ποσοτική διάστασή της. Στη συγκεκριμένη εργασία δίνεται μία ξεκάθαρη πρόταση των επιπέδων της κατανόησης που θα πρέπει να αξιολογούνται από τους εκπαιδευτικούς αλλά και τους ερευνητές της ανάγνωσης, καθώς και των μεταβλητών που την επηρεάζουν και οι οποίες θα μπορούσαν να διαμορφώσουν τους πυλώνες των κριτηρίων αξιολόγησης. Τέλος, αναφέρονται οι τρόποι και οι μορφές αξιολόγησης της αναγνωστικής κατανόησης, οι οποίοι χρησιμοποιούνται στη δόμη¬ση τυπικών ή άτυπων μορφών αξιολόγησης τόσο για τους μαθητές που δεν αντιμετωπίζουν αναγνωστικές δυσκολίες αλλά και για μαθητές που έχουν διαγνωστεί με ειδικές μαθησιακές δυσκολίες. Στον ελληνικό εκπαιδευτικό χώρο, η αξιολόγηση, και η αξιολόγηση της κατανόησης ειδικότερα, γίνεται από τους εκπαιδευτικούς με τη χρήση αυ¬τοσχέδιων δοκιμασιών,1 δε μετρά με ποιοτικά κριτήρια την επίδοση των μαθητών στον συγκεκριμένο τομέα και δεν αξιοποιούνται τα αποτελέσμα¬τά της μακροπρόθεσμα για τη δόμηση εναλλακτικών διδακτικών μεθόδων ή για τον εμπλουτισμό των ήδη υπαρχόντων παρεμβάσεων. Τα σταθμισμένα εργαλεία που υπάρχουν, συνήθως, δε χρησιμοποιού¬νται ομαδικά στην τάξη ή δεν αφορούν τον γενικό μαθητικό πληθυσμό, αλλά μόνο εκείνους τους μαθητές που ανήκουν σε ομάδες υψηλού κινδύ¬νου να αντιμετωπίσουν μαθησιακές δυσκολίες ή ειδικότερα αναγνωστικές δυσκολίες. Η ανατροφοδότηση που προέρχεται από την αξιολόγηση θα πρέπει να ακολουθεί δύο διαστάσεις: το εύρος και το βάθος του γνωστικού προφίλ του μαθητή και τη χρηστικότητα της γνώσης. Ένα μόνο εργαλείο αξιολόγησης δεν πληροί και τις δύο διαστά¬σεις, αντίθετα μπορεί να εμφανίσει ένα διαστρεβλωμένο προφίλ ικανο¬τήτων, οπότε η ποικιλία εργαλείων αξιολόγησης είναι ευ¬κταία. Επιπλέον, η ανατροφοδότηση θα πρέπει να ωφελεί το μαθησιακό περιβάλλον και να παρουσιάζεται σαν αναπόσπαστο κομμάτι της μάθησης.
... Consequently, they often perform lower levels of comprehension skill (e.g., retelling, recall, etc.). In contrast, students with high reading accuracy can direct more attention to develop higher level comprehension skills (e.g., integration and reasoning) [21]. ...
Article
Full-text available
Reading comprehension is a widely adopted method for learning English, involving reading articles and answering related questions. However, the reading comprehension training typically focuses on the skill level required for a standardized learning stage, without considering the impact of individual differences in linguistic competence. This paper presents a personalized support system for reading comprehension, named ChatPRCS, based on the Zone of Proximal Development (ZPD) theory. It leverages the advanced capabilities of large language models (LLMs), exemplified by ChatGPT (Chat Generative Pre-trained Transformer). ChatPRCS employs methods including skill prediction, question generation, and automatic evaluation, to enhance reading comprehension instruction. Firstly, a ZPD-based algorithm is developed to predict students' reading comprehension skills. This algorithm analyzes historical data to generate questions with appropriate difficulty. Second, a series of ChatGPT prompt patterns is proposed to address two key aspects of reading comprehension objectives: question generation, and automated evaluation. These patterns further improve the quality of generated questions. Finally, by integrating personalized skill prediction and reading comprehension prompt patterns, ChatPRCS is validated through a series of experiments. Empirical results demonstrate that it provides learners with high-quality reading comprehension questions that are broadly aligned with expert-crafted questions at a statistical level. Furthermore, this study investigates the effect of the system on learning achievement, learning motivation and cognitive load, providing further evidence of its effectiveness in instructing English reading comprehension.
... A failure to control for individual differences in motivation or background knowledge is a weakness of assessments (see: Sabatini et al., 2013). For example, adults who left formal education early find it difficult to process more complex syntactic structure which occur mostly in writing and not in speech (Dąbrowska, 1997); if the test uses such language structures it won't be valid for the test taker (Snyder et al., 2005). Many reading comprehension tests use expository or literary texts, similarly to school textbooks. ...
Article
Full-text available
The paper reviews the methods for assessing different components of reading skills in adults with reading difficulties, along with functional reading skills. We are particularly interested in the assessment methods available to researchers and practitioners, developed predominantly in the research context, and not available solely in English. We discuss the large-scale international study, PIAAC, as an example of a framework for such assessments. Furthermore, we cover the following types of assessment tools: (1) self-assessment questionnaires, probing into comprehension difficulties and reading habits; (2) measures of print exposure, such as author recognition tests, correlating with other reading-related skills; (3) measures of word recognition and decoding, including reading aloud of words and pseudowords, as well as silent lexical decision tasks; (4) fill-in-the-blank tasks and sentence reading tasks, measuring predominantly local comprehension, entangled with decoding skills; (5) comprehension of longer reading passages and texts, focusing on functional texts. We discuss comprehension types measured by tests, text types, answer formats, and the dependence problem, i.e., reading comprehension tests that can be solved correctly without reading. Finally, we tap into the new ideas emerging from the AI systems evaluation, e.g., using questions generated from news articles or Wikipedia or asked directly by search engines users. In the concluding section, we comment on the significance of incorporating background information, motivation, and self-efficacy into the assessment of adult literacy skills.
Article
Reading outcomes at a national level have remained stagnant for more than two decades. One reason why is that the field has struggled with how to address poor reading comprehension when reading words is not the problem. Another is that limited insight into the causes of poor comprehension performance is offered by traditional reading comprehension measures. This article describes the Multiple-choice Online Causal Comprehension Assessment (MOCCA), a measure of reading comprehension designed to assess the cognitive processes in which Grade 3 to 6 students engage when trying to comprehend as they read. In contrast to traditional reading comprehension assessments, MOCCA provides diagnostic information about how students who are struggling with reading comprehension cognitively approach the comprehension task. Multiple-choice Online Causal Comprehension Assessment scores are directly aligned to instructional decisions and are most useful when triangulated with other reading curriculum-based measures, for which this article provides two examples and a decision-making heuristic.
Article
Full-text available
Many popular standardized language tests use a "normal" sample for their normative comparison group. This article describes the errors that result from current uses of such tests; (a) identification of normal children as language impaired, (b) provision of misleading profiles of verbal and nonverbal performance, (c) inability to estimate the severity or describe the general nature of a language impairment, and (d) a gradual increase in the number of children identified as language impaired with progressive test re-normings. Recommendations to alleviate this problem include (a) test users employing only full-range standardized tests; (b) test users using flexible cutoff scores, with the major emphasis on descriptive, criterion-referenced testing, and (c) test makers moving to full-range normative samples with descriptions of what particular test performances indicate in terms of the daily communication functioning of typically achieving children and clinical subgroups of children.
Article
Full-text available
The purpose of the present study was to examine the technical adequacy of curriculum-based measurement (CBM) for assessing student growth over time. Participants were 43 second graders whose reading performance was measured monthly over 1 school year with the maze task. Technical characteristics of the CBM maze task were examined in terms of reliability, sensitivity, and validity for assessing student growth. Results showed that the maze task had good alternate-form reliability, with a mean coefficient of .81 and 1- to 3-month intervals between testing. The maze task also sensitively reflected improvement of student performance over a school year and revealed interindividual differences in growth rates. Finally, growth rates estimated on repeated maze scores were positively related to later reading performance on a standardized reading test; in addition, although a significant difference was not found, general education students appeared to develop reading proficiency faster than remedial education students. Results support the use of the maze task as a reliable, sensitive, and valid data collection procedure for assessing reading growth.
Article
This study investigated the validity of three curriculum-based measures for predicting the performance of secondary students on content-area tasks. It was hypothesized that oral reading, maze, and vocabulary measures would predict students' performance on comprehension, acquisition, and retention of content-area material. Participants were 184 urban middle school students, including 13 with mild disabilities. Reliable correlations were found between each of the three general outcome measures and performance on the content tasks. Results of regression analyses supported the vocabulary measure as the most efficient and effective measure for predicting student performance on the three content-area tasks.
Article
Summary Street is educational software based on latent semantic analysis (LSA), a computer method for representing the content of texts. The classroom trial described here demonstrates the power of LSA to support an educational goal by providing automatic feedback on the content of students' summaries. Summary Street provides this feedback in an easy-to-grasp, graphic display that helps students to improve their writing across multiple cycles of writing and revision on their own before receiving a teacher's final evaluation. The software thus has the potential to provide students with extensive writing practice without increasing the teacher's workload. In classroom trials 6th-grade students not only wrote better summaries when receiving content-based feedback from Summary Street, but also spent more than twice as long engaged in the writing task. Specifically, their summaries were characterized by a more balanced coverage of the content than summaries composed without this feedback. Greater improvement in content scores was observed with texts that were difficult to summarize. Classroom implementation of Summary Street is discussed, including suggestions for instructional activities beyond summary writing.
Article
In response to assessment guidelines defined by the Improving America's Schools Act of 1994, a reading/language arts portfolio assessment was included in the Dallas Public Schools' Title I Local Education Agency Plan (Dallas Public Schools, 1995) as an optional student assessment for students in prekindergarten to second grade. The objective of this study was to examine the extent to which the 1995-96 portfolio assessment met appropriate technical standards for its intended uses. After 3 years of development and gradual improvement, the portfolio assessment did not provide high quality information about student achievement as hoped. The reliability of the scores was low, and the portfolio contents did not provide a valid sample of students' work. Implications for other districts are discussed related to technical quality, staff development, and cost.
Article
This study examined culturally based differences in children's eyewitness reporting. The same live event was enacted with one mainstream-culture and one Navajo (Diné) third-grade class. Ten days later an interviewer used a forensic protocol to question each child. The children's answers were analyzed for amount and accuracy of reported information. Mainstream-culture children reported more information overall than the Navajo, but the groups did not differ in accuracy. Active participation affected the amount of information reported by mainstream-culture children, whereas Navajo children were equally informative as either participants or observers. Results indicate a cultural orientation to the recall of personally experienced events.