ArticlePDF Available

How effective are one-to-one tutoring programs in reading for elementary students at risk for reading failure? A meta-analysis of the intervention research

Authors:

Abstract and Figures

A meta-analysis of supplemental, adult-instructed one-to-one reading interventions for elementary students at risk for reading failure was conducted. Reading outcomes for 42 samples of students ( N = 1,539) investigated in 29 studies reported between 1975 and 1998 had a mean weighted effect size of 0.41 when compared with controls. Interventions that used trained volunteers or college students were highly effective. For Reading Recovery interventions, effects for students identified as discontinued were substantial, whereas effects for students identified as not discontinued were not significantly different from zero. Two studies comparing one-to-one with small-group supplemental instruction showed no advantage for the one-to-one programs. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Content may be subject to copyright.
Journal of Educational Psychology
2000,
Vol. 92, No. 4, 605-619Copyright 2000 by the American Psychological Association, Inc.
O022-0663/O0/$5.00 DOI: 10.1037//0022-0663.92.4.605
How Effective Are One-to-One Tutoring Programs in Reading for
Elementary Students at Risk for Reading Failure?
A Meta-Analysis of the Intervention Research
Batya Elbaum
University of MiamiSharon Vaughn
University of Texas at Austin
Marie Tejero Hughes and Sally Watson Moody
University of Miami
A meta-analysis of supplemental, adult-instructed one-to-one reading interventions for elementary
students at risk for reading failure was conducted. Reading outcomes for 42 samples of students
(N =
1,539)
investigated in 29 studies reported between 1975 and 1998 had a mean weighted effect size
of
0.41
when compared with controls. Interventions that used trained volunteers or college students were
highly effective. For Reading Recovery interventions, effects for students identified as discontinued were
substantial, whereas effects for students identified as not discontinued were not significantly different
from zero. Two studies comparing one-to-one with small-group supplemental instruction showed no
advantage for the one-to-one programs.
One-to-one instruction, provided as a supplement to classroom
teaching, is generally considered to be the most effective way of
increasing students' achievement. The effectiveness of one-to-one
instruction has been validated by empirical research, especially for
students who are considered at risk for school failure or have been
identified as having reading or learning disabilities (Bloom, 1984;
Jenkins, Mayhall, Peschka, & Jenkins, 1974; Juel, 1991; Wasik &
Slavin, 1993). According to Adler (1998), more and more parents,
dissatisfied with their children's academic progress, are hiring
tutors to provide additional instruction to their children.
Classroom teachers identify adult-delivered one-to-one instruc-
tion as the ideal teaching practice but report that they are rarely
able to implement it in their classrooms (Moody, Vaughn, &
Schumm, 1997). Corroborating these teachers' reports is a study
indicating that when one-to-one instruction is provided within the
general education classroom, it is usually implemented for less
than 1 min and serves largely to clarify information, answer
questions, or check for understanding (Mclntosh, Vaughn,
Schumm, Haager, & Lee, 1993) rather than to provide systematic,
remedial instruction. Even in special education classrooms, one-
to-one instruction may occur in only a limited way (Vaughn,
Moody, & Schumm, 1998).
Batya Elbaum, Department of Teaching and Learning and Department
of Psychology, University of Miami; Sharon Vaughn, Department of
Special Education, University of Texas at Austin; Marie Tejero Hughes
and Sally Watson Moody, Department of Teaching and Learning, Univer-
sity of Miami.
This research was supported by
U.S.
Department of Education, Office of
Special Education Programs Grant H023E5005-96.
Correspondence concerning this article should be addressed to Batya
Elbaum, School of Education, P.O. Box 248065, Coral Gables, Florida
33124. Electronic mail may be sent to elbaum@miami.edu.
In the 1970s and early 1980s, many schools adopted schoolwide
tutoring programs for students with academic difficulties. In a
meta-analysis of tutoring outcomes for elementary and secondary
students, Cohen, Kulik, and Kulik (1982) wrote the following:
The tutoring programs offered in many elementary and secondary
schools today differ in an important way from yesterday's tutorial
programs. In most modern programs, children are tutored by peers or
paraprofessionals rather than by regular school teachers or profes-
sional tutors. The use of peer and paraprofessional tutors has dramat-
ically affected the availability of tutoring programs. No longer a
luxury available only to an aristocratic elite, tutoring programs today
are open to boys and girls in ordinary classrooms throughout the
country, (p. 237)
Students who were tutored by their classmates or by older students
made greater academic gains than did untutored students (Cohen et
al.,
1982; Mathes & Fuchs, 1994).
During the 1980s, concern mounted over the high percentage of
students who were not reading at grade level. Students who did not
acquire basic reading skills in the early grades were shown to be at
risk not only for school failure but also for negative outcomes
beyond the school years (Karweit & Wasik, 1992; Kennedy,
Birman, & Demaline, 1986). Systematic, one-to-one instruction by
trained adults was advanced as a way of ensuring that all children
would learn to read in the first years of elementary school.
Subsequently, educational decision makers recognized the lim-
ited extent to which effective, systematic, individual instruction
could be provided by teachers in the context of their classrooms.
As a result, schools invested in additional personnel to provide
one-to-one instruction to students experiencing the greatest diffi-
culty in reading. Depending on the program, the one-to-one in-
struction was provided by teachers (e.g., Clay, 1985), by parapro-
fessionals and volunteers (e.g., Invernizzi, Juel, & Rosemary,
605
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
606
ELBAUM, VAUGHN, HUGHES, AND MOODY
1997),
or by
college students (e.g., Butler, 1991).
The
huge
in-
vestment
of
resources that some
of
these programs required
was
justified
by the
belief that students
who
were given intensive,
one-to-one instruction
by
trained adults before second grade would
attain average levels
of
performance
in
reading and would need
no
further remedial help.
Wasik
and
Slavin (1993) conducted
a
review
of
five adult-
delivered, one-to-one instructional programs
in
reading
for
first
graders with reading difficulties: Reading Recovery, Success
for
All, Prevention
of
Learning Disabilities,
the
Wallach Tutoring
Program,
and
Programmed Tutorial Reading. They concluded that
all five programs yielded significant positive effects, with larger
effects
for
those programs that used certified teachers rather than
paraprofessionals.
Since Wasik and Slavin's (1993) review, many issues have been
raised concerning
not
only
the
cost-effectiveness
but the
efficacy
of programs involving adult, one-to-one tutoring
of
at-risk readers
(Shanahan, 1998).
For
example, Hiebert (1994), Shanahan
and
Barr (1995),
and
Grossen, Coulter,
and
Ruggles (1997) have
reviewed
the
available evidence
on
Reading Recovery interven-
tions and have concluded that numerous flaws
in the
methodology
used
by
proponents
of
Reading Recovery
to
evaluate
and
report
intervention outcomes have resulted
in
inflated claims
as to
what
the intervention achieves. Other researchers have implemented
multitreatment studies
to
assess
the
contribution
of
additional
components
to the
standard Reading Recovery intervention
(Iversen
&
Tunmer, 1993).
Yet
others have investigated whether
one-to-one interventions provide greater benefits than small-group
interventions (Evans, 1996).
In
the
light
of
the extreme importance
of
ensuring that
as
many
children
as
possible acquire adequate literacy skills
in the
early
years
of
schooling
and
given
the
debate over
the
efficacy
of
one-to-one reading interventions
for
children
at
risk
for
reading
failure,
we
undertook
a
rigorous meta-analysis
of the
empirical
findings related
to
adult-delivered, one-to-one instructional inter-
ventions
in
reading
for
elementary school children identified
as
being
at
risk
for
reading failure. Cognizant
of the
methodological
pitfalls
of
many of the primary studies
in
this area
(for
discussions,
see Center, Wheldall,
&
Freeman,
1992;
Grossen
et al., 1997;
Hiebert, 1994; Shanahan, 1998; Shanahan
&
Barr, 1995; Wasik
&
Slavin, 1993),
we
applied stringent parameters
not
only
for the
inclusion
of
studies
in the
synthesis
(cf.
White,
1994) but
also
for
the inclusion
of
individual effect size comparisons
in the
meta-analysis.
Our goal
in
conducting this meta-analysis
was to
answer
the
following questions:
How effective
are
adult-delivered, one-to-one instructional
in-
terventions
in
reading
for
children
at
risk
for
reading failure?
Decisions concerning
the
adoption
and
implementation
of
educa-
tional interventions
are
often based
on
projections—or, lacking
data,
on
conjectures—about expected outcomes.
We
wished
to
provide educators
and
policymakers with
a
reasonable estimate
of
the gains
(as
measured immediately after
an
intervention) that
students
at
risk
for
reading failure
are
likely
to
achieve
as a
result
of participating
in a
one-to-one reading intervention.
How
do key
features
of
the intervention relate
to
intervention
outcomes? Variables related
to the
intervention
can
affect
not
only
its
effectiveness
but
also
its
cost. These variables include
the
expertise
of the
individuals
who
implement
the
program,
the
training they undergo before beginning
the
intervention,
the fre-
quency
of
tutoring sessions,
and the
total hours
of
instruction
provided
to
each student.
To what extent
are
variables related
to
studies' research meth-
odology associated with study outcomes?
An
accurate interpre-
tation
of
intervention outcomes requires
an
examination
of the
relation between effect size variation
and
methodological vari-
ables. Among those
we
examined
are the
method
of
assigning
students
to
treatment groups, whether the researchers implemented
a fidelity
of
treatment check,
and
whether
the
outcomes were
assessed
by
means
of
standardized
or
nonstandardized tests.
How
do the
outcomes of Reading Recovery compare with
out-
comes produced
by
other interventions ? Reading Recovery
is the
most widespread teacher-implemented, one-to-one intervention
currently
in use in
schools
in the
United States.
In
this meta-
analysis,
we
compared Reading Recovery with other one-to-one
interventions designed
to
prevent
or
remediate reading failure
in
young children.
How
do the
outcomes of one-to-one reading interventions
com-
pare with those of small-group interventions? One-to-one inter-
ventions place severe practical limits
on the
number
of
students
that
can
receive supplemental instruction. Despite
the
popular
belief that one-to-one instruction
is
more effective than instruction
delivered
to
larger numbers
of
students, there
is
actually little
systematic evidence
to
support this
belief.
Each additional student
that
can be
accommodated
in an
instructional group represents
a
substantial reduction
in the
per-student cost
of
the intervention,
or,
alternatively,
a
substantial increase
in the
number
of
students that
can
be
served
(cf.
Hiebert,
1994;
Shanahan
&
Barr, 1995).
Adult-delivered, one-to-one reading interventions
for
students
at
risk have achieved widespread currency
in the
United States.
The
present meta-analysis
was
designed
to
provide researchers, educa-
tors,
and policymakers with information that would inform options
and improve academic outcomes
for
elementary students
who
experience severe difficulty
in
reading.
Method
The design
of
this meta-analysis followed best practices
for
research
synthesis
as
described by Cooper and Hedges (1994) and applied analytic
procedures described by Glass, McGaw, and Smith (1981)
and
Cooper
and
Hedges (1994).
Literature Search
Key terms
related
to one-to-one instruction in reading were identified on
the basis of previous research and from database thesaumses (e.g., the
Thesaurus
of
ERIC
Descriptors [Educational Research Information Center,
1995]).
These key terms were used to conduct multiple computer and hand
searches of the literature. Criteria for the inclusion of
studies
in the research
synthesis were as follows: (a) The study was published or available
between 1975 and 1998; (b) study participants were elementary students
identified as at risk for reading failure, scoring in the lowest 20-30
percentile on grade level reading assessments, or possessing learning
disabilities; (c) outcomes of students who received one-to-one instruction
in reading were compared with those of students who exhibited compara-
bly low performance in reading but did not receive one-to-one instruction
in
reading;
and (d) outcome data amenable to the calculation of an effect
size (e.g., means and standard deviations or I tests) were reported. The
criterion
regarding
students in the control group was especially important
in that findings from studies using a higher performing comparison group
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ONE-TO-ONE INSTRUCTION IN READING607
have been seriously confounded by the phenomenon of regression to the
mean (for a discussion of this issue, see Shanahan & Barr, 1995).
Coding of
Studies
An extensive code sheet was used to record and organize pertinent
information from each of the identified studies. The four authors and a
research assistant, all of whom had experience coding studies for a previ-
ous synthesis, each coded a portion of the studies. Batya Elbaum reviewed
all code sheets for completeness and accuracy. In the very small number of
cases in which an assigned code was questionable, Elbaum consulted with
the coder to resolve ambiguities and reach a decision by consensus.
Calculation of Effect Sizes
Standardized effect sizes, computed as the difference between the mean
posttest score of the intervention group minus the mean posttest score of
the control or comparison group divided by the standard deviation of the
control or comparison group, were calculated for all reading outcomes for
which means and standard deviations were available
(TJ
= 221). When only
a test statistic such as t or F was available (n = 8), we applied formulas
provided by Rosenthal (1994). In the 12 cases in which a statistical test of
the difference between groups was reported as nonsignificant and no other
data were provided, we assumed an effect size of 0.
Data Screening
The initial data set consisted of 241 effect sizes from 32 studies de-
scribed in 31 reports. Because some studies reported outcomes for more
than one group of students receiving one-to-one tutoring, the data set
included effect size comparisons for 45 independent samples of students
(M = 0.91, SD = 1.77, Mdn = 0.56).
Through visual inspection of the means, standard deviations, and score
ranges for all outcome measures, we determined that floor effects were
present in some effect size comparisons. That is, there were instances in
which the mean score for a group was very close to the bottom of the score
range. When a floor effect is present in a control group, the restricted
variation in scores results in an underestimate of the population variation
on the measure and, consequently, in a spuriously inflated effect size. Our
decision was to exclude such comparisons from the meta-analysis, using an
algorithm we developed specifically for this purpose.1 A total of 22 effect
size comparisons involving floor effects were excluded from the analysis.
Procedure for Handling Outliers
In the present meta-analysis, we applied Tukey's definition of extreme
values, namely, those that are 3 or more interquartile ranges below the first
quartile or above the third quartile (Cooper, Charlton, Valentine, &
Muhlenbruck, 2000; Tukey, 1977). Six unweighted effect sizes (hereafter
denoted by £S), all at the positive end of the distribution, had values
exceeding the Tukey boundaries. Four extreme values were from the study
by Iversen and Tunmer
(1993;
ES = 4.59 and 3.57 for one sample and 4.28
and 3.80 for the second sample); one extreme valueS = 4.36) was from
the study by Ramaswami (1994), and one5 = 4.81) was from the study
by Graves (1986). These six extreme values were winsorized, that is, they
were set at a uniform maximum value. We used the value of 3.45, equal
to 3 interquartile ranges beyond Tukey's upper
hinge.
Winsorization makes
it possible to make maximum use of the available data while limiting the
impact of extreme values in a distribution.
Given the weighting of within-sample effect sizes in computing a
population estimate (Cooper, 1989)—a procedure that gives greater weight
to effect sizes based on larger samples—meta-analytic findings are highly
influenced not only by effect size outliers but also by sample size outliers
(cf. Cooper et al., 2000). In the present meta-analysis, the mean interven-
tion sample size was 36. Only 3 independent samples of students out of 44
(see Table 1) had intervention sample sizes greater than 63; these sample
sizes were 96, 170, and 266. Using a somewhat more stringent version of
Tukey's criterion (2 instead of 3 interquartile ranges beyond the upper
hinge),
we winsorized these sample sizes to 80. The mean size of com-
parison groups was 44; 4 very high values (99, 138, 165, and 217) were
similarly set to the value of 80. Sample size was therefore winsorized for
a total of 6 independent samples of students, affecting 24 individual effect
size comparisons.
Calculation of Mean Weighted Effect Sizes
Mean weighted effect sizes (for the weighting formula, see Cooper,
1999,
p. 137) were calculated for various aggregations of effect sizes to
examine particular substantive and methodological variables of interest.
Following Cooper (1989), we used a shifting unit of analysis to ensure the
independence of data in a given aggregation. Thus, effect sizes from
multiple measures administered to the same group of students were aver-
aged to yield a single effect size for that sample of students. For the
analysis by outcome measure, each sample of students contributed only one
effect size for each type of measure.
In meta-analysis, a weighted least squares analysis analogous to a
multifactorial analysis of variance (Hedges & Olkin, 1985) is typically
used to test the significance of main effects and interactions among the
independent variables. However, the highly unbalanced distribution of
cases across levels of the individual variables and the presence of empty
cells in the multivariate design made it impossible to conduct an interpret-
able multivariate analysis. The significance of the independent variables
was therefore tested in a series of single-factor homogeneity tests (Cooper
& Hedges, 1994). The test statistic Q (subscripted £?w to refer to the
homogeneity of effect size comparisons within an aggregation and QB to
refer to the homogeneity of categories of a moderator variable) is distrib-
uted as chi-square with degrees of freedom equal to k
1, where k denotes
the number of independent effect sizes in an aggregation, for Qw, and one
less than the number of categories of the variable, for QB. Values of Q are
reported to be significant at p < .05.
1 One indication of a floor effect in the control group is a large discrep-
ancy in the standard deviations of the intervention and control groups. We
calculated the ratio of the treatment standard deviation and control group
standard deviation and found that the distribution of this ratio was
M = 1.07, SD = 0.87, Mdn = 0.93. The mean value near 1 indicated that
on average, the standard deviations for intervention and control groups
were very similar. We determined that values exceeding 1.83 (1.5 times the
interquartile range beyond the 75th percentile) would be considered ex-
treme (this is the algorithm used, for example, by the SPSS [1999] Explore
procedure). A total of 17 such values were identified. However, it is
possible for floor effects to be present even if standard deviations are
comparable. This occurs when the means of both the treatment and control
groups are close to the bottom of the range of the measure. In such cases,
the effect size is not meaningful because the measure used is apparently not
sensitive enough to detect differences in outcomes at the level at which
students are performing. To capture these cases, we identified all effect size
comparisons in which the mean of
the
control group was in the bottom 15%
of the range for the particular measure. Eleven effect size comparisons
were so identified, including 6 that had also been identified as having
extreme values with regard to the ratio of the groups' standard deviations.
A total of 22 effect size comparisons, involving 11 samples of students
from 8 studies, were therefore excluded from me analysis. The exclusion of
these cases resulted in the elimination of one study (Towner & Davidson,
1998) that had been represented by a single effect size. The 22 excluded
effect sizes had a mean of 3.75 and a standard deviation of 3.87; the 219
effect sizes retained in the analysis had a mean of 0.59 and a standard
deviation of 0.99.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
608ELBAUM, VAUGHN, HUGHES, AND MOODY
Table
1
Summary Information on Interventions Included in the Meta-Analysis by Sample
StudyGrade
levelIntervention
sample sizeInstructorFocus
of
instructionn
of
within-
sample
ES
Mean within-
sample
£5
Arnold
et
al. (1977)
Butler (1991)
Center, Wheldall, Freeman, Outhred,
and
McNaught (1995)"
Chapman, Tunmer,
and
Prochnow: Sample
1
(1998)"
Chapman, Tunmer,
and
Prochnow: Sample
2
(1998)"
Compton (1992)a
Dorval, Wallach,
and
Wallach (1978)
Graves: Sample
1
(1986)
Graves: Sample
2
(1986)
Hagin, Silver,
and
Beecher (1978)
Hedrick (1996)
Iversen
and
Tunmer: Sample
1
(1993)"
Iversen
and
Tunmer: Sample
2
(1993)"
Juel (1996)
Knapp
and
Winsor (1998)
Lafave (1995)"
Mantzicopoulos, Morrison, Stone,
and
Setrakian: Sample
1
(1992)
Mantzicopoulos, Morrison, Stone,
and
Setrakian: Sample
2
(1992)
McCarthy, Newby,
and
Recht (1995)
McGrady (1994)
Morris, Shaw,
and
Perney (1990)
Nielson (1991)
Pinnell: Sample
1
(1988)"
Pinnell: Sample
2
(1988)"
Pinnell, Lyons, DeFord, Bryk,
and
Seltzer: Sample
1
(1994)*
Pinnell, Lyons, DeFord, Bryk,
and
Seltzer: Sample
2
(1994)"
Pinnell, Lyons, DeFord, Bryk,
and
Seltzer: Sample
3
(1994)
Ramaswami: Sample
1
(1994)"
Ramaswami: Sample
2
(1994)"
Ramaswami: Sample
3
(1994)*
Ramaswami: Sample
4
(1994)*
Ramey: Sample
1
(1991)
Ramey: Sample
2
(1991)
Saginaw Public Schools (1992)a
Torgeson
et
al.: Sample
1
(1998)
Torgeson
et
al.: Sample
2
(1998)
Torgeson
et
al.: Sample
3
(1998)
Vadasy, Jenkins, Antil, Wayne,
and
O'Connor:
Sample
1
(1997)
Vadasy, Jenkins, Antil, Wayne,
and
O'Connor:
Sample
2
(1997)
Vadasy, Jenkins,
and
Pool (1998)
Wallach
and
Wallach (1976)
Weeks (1992)*
I
23
4-6
20
1
22
1
26
1
6
1
80b
1
20
4-6
8
4-6
8
Range
63
;
I
80"
1
32
I
32
2-3
6
2-3
8
;
1
23
I
59
I
52
[
19
4-6
35
2-3
30
2-3
14
80b
37
I
31
38
1
30
I
12
5
19
I
13
2-3
18
4-6
59
35
2-3
33
2-3
36
2-3
37
6
14
23
36
20
No information
College students
Teachers
Teachers
Teachers
College students
Paraprofessionals
Teachers
Teachers
Teachers
Teachers
Teachers
Teachers
College students
Teachers
Teachers
Teachers
Teachers
Teachers
No information
Volunteers
Volunteers
Teachers
Teachers
Teachers
Teachers
Teachers
Teachers
Teachers
Teachers
Teachers
Volunteers
Volunteers
Teachers
Teachers
Teachers
Teachers
Volunteers
Volunteers
Volunteers
Volunteers
Teachers
V-P skills
Mixed
Mixed
Mixed
Mixed
Mixed
PA-phonics
Comprehension
Comprehension
Mixed
Mixed
Mixed
Mixed
Mixed
Underspecified
Mixed
V-P skills
PA-phonics
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Mixed
Underspecified
Underspecified
Mixed
PA-phonics
Mixed
Underspecified
Mixed
Mixed
Mixed
PA-phonics
Mixed
2
4
4
8
8
1
1
2
2
3
1
6
6
1
2
6
6
6
15
1
6
1
9
9
4
4
4
5
5
6
6
1
1
6
8
8
8
11
11
9
6
3
0.00
0.75
1.08
-0.43
-1.32
1.83
0.68
1.85
3.34
0.49
-0.01
2.46
2.53
3.15
0.71
0.36
0.05
0.09
0.68
-0.37
0.52
0.38
0.65
0.58
0.74
0.13
-0.05
2.17
1.20
0.35
-0.37
0.00
-0.25
0.92
0.68
0.16
0.05
0.85
0.06
0.98
0.67
-0.35
Note.
ES =
unweighted effect size;
V-P =
visual-perceptual skills; PA-phonics
=
phonemic awareness-phonics.
*
Reading Recovery intervention.
b
Winsorized sample size.
Results
Disaggregation
of
Studies
by
Research Design
Twenty-nine of the 31 studies that met our search criteria
contrasted one or more groups of students who participated in a
supplemental one-to-one instructional intervention in reading with
a group of students who did not receive any one-to-one instruction.
Although students in some of the control groups received supple-
mental academic support through federally funded after-school
programs, the programs did not include any systematic one-to-one
tutoring in reading. Two studies, comprising three effect size
comparisons, contrasted outcomes for students participating in a
one-to-one reading intervention with outcomes for students partic-
ipating in a small-group reading intervention. Because the effect
sizes of these latter studies must be interpreted differently than
effect sizes that contrast a one-to-one instructional intervention
with a control group, we analyzed the treatment-comparison stud-
ies separately from the treatment-control studies.
The Treatment-Control Database
This database consisted of effect sizes from 29 treatment-
control studies, described in 28 separate reports. Sources were 14
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ONE-TO-ONE INSTRUCTION IN READING609
published articles, 5 doctoral dissertations, 4 technical reports, 1
conference presentation, 1 book, and 3 manuscripts prepared for
submission to professional journals. Two studies were conducted
in the late 1970s, 5 in the 1980s, and the remainder in the 1990s.
The students in 2 studies (3 samples of students) had learning
disabilities; students in all other studies (39 samples) had no
identified disability but were identified as at risk for reading
difficulties. The preponderance of students represented in the
present synthesis were first graders. Included in the synthesis
were 28 samples of first graders (n =
1,164),
8 samples of students
in Grades 2 or 3 (n = 182), 5 samples of students in Grades 4-6
(n = 130), and 1 sample of students ranging from 1st through 4th
grade (n = 63). Summary information on the interventions is
provided in Table 1; a description of intervention procedures is
provided in Table 2.
The treatment-control studies yielded a total of 216 individual
effect size comparisons. When the individual effect sizes were
aggregated by independent sample, so that each sample of students
contributed a single, averaged effect size for that group of students,
the distribution of the 42 effect sizes was somewhat positively
skewed (skewness = 2.85), M = 0.67, SD = 0.98, Mdn = 0.55.
Six effect sizes were markedly negative (—1.32 to —0.25); 10
were very small or close to 0 (—0.12 to 0.16); 14 were moderately
large (0.35 to 0.75); 5 were large (0.85 to
1.20);
and 7 were very
large (1.83 to 3.34).2 The mean weighted effect size was
0.41.
The
various disaggregations of effect sizes, with their accompanying
weighted means and homogeneity statistics, are presented in
Table 3.
Qualifications of Instructors, Training,
and Treatment Fidelity
The interventions included in the meta-analysis were conducted
by individuals with varying qualifications for teaching. The sig-
nificant homogeneity test associated with the instructor variable,
2B(4) = 77.05, indicated that the variation in effect sizes was
significantly associated with the qualifications of the instructor.
The tutors whose students made the greatest gains as a result of
one-to-one instruction were college students (d 1.65, k = 3).
The mean weighted effect sizes for teachers and community vol-
unteers were 0.36 (k = 28) and 0.26 (k = 8), respectively. The
effect size for the single sample of students taught by paraprofes-
sionals was 0.68. Two studies provided no information on the
individuals carrying out the intervention. The mean weighted
effect size associated with these studies was —0.20.
Six studies investigated outcomes of one-to-one interventions in
which community volunteers served as the tutors. Five of the 6
studies described the training that was provided to tutors; 1 study
did not specify whether the tutors received any training. The mean
weighted effect size associated with studies that described the
tutors'
training was 0.59 (fe = 6), compared with —0.17 (k = 2) for
the study that did not indicate whether the volunteer tutors re-
ceived training.
Because treatment fidelity is a particular concern with regard to
volunteers, Vadasy, Jenkins, Antil, Wayne, and O'Connor (1997)
explicitly contrasted the outcomes of two subgroups of volunteer
tutors:
those who maintained treatment fidelity (e.g., came to each
tutoring session, tutored for the full amount of time) and those who
did not. The effect size associated with consistent tutors was 0.85;
that associated with inconsistent tutors was 0.06. If volunteers who
performed inconsistently were excluded from the aggregation of
volunteers who were known to have received training before the
intervention, the resulting mean weighted effect size for volunteers
was 0.67 (k = 5).
Students' Grade Level
The significant homogeneity statistic for this variable,
QB(3)
= 9.27, indicated that students' grade level was signifi-
cantly associated with the variation in effect sizes. Mean effects for
all except the oldest students were in the moderate range
{d = 0.37-0.49). The mean effect of one-to-one instruction for
students in Grades 4-6 was not significantly different from 0.
Note that as indicated by the significant Q value associated with
the effect size estimate for older students, fiw(4) = 37.60, the five
within-sample effect sizes constituting this aggregation were quite
disparate. Effect sizes from the studies by Butler
(1991;
ES = 0.75) and Graves (1986; ES = 1.85 and 3.34) were very
high, whereas those from the studies by Ramey
(1991;
ES =
-0.25) and McGrady (1984; £5 = -0.37) were negative (see
footnote 2 for a further treatment of negative effect sizes). The
variation in effects for first graders is examined in a subsequent
analysis (see below).
Focus of the Intervention
Interventions were coded in terms of their primary instructional
focus.
The categories used for this variable were (a) decoding-
word recognition, (b) comprehension, (c) mixed (a combination of
2 The effect sizes in the present meta-analysis include six with moderate
to large negative values. The occurrence of negative values is problematic
in that it is difficult to imagine how implementation of a one-to-one
intervention could lower outcomes for participating students. Two of the
six samples of students in this category were students who failed to
successfully complete the Reading Recovery program. The largest negative
effect size was for the small sample (n = 6) of students in the study by
Chapman et al. (1998) who did not successfully complete the program and
were "referred on for additional remedial reading assistance" (p. 7). The
other negative effect size for a not-discontinued sample was from
Ramaswami (1994). The negative effect sizes for two other samples of
students might be due to the nonequivalence of treatment and comparison
groups at pretest. Weeks (1992) reported that the Reading Recovery group
performed worse at pretest on several reading measures than did the
Supported Control group. Fall to spring gains were approximately equiv-
alent for the two groups. Thus, had initial differences been statistically
controlled, the effect size for the sample would have been close to zero.
McGrady (1984) reported that students in the study were not randomly
assigned and that the treatment and control groups differed significantly at
pretest. The intervention, according to the author, narrowed the gap be-
tween the groups. Two samples of students receiving one-to-one instruc-
tion, from studies by Ramey (1991) and Chapman et al. (1998), did not
perform as well as students in the comparison group. In the study by
Ramey, students in the tutoring program were compared with students
receiving instruction in a Reading Resource Specialist model, which is not
described in the report. In the study by Chapman et al. (1998), students
participating in Reading Recovery did not do as well as students in the
comparison group. According to the authors, students in the Reading
Recovery group did not demonstrate average grade-level performance
either immediately after the intervention or 6 or 12 months later.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
610ELBAUM, VAUGHN, HUGHES, AND MOODY
Table 2
Description of Interventions
StudyIntervention procedures
Treatment-control studies3
Arnold et al. (1977) Channel-specific perceptual stimulation (ES = 0.00): Students were tutored for 30 min twice a week for 6 months.
The tutoring was based on stimulation of the specific deficits noted in the students' perceptual profile. Techniques
developed by Silver and Hagin (1976) were used to train out perceptual deficits in visual, auditory, (anesthetic,
haptic, and body-image modalities. Control group: Regular classroom instruction.
Butler (1991) Reading Assistance Tutorial (R.A.T.) Pack (ES = 0.75): Children were tutored by university students for 20 min 3
times a week for 8 weeks. The R.A.T. Pack is a psycholinguistic and social semiotic approach to literacy that aims
at developing students' phonological processing strategies, linguistic awareness, and sight word vocabulary.
Functional language use is promoted through sentence construction, cloze passages, puzzles, games, and creative
manipulations of the surface features of language. The R.A.T. Pack consists of 12 books varying in difficulty from
early sounds through vocabulary development and comprehension. Control group: Regular classroom instruction.
Center et al. (1995) Reading Recovery (ES = 1.08): Students were tutored for 30 min daily for an average of 15 weeks. Each tutoring
session consisted of the following components: (a) rereading of two or more familiar books, (b) independent
reading of the previous day's new book while the teacher took a running record (miscue analysis), (c) letter
identification (if needed), (d) writing a story the child composed, with emphasis on hearing sounds in words, (e)
reassembling a cut-up story, (f) introducing a new book, and (g) reading the new book. Control group: Regular
classroom instruction plus any support in reading typically available at the school. Remedial assistance consisted of
up to 2 hr per week of additional instruction.
Reading
Recovery1"
(successfully discontinued; ES =
—0.43).
Reading Recovery (not discontinued and referred on for
remedial services; ES =
-1.32).
Control group: Regular classroom instruction.
Reading Connection (ES = 1.83): Students were tutored 30 min a day, 4 days a week, for 14 weeks by university
students using the Reading Recovery11 method. Control group: Regular classroom instruction plus 30 min of
instruction daily through Chapter 1 for 14 weeks.
Wallach and Wallach (ES = 0.68): Students were tutored 30 min daily for 28 weeks using the Wallach and Wallach
program designed to teach phonemic awareness. Control group: Regular classroom instruction.
Direct instruction (ES =
1.85):
Over eight individual tutoring sessions, students were taught to find the main idea of
stories using techniques described by Carnine and Silbert (1979). Direct instruction plus metacomprehension
(ES = 3.34): The metacomprehension training, based on Loper (1980), emphasized self-monitoring as a way of
recording one's progress during instructional tasks. Control group: Students were prompted to read stories and
answer questions at the end about the main ideas.
TEACH (ES = 0.49): Students were tutored three to four times per week using a perceptual stimulation approach
aimed at developing the accuracy of students' perceptions within single modalities and across modalities. The
learning tasks proceeded through three stages: recognition, copying, and recall. Control group: Regular classroom
instruction.
1CARE (ES =
-0.01):
Students were tutored 30 min daily for one semester. The intervention was based on Reading
Recovery1" with the addition of a phonemic awareness component. The ICARE Program also required parents to
read to the student for 15 min each night. Control group: Regular classroom instruction plus any other available
support services.
Standard Reading
Recovery1"
(ES - 2.46): Students were tutored for 30 min four times per week for an average
of 10.5 weeks. Modified Reading Recovery (ES = 2.53): Students were tutored for 30 min four times per week for
an average of 14.3 weeks. The lessons involved the seven standard Reading Recovery activities; however, explicit
instruction in the letter-phoneme patterns took the place of the letter-identification segment when the children
demonstrated that they could identify at least 35 of the 54 alphabetic characters. The explicit training in
phonological skills involved asking the students to manipulate magnetic letters to make, break, and build new
words having similar visual and phonological elements. The teacher chose suitable words from one of the books
the student had read earlier in the lesson or from a list of frequently occurring words. Control group: Students
received whatever additional support (generally funded by Chapter 1) was normally offered to at-risk readers at
their schools. Typical support consisted of out-of-class, small-group (6-7 students) instruction four times a week.
Juel (1996) Literacy tutoring (ES = 3.15): Students were tutored by university student athletes for 45 min two times a week
for 1 school year. Tutoring sessions consisted of three or four of the following activities: (a) reading children's
literature, (b) writing, (c) introducing high-frequency words from the basal readers, (d) journal (the tutor wrote
words, and the child copied), (e) alphabet book (the child selected words to add for each letter), (f) hearing word
sounds (phonemic awareness), and (g) letter-sound activities. Control group: Students received mentoring from the
same university students in regularly scheduled weekly meetings. Mentoring included reading to the students (but
not other tutoring activities) or reading and talking with the student in the school library or on the playground.
Chapman et al. (1998)
Compton (1992)
Dorval, Wallach, and
Wallach (1978)
Graves (1986)
Hagin, Silver,
and Beecher (1978)
Hedrick (1996)
Iversen and Tunmer
(1993)
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ONE-TO-ONE INSTRUCTION IN READING611
Table 2 {continued)
StudyIntervention procedures
Knapp and Winsor (1998)
Lafave (1995)
Mantzicopoulos, Morrison,
Stone, and
Setrakian (1992)
McCarthy, Newby, and
Recht (1995)
McGrady (1984)
Morris, Shaw,
and Pemey (1990)
Nielson (1991)
Pinnell (1988)
Pinnell et al. (1994)
Cognitive apprenticeship in reading (ES = 0.71): Students were tutored three times a week for 10 weeks by adult
volunteers. The program was based on the cognitive apprenticeship model explicated by Collins, Brown, and
Holum (1991). The student and tutor first read a book of the student's choice, alternately reading aloud to each
other and commenting on what was read. The tutor modeled reading strategies and fluent reading, helped with the
decoding of difficult words, and offered questions and explanations to clarify text meaning. Students selected
personally interesting books and were allowed to discontinue reading any books they found uninteresting or too
difficult. Control group: Regular classroom instruction.
Reading Recovery'' (ES = 0.36): Students were tutored for 30 min daily for 5 months. Control group: Regular
classroom instruction plus additional instruction through Chapter 1 consisting of small-group (3-5 students)
instruction 5 days a week for 5 months. Chapter 1 sessions included instruction in reading aloud, letter-sound
relationships, word families, and writing stories.
TEACH
(ES = 0.05): Students were tutored for 30 min twice a week for 25 weeks. The intervention consisted of 55
teaching activities organized into 5 clusters: visual (e.g., visual discrimination, visual sequencing), visual-motor,
auditory (e.g., recognition of rhyming words, ordering, blending), body image (left-to-right progression of reading),
and intermodal (e.g., matching sounds to visual symbols). Phonetic tutoring (ES = 0.09): Students received 20 min
of reading drill and 10 min of spelling drill per session. Students were expected to read lists of words as quickly
and accurately as possible within a set time period, with the tutor keeping count of the number of words read
correctly. Control group: Regular classroom instruction.
Early Intervention Program (ES = 0.68): Students were tutored for 30 min daily using a program based on Reading
Recovery.** Each tutoring session involved three 10-min segments: (a) students reread books they had covered in
previous lessons, (b) students wrote a message of their own composition in standard spelling, with explicit
instruction from the tutor in sound segmentation and relations with the alphabetic code, (c) tutors presented new
reading material using a guided-reading format. Phonological training involved two strategies: the Elkonin "boxes"
strategy (the student slowly articulated the sounds in a word sequentially while manipulating corresponding
counters) and the "stretch it out" strategy (the student slowly articulated sounds in a word while choosing the
appropriate alphabetic symbols to represent sounds). Control group: Regular classroom instruction.
Programed [sic] tutoring (ES =
—0.37):
Students were tutored for 15 min a day for 1 school year using the
Houghton Mifflin Tutorials, a set of materials designed to be used as a supplement to classroom teaching based on
the Houghton Mifflin Reading Series. Activities included oral reading, comprehension, and word attack. Control
group: Regular classroom instruction plus available Chapter 1 services.
Tutoring
program (ES = 0.52): Students were tutored 1 hr a day, 4 days a week, for 8 months by adult volunteers.
The premise of the tutoring approach was that children who are having difficulty learning to read need the
semantic and syntactic support offered by good stories written in natural (as opposed to formulaic) language and
that children should be led to automatize basic one-syllable spelling patterns as a means of building word
knowledge. Tutoring involved (a) 15-20 min of easy contextualized reading at the student's instructional level, (b)
10-12 min of word study, (c) 15 min of writing, (d) 10-15 min easy reading in trade books, (e) 5-10 min reading
the student a good piece of literature, for example, a fairy tale, fable, short picture book, or chapter from a longer
book. Control group: Regular classroom instruction.
Tutoring (ES = 0.38): Students were tutored over a period of 9 months by adult volunteers. Students did oral reading
and were drilled on vocabulary items that were missed during the reading; tutors also provided some instruction in
improving reading comprehension. Tutors maintained a log of voluntary home reading, oral in-school reading, and
flash and card drill. Students were given a point for each activity and for each sentence read correctly the first
time. At the end of each month, points could be exchanged for prizes such as pencils, erasers, balls, books, and
school logo shirts. Control group: Regular classroom instruction.
Reading Recovery* (ES = 0.65): Students received the standard Reading Recovery intervention and were in
classrooms whose teachers were trained in Reading Recovery. The students therefore received group as well as
individual instruction using the Reading Recovery approach. Reading Recovery (ES = 0.58): Students received the
standard Reading Recovery intervention; however, their regular classroom teachers were not trained in Reading
Recovery. Control group: Regular classroom instruction plus a compensatory program involving skill-oriented and
drill activities conducted by a paraprofessional.
Reading Recover/ (ES = 0.74): Students were tutored for 30 min daily for 5 months. Reading Success (ES = 0.13):
Students were tutored for 30 min daily for 5 months using a program modeled on Reading Recovery and taught by
certified teachers who received a condensed 2-week version of the Reading Recovery training. Direct Instruction
Skills Plan (ES =
—0.05):
Students were tutored for 30 min daily for 5 months using a program focusing on
systematic instruction in skills considered to be basic to the performance of reading tasks. For each child, tutorial
sessions were linked to the classroom instruction the child was receiving. Lessons included work on letters and
sounds, words, and text-level strategies such as sequencing, filling in the blanks, and answering questions, as well
as reading extended texts. Reading and Writing Group (ES compared with one-to-one interventions = 0.12):
Students received Reading Recovery-based tutoring in a small-group format. The teachers, who had been trained as
Reading Recovery teachers, could modify Reading Recovery procedures to adjust to group instruction and could
adopt any techniques they believed to be consistent with the theoretical base developed during their training.
Control group: Regular classroom instruction plus any existing Chapter 1 services for first graders.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
612ELBAUM, VAUGHN, HUGHES, AND MOODY
Table 2 (continued)
StudyIntervention procedures
Ramaswami (1994)
Ramey (1991)
Saginaw Public Schools
(1992)
Torgesen et al. (1998)
1991-1992 cohort: Reading Recover/' (successfully discontinued; ES = 2.17). Reading Recovery (not discontinued;
ES = 1.20). Control group: Regular classroom instruction plus compensatory instruction for students who
qualified. 1992-1993 cohort: Reading Recover/ (successfully discontinued; ES = 0.35). Reading Recovery (not
discontinued; ES =
—0.37).
Control group: Regular classroom instruction plus compensatory instruction for
students who qualified.
HOSTS (Helping One Student to Succeed; Grades 2 and 3; ES = 0.00): Students were tutored by community
volunteers for 1 year. HOSTS (Grades 4-5; ES =
-0.25):
Same. Control group: Students received compensatory
education for 1 year through a traditional pull-out approach.
Reading Recovery1' (ES = 0.92). Control group: Regular classroom instruction.
Students in all tutoring conditions were tutored for 20 min, four times a week for 2V4 years beginning in the second
semester of kindergarten. Phonological Awareness Plus Synthesis Phonics (ES = 0.68): Tutors provided implicit
instruction in phonemic awareness by leading students to discover and label auditory gestures associated with each
phoneme. Then students were engaged in activities to build skills in tracking sounds in words and to represent
sounds with letters. Students learned to spell syllables with letters and then to read syllables by blending separate
phonemes together. Students then read short stories containing the words they could decode. During second grade,
children received direct fluency-building practice and were taught strategies for multisyllabic words. Embedded
Phonics (ES = 0.16): Tutoring consisted of (a) learning to recognize small groups of whole words by using word-
level drill and word games, (b) instruction in letter-sound correspondences in the context of the sight words being
learned, (c) writing the words in sentences, and (d) reading the sentences that were written. Stimulation of
phonological awareness was done during writing activities in which students were asked to identify the sounds in
words before writing them. Most grapheme-phoneme correspondences were taught in the context of word reading
and writing activities. Basal readers were also used. Regular classroom support (ES = 0.05): Students received
tutoring in the activities and skills taught in their regular classroom reading programs. The activities varied from
phonics-oriented activities to sight word drill to writing in journals. Control group: Regular classroom instruction.
Tutoring by community volunteers (high treatment fidelity; ES = 0.85): Students received 30 min of tutoring 4 days a
week for up to 23 weeks. Each lesson included six to eight activities selected from the following: (a) letter sounds
and beginning sound instruction, (b) rhyming, (c) auditory blending, (d) segmenting, (e) spelling and analogy use,
(f) story reading, and (g) writing. Tutoring by community volunteers (low treatment fidelity; ES = 0.06): Same.
Control group: Regular classroom instruction.
Tutoring (ES = 0.98): Students were tutored for 30 min, 4 days a week, for the school year. Tutoring included
instruction in phonological skills, letter-sound correspondence, explicit decoding, rime analysis, writing, spelling,
and reading phonetically controlled text. Control group: Regular classroom instruction plus Title 1 services where
available.
Wallach and Wallach tutorial program (ES = 0.67): Students were tutored for 30 min, five times a week, for 30
weeks by community volunteers. First, students were taught to recognize sounds at the start of words, to recognize
the shape of letters, and to connect letter shapes with sounds. Second, students gained skill at recognizing and
manipulating the sounds in words and blending sounds in the context of short, regularly spelled words. Third,
students practiced applying previously acquired skills using regular classroom reading materials. Control group:
Regular classroom instruction.
Reading Recover/ (ES =
—0.35):
Students were tutored 5 days a week for a maximum of 12 weeks. Control group:
Students received regular classroom instruction from teachers who were participating in a 7-month in-service
program similar to that of the Reading Recovery teacher training and emphasizing whole language literacy
instruction.
Treatment-comparison studies'5
Reading Recover/ (ES =
—0.12):
Students were tutored 30 min daily for 1 school year. Comparison group: Project
READ, based on the Orton-Gillingham method emphasizing phonics. Students received instruction in groups of 2
to 5, 30 min daily, for 1 school year. Lessons progressed from a focus on phonology to comprehension and then to
writing, with an emphasis on instruction in vowel and consonant sounds, blends, and word syllabication. Each
lesson introduced five to ten new words, with sentence complexity increasing over the sequences of lessons.
Reading Recover/ (ES = 0.05): Students received tutoring for 16 weeks. Comparison group: Students received
small-group (4 students) instruction in the regular classroom for 30 min daily, for 16 weeks. The instruction was
based on the principles of Reading Recovery and included the following components: (a) independent reading, (b)
shared reading, (c) shared journal, and (d) introduction to the new text.
Note. ES = unweighted effect size.
* Studies compared students who received supplementary one-to-one tutoring in reading with students who received no supplementary one-to-one or other
systematic instructional intervention in reading but may have received additional support through Chapter 1 or Title 1 programs. b For a description of
the standard Reading Recovery intervention, see table entry under Center et al. (1995). c Studies compared students who received one-to-one tutoring in
reading with students who received a systematic, small-group reading intervention.
Vadasy et al. (1997)
Vadasy, Jenkins, and Pool
(1998)
Wallach and Wallach
(1976)
Weeks (1992)
Acalin (1995)
Evans(1996)
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ONE-TO-ONE INSTRUCTION IN READING613
decoding, word recognition, and comprehension), (d) phonemic
awareness-phonics, (e) visual-perceptual skills, and (f) under-
specified (not sufficiently well described to be coded). The ma-
jority of interventions were coded as mixed; these interventions
accounted for 30 samples of students, including the 16 samples of
students that received Reading Recovery instruction. The distribu-
tion of samples across the remaining categories was reading com-
prehension (two samples), phonemic awareness-phonics (four
samples), visual-perceptual skills (two samples), and underspeci-
fied (four samples). The significant homogeneity statistic,
gB(4) = 42.44, indicated that focus of instruction was reliably
associated with the variation in effect sizes. The focus associated
with the largest effect (d = 2.41) was reading comprehension; this
effect was derived from two interventions that used direct instruc-
tion to improve the comprehension of upper elementary students
with learning disabilities. Interventions that had a mixed focus or
a focus on phonemic awareness-phonics yielded mean weighted
effect sizes in the moderate range (d = 0.50 and 0.43, respective-
ly).
Interventions that focused on visual-perceptual skills and
those that were not adequately described in reports had mean
weighted effect sizes close to 0.
Outcome Measures
To investigate whether the aspect of reading measured by the
outcome measure was significantly associated with the variation in
effect sizes, individual effect sizes were aggregated by measure
type within independent samples. Thus, each sample of students
contributed one effect size for each type of measure used to assess
outcomes. The significant homogeneity statistic, QB(9)
54.81,
indicated that the variation in effect sizes was significantly asso-
ciated with the aspect of reading or language that was measured
after the intervention. Modest effects were found with measures of
reading comprehension (d = 0.28) and spelling (d = 0.14). Mea-
sures of decoding, oral reading of words, oral reading of passages,
composites based on subtests of different skills, and writing
yielded moderate effects (d 0.41-0.54). The single listening
comprehension outcome in the corpus had an effect size of 0.68.
Writing vocabulary, as measured by the Clay Writing Vocabulary
Test, produced the largest effects (d = 0.94). The only mean effect
size of negative valence was for measures of phonemic awareness
(d = -0.29).
Within the category of oral reading of passages, outcomes
measured by the Text Reading Level measure (Clay, 1985) ac-
counted for half of the 18 effect size comparisons. This measure
has been criticized as having poor psychometric properties, in that
growth between levels is much smaller at the lower end of the
scale than at the higher end (cf. Iversen & Tunmer, 1993). When
effect sizes produced by the Text Reading Level measure (Clay,
1985;
d = 0.64) were contrasted with effect sizes produced by
other measures of oral reading of passages (d = 0.30), the differ-
ence was statistically significant, 2B(1) = 7.23.
Standardized measures generally yield smaller effects than non-
standardized measures, because the latter are typically more
closely aligned with particular interventions. We coded each out-
come measure as standardized or not standardized; measures
coded as standardized had to use a standard set of stimulus mate-
rials,
a standard administration procedure, and a standard scoring
procedure and be supported by norming information. In 12 studies
(k = 14), outcomes were assessed by means of standardized
measures only; in 5 studies (k = 7), outcomes were assessed by
means of nonstandardized measures only; 12 studies (k = 21) used
both types of measures. To conduct the most stringent test of
differences owing to whether the outcome was assessed by means
of a standardized measure, we compared effect sizes associated
with standardized and nonstandardized measures for the 21 sam-
ples of students (11 Reading Recovery and 10 other interventions)
for which both types of measures were used. The difference
between the mean weighted effect sizes calculated for the two
types of measures was not statistically significant, QB(l) = 1.09.
Reading Recovery and other intervention samples were also con-
sidered separately, to examine whether the difference between
effect size estimates based on standardized and nonstandardized
measures was significant for either group alone. For samples
receiving interventions other than Reading Recovery, the mean
weighted effect sizes for standardized and nonstandardized mea-
sures were almost identical (d = 0.46 vs. 0.42, respectively). For
Reading Recovery samples, the mean weighted effect size for
standardized measures was less than that for nonstandardized
measures (d = 0.60 vs. 0.80, respectively), but the difference was
not statistically reliable.
Intervention Intensity
Intervention intensity was examined in two ways: by duration,
coded as the number of weeks over which the intervention was
carried out, and total instructional time, coded as the number of
hours of instruction provided to each student. Information on the
duration of the intervention was available for 30 samples of
students; information on total instructional time was available
for 27 samples. The interventions ranged in duration from 8 to 90
weeks and in total instructional time from 8 to 150 hr. Duration of
the intervention was reliably associated with the variation in effect
sizes,
QB(1) = 7.9; interventions lasting up to 20 weeks had a
mean weighted effect size of 0.65, compared with 0.37 for those
lasting longer than 20 weeks. Total instructional time, however,
was not reliably associated with effect size variation,
fiB(l) = 0.35.
We further examined the relation between intervention duration
and intensity. The mean instructional time for interventions lasting
up to 20 weeks was 63 hr; the mean time for interventions lasting
longer than 20 weeks was 61 hr. Duration and total instructional
time did not significantly covary (r = .116, ns). This finding
suggested that the same amount of instructional time, delivered
more intensively, tends to have more powerful effects.
Methodological Variables
Two methodological variables were examined for their potential
impact on study outcomes. Homogeneity tests revealed that the use
of a check on the fidelity of treatment was not significantly
associated with the variation in effect sizes, <2B(1) = 0.42. In
contrast, the method that researchers used to assign students to
treatments was reliably associated with such variation,
(?B(1) = 20.24, so that studies that used random assignment or
matching yielded significantly higher effect sizes {d = 0.56) than
studies that used other procedures (e.g., teacher judgment, conve-
nience; d = 0.17).
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
614ELBAUM, VAUGHN, HUGHES, AND MOODY
Table 3
Meta-Analysis
Aggregation
95%
CI for d
Lower UpperOw
Samples compared with a
control
Instructor
Teachers
Community volunteers
College students
Paraprofessionals
No information
Training of volunteers
Samples tutored by volunteers
Volunteers were trained
Training not reported
Grade level
1
2-3
4-6
Range
Focus of instruction
Mixed
Reading comprehension
Phonemic awareness-phonics
Visual-perceptual skills
Underspecified
Type of outcome measure
All measure types
Reading comprehension
Oral reading of words
Decoding
Oral reading of passages
Composite reading
Spelling
Writing
Listening comprehension
Writing Vocabulary
(Clay, 1985)
Standardized versus
nonstandardized measures
All interventions
Standardized
Nonstandardized
Reading Recovery
Standardized
Nonstandardized