ArticlePDF Available

Meta-Analysis of the Impact of Reading Interventions for Students in the Primary Grades

  • Instructional Research Group
  • Instructional Research Group

Abstract and Figures

This meta-analysis systematically reviewed the most up-to-date literature to determine the effectiveness of reading interventions on measures of word and pseudoword reading, reading comprehension, and passage fluency, and to determine the role intervention and study variables play in moderating the impacts for students at risk for reading difficulties in Grades 1–3. We used random-effects meta-regression models with robust variance estimates to summarize overall effects and to explore potential moderator effects. Results from a total of 33 rigorous experimental and quasi-experimental studies conducted between 2002 and 2017 that met WWC evidence standards revealed a significant positive effect for reading interventions on reading outcomes, with a mean effect size of 0.39 (SE = .04, p < .001, 95% CI [0.32, 0.46]). Moderator analyses demonstrated that mean effects varied across outcome domains and areas of instruction.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
Journal of Research on Educational Effectiveness
ISSN: 1934-5747 (Print) 1934-5739 (Online) Journal homepage:
Meta-Analysis of the Impact of Reading
Interventions for Students in the Primary Grades
Russell Gersten, Kelly Haymond, Rebecca Newman-Gonchar, Joseph Dimino
& Madhavi Jayanthi
To cite this article: Russell Gersten, Kelly Haymond, Rebecca Newman-Gonchar, Joseph Dimino
& Madhavi Jayanthi (2020) Meta-Analysis of the Impact of Reading Interventions for Students
in the Primary Grades, Journal of Research on Educational Effectiveness, 13:2, 401-427, DOI:
To link to this article:
Published online: 09 Jan 2020.
Submit your article to this journal
Article views: 148
View related articles
View Crossmark data
Meta-Analysis of the Impact of Reading Interventions for
Students in the Primary Grades
Russell Gersten
, Kelly Haymond
, Rebecca Newman-Gonchar
, Joseph Dimino
and Madhavi Jayanthi
This meta-analysis systematically reviewed the most up-to-date litera-
ture to determine the effectiveness of reading interventions on meas-
ures of word and pseudoword reading, reading comprehension, and
passage fluency, and to determine the role intervention and study vari-
ables play in moderating the impacts for students at risk for reading
difficulties in Grades 13. We used random-effects meta-regression
models with robust variance estimates to summarize overall effects
and to explore potential moderator effects. Results from a total of 33
rigorous experimental and quasi-experimental studies conducted
between 2002 and 2017 that met WWC evidence standards revealed a
significant positive effect for reading interventions on reading out-
comes, with a mean effect size of 0.39 (SE ¼.04, p<.001, 95%
CI [0.32, 0.46]). Moderator analyses demonstrated that mean effects
varied across outcome domains and areas of instruction.
Received 29 January 2019
Revised 22 October 2019
Accepted 31 October
Reading; response to
intervention; multi-tiered
system of support; Tier 2
intervention; meta-analysis
Multi-tiered systems of support (MTSS), also referred to as Response to Intervention
(RtI), have become routine in American elementary schools, especially in the area of lit-
eracy/reading. In 20102011, for example, full implementation of MTSS in Grade 1
reading occurred in 71 percent of schools from a demographically representative sample
(Balu et al., 2015). The massive scale-up of MTSS was fueled by major pieces of federal
legislation, such as the Reading First portion of the No Child Left Behind Act (NCLB,
2002), the Individuals with Disabilities Education Act (IDEA, 2004), and the Every
Student Succeeds Act (ESSA, 2015). ESSA explicitly called for an emphasis on evidence-
based interventions with strongand moderatelevels of evidence, based on the What
Works Clearinghouse (WWC) standards (U.S. Department of Education [U.S. ED],
Institute of Education Sciences [IES], & What Works Clearinghouse, 2013).
With the rapid and widespread implementation of MTSS in reading across schools, a
national study was undertaken to examine the impact of high-quality MTSS in reading
(identified as high-quality by experts and onsite evaluation teams) on the performance
of students in primary grades (Balu et al., 2015). To the surprise of many in the reading
research community, the evaluation found statistically significant negative effects on
Grade 1 reading performance and non-significant impacts on Grades 2 and 3 reading
!2019 Taylor & Francis Group, LLC
CONTACT Kelly Haymond Instructional Research Group, 4281 Katella Avenue, Suite 205,
Los Alamitos, California 90720, USA.
Instructional Research Group, Los Alamitos, California, USA
2020, VOL. 13, NO. 2, 401427
performance in 146 elementary schools in 13 states using a regression discontinuity
design (Imbens & Lemieux, 2008).
Some in the reading research community raised concerns about the study design, the
use of a regression design to answer questions about effectiveness, how impacts were com-
bined across districts that use different curricula, and the limited monitoring of fidelity of
implementation (e.g., Fuchs & Fuchs, 2017; Gersten, Jayanthi, & Dimino, 2017). Yet, the
findings are hard to ignore. They raise questions about the effectiveness of interventions in
authentic settings, specifically, How could interventions based on principles from scien-
tific research or those that were actually shown to be evidence-based according to current
standards be ineffectual or even slightly negative in practice?
There are at least three plausible reasons for the finding. The first is that the evidence
based on the effectiveness of beginning reading interventions may not be as robust or
consistent as previously believed. The second is that there is a body of rigorously con-
ducted research documenting the effectiveness of beginning reading interventions and
the instructional practices used in the research, but these interventions were not imple-
mented with fidelity in practice. The third could be a result of the types of outcome
measures used. Measure types have been found to moderate the size of the impacts
(Elleman, Lindo, Morphy, & Compton, 2009). Students in the national evaluation were
assessed on comprehensive measures of reading performance, as opposed to measures
that measure discrete reading proficiencies, often well aligned with the interven-
tions focus.
Rather than listing the names of interventions with rigorous research support, as
done for example on the WWC website, we thought it more important to conduct a
careful examination and rigorous meta-analysis of the body of contemporary research
on reading interventions relevant to MTSS. Doing so would allow us to articulate the
instructional practices and principles that underlie the interventions found to be effect-
ive, and to delineate the outcomes and the grade levels for which there is strong
evidence to support intervention. It also seemed important to examine factors such as
the type of interventionist and the level of support and monitoring provided to the
First, we briefly review prior meta-analyses and literature reviews on this topic.
A Brief Summary of Meta-Analyses and Literature Reviews on Reading
Six syntheses of research (either literature reviews or meta-analyses) address the topic of
reading interventions in the primary grades, albeit from different perspectives. Three of
these do not examine the impacts of reading interventions: Al Otaiba and Fuchs (2002),
Stuebing et al. (2015), and Tran, Sanchez, Arellano, and Swanson (2011) explored the
relationship between studentsprior reading abilities and other learner characteristics
and their responsiveness to reading intervention.
Slavin, Lake, Davis, and Madden (2011) examined 70 studies of programs geared
toward providing support to struggling readers in elementary school (K-5), including
not only the small-group interventions (20 studies) and one-on-one interventions (20
studies) typical for Tier 2 of MTSS, but also whole-class instructional practices (16
studies) and computer-based (14 studies) instruction for at-risk readers. The studies in
the review were not evaluated for quality, though the authors did limit their review to
studies that included randomization or matching to form a comparison group, a far
lower standard than the WWC standard used for these studies. Studies were only
included if the program lasted at least 12 weeks. Effects ranged from 0.09 for computer-
based interventions to 0.56 for whole-class interventions. One-on-one and small group
interventions resulted in effects of 0.39 and 0.31, respectively, suggesting that both
approaches show evidence of promise for Tier 2 intervention.
Wanzek and colleagues conducted two meta-analyses of reading intervention studies
involving students in kindergarten through Grade 3. The first (Wanzek et al., 2016) exam-
ined studies of shorter interventions (lasting less than 100 sessions); the latter (Wanzek
et al., 2018) included only longer interventions (lasting more than 100 sessions). The meth-
odology was similar, incorporating RCTs and QEDs only, but not ensuring that the study
met rigorous WWC standards. The first set included 72 studies, published between 1995
and 2013. The second, more recent meta-analysis of longer interventions located only 25
studies published from 1995 to 2015, using similar methodology.
For the first meta-analysis of shorter interventions, the authors classified outcomes
into one of two domains developed to correspond with the simple view of reading
(Francis, Kulesz, & Benoit, 2018). The first addressed the broad area of decoding,
including pre-reading skills (phonological awareness, rhyming, letter identification), as
well as measures of decoding of phonetically regular words and pseudowords, and word
reading and fluency. The second domain was called multicomponent and was essentially
a composite of both listening comprehension and reading comprehension, as well as
oral vocabulary and reading vocabulary. Using a random effects model, the authors
found positive mean effects ranging from 0.54 to 0.62 on the composite outcomes of
foundational reading and reading-related skills and 0.36 to 1.02 on language and com-
prehension measures. There was no evidence that group size, intervention type, grade
level, or interventionist were related to the magnitude of impacts.
Findings from the 2018 meta-analysis of longer interventions produced a mean effect
size of 0.28 when corrected for publication bias. The effects were found to be homoge-
neous, precluding the use of moderator analysis or meta-regression.
Rationale for the Present Meta-Analysis of Tier 2 Reading Interventions in
Grades 13
We decided to conduct a new meta-analysis for several reasons. One reason is that we
wanted to use robust variance estimation (RVE; Hedges, Tipton, & Johnson, 2010), a
more contemporary approach that addresses dependent effect sizes arising from multiple
outcomes and comparisons within studies. RVE allows researchers to model all depend-
encies statistically, compared to traditional meta-regression approaches, which address
dependencies by selecting specific comparisons, selecting a single measure, or aggregat-
ing all measures by computing an average effect. Of the six related syntheses we found,
only the most recent meta-analysis by Wanzek and colleagues (2018) used RVE but that
study focused on longer reading interventions. The current study used RVE on a set of
studies that have not been previously examined with this type of analysis.
A second reason for conducting a new meta-analysis is to limit the grade level to
those in which students are expected to begin reading. Typically, kindergarten interven-
tion studies include very few, if any, reading measures. Instead, they often include meas-
ures of pre-reading or reading-related skills such as listening comprehension, rhyming,
and phonemic awareness. The three previous reviews (Slavin et al., 2011; Wanzek et al.,
2016,2018) included kindergarten studies and measures of pre-reading skills
(i.e., phonological awareness, listening comprehension). As our goal was to determine
whether students receiving the intervention progressed beyond the pre-reading stage
and truly learned to read, we only included studies from Grades 1, 2, and 3 to reflect
the grades included in the national RtI evaluation study (Balu et al., 2015). We also lim-
ited the outcomes to include only measures of reading performance (word reading, pas-
sage fluency, reading comprehension), which is in line with the ESSA standards for
evaluating study outcomes in the primary grades (Center for Research and Reform in
Education & Johns Hopkins University, 2019) and the framework used for assessing stu-
dent reading performance in the Reading First national evaluation (Gamse, Jacob,
Horst, Boulay, & Unlu, 2008). (Reading First used reading comprehension and decoding
to assess the reading performance of struggling students in Grades K2). We did not
include studies of interventions above Grade 3 because the interventions in Grades 4
and 5, for example, are very different than those in Grades 13: They focus more on
comprehension, vocabulary development, and fluency building, and less on decoding.
Finally, given the focus of ESSA (2015) on using interventions with moderateto
stronglevels of evidence from studies that have met WWC standards (Version 3.0;
U.S. ED et al., 2013) for high-quality causal studies, we wanted to conduct a formal
review of the studies using WWC standards and include only those studies in the meta-
analysis that met those standards. The current set of studies is thus a more focused set
that has been screened for the designs rigor and the findingstrustworthiness.
Purpose of the Present Meta-Analysis
The purpose of this meta-analysis is to synthesize rigorously conducted randomized
controlled trials and quasi-experimental studies on reading interventions for students
who are at risk for reading difficulty in Grades 13. The research questions guiding this
project are:
1. Overall, how effective are reading interventions that are designed to improve the
reading outcomes (i.e., reading of words and pseudowords, passage reading flu-
ency, and reading comprehension) of Grades 13 students who are considered at
risk for reading difficulties?
2. Do study characteristics (i.e., nature of comparison, design, grade level, risk sta-
tus, and outcome domain/type) or intervention characteristics (i.e., group size,
interventionist, average hours per week of intervention, whether the intervention
was scripted, areas of instruction within the interventions, and support provided
to interventionists) moderate the effect of reading interventions on read-
ing outcomes?
Literature Search and Selection of Relevant Studies for the Meta-Analysis
The goal of the search was to locate all studies published from January 2002 to March
2017 focused on reading interventions for students in Grades 13. The literature search
began with a keyword search of the following databases: Academic Search Premier,
Campbell Collaboration, Educators Reference Complete, ERIC, PsycINFO, Social
Sciences Citation Index, and WorldCat. The following keywords were used: reading, lit-
eracy, fluency, decoding, vocabulary, comprehension, reading ability, reading proficiency,
reading achievement, response to intervention and instruction, reading intervention, RtI,
response to intervention, response to instruction, Tier 2 intervention, tutoring, small-group
instruction, one-on-one instruction, intensive intervention, at-risk students, at-risk, contin-
ued risk, non-responders, responders, reading difficulties, reading disabilities, and strug-
gling readers. In addition, we examined all WWC intervention reports in beginning
reading and two relevant WWC Practice Guides, Assisting Students Struggling with
Reading and Improving Reading Comprehension in Kindergarten Through 3rd Grade. We
performed a version of hand-searching known as snowballing, by checking the reference
lists of research syntheses on the topic. Finally, we solicited recommendations from key
researchers in the field on studies likely to be eligible.
Toward the end of the search
and review process, we examined any studies not previously located but included in the
foundational reading practice guide (Foorman et al., 2016) and other meta-analysis and
research syntheses (e.g., Wanzek & Vaughn, 2007; Wanzek et al., 2016).
The search resulted in the identification of 2,423 publications. All studies were
screened for eligibility based on the title, keywords, and abstracts. The studies were then
examined to determine whether they met the following inclusion criteria:
(a) Location. To be eligible, studies had to take place in the United States.
(b) Publication date. We limited the search to studies published between 2002 and
2017. The 2002 start date was chosen because it marks a transition in how teachers
approached reading interventions. Beginning circa 2002, initiatives in states such as
Texas and California (and numerous others) were reinforced by the Reading First pro-
grams (NCLB, 2002) emphasis on small-group preventative reading interventions based
on early screening in the primary grades. Research after this date focused more on the
effectiveness of these preventative interventions. Therefore, studies published after 2002
seemed most relevant to the research questions.
(c) Reading intervention. The study had to focus on the effectiveness of a reading
intervention: that is, preventative instructional practices and activities designed to help
students who are considered at risk for reading difficulties (e.g., Gersten, Compton,
et al., 2009). The interventions had to be at least 8 h in duration and could be provided
to small groups of students or individually to one student. We did not exclude any stud-
ies based on the size of the small groups. The interventions could be conducted at
school, either during school or after school, or at non-school clinics. The intervention
could be conducted during the school year or during summer break. They could be
This procedure is documented in Gersten et al. (2017).
delivered by teachers, researchers, tutors, volunteers, parents, or paraprofessionals, pro-
vided they followed a specific intervention program or a clearly outlined approach.
Although we included interventions that taught phonological awareness, we did not
include interventions that focused solely on phonological awareness without providing
any instruction on reading words and/or pseudowords. We also did not include
whole-class (Tier 1) interventions (even if it was noted that the entire class or school
was considered at risk for reading failure) or intensive Tier 3 interventions that were
meant to meet the individual needs of students who failed to benefit from evidence-
based interventions (e.g., Fuchs, Fuchs, et al., 2008; Gersten, Compton, et al., 2009). In
other words, studies that selected only students who were nonresponders to a Tier 1 or
2 intervention were excluded. Denton et al. (2013), for example, examines the impact of
an intervention for students who have failed to show progress in both Tier 1 and Tier 2
interventions and, therefore, was excluded from the meta-analysis. Finally, we excluded
interventions that were delivered only at home, conducted in a language other than
English, or included only a professional development component for teachers on the
topic and lacked a specific intervention or intervention approach.
(d) Study design. Only RCTs and QEDs were included.
(e) Sample. The participants had to be students in Grades 13 who were considered
at risk for reading difficulties. To be considered at risk, students had to have (a) a score
on a valid screener or screening battery indicating that the student was likely to be at
risk for possible reading failure at the end of the school year or (b) a score on a norm-
referenced standardized test (such as Woodcock Reading Mastery) indicating that the
student performed below the 40th percentile at the beginning of the school year or at
the end of the previous school year. If a study sample included students from grades
that were outside the scope of the review (e.g., Grades K, 4, or 5), then the study had to
meet one of the following criteria: (a) the study findings disaggregated the results of stu-
dents in eligible grades or (b) students in eligible grades represented over 50 percent of
the aggregated mixed-age sample.
(f) Outcomes. Studies had to include outcome measures of reading proficiencies and
skills (i.e., word reading, passage fluency, reading comprehension, or overall reading
achievement). Studies that only included measures of pre-reading skills such as phon-
emic awareness, rhyming, and oral comprehension were excluded.
Of the 2,423 publications that were examined, 54 met the initial criteria for inclusion.
See Figure 1 for a pictorial representation of the screening process.
Coding of Studies
The 54 publications that met initial inclusion criteria were coded in 3 phases. In Phase
1, publications were coded for quality of research design. In Phase 2, publications were
coded to identify study characteristics and intervention characteristics. Finally, in Phase
3, publications were coded to explore the areas of reading covered in the interven-
tion lessons.
Phase 1 Coding: Quality of Research Design
In the first phase, two members of the research team (who are certified WWC
reviewers) independently examined each publication for the study designs strength and
quality using WWC Procedures and Standards Handbook (Version 3.0; U.S. ED et al., 2013).
Only studies that met WWC standards (with or without reservations) were included in
Phases 2 and 3.
Several publications we reviewed included more than one study (e.g., Denton,
Fletcher, Taylor, Barth, & Vaughn, 2014; Lane, Pullen, Hudson, & Konold, 2009).
For the purposes of this project, we defined a study as any comparison with a
unique treatment group compared to a unique, business-as-usual control condition.
Studies comparing the impacts of two researcher-controlled interventions, as well as
comparisons of variations in treatments, were excluded. For example, in Lane et al.
(2009), researchers report the effects of an intervention, as well as three variations of
that intervention, when compared with that of a business-as-usual control condition.
We considered each intervention and variation as a unique treatment group, and
each comparison of a unique treatment group with a business-as-usual control as a
separate study; therefore, we counted four studies in this publication (i.e., T0 vs. C,
T1 vs. C, T2 vs. C, and T3 vs. C). The comparisons of each variation in treatment
with the others were excluded. Studies of variations in treatments focus on a much
more precise research question than the effectiveness of reading interventions. The
aThe study is a randomized controlled trial with high attrition or a quasi-experimental design
study with analysis groups that are not shown to be equivalent. bThere was only one unit
assigned to at least one of the conditions, or the intervention was always used in combination
with another intervention.
What Works
Evidence Reviews
Included in Meta-
Publications screened for eligibility:
Publications that were excluded because they
compare two unique treatments: 9
Publications and study comparisons
used in meta-analysis:
25 publications include 33 studies
Publications that met screening
criteria: 54
Publications coded for quality: 54
Publications excluded at screening: 2,369
Not conducted in the U.S.
Not published between 2002–2017
No eligible reading intervention
Not an eligible study design, sample, or outcome
Publications that met WWC Evidence
Standards: 34
Publications that did not meet WWC Evidence
Standards: 20
Design qualitya
Confounding factorb
Figure 1. Literature search, screening, and reviewing.
framework of this meta-analysis could not account for the more experimental manip-
ulations of specific components.
In total, of the 54 publications reviewed, 25 publications included 33 separate studies
that met standards (with or without reservations). See Figure 1.
Phase 2 Coding: Study and Intervention Characteristics
For studies that met WWC group design standards (with or without reservations), we
coded the following study characteristics: nature of the comparison, design (either RCT
or QED), grade level, participantsrisk level, and outcome domain and type. Coding of
the intervention characteristics addressed the following: What was the size of the inter-
vention group, who implemented the intervention (i.e., interventionist), was the inter-
vention scripted, was monitoring and feedback provided to the interventionist, and how
many hours of instruction were provided per week. See Table 1 for the operational defi-
nitions. Two members of the researcher team coded all study and intervention charac-
teristics. The researchers discussed and rectified any discrepancies. After the initial
coding, a third researcher coded a randomly selected 20 percent of the studies for reli-
ability purposes. Reliability was 90.6 percent.
Phase 3 Coding: Area/Focus of Instruction
We examined descriptions of the interventions provided in the publications and cataloged
the interventions in two ways: (a) the target area of instruction, and (b) the focus of instruc-
tion. Each study was examined to determine if any of the following reading areasphono-
logical awareness, decoding, encoding (spelling), fluency, vocabulary, comprehension, and
writingwere addressed during the intervention. If an intervention covered a reading area
in any manner, minimally or extensively, then the study was coded for that area.
During our coding, we noticed that some studies covered the main areas of reading
minimally during the intervention while others gave evidence of extended explicit
instruction. For instance, many studies mentioned that they included reading compre-
hension in the lesson, but then only described that they asked comprehension questions
as or after students read a passage, without providing any explicit instruction in com-
prehension strategies. Consequently, instruction in each intervention was further exam-
ined to determine whether the reading areas were taught routinely and explicitly. If so,
the studies were also coded for having a focus in that area of reading. This additional
level of codingthe focus of instructionwas limited to decoding, fluency, reading
comprehension, and vocabulary.
An example to illustrate how the research team determined whether a component
was a focus of instruction is the coding of the reading comprehension component in
Denton et al. (2014). This study consisted of two treatment conditions, explicit instruc-
tion and guided reading, and a control group. Comprehension was coded as a focus of
instruction in the explicit instruction condition but not the guided reading condition. In
the former, instruction consisted of teachers modeling comprehension strategies using
think-aloudsand providing specific feedback when students practiced in small groups.
Although the guided reading condition included discussion activities, teachers never
modeled or provided any clear guidance on how and when to use various strategies to
discern a cause-effect relationship or for succinct retelling.
Coding of studies during this phase was done collaboratively by two members of the
research team (who are experts in beginning reading). Reliability was calculated on 20
percent of randomly selected studies. Reliability was 85.71 percent.
Data Analysis
Calculation of Effect Sizes
To determine each interventions impact, we calculated the average effect size for
each domain of reading (i.e., word and pseudoword reading, passage reading
Table 1. Study and intervention characteristic definitions.
Moderator Levels and operational definitions
Design RCT ¼Randomized control Trial; QED ¼Quasi-experimental design.
Grade level
1¼First grade only; 2/3 ¼Second- and third-grade combination class.
Nature of the comparison group Core reading instruction only ¼Business-as-usual, whole-class reading
instruction with no additional support (i.e., Tier 1); School-provided
intervention ¼Reading interventions typically provided by the
school/district (i.e., some form of preventative intervention provided
in addition to core Tier 1 reading).
Participantsrisk level
At risk ¼Only students in the 25th percentile or lower on a
standardized norm-referenced screener; Minimal risk ¼Students
considered potentially at risk who score below the 40th percentile
on a standardized norm-referenced screener.
Outcome measure domains Word or pseudoword reading ¼e.g., TOWRE &Woodcock-Johnson Word
Attack; Passage reading fluency ¼e.g., AIMSweb Standard Reading
Assessment Passages; Reading comprehension ¼Woodcock Reading
Mastery Tests (WRMT) Passage Comprehension subtest, GRADE
reading comprehension subtest.
Measure type
Standardized tests ¼Existing measures administered, scored, and
interpreted in the same way for all test-takers; Researcher-developed
measures ¼Only those the researcher developed for the study.
Group size Small group ¼groups of more than one student; One-on-one ¼1
student with 1 interventionist.
Certified teacher ¼Had a teaching credential, even if they were not
employed as a full-time teacher at the schools where the studies
took place; Paraprofessional ¼Anyone who worked or volunteered at
the school as part of the study and had no teaching credential;
Research staff ¼Were typically graduate students at a university.
Avg. hrs./week of instruction Low ¼Less than 1.5 h per week; Medium ¼1.5 to 2.0 h per week; High
¼2.0 or more hours per week.
Scripted Yes ¼Interventionist provided with step-by-step instructions on what to
say and do during each session; No ¼Interventionist was not
provided with step-by-step instructions.
Monitoring and feedback Yes ¼Interventionists were observed conducting the intervention and
were provided feedback after they were observed;
No ¼Interventionists were not observed and no feedback
was provided.
If a study included students from more than one grade (e.g., from Grades 1 and 2), then the study was assigned to
the grade level of the majority of the sample.
If the authors did not provide a percentile on a nationally normed test to describe the at-risk sample, the study was
not coded for this variable.
Measures of pre-reading skills such as phonological awareness, rhyming, and letter naming were excluded, as were
measures of listening comprehension, spelling, and writing.
Studies with a mix of interventionists (e.g., both teachers and paraprofessionals) were coded by the most preva-
lent category.
fluency, vocabulary, and reading comprehension). The effect size was calculated for
each outcome using the means and pooled standard deviations for intervention and
comparison groups, and corrected for small sample bias by Hedges (1981) proce-
dures. In cases where means and standard deviations were not available, the tor F
statistics and the treatment and comparison group sample sizes were used to calcu-
late Hedgesg.
Meta-Analytic Procedures
To account for dependencies in our data, we used random effects robust variance
estimation (RVE) techniques (Hedges et al., 2010). RVE permits the comparison of
effect sizes across studies in which multiple, dependent effect sizes are drawn from
the same sample. Random effects analyses were conducted using the statistical soft-
ware Stata (StataCorp, 2015) and the Robumetapackage (Hedberg, 2011), a
macro that applies the RVE techniques. In RVE, the mean correlation between all
pairs of effect sizes within a study (q) must be specified to estimate the study
weights and calculate the between-study variance. We used a qvalue of .80 to esti-
mate the between-study variance and then conducted sensitivity analyses using q
values of 0 to .90.
The small-sample correction developed by Tipton (2015) was
implemented in Robumeta for all models, as RVE results have been shown to
inflate the Type I error rate when the meta-analysis includes fewer than 40 studies
(Tipton, 2015).
We estimated a series of meta-regression models using RVE. First, we ran an inter-
cept-only model in which the estimate for the constant represented the average weighted
effect size across all 33 studies (Tanner-Smith & Tipton, 2014). The Robumeta package
calculated the following indices of heterogeneity: Qstatistic and its p-value, estimates of
(the percentage of between-study heterogeneity not due to chance variation in
effects), and s
(the true variance in the population of effects).
Next, we examined the role of moderators such as group size, grade level, and level
of support provided to interventionists. Though one meta-regression model with covari-
ates for all moderators is preferable, the number of studies that met the inclusion crite-
ria was small, and not every study included information that permitted coding of all
moderators. Thus, this approach was not taken because results would be uninterpretable
due to insufficient degrees of freedom. Instead, we examined potential moderators using
separate RVE meta-regression models with only the moderator of interest entered as a
predictor. We interpret these results with caution due to potential confounding effects
of other moderators that are unaccounted for in these single-predictor models.
Moreover, a small number of single-predictor models remained underpowered (df <4),
likely a result of large imbalances in the data (Tipton, 2015).
The moderator variables were dummy coded and included as covariates in each
model. To estimate a mean effect size for each level of the moderator variables (i.e.,
Hedges et al. (2010) demonstrated that the value selected for
generally does not affect results much and
recommended implementing a sensitivity analysis by analyzing models with varying
values. We conducted sensitivity
analyses using
values of 0 to .90 and found no meaningful differences in the results across models, indicating that
our findings were robust across estimates of
RCT and QED are levels of the design moderator variable), intercept-only models also
were run for each level of the moderator. The p-value for determining statistical signifi-
cance in each of the moderator analyses was set to p<.05.
Publication Bias
The potential impact of publication bias using the trim-and-fill methodology (Duval &
Tweedie, 2000) was examined by constructing a funnel plot of the effect sizes and not-
ing any asymmetry in the distribution of effects. The plot was then systematically
trimmed, removing the effect sizes causing the asymmetry, and filled in with any effect
sizes that may have been missing from unpublished studies that resulted in small and
non-significant treatment effects. The analysis estimated the number of missing effect
sizes and recalculated the overall mean effect size in a way that reflects the presence of
these missing effects.
Study Characteristics
A total of 33 studies from 25 publications met WWC group design standards and were
included in the final meta-analysis. These 33 studies spanned 13 years (20042016) and
provided a total of 128 effect sizes. The sample sizes in the studies ranged from 21 to
6,888 students. Total sample size across all studies was 11,737 students (median sample
size ¼89).
Thirty of the studies were RCTs; the remaining three were QEDs. The comparison
condition for the majority of studies (k¼21) was Tier 1 core classroom instruction (i.e.,
nothing other than what the classroom teacher chose to provide). In the remaining 12
studies, the comparison was typical school- or district-provided intervention. Twenty-
two studies were conducted in Grade 1, and 11 studies were in Grades 2 and/or 3. Only
16 studies provided the information necessary for coding for participantsrisk level.
those, 12 studies included students in the minimal risk category, and only 4 studies
included students in the at-risk category (i.e., 25th percentile or lower). The most com-
mon outcome domain was word or pseudoword reading (included in all but one of the
33 studies). Nineteen studies included outcomes in reading comprehension, 16 included
passage reading fluency outcomes, and only two studies included outcomes in vocabu-
lary. The study characteristics for each study included in the meta-analysis are presented
in Table 2.
Intervention Characteristics
Interventions were delivered to students in one-on-one settings in 21 of the studies and
in small-group settings (group sizes in the included studies ranged from 2 to 5 students)
If the authors did not provide a percentile on a nationally normed test to describe the at-risk sample, the study was
not coded for this variable. Across the studies, a wide range of screening measures and operational definitions were
used, and the screeners were typically not nationally normed.
Table 2. Study and intervention characteristics for each study included in the meta-analysis.
Study characteristics Intervention characteristics
Author Design
Grade At-risk level
Outcome domain
Avg. hrs./wk.
Allor and McCathren (2004) Study 1 RCT CR 1 RC 1:1 P Y L Y
Allor and McCathren (2004) Study 2 RCT CR 1 PF, RC, WR 1:1 P Y L Y
Berninger, Abbott, Vermeulen, and Fulton (2006) RCT CR 2/3 A WR SG R N M N
Blachman et al. (2004) RCT SI 2/3 A PF, RC, WR 1:1 CT N H Y
Case et al. (2010) RCT CR 1 WR SG R Y M Y
Case et al. (2014) RCT CR 1 PF, WR SG R Y M Y
Denton et al. (2010) RCT SI 1 RC, WR SG CT N H Y
Denton et al. (2014) RCT SI 2/3 M PF, RC, WR SG P Y H Y
Denton et al. (2014) RCT SI 2/3 M PF, RC, WR SG P Y H Y
Fien et al. (2015) RCT SI 1 M PF, WR SG P N H Y
Fuchs, Compton, Fuchs, Bryant, and Davis (2008) RCT CR 1 WR SG R N H N
Gunn et al. (2005) RCT CR 2/3 PF, RC, WR SG P Y M Y
Jacob, Armstrong, Bowden, and Pan (2016) RCT SI 2/3 PF, RC, WR 1:1 P Y L Y
Jenkins, Peyton, Sanders, and Vadasy (2004) QED CR 1 A RC, WR 1:1 P Y M Y
Lane et al. (2009) RCT CR 1 M WR 1:1 R N M N
Lane et al. (2009) RCT CR 1 M WR 1:1 R N M N
Lane et al. (2009) RCT CR 1 M WR 1:1 R N L N
Lane et al. (2009) RCT CR 1 M WR 1:1 R N L N
May, Sirinides, Gray, and Goldsworthy (2016) RCT SI 1 RC, WR 1:1 CT N H N
OConnor et al. (2010) RCT CR 2/3 PF, RC, WR 1:1 P N L N
OConnor et al. (2010) RCT CR 2/3 PF, RC, WR 1:1 P N L N
Pullen, Lane, and Monaghan (2004) RCT CR 1 M WR 1:1 P N L N
Scanlon, Vellutino, Small, Fanuele, and Sweeney (2005) RCT SI 1 RC, WR 1:1 CT H Y
Scanlon et al. (2005) RCT SI 1 RC, WR 1:1 CT H Y
Schwartz (2005)
RCT CR 1 RC, WR 1:1 CT N H N
Smith et al. (2016) RCT SI 1 M PF, WR SG P N H Y
Vadasy and Sanders (2011)
RCT CR 1 PF, RC, WR 1:1 P Y L Y
Vadasy, Sanders, and Peyton (2006) QED
QED CR 2/3 M PF, RC, WR 1:1 P Y M Y
Vadasy et al. (2006) RCT
RCT CR 2/3 M PF, RC, WR 1:1 P Y M Y
Vadasy et al. (2007)
RCT CR 2/3 M PF, WR 1:1 P Y M Y
Vellutino and Scanlon (2002) RCT SI 1 A WR 1:1 CT N H Y
Wang and Algozzine (2008) RCT CR 1 RC, WR SG P Y L N
Wanzek and Vaughn (2008)
Identifies studies also included in Wanzek et al. (2016).
RCT ¼randomized controlled trial. QED ¼quasi-experimental design.
CR ¼core reading instruction. SI ¼school-provided intervention.
A¼at risk, a sample with students only in the 25th percentile or lower. M ¼minimal risk, a sample that included students below the 40th percentile. Blank ¼authors did not provide
a percentile or the information to calculate the percentile from a nationally normed test given for screening purposes.
WR ¼Word & Pseudoword Reading. RC ¼Reading Comprehension. PF ¼Passage Fluency.
SG ¼small group.
P¼paraprofessional. R ¼researcher. CT ¼certified teacher.
Y¼scripted. N ¼not scripted.
L¼low, M ¼medium, H ¼high.
Y¼support included. N indicates that monitoring and feedback was either not included or not reported.
in 12 studies. The interventions were implemented by a researcher in nine studies, a
certified teacher in seven studies, and a paraprofessional in 15 studies. In 21 studies,
additional support in the form of monitoring and feedback was provided to the inter-
ventionists. Nearly half of the studies (k¼15) included scripted interventions. The aver-
age hours per week of intervention ranged from less than an hour (i.e., 45 min) to
4.17 h per week (median ¼2). The intervention characteristics for each study included
in the meta-analysis are described in Table 2.
Instructional Area/Focus of the Intervention
All the studies included in the meta-analysis examined interventions that focused on
building studentsreading skills in more than one instructional area (e.g., decoding, flu-
ency, comprehension). In many respects, the interventions appeared to be similar to
each other, mainly addressing decoding and fluency, while also attending to one or
more areas of reading instruction, such as encoding, comprehension, vocabulary, phono-
logical awareness, and writing.
All but two studies (k¼31) addressed decoding. Many studies also included instruc-
tion in passage fluency (k¼29) and encoding (k¼23). Over 50 percent of the studies
addressed phonological awareness (k¼19) and comprehension (k¼18). Vocabulary
(k¼13) and writing (k¼7) were addressed less frequently.
All studies with decoding and fluency were coded for both area of instruction as well
as the focus of instruction. Only nine of the 18 studies that were coded for comprehen-
sion were also given a Yes under the focus code as they showed evidence of systematic
teacher-led explicit instruction in comprehension that went beyond asking literal ques-
tions, questions about title and pictures, or monitoring comprehension strategies. Only
one of the nine studies coded for vocabulary was also given a Yes for focus. Vocabulary
instruction rarely involved explicit teaching and interaction; it typically involved defin-
ing words if students asked, or asking students to look at pictures to derive the meaning
of a word.
Meta-Analytic Results
The meta-analysis included 128 effect sizes from 33 studies. See Table 3 for effect sizes
(Hedgesg) for all outcome measures by domain for each study. Effect sizes ranged
widely, from 0.20 to 1.37. The mean effect size for these studies was 0.39 (SE ¼.04,
p<.001, 95% CI [0.32, 0.46]), indicating that the reading interventions were generally
effective across students, settings, and measures. As expected, treatment effects varied
considerably. The I
estimate of the percentage of between-study heterogeneity not due
to chance was 50.75%, with a s
estimate of the true variance in the population of
effects of 0.02.
Eleven categorical moderator analyses of study characteristics (six variables) and inter-
vention characteristics (five variables) were individually tested using each as a single
Table 3. Outcomes and effect sizes.
Author Total NOutcome domain
Measure type
Allor and McCathren (2004) Study 1 86 RC S 0.50
Allor and McCathren (2004) Study 2 157 WR S 0.05, 0.13, 0.33, 0.44, 0.78
RC S "0.16
PF R 0.13
Berninger et al. (2006) 93 WR S 0.35
Blachman et al. (2004) 69 WR S 0.74, 0.87
RC S 0.53
PF S 0.70
Case et al. (2010) 30 WR S 0.48, 0.73, 0.73
Case et al. (2014) 123 WR S 0.02, 0.17, 0.21, 0.23
PF S 0.20
Denton et al. (2010) 422 WR S 0.42
RC S 0.51
Denton et al. (2014) 103 WR S 0.34, 0.40, 0.50
RC S 0.08, 0.13
PF S 0.16
Denton et al. (2014) 112 WR S 0.31, 0.50, 0.63
RC S 0.29, 0.46
PF S 0.45
Fien et al. (2015) 239 WR S 0.38, 0.45
PF S 0.30
Fuchs et al. (2008) 64 WR S 0.26, 0.26, 0.38, 0.46, 0.65
Gunn et al. (2005) 245 WR S 0.30, 0.52
RC S 0.32
PF S 0.24
Jacob et al. (2016) 1,166 WR S 0.11
RC S 0.10
PF S 0.09
Jenkins et al. (2004) 99 WR S 0.37, 0.50, 0.52, 0.73, 0.76, 1.12
RC S 0.74
Lane et al. (2009) 41 WR R 0.64, 0.71
WR S 1.24
Lane et al. (2009) 42 WR R 0.24, 0.29
Lane et al. (2009) 43 WR R 0.39, 0.55
Lane et al. (2009) 46 WR R 0.52, 0.59
WR S 1.02
May et al. (2016) 6,888 WR S 0.41
RC S 0.42
OConnor et al. (2010) 40 WR S 0.10, 0.56
RC S 0.48, 0.53
PF S 0.60, 0.75, 0.76, 0.87
OConnor et al. (2010) 43 WR S 0.25, 0.57
RC S 0.37, 0.44
PF S 0.81, 0.84 0.93, 1.33
Pullen et al. (2004) 47 WR R 0.24, 0.81
WR S 0.54, 0.59
Scanlon et al. (2005) 114 WR S 0.31, 0.55
RC S 0.41
Scanlon et al. (2005) 117 WR S 0.51, 0.62
RC S 0.35
Schwartz (2005) 74 WR S 0.93, 1.37
RC S 0.14
Smith et al. (2016) 743 WR S 0.19
PF S 0.12
Smith et al. (2016) 729 WR S 0.23, 0.32
Smith et al. (2016) 749 WR S 0.24
PF S 0.18
Vadasy and Sanders (2011) 89 WR S 0.51
RC S 0.29
PF R 0.69
Vadasy et al. (2006) QED 31 WR S 0.61, 0.72
predictor in the meta-regression models. Some of these variables emerged as significant
moderators of the relationship between reading interventions and effect sizes on meas-
ures of studentsreading proficiency.
Findings are summarized in Table 4. The coefficients from the intercept-only models
should be interpreted as the weighted effect size for studies with that level of moderator,
and statistically significant results indicate that the mean effect size for studies with that
level of the moderator is significantly different from zero.
Study Characteristics. The outcome domain for the measures, when comparing the
three areas of reading
(word or pseudoword reading, reading comprehension, passage
reading fluency) significantly moderated effect size. On average, outcomes in the word
or pseudoword reading domain yielded the largest effect size (b¼0.41 [0.330.50],
p<.001, k¼32). Reading comprehension domain outcomes produced a slightly smaller
effect (b¼0.32 [0.200.43], p<.001, k¼19), followed by passage reading fluency out-
comes, which generated the smallest effect size (b¼0.31 [0.170.44], p<.001, k¼16).
The only significant difference, however, was on outcomes in the word or pseudoword
reading domain, which yielded significantly larger effect sizes than outcomes in the
domains of reading comprehension or passage reading fluency (b¼0.10 [0.000.19],
p¼.049, k¼32)
Study characteristic variables that did not significantly moderate the effect size
included grade level (b¼0.06 ["0.260.15], p¼.542, k¼33), research design (b¼"0.06
["1.030.92], p¼.823, k¼33), the nature of the comparison group (b¼"0.11
["0.250.04], p¼.139, k¼33, participantsrisk level (b¼0.13 ["0.180.44], p¼.330,
k¼16), and standardized measures versus researcher-developed measures (b¼0.11
["0.070.28], p¼.189, k¼33).
Table 3. Continued.
Author Total NOutcome domain
Measure type
RC S 0.50
PF R 0.81
Vadasy et al. (2006) RCT 21 WR S 0.67, 0.75
RC S 0.21
PF R 0.55
Vadasy et al. (2007) 43 WR S 0.47
PF S 0.52
Vellutino and Scanlon (2002) 118 WR S 0.38
Wang and Algozzine (2008) 139 WR S "0.03, 0.39, 0.45
RC S 0.17
Wanzek and Vaughn (2008) 50 WR S 0.12, 0.18
PF S "0.20
WR ¼Word & Pseudoword Reading. RC ¼Reading Comprehension. PF ¼Passage Fluency.
R¼researcher developed. S ¼standardized.
One effect size per measure. If four effect sizes are listed, it means four measures were used in this outcome domain.
The analysis could not include a meta-analysis of effects from outcomes in the vocabulary domain due to the small
number of studies (k¼2).
Intervention Characteristics. None of the intervention characteristics led to significant
moderator effects. These included interventions implemented by researchers (b¼"0.01
["0.210.20], p¼.939, k¼33), interventions implemented by certified teachers
(b¼0.14 [0.000.28], p¼.053, k¼33), interventions implemented by a paraprofessional
(b¼"0.12 ["0.250.02], p¼.086, k¼33), if an intervention was scripted (b¼"0.13
["0.280.02], p¼.094, k¼31), whether or not monitoring and feedback was provided
(b¼"0.12 ["0.260.03], p¼.103, k¼33), or average hours per week of intervention
for the low (b¼"0.08, ["0.300.15], p¼.464, k¼33), medium (b¼0.04, ["0.140.23],
p¼.610, k¼33), or high (b¼0.03, ["0.130.18], p¼.722, k¼33) categories, and
grouping (either small group or one-on-one interventions) (b¼"0.13 ["0.270.01],
p¼.075, k¼33). When tested per grade level, however, grouping was a significant mod-
erator for Grade 1 but not Grades 2 and 3. For Grade 1 specifically, effects were larger
if the intervention was delivered to students individually rather than to groups of stu-
dents, (b¼"0.16 ["0.320.01], p¼.042, k¼22).
Area/Focus of Instruction. Given that all the studies examined interventions with mul-
tiple areas of instruction, we tested a meta-regression analysis model using all seven
areas of instruction. Analyzing the areas of instruction simultaneously allowed us to
examine each areasmoderatinginfluencewhileholdingtheotherareasconstant.Of
the seven areas of instruction, phonological awareness, encoding (spelling), and writ-
ing appeared to be significant moderators of effect sizes when holding all other areas
of instruction constant (see Table 4). Interventions that included phonological aware-
ness tended to result in smaller effects across the word or pseudoword reading, read-
ing comprehension, and passage reading fluency outcomes (b¼"0.19 ["0.32, "0.05],
p¼.010). However, this study did not specifically address phonological awareness out-
comes, so we do not know the impact on those. In contrast, providing instruction in
encoding (b¼0.18 [0.01, 0.35], p¼.045) or writing (b¼0.18 [0.02, 0.34], p¼.028)
yielded significantly higher effect sizes when they were included as a component in
the intervention.
We were also interested in whether studies that included a focus in a particular
areathat is, more in-depth and explicit instructionsignificantly moderated impacts.
Effect sizes were not significantly associated with providing a more in-depth instruc-
tional focus in decoding (b¼"0.17 ["0.890.55], p¼.216, k¼33), fluency (b¼"0.04
["0.340.26], p¼.694, k¼33), or comprehension (b¼"0.06 ["0.300.17], p¼.572,
k¼33). Note that vocabulary was not examined here as a focus due to the limited num-
ber of studies.
Publication Bias
Finally, to determine whether the findings suffered from publication/small-study
bias, we implemented the trim-and-fill method (Duval & Tweedie, 2000). The
results indicated that 23 effect sizes were estimated missing from the current meta-
analysis of 128 total effects. Including these in the random-effects model would
minimally decrease the mean effect size from g¼0.39 to g¼0.32 (p<. 001, 95%
CI [0.27, 0.36]).
Table 4. Moderator analysis.
Moderator Coeff SE 95% CI p df QI
Grade level
1 vs. 2/3 0.06 0.09 (0.26, 0.15) 0.542 12 54.82 41.63 0.02 128 33 .8
Small Group vs. 1:1 0.13 0.07 (0.27, 0.01) 0.075 20 64.31 50.24 0.02 128 33 .8
Research design
RCT vs. QED 0.06 0.23 (1.03, 0.92) 0.823 <4 66.80 52.10 0.02 128 33 .8
Paraprofessional vs. Other 0.12 0.06 (0.25, 0.02) 0.086 19 49.68 35.59 0.01 128 33 .8
Certified Teacher vs. Other 0.14 0.06 (0.00, 0.28) 0.053 8 50.87 37.09 0.02 128 33 .8
Researcher vs. Other 0.01 0.09 (0.21, 0.20) 0.939 10 67.00 52.24 0.02 128 33 .8
Scripted intervention
Scripted vs. Non-Scripted 0.13 0.07 (0.28, 0.02) 0.094 18 50.53 40.63 0.02 122 31 .8
Nature of the comparison group
SI vs. CR
0.11 0.07 (0.25, 0.04) 0.139 24 65.88 51.43 0.02 128 33 .8
At-risk sample
At risk vs. Minimal risk 0.13 0.12 (0.18, 0.44) 0.330 5 16.24 0.01 58 16 .8
Hours of treatment
Low vs. Other 0.08 0.10 (0.30, 0.15) 0.464 10 53.19 39.84 0.02 128 33 .8
Medium vs. Other 0.04 0.08 (0.14, 0.23) 0.610 10 66.89 52.16 0.02 128 33 .8
High vs. Other 0.03 0.07 (0.13, 0.18) 0.722 21 57.57 44.42 0.02 128 33 .8
Provided monitoring/feedback
Yes vs. No 0.12 0.07 (0.27, 0.03) 0.103 12 54.27 41.04 0.02 128 33 .8
Measure type
Standardized vs. Researcher 0.11 0.07 (0.07, 0.28) 0.189 7 66.36 51.78 0.02 128 33 .8
Outcome domain
Word/Pseudoword Reading vs. Other 0.10 0.05 (0.00, 0.19) 0.049 19 66.10 51.59 0.02 128 33 .8
Passage Reading Fluency vs. Other 0.09 0.06 (0.23, 0.05) 0.188 10 60.01 46.68 0.02 128 33 .8
Reading Comprehension vs. Other 0.05 0.05 (0.23, 0.05) 0.308 13 66.58 51.94 0.02 128 33 .8
Instructional Area
Constant 0.49 0.15 (0.02, 1.00) 0.055 <4 44.07 27.39 0.02 128 33 .8
Decoding 0.12 0.11 (0.53, 0.29) 0.356 <4 .8
Passage fluency 0.00 0.10 (0.27, 0.26) 0.974 5 .8
Reading comprehension 0.08 0.07 (0.23, 0.06) 0.234 13 .8
Vocabulary 0.08 0.09 (0.12, 0.27) 0.414 11 .8
Phonological awareness 0.19 0.06 (0.32, 0.05) 0.010 13 .8
Encoding 0.18 0.07 (0.01, 0.35) 0.045 8 .8
Writing 0.18 0.07 (0.02, 0.34) 0.028 8 .8
Note. Coeff = coefficient; SE = standard error; CI = confidence interval; p= significance; df = degrees of freedom; Q = test of homogeneity of effect sizes; I
= measures of effect size
variability; τ
= between-study variance; n= number of effect sizes; k= number of studies; ρ= corrected correlation. In all RVE models, we used a ρvalue of .80 to estimate the
between-study variance; = could not be estimated. Bolded coefficient values indicate statistically significant estimates at p< .05.
CR = core reading instruction. SI = school-provided intervention.
Results from this meta-analysis of 33 studies of reading interventions conducted
between 2002 and 2017 reveal significant, positive effects on a range of reading out-
comes. The significant mean effect size (Hedgesg) across 33 studies was 0.39 (p<.001),
indicating that students from Grades 1, 2, and 3 who score in the at-risk category on a
screening battery or on a normed test do, on average, benefit from the set of reading
interventions studied. This leads us to conclude that the research base underlying read-
ing interventions is sound and not the primary reason for the lack of impacts in the
national RtI evaluation (Balu et al., 2015), which found null, or in one case negative,
impacts on reading outcomes for students at or near the cut point on screening.
Mean effect sizes (Hedgesg) for each outcome domain ranged from 0.41 in the area
of word or pseudoword reading to 0.32 in comprehension and 0.31 in passage reading
fluency. All were statistically significant at p<.001. Note that the mean effect size was
the highest in the outcome domain of word and pseudoword reading. This is unsurpris-
ing, given the large body of evidence supporting the use of various forms of systematic,
explicit, small-group instruction in phonemic awareness, phonics instruction, and sight
word reading to help students who are likely to fall behind when experiencing more
traditional instruction (e.g., Gersten, Compton, et al., 2009; National Institute of Child
Health and Human Development [NICHD], 2000).
The reading interventions examined showed many commonalities. Every intervention
addressed multiple aspects of foundational readingphonological awareness, decoding,
passage reading fluency, encoding (spelling) and, on occasion, writing. Nearly all inter-
ventions addressed comprehension in some fashion, although few provided much in the
way of detail. Vocabulary and comprehension instruction were rarely emphasized.
Virtually all interventions included systematic, explicit instruction. Typically, this
occurred during phonics/word reading skills and passage reading fluency, often with
some activities geared toward fluency building and phonological awareness.
Interventions that included instruction on phonological awareness were associated
with significantly smaller effects, whereas interventions that addressed encoding or writ-
ing yielded significantly higher effect sizes. Perhaps focusing on pre-reading skills
such as phonological awareness after students have started to learn to decode is coun-
ter-productive as it takes time and focus away from gaining proficiency in decoding
skills. We speculate that an encoding component may help reinforce phonics rules and
decoding, and we note that this has been a feature of some core reading programs and
intervention programs.
Variables for Future Exploration
The percentage of between-study heterogeneity not due to chance was 50.75%, suggest-
ing both a good deal of variance in the pattern of effects and the need to use moderator
analyses to begin to understand salient factors. Although many of the moderators
explored in the current meta-analysis were non-significant (p>.05), future research is
needed to explore aspects of the interventions that may moderate the relationship
between the intervention and reading outcomes. In particular, researchers should
continue to investigate variables that could provide us with possible explanations for the
null and negative impacts from the Balu et al. (2015) study.
One finding worth exploring further is whether the interventionist moderates student
impacts on measures of reading achievement. Interventions implemented by certified
teachers did not yield significantly higher effect sizes (p¼.053) than those conducted by
others (primarily paraeducators, university students working for a researcher). Yet,
results of the Balu et al. (2015) survey revealed that teachers provided intervention in
over a third of schools implementing RtI in Grades 13. Similarly, the effect sizes for
interventions delivered by paraprofessionals did not differ significantly from those deliv-
ered by certified teachers or researchers (p¼.086) in our analyses. These results conflict
with those reached by Slavin et al. (2011) in an earlier review of literature on reading
interventions, which suggests this is an area that warrants further investigation.
The results from the Balu et al. (2015) study also suggest that schools implementing
RtI often used small groups ranging from 2 to 10 students, as opposed to the interven-
tions in the meta-analysis, which were implemented in smaller groups (2 to 5 students)
or one-on-one. We found an average effect size of 0.46 for interventions that were deliv-
ered one-on-one and 0.31 for those delivered to small groups of students; however, this
moderator variable was not statistically significant (p¼.075). Further analyses revealed
that grouping moderated effects for Grade 1 but not for Grades 2 and 3 (p¼.042).
One-on-one instruction may be more beneficial for beginning readers. Similarly, even in
small groups of 25 students, it may be easier to meet studentsneeds if all the students
in the group are similar in their basic knowledge of rhyming, alphabet, phonemes, and
decoding skills. A recent study by Al Otaiba, Connor, et al. (2014) supports this notion.
They found that it was necessary to make small groups more homogeneous by adjusting
both the texts readability level and the lesson pacing to meet studentsindividual needs.
It could also be that the schools in the Balu et al. (2015) study used more scripted
interventions, though we cannot know for sure since the Balu survey did not ask
whether scripted interventions were used. Previous research has indicated that programs
where teachers are given some autonomy tend to produce higher results in reading
comprehension (Fang, Fu, & Lamme, 2004; Tivnan & Hemphill, 2005; Wilson, Martens,
& Arya, 2005). One reason for this might be that scripted interventions leave little room
for even slight adaptations to meet individual student needs when compared to those
with a lesson plan and no exact wording. Our results, however, found the effect sizes
for research interventions that allowed teachers to adapt the intervention to students
needs did not differ significantly from scripted interventions (p¼.094). Future research
should explore this area.
Before overgeneralizing from these findings, it is important to note that other var-
iables may be confounding the relationship. For example, all but 3 of the 15 scripted
interventions were implemented by paraprofessionals. Typically, paraprofessionals
implement scripted programs because most do not have the training to make appro-
priate instructional decisions when using a traditional lesson plan. Because our lim-
ited number of studies hindered our ability to model all the potential moderators at
once (i.e., controlling for other variables), the moderator findings should be inter-
preted with caution.
Relation to Previous Relevant Meta-Analysis
It is difficult to draw a direct comparison between the current study and the Slavin
et al. (2011) and Wanzek et al. (2016,2018) meta-analyses. Though the studies included
in this meta-analysis overlap some of the studies included in the other meta-analyses,
this meta-analysis is the first to use rigorous standards of evidence in the inclusion cri-
teria. The other meta-analyses included studies that were not as rigorous as those in the
current study and included kindergarten interventions, which typically focus heavily on
reading-related skills such as phonological awareness, rhyming, and basic decoding.
The impacts in the current meta-analysis (0.39) are smaller than several impacts in
the Wanzek et al. (2016) meta-analysis of studies of shorter interventions: 0.54 on stand-
ardized foundational skill measures, 0.62 for non-standardized foundational skill meas-
ures, and 1.02 for non-standardized multicomponent measures. The differences in the
magnitude of effect sizes may be due to studies that were not as rigorous as those in the
current study or to the inclusion of kindergarten interventions. However, in the Wanzek
et al. (2016) study, domain level impacts were reported for composite domainsfounda-
tional reading/reading-related skills (including phonological awareness, rhyming, letter
identification, as well as measures of decoding; 0.54 to 0.62)and multicomponent
measures (a composite of listening and reading comprehension; 0.36 to 1.02), which
makes it difficult to compare against our domain-level impacts, which ranged from 0.31
to 0.41.
Yet, effects in the Wanzek et al. (2018) meta-analysis of studies of longer interven-
tions, and the impacts of one-on-one and small group interventions in Slavin et al.
(2011), and the impacts on standardized multicomponent measures in Wanzek et al.
(2016) are similar to those found in our analysis. These findings suggest consistency in
the impact of reading interventions for struggling readers.
Challenges and Limitations in Conducting the Meta-Analysis
Issues in Using Rigorous Design Standards and Contemporary Meta-Analytic Techniques
A unique feature of this meta-analysis is that it included only those studies that met
what is often called the gold standard, What Works Clearinghouse (WWC 3.0) standards
for RCTs and quasi-experimental design. Ninety-one percent of the studies included in
this meta-analysis were RCTs (k¼30), which is a much higher proportion of RCTs
than in similar previous meta-analyses (e.g., Wanzek et al., 2016 [55% RCTs]; Swanson,
1999 [47.9% RCTs]). Including only those studies that met these rigorous standards
allows for more confidence in the meta-analytic findings. This is an especially important
contemporary issue given the lack of replicability of findings in the social sciences
(Ioannidis, 2005) and the general concern about false positives in both individual
research studies (Benjamini & Hochberg, 1995) and meta-analyses (Greco, Zangrillo,
Biondi-Zoccai, & Landoni, 2013).
The gain in trustworthiness, however, resulted in less statistical power for analyses,
including the crucial moderator analyses, which help in understanding possible underly-
ing themes in the data, as invariably, fewer studies met the rigorous design standards in
this meta-analysis. This is likely to become an issue in future meta-analyses, as the
tradeoff between the quality and validity of the research findings conflicts with the need
for a large number of studies in conducting important moderator analyses with suffi-
cient power. We suspect it will take some time for the field to produce enough high-
quality studies to result in statistically significant findings from which we could draw
conclusions across studies.
As studies most often contain more than one outcome measure and at times more
than one comparison, meta-analyses must address the dependencies arising from such
multiple outcomes and comparisons within studies. This issue was pertinent for our
meta-analysis, as 90% of the studies included multiple measures and 15% contained
multiple comparisons. Thus, to account for the dependencies in the data, we used the
Robust Variance Estimation (RVE; Hedges et al., 2010), a contemporary statistical tech-
nique. One problem in using the RVE is that it results in low statistical power rates
unless the meta-analysis includes a large number of studies (L!
opez, Van den
Noortgate, Tanner-Smith, Wilson, & Lipsey, 2017). Tipton (2015) notes that at least 40
studies are needed for adequate statistical power to conduct the moderator analyses.
Our meta-analysis included 33 high-quality experimental and quasi-experimental studies
(meeting WWC standards), a number not typically seen for a topic as specific as this in
other areas of educational research. However, it still fell below the minimum number of
40 studies for adequate statistical power to conduct the moderator analyses.
A meta-regression model with all moderators entered simultaneously (e.g., Gersten,
Chard, et al., 2009; Wanzek et al., 2016) would have been preferable to our series of
analyses, which tested each moderator individually. However, the overall number of
studies that met the inclusion and study quality criteria was small, and not every study
included information that permitted coding of all moderators. Thus, analyzing all varia-
bles within one regression model was not feasible, as results would be
uninterpretable due to insufficient degrees of freedom. Therefore, the single-predictor
RVE meta-regression models used in the meta-analysis must be interpreted with caution
due to the potential confounding effects of other moderators that are not accounted for
in these models.
Issues in Coding Studies
Only a few studies (Denton et al., 2014; Vadasy, Sanders, & Tudor, 2007) provided a
rich description on the nature of instruction in the intervention. Most articles did not
provide sufficient detail on how reading is taught; instead, they merely listed the areas
of instruction that were covered, provided a brief cursory explanation, or addressed
them in a figure or table with little sense of the amount of time devoted to activities or
what they actually entailed. Because written descriptions were often not detailed enough,
coding or classifying these areas was at times guesswork.
Given the difficulty we had with coding the instructional focus categories, we recom-
mend future intervention research articles to include more detailed descriptions of each
component of the intervention. However, this may be easier said than done as journal
article submission usually entails strict space allocation. Understanding this, we would
encourage authors to write detailed descriptions with sample lessons, place them in a
website noted in the article, and provide access to the information. That would
allow those involved in research syntheses and those interested in replications to
access this material and ultimately have a better understanding of the nature of
the research.
Coding the at-risk category was a challenging task, as out of the 33 studies, only 16
could be used for the analysis examining the moderating role of the at-risk status variable.
This is because, across the 33 studies, there was little commonality in how at-risk status
was operationally defined (i.e., described as below grade-level performance; based on local
norms, national norms, researcher-developed measures, or validated screening measures),
making comparisons across studies difficult and underpowered. We also found the use of
norms from standardized tests to be problematic because some of the norms were much
older than others, and the field of early literacy instruction has undergone massive changes
in the past 15years. In addition, there are likely to be shifts in national norms on some
measures, especially those involving phonological awareness, phonics, and possibly oral
reading fluency. It would be helpful if the field could adopt more consistent means of
determining the suitable samples for a Tier 2 reading intervention.
Fidelity of implementation was another area that was difficult to code due to the lack
of consistency across studies in how fidelity was explained and measured. For instance,
if different measurement systems are used, 80% fidelity in one study is not comparable
to 80% fidelity in another study. As a result, though this was very much an area of
interest for us, we could not code for fidelity as a moderator.
Implications for Future Research
Most intervention studies examine impacts immediately at the end of an intervention.
An important next step in reading intervention research, one only occasionally
attempted to date (e.g., Al Otaiba, Kim, Wanzek, Petscher, & Wagner, 2014; Blachman
et al., 2014; Vaughn et al., 2008), is to see whether the impacts on reading performance
are maintained, both with and without further intervention, in follow-up studies.
Additional intervention research is also needed in the area of vocabulary. Few studies
in our meta-analysis addressed reading vocabulary in a comprehensive manner during
the intervention, and only two studies (Gunn, Smolkowski, Biglan, Black, & Blair, 2005;
OConnor, Swanson, & Geraghty, 2010) included vocabulary as an outcome measure.
We were therefore unable to draw conclusions on this crucial aspect of reading profi-
ciency. Future intervention research, especially in Grades 2 and 3, should include a sys-
tematic vocabulary instruction component in the interventions and assess its
effectiveness using reading vocabulary outcomes.
We would also encourage more intervention research in the areas of reading and lan-
guage comprehension, since these were areas of weaker impacts. Newer intervention
research (e.g., Foorman, Herrera, & Dombek, 2018) increasingly includes both reading
and listening comprehension, and these may lead to stronger impacts in the reading
comprehension domain.
The authors wish to acknowledge the sage advice provided by Nancy Lewis and Terri Pigott, and
recognize Samantha Spallone, Pam Foremski, and Christopher Tran for their assistance.
This research was supported in part by Contract Number [ED-IES-12-C-0011]. The views do not
represent those of the U.S. Department of Education.
Al Otaiba, S., Connor, C. M., Folsom, J. S., Wanzek, J., Greulich, L., Schatschneider, C., &
Wagner, R. K. (2014). To wait in Tier 1 or intervene immediately: A randomized experiment
examining first-grade response to intervention in reading. Exceptional Children,81(1), 1127.
Al Otaiba, S., & Fuchs, D. (2002). Characteristics of children who are unresponsive to early liter-
acy intervention: A review of the literature. Remedial and Special Education,23(5), 300316.
Al Otaiba, S., Kim, Y. S., Wanzek, J., Petscher, Y., & Wagner, R. K. (2014). Long-term effects of
first-grade multitier intervention. Journal of Research on Educational Effectiveness,7(3),
250267. doi:10.1080/19345747.2014.906692
Allor, J., & McCathren, R. (2004). The efficacy of an early literacy tutoring program implemented
by college students. Learning Disabilities Research and Practice,19(2), 116129. doi:10.1111/j.
Balu, R., Zhu, P., Doolittle, F., Schiller, E., Jenkins, J., & Gersten, R. (2015). Evaluation of response
to intervention practices for elementary school reading (NCEE 2016-4000). Washington, DC:
National Center for Education Evaluation and Regional Assistance, Institute of Education
Sciences, U.S. Department of Education. Retrieved from
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and power-
ful approach to multiple testing. Journal of the Royal Statistical Society: Series B
(Methodological),57(1), 289300. doi:10.1111/j.2517-6161.1995.tb02031.x
Berninger, V. W., Abbott, R. D., Vermeulen, K., & Fulton, C. M. (2006). Paths to reading com-
prehension in at-risk second-grade readers. Journal of Learning Disabilities,39(4), 334351.
Blachman, B. A., Schatschneider, C., Fletcher, J. M., Francis, D. J., Clonan, S. M., Shaywitz, B. A.,
& Shaywitz, S. E. (2004). Effects of intensive reading remediation for second and third graders
and a 1-year follow-up. Journal of Educational Psychology,96 (3), 444461. doi:10.1037/0022-
Blachman, B. A., Schatschneider, C., Fletcher, J. M., Murray, M. S., Munger, K. A., & Vaughn,
M. G. (2014). Intensive reading remediation in grade 2 or 3: Are there effects a decade later?
Journal of Educational Psychology,106 (1), 4657. doi:10.1037/a0033663
Case, L. P., Speece, D. L., Silverman, R., Ritchey, K. D., Schatschneider, C., Cooper, D. H.,
Jacobs, D. (2010). Validation of a supplemental reading intervention for first-grade children.
Journal of Learning Disabilities,43(5), 402417. doi:10.1177/0022219409355475
Case, L., Speece, D., Silverman, R., Schatschneider, C., Montanaro, E., & Ritchey, K. (2014).
Immediate and long-term effects of tier 2 reading instruction for first-grade students with a
high probability of reading failure. Journal of Research on Educational Effectiveness,7(1),
2853. doi:10.1080/19345747.2013.786771
Center for Research and Reform in Education & Johns Hopkins University. (2019). Evidence for
ESSA: Standards and procedures. Retrieved from
Denton, C. A., Fletcher, J. M., Taylor, W. P., Barth, A. E., & Vaughn, S. (2014). An experimental
evaluation of guided reading and explicit interventions for primary-grade students at-risk for
reading difficulties. Journal of Research on Educational Effectiveness,7(3), 268293. doi:10.1080/
Denton, C. A., Nimon, K., Mathes, P. G., Swanson, E. A., Kethley, C., Kurz, T. B., & Shih, M.
(2010). Effectiveness of a supplemental early reading intervention scaled up in multiple schools.
Exceptional Children,76 (4), 394416. doi:10.1177/001440291007600402
Denton, C. A., Tolar, T. D., Fletcher, J. M., Barth, A. E., Vaughn, S., & Francis, D. J. (2013).
Effects of tier 3 intervention for students with persistent reading difficulties and characteristics
of inadequate responders. Journal of Educational Psychology,105(3), 633648. doi:10.1037/
Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot-based method of testing and
adjusting for publication bias in meta-analysis. Biometrics,56 (2), 455463. doi:10.1111/j.0006-
Elleman, A. M., Lindo, E. J., Morphy, P., & Compton, D. L. (2009). The impact of vocabulary
instruction on passage-level comprehension of school-age children: A meta-analysis. Journal of
Research on Educational Effectiveness,2(1), 144. doi:10.1080/19345740802539200
Every Student Succeeds Act of 2015, Pub. L. No. 11495, § 8101(21)(A), 129 Stat.1939 (2015).
Fang, Z., Fu, D., & Lamme, L. L. (2004). From scripted instruction to teacher empowerment:
Supporting literacy teachers to make pedagogical transitions. Literacy (Formerly Reading),
38(1), 5864. doi:10.1111/j.0034-0472.2004.03801010.x
Fien, H., Smith, J. L. M., Smolkowski, K., Baker, S. K., Nelson, N. J., & Chaparro, E. (2015). An
examination of the efficacy of a multitiered intervention on early reading outcomes for first
grade students at risk for reading difficulties. Journal of Learning Disabilities,48(6), 602621.
Foorman, B., Beyler, N., Borradaile, K., Coyne, M., Denton, C. A., Dimino, J., Wissel, S.
(2016). Foundational skills to support reading for understanding in kindergarten through 3rd
grade (NCEE 2016-4008). Washington, DC: National Center for Education Evaluation and
Regional Assistance (NCEE), Institute of Education Sciences, U.S. Department of Education.
Retrieved from
Foorman, B. R., Herrera, S., & Dombek, J. (2018). The relative impact of aligning Tier 2 interven-
tion materials with classroom core reading materials in grades K2. The Elementary School
Journal,118(3), 477504. doi:10.1086/696021
Francis, D. J., Kulesz, P. A., & Benoit, J. S. (2018). Extending the simple view of reading to
account for variation within readers and across texts: The complete view of reading (CVR i).
Remedial and Special Education,39(5), 274288. doi:10.1177/0741932518772904
Fuchs, D., Compton, D. L., Fuchs, L. S., Bryant, J., & Davis, G. N. (2008). Making secondary
interventionwork in a three-tier responsiveness-to-intervention model: Findings from the
first-grade longitudinal reading study of the National Research Center on Learning Disabilities.
Reading and Writing,21(4), 413436. doi:10.1007/s11145-007-9083-9
Fuchs, D., & Fuchs, L. S. (2017). Critique of the national evaluation of response to intervention:
A case for simpler frameworks. Exceptional Children,83(3), 255268. doi:10.1177/
Fuchs, L. S., Fuchs, D., Powell, S. R., Seethaler, P. M., Cirino, P. T., & Fletcher, J. M. (2008).
Intensive intervention for students with math disabilities: Seven principles of effective practice.
Learning Disability Quarterly,31(2), 7992. doi:10.2307/20528819
Gamse, B. C., Jacob, R. T., Horst, M., Boulay, B., & Unlu, F. (2008). Reading first impact study
final report (NCEE 2009-4038). Washington, DC: National Center for Education Evaluation
and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
Gersten, R., Chard, D., Jayanthi, M., Baker, S., Morphy, P., & Flojo, J. (2009). Mathematics
instruction for students with learning disabilities: A meta-analysis of instructional components.
Review of Educational Research,79(3), 12021242. doi:10.3102/0034654309334431
Gersten, R., Compton, D., Connor, C. M., Dimino, J., Santoro, L., Linan-Thompson, S., & Tilly,
W. D. (2009). Assisting students struggling with reading: Response to Intervention and multi-tier
intervention for reading in the primary grades. A practice guide (NCEE 2009-4045).
Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute
of Education Sciences, U.S. Department of Education. Retrieved from
Gersten, R., Jayanthi, M., & Dimino, J. (2017). Too much, too soon? A commentary on what the
national RtI evaluation left unanswered and what reading intervention research tells us.
Exceptional Children,83(3), 244254. doi:10.1177/0014402917692847
Greco, T., Zangrillo, A., Biondi-Zoccai, G., & Landoni, G. (2013). Meta-analysis: Pitfalls and
hints. Heart, Lung and Vessels,5(4), 219225.
Gunn, B., Smolkowski, K., Biglan, A., Black, C., & Blair, J. (2005). Fostering the development of
reading skill through supplemental instruction results for Hispanic and non-Hispanic students.
The Journal of Special Education,39(2), 6685. doi:10.1177/00224669050390020301
Hedberg, E. C. (2011). ROBUMETA: Stata module to perform robust variance estimation in meta-
regression with dependent effect size estimates. Boston, MA: Boston College.
Hedges, L. V. (1981). Distribution theory for Glasss estimator of effect size and related estima-
tors. Journal of Educational Statistics,6(2), 107128. doi:10.3102/10769986006002107
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regres-
sion with dependent effect size estimates. Research Synthesis Methods,1(1), 3965. doi:10.1002/
Imbens, G. W., & Lemieux, T. (2008). Regression discontinuity designs: A guide to practice.
Journal of Econometrics,142(2), 615635. doi:10.1016/j.jeconom.2007.05.001
Individuals with Disabilities Education Act, Pub. L. No. 108-446, 20 U.S.C. § 1400, 118 Stat. 2649 (2004).
Ioannidis, J. (2005). Why most published research findings are false. PLoS Medicine,2(8), e124.
Jacob, R., Armstrong, C., Bowden, A. B., & Pan, Y. (2016). Leveraging volunteers: An experimen-
tal evaluation of a tutoring program for struggling readers. Journal of Research on Educational
Effectiveness,9(Supp 1), 6792. doi:10.1080/19345747.2016.1138560
Jenkins, J. R., Peyton, J. A., Sanders, E. A., & Vadasy, P. F. (2004). Effects of reading decodable
texts in supplemental first-grade tutoring. Scientific Studies of Reading,8(1), 5385. doi:10.
Lane, H. B., Pullen, P. C., Hudson, R. F., & Konold, T. R. (2009). Identifying essential instruc-
tional components of literacy tutoring for struggling beginning readers. Literacy Research and
Instruction,48(4), 277297. doi:10.1080/19388070902875173
opez, J. A., Van den Noortgate, W., Tanner-Smith, E. E., Wilson, S. J., & Lipsey, M. W.
(2017). Assessing meta-regression methods for examining moderator relationships with
dependent effect sizes: A Monte Carlo simulation. Research Synthesis Methods,8(4), 435450.
May, H., Sirinides, P., Gray, A., & Goldsworthy, H. (2016). Reading recovery: An evaluation of the
four-year i3 scale-up. Philadelphia, PA: Consortium for Policy Research in Education,
University of Pennsylvania.
National Institute of Child Health and Human Development [NICHD]. (2000). Report of the
National Reading Panel. Teaching children to read: Reports of the subgroups (NIH Publication
No. 00-4754). Washington, DC: U.S. Department of Health and Human Services. Retrieved
No Child Left Behind Act of 2001 [NCLB], Pub. L. No. 107-110, § 1201, 115 Stat. 1425 (2002).
OConnor, R. E., Swanson, H. L., & Geraghty, C. (2010). Improvement in reading rate under
independent and difficult text levels: Influences on word and comprehension skills. Journal of
Educational Psychology,102(1), 119. doi:10.1037/a0017488
Pullen, P. C., Lane, H. B., & Monaghan, M. C. (2004). Effects of a volunteer tutoring model on
the early literacy development of struggling first grade students. Reading Research and
Instruction,43(4), 2140. doi:10.1080/19388070409558415
Scanlon, D. M., Vellutino, F. R., Small, S. G., Fanuele, D. P., & Sweeney, J. M. (2005). Severe
reading difficultiesCan they be prevented? A comparison of prevention and intervention
approaches. Exceptionality,13(4), 209227. doi:10.1207/s15327035ex1304_3
Schwartz, R. M. (2005). Literacy learning of at-risk first-grade students in the reading recovery
early intervention. Journal of Educational Psychology,97(2), 257267. doi:10.1037/0022-0663.97.
Slavin, R. E., Lake, C., Davis, S., & Madden, N. A. (2011). Effective programs for struggling read-
ers: A best-evidence synthesis. Educational Research Review,6(1), 126. doi:10.1016/j.edurev.
Smith, J. L. M., Nelson, N. J., Fien, H., Smolkowski, K., Kosty, D., & Baker, S. K. (2016).
Examining the efficacy of a multitiered intervention for at-risk readers in grade 1. The
Elementary School Journal,116 (4), 549573. doi:10.1086/686249
StataCorp. (2015). Stata statistical software (Release 14). College Station, TX: StataCorp LP.
Stuebing, K. K., Barth, A. E., Trahan, L. H., Reddy, R. R., Miciak, J., & Fletcher, J. M. (2015). Are
child cognitive characteristics strong predictors of responses to intervention? A meta-analysis.
Review of Educational Research,85(3), 395429. doi:10.3102/0034654314555996
Swanson, H. L. (1999). Reading research for students with LD: A meta-analysis of intervention
outcomes. Journal of Learning Disabilities,32(6), 504532. doi:10.1177/002221949903200605
Tanner-Smith, E. E., & Tipton, E. (2014). Robust variance estimation with dependent effect sizes:
Practical considerations including a software tutorial in Stata and SPSS. Research Synthesis
Methods,5(1), 1330. doi:10.1002/jrsm.1091
Tipton, E. (2015). Small sample adjustments for robust variance estimation with meta-regression.
Psychological Methods,20(3), 375393. doi:10.1037/met0000011
Tivnan, T., & Hemphill, L. (2005). Comparing four literacy reform models in high-poverty
schools: Patterns of first-grade achievement. The Elementary School Journal,105(5), 419441.
Tran, L., Sanchez, T., Arellano, B., & Swanson, H. L. (2011). A meta-analysis of the RTI literature
for children at risk for reading disabilities. Journal of Learning Disabilities,44(3), 283295. doi:
U.S. Department of Education [U.S. ED], Institute of Education Sciences [IES], & What Works
Clearinghouse [WWC]. (2013). What Works Clearinghouse: Procedures and standards handbook
(Version 3.0). Retrieved from
Vadasy, P. F., & Sanders, E. A. (2011). Efficacy of supplemental phonics-based instruction for
low-skilled first graders: How language minority status and pretest characteristics moderate
treatment response. Scientific Studies of Reading,15(6), 471497. doi:10.1080/10888438.2010.
Vadasy, P. F., Sanders, E. A., & Peyton, J. A. (2006). Paraeducator-supplemented instruction in
structural analysis with text reading practice for second and third graders at risk for reading
problems. Remedial and Special Education,27(6), 365378. doi:10.1177/07419325060270060601
Vadasy, P. F., Sanders, E. A., & Tudor, S. (2007). Effectiveness of paraeducator-supplemented
individual instruction: Beyond basic decoding skills. Journal of Learning Disabilities,40(6),
508525. doi:10.1177/00222194070400060301
Vaughn, S., Cirino, P. T., Tolar, T., Fletcher, J. M., Cardenas-Hagan, E., Carlson, C. D., &
Francis, D. J. (2008). Long-term follow-up of Spanish and English interventions for first-grade
English language learners at risk for reading problems. Journal of Research on Educational
Effectiveness,1(3), 179214. doi:10.1080/19345740802114749
Vellutino, F. R., & Scanlon, D. M. (2002). The Interactive Strategies approach to reading interven-
tion. Contemporary Educational Psychology,27(4), 573635. doi:10.1016/S0361-476X(02)00002-4
Wang, C., & Algozzine, B. (2008). Effects of targeted intervention on early literacy skills of at-risk
students. Journal of Research in Childhood Education,22(4), 425439. doi:10.1080/
Wanzek, J., Stevens, E. A., Williams, K. J., Scammacca, N., Vaughn, S., & Sargent, K. (2018).
Current evidence on the effects of intensive early reading interventions. Journal of Learning
Disabilities,51(6), 612624. doi:10.1177/0022219418775110
Wanzek, J., & Vaughn, S. (2007). Research-based implications from extensive early reading inter-
ventions. School Psychology Review,36 (4), 541561.
Wanzek, J., & Vaughn, S. (2008). Response to varying amounts of time in reading intervention
for students with low response to intervention. Journal of Learning Disabilities,41(2), 126142.
Wanzek, J., Vaughn, S., Scammacca, N., Gatlin, B., Walker, M. A., & Capin, P. (2016). Meta-anal-
yses of the effects of tier 2 type reading interventions in grades K-3. Educational Psychology
Review,28(3), 551576. doi:10.1007/s10648-015-9321-7
Wilson, P., Martens, P., & Arya, P. (2005). Accountability for reading and readers: What the
numbers dont tell. The Reading Teacher,58(7), 622631. doi:10.1598/RT.58.7.3
... Fluency is linked with reading comprehension and can be defined as the ability to read a text quickly, accurately and expressively (Gersten et al., 2020). Fluent readers can identify words accurately and automatically, allowing them to focus their attention on reading comprehension and in the connections between the ideas presented in a text and their prior knowledge (National Institute of Child Health and Human Development, 2000). ...
... Less fluent readers, on the other hand, must focus their attention on word recognition, resulting in less attention devoted to reading comprehension. As a result, while the ability to read words accurately is required in the process of learning to read, the fluency with which this process is carried out is critical for children's reading comprehension (Gersten et al., 2020). Rasinski et al. (2011) suggest that fluent readers tend to have more positive attitudes toward reading, as well as a more positive perception of themselves as readers. ...
... Recent meta-analyses show that a plethora of variables influence the efficacy of the strategies and interventions used for reading fluency and accuracy (Gersten et al., 2020;Kim et al., 2020;Zimmermann et al., 2021). These include the duration of the intervention, the number of sessions, the session length, and the size of the group. ...
Full-text available
The global COVID-19 pandemic disrupted face-to-face teaching, having a significant impact on the teaching-learning process. As a result, many students spent less time reading (and learning to read) than they did during face-to-face instruction, requiring the use of alternative approaches of instruction. A combined online and peer tutoring intervention was designed to improve reading skills such as fluency and accuracy. Following a quasi-experimental design, this study sought to evaluated the impact of implementing an online peer tutoring intervention on the development of reading fluency and accuracy in a sample of 91 2nd and 4th graders (49.6% female). Children were aged 6–10 years old (M = 7.81, SD = 1.10) and were enrolled in five classrooms (A, B, C, D, and E) from three schools in the Portuguese district of Porto, between January and May 2021. A set of 10 texts were chosen from official textbooks to assess reading fluency and accuracy. Classes were evaluated in three moments: initial (pre-intervention), intermediate (after 10 sessions) and final (post-test, after other 10 sessions). In order to examine the effects of the intervention, there was a 8-week lag between the start of the intervention in classes A, B, and C (experimental group) and classes D and E (control group). Moreover, classes D and E started intervention with a gap of 5 weeks between them. Students in the experimental group registered significant higher improvements in reading accuracy and fluency than in the control group. Interaction effects revealed that students with an initial lower performance (i.e., at the frustration level) showed higher increases in reading accuracy. Furthermore, 2nd graders showed higher increases throughout the intervention while the 4th graders stablished their progress after the first 10 sessions of intervention. Despite the study’s limitations, the findings support the positive impact that online peer tutoring can have on promoting students’ reading skills, adding to the ongoing discussion—which has gained a special emphasis with the COVID-19 pandemic—about the development of effective strategies to promote reading abilities in the first years of school.
... Although individual intervention studies and descriptive syntheses have furnished information about what works to improve outcomes for young students with dyslexia, there is a need for meta-analytic research examining the effects of reading instruction on reading outcomes for this population specifically. Previous metaanalyses (e.g., Gersten et al., 2020;Neitzel et al., 2022;Slavin et al., 2011;Suggate, 2010;Swanson et al., 1999;Wanzek et al., 2016Wanzek et al., , 2018 report robust findings about effective early reading instruction for elementary-grade students with or at risk for RDs broadly defined (i.e., defined as including a wide range of reading and language difficulties). Specifically, these systematic reviews reveal that multicomponent reading interventions that provide explicit, systematic instruction in foundational skills (going forward, we use the term foundational skills to refer to phonological awareness [PA], phonics knowledge, word reading, spelling, and connected-text reading) and simultaneously focus on meaning (i.e., both word meanings and comprehension of connected text) are associated with significant positive effects. ...
... Specifically, these systematic reviews reveal that multicomponent reading interventions that provide explicit, systematic instruction in foundational skills (going forward, we use the term foundational skills to refer to phonological awareness [PA], phonics knowledge, word reading, spelling, and connected-text reading) and simultaneously focus on meaning (i.e., both word meanings and comprehension of connected text) are associated with significant positive effects. Three of the most recent among these meta-analyses, all of which employed stringent standards for inclusion and sophisticated meta-analytic methods, reported mean intervention effects of 0.23 (Neitzel et al., 2022) and 0.39 (Gersten et al., 2020;Wanzek et al., 2018). ...
... During the last two decades, seven meta-analyses (Donegan & Wanzek, 2021;Gersten et al., 2020;Neitzel et al., 2022;Slavin et al., 2011;Suggate, 2010;Wanzek et al., 2016Wanzek et al., , 2018; see Table 1) have investigated the immediate effects of reading instruction on word reading outcomes for elementary-grade students with or at risk for RDs. One additional research meta-analysis (Galuschka et al., 2020) investigated the immediate effects of spelling instruction on spelling outcomes for students with reading or spelling difficulties. ...
This meta‐analysis included experimental or quasi‐experimental intervention studies conducted between 1980 and 2020 that aimed to improve reading outcomes for Grade K‐5 students with or at risk for dyslexia (i.e., students with or at risk for word reading difficulties, defined as scoring at or below norm‐referenced screening or mean baseline performance thresholds articulated in our inclusion criteria). In all, 53 studies reported in 52 publications met inclusion criteria (m = 351; total student N = 6,053). We employed robust variance estimation to address dependent effect sizes arising from multiple outcomes and comparisons within studies. Results indicated a statistically significant main effect of instruction on norm‐referenced reading outcomes (g = 0.33; p < .001). Because there was significant heterogeneity in effect sizes across studies (p < .01), we used meta‐regression to identify the degree to which student characteristics (i.e., grade level), intervention characteristics (i.e., dosage, instructional components, multisensory nature, instructional group size), reading outcome domain (i.e., phonological awareness, word reading/spelling, passage reading, or reading comprehension), or research methods (i.e., sample size, study design) influenced intervention effects. Dosage and reading outcome domain were the only variables that significantly moderated intervention effects (p = .040 and p = .024, respectively), with higher dosage studies associated with larger effects (b = 0.002) and reading comprehension outcomes associated with smaller effects than word reading/spelling outcomes (b = −0.080). This meta‐analysis included experimental or quasi‐experimental intervention studies conducted between 1980 and 2020 that aimed to improve reading outcomes for Grade K‐5 students with or at risk for dyslexia (i.e., students with or at risk for word reading difficulties, defined as scoring at or below norm‐referenced screening or mean baseline performance thresholds articulated in our inclusion criteria).
... It can be difficult to discern whether an EL student's WLRD stem primarily from (a) still-developing English language skills, (b) lack of exposure to instruction, or (c) underlying weaknesses in phonological processing and other foundational skills that support decoding and encoding. When educators attribute ELs' WLRD to gaps in language knowledge alone, they may fail to implement reading interventions that seek to improve foundational skills or knowledge (e.g., phonemic awareness, grapheme-phoneme correspondences) that research has shown to be critical for reading success (e.g., Gersten et al., 2020;Wanzek et al., 2016Wanzek et al., , 2018. Better understanding the effectiveness of early literacy instruction of students who are both EL and have WLRD is important to inform instruction for this population of students. ...
... A considerable research base has identified features of effective instructional interventions for EM students with RD (e.g., Wanzek et al., 2016Wanzek et al., , 2018. Gersten et al. (2020) systematically synthesized 33 rigorous experimental and quasi-experimental reading intervention studies with students with RD in Grades 1-3. The authors reported a mean effect of reading interventions on combined reading outcomes of .39 ...
... Findings from the current study are similar to those that only included EMs-larger effects are generally found for students with RD compared to typically developing readers. Two recent meta-analyses of research examining the effects of extensive reading interventions on reading outcomes for K-3 EMs with RD (Wanzek et al., 2018) and at-risk EM readers in Grades 1-3 (Gersten et al., 2020) found statically significant weighted mean ES on combined reading outcomes of g = 0.39. The larger ES found by Gersten et al. (2020) and Wanzek et al. (2018) may be partly attributable to the narrower grade band in those studies (i.e., K-3) than in the current synthesis (i.e., K-5), as prior work with EMs (Ehri et al., 2007;Suggate, 2010;Wanzek & Vaughn, 2007) and ELs (Ludwig et al., 2019) suggests interventions serving younger readers may have larger effects than those that serve older students. ...
Full-text available
This study meta‐analyzed the last four decades (1980–2020) of reading intervention research focused on improving reading outcomes for English language (EL) students in Grades K–5 with or at risk for word reading difficulties. Experimental and quasi‐experimental group design and single‐case experimental design (SCED) studies were included; 10 group design and 7 SCED studies met inclusion criteria (m = 61; total student N = 2,270). Visual inspection of the effect size distribution revealed that the assumption of between‐study heterogeneity was not supported; therefore, the findings were synthesized for SCED studies separately from those reported in group design studies. Implications for practice, policy, and future research are discussed.
... Despite the ample studies on various reading interventions and their efficacy (e.g., Galuschka et al., , 2020Gersten et al., 2020;Neitzel et al., 2021), deciding on the best suited reading intervention or a combination of reading interventions for each grade becomes complicated. For example, according to the meta-analysis of , phonics instruction was the most thoroughly investigated intervention and the only one that had a significant effect on reading and spelling performance of individuals with reading disabilities. ...
Reading skills are among the most important basic skills in society. However, not all readers are able to adequately understand texts or decode individual words. Findings from the Progress in International Reading Literacy Study (PIRLS; German: IGLU) show that about one fifth of fourth graders can only establish coherence at the local level, and in some cases they only have a rudimentary understanding of the text they read (Bremerich-Vos et al., 2017). In addition, these reading deficits persist and have a negative impact on academic and professional success (Jimerson, 1999). Therefore, identifying the causes of these deficits and creating opportunities for interventions at an early stage is an important research objective. The aim of this dissertation was to examine the relationship between the aspects of reading fluency and their influence on reading comprehension. Despite the increasing scientific interest in reading fluency in recent years, a research gap still exists in the relationship between word recognition accuracy and both speed and the relevance of prosodic patterns for reading comprehension. Study 1 investigated whether German fourth graders (N = 826) were required to reach a certain word-recognition accuracy threshold before their word-recognition speed improved. In addition, a sub-sample (n = 170) with a pre-/posttest design was examined to assess the extent that the existing word-recognition accuracy can influence the effects of a syllable-based reading intervention on word-recognition accuracy and word-recognition speed. Results showed that word-recognition speed improved after children achieved a word-recognition accuracy of 71%. A positive intervention effect was also found on word-recognition accuracy for children who were below the 71% threshold before the intervention, whereas the intervention effect on word-recognition speed was positive for all children. However, a positive effect on reading comprehension was only found for children who were above the 71% threshold before the intervention. Study 2 investigated the relationship between word-recognition accuracy threshold and word-recognition speed shown in the first study in a longitudinal design with German students (N = 1,095). Word-recognition accuracy and speed were assessed from the end of Grade 1 to 4, whereas reading comprehension was assessed from the end of Grade 2 to 4. The results showed that the developmental trajectories of word recognition speed and reading comprehension were steeper in children who reached the word-recognition accuracy threshold by the end of the first grade than in children who later reached or had not reached this threshold. In Study 3, recurrence analysis (RQA) was used to extract prosodic patterns from reading recordings of struggling and skilled readers in the second (n = 67) and fourth grade (n = 69) and was used for the classification into struggling and skilled readers. In addition, the classification based on the prosodic patterns from the recurrence quantification analysis was compared with the classification of prosodic features from the manual transcription of the reading recordings. The results showed that second-grade struggling readers have lengthier pauses within or between words and take more time between pauses on average, whereas fourth-grade struggling readers spend more time between recurring stresses and have multiple diverse patterns in pitch and more recurring accents. Although the recurrence analysis had a good goodness of fit and provided additional information about the relationship of prosody with reading comprehension, the model using prosodic features from transcription had a better fit. In summary, the three studies in this dissertation provide four important insights into reading fluency in German. First, a threshold in word-recognition accuracy must be achieved before word-recognition speed improves. Second, the earlier this accuracy level is reached, the greater the gain in word-recognition speed and reading comprehension. Third, the intervention effects of a primary school reading intervention are influenced by the accuracy level. Fourth, although incorrect pauses within or between words play an important role in identifying and describing struggling readers in second grade, the importance of prosodic patterns increases in fourth grade.
... In contrast to our hypothesis, the overall effect of reading interventions for children with ADHD was generally greater than the overall effect found for reading interventions in studies that did not specifically recruit for children with ADHD (e.g., g = 0.21-0.95; Gersten et al., 2020;Hall & Burns, 2018;Scammacca et al., 2007Scammacca et al., , 2015Swanson, 1999), and instead more comparable to the large magnitude benefits of reading interventions previously reported in meta-analyses of children with behavioral/ emotional difficulties more generally (g = 0.90-1.02; Benner et al., 2010;Roberts et al., 2020). ...
Objective Utilizing a multi-level meta-analytic approach, this review is the first to systematically quantify the efficacy of reading interventions for school-aged children with ADHD and identify potential factors that may increase the success of reading-related interventions for these children. Method 18 studies (15 peer-reviewed articles, 3 dissertations) published from 1986 to 2020 ( N = 564) were meta-analyzed. Results Findings revealed reading interventions are highly effective for improving reading skills based on both study-developed/curriculum-based measures ( g = 1.91) and standardized/norm-referenced achievement tests ( g = 1.11) in high-quality studies of children with rigorously-diagnosed ADHD. Reading interventions that include at least 30 hours of intervention targeting decoding/phonemic awareness meet all benchmarks to be considered a Level 1 (Well-Established) Evidence-Based Practice with Strong Research Support for children with ADHD based on clinical and special education criteria. Conclusions Our findings collectively indicate that reading interventions should be the first-line treatment for reading difficulties among at-risk readers with ADHD.
... Des études corrélationnelles faites à partir des comparaisons internationales -PISA, TIMMS 17 - (Cairns et Areepattamannil, 2017;Chi et al., 2018;Grabau et Ma, 2017) observent une corrélation négative entre l'usage de la pédagogie de l'investigation et le rendement 6.0 Les recadrages constructivistes de la PADC en pédagogie générale en sciences 18 . Finalement, des recherches et métaanalyses en pédagogie générale présentent des données probantes qui induisent à favoriser une enseignement direct et explicite (Chall, 2000 ;Gersten et al., 2009 ;Gersten et al., 2020 ;Kaldenberg, Watt et Therrien, 2015 ;Rosenshine, 2009 ;Stevens, Rodgers et Powell, 2018 ;Stockard et al., 2018 ;Watkins, 1997) ou, à tout au moins, à limiter grandement l' emploi des pédagogies constructivistes. ...
Full-text available
Les progrès de la médecine au cours des deux derniers siècles découlent essentiellement de son ancrage de plus en plus accentué dans la recherche scientifique qui se traduit par la révérence de la rigueur, de la rationalité, de la méthode scientifique et des données probantes. Au début du XXe siècle, le rapport Flexner (1910) est venu accélérer cet élan décisif. Au coeur de cette histoire, il y a le contenu des apprentissages, mais il y a aussi le cadre pédagogique de la formation visant l’apprentissage de ce contenu. Au XXIe siècle, le contenu de l’apprentissage de la formation médicale est clairement du côté des données probantes et de la rigueur. Mais qu’en est-il du cadre pédagogique de la formation médicale, et, dans le cas qui nous intéresse, de la résidence en médecine ? En 2009, l’Université de Toronto expérimente, à la résidence en médecine, une approche pédagogique développée par le Collège royal des médecins et chirurgiens du Canada (CRMCC), baptisée la compétence par conception (CPC). La CPC s’inscrit directement dans la lignée de la pédagogie axée sur le développement des compétences (PADC), un courant pédagogique dont la naissance aux États-Unis découlerait du lancement du Spoutnik soviétique en 1957… On retrouve aujourd’hui le courant de la PADC en pédagogie générale (primaire, secondaire, postsecondaire), en pédagogie vocationnelle et en pédagogie médicale. Sur quel cadre théorique de la PADC du CRMCC (CPC) repose-t-il ? Ce cadre a-t-il été scientifiquement validé, en tout ou en partie ? Le présent texte soutient que la PADC du CRMCC n’a pas suivi un processus rigoureux exemplaire et qu’elle n’est pas appuyée sur les données probantes de la recherche scientifique, ni en pédagogie générale ni en pédagogie médicale. Plus précisément, nous soutenons que le monde médical canadien et québécois, en adoptant la PADC du CRMCC, s’est éloigné des données probantes, de la rigueur nécessaire et de la prudence consciencieuse dont il fait habituellement preuve à l’égard des innovations médicales. Chemin faisant dans le déploiement de ce texte, nous examinons également succinctement l’efficacité de méthodes pédagogiques parfois co-occurrentes de la PADC, mais pouvant aussi proposer une solution de rechange, au moins partielle, à la résidence traditionnelle et à la PADC appliquée à la résidence (ex. : la simulation assistée par la technologie, les patients simulateurs standardisés, la pratique délibérée, la maîtrise des apprentissages et la progression basée sur les compétences).
... 32 Cairns et Areepattamannil, 2017Chi et al., 2018 ;Grabau et Ma, 2017. 33 Chall, 2000Gersten et al., 2009 ;Gersten et al., 2020 ;Kaldenberg, Watt et Therrien, 2015 ;Rosenshine, 2009 ;Stevens, Rodgers et Powell, 2018 ;Stockard et al., 2018 ;Watkins, 1997. ISBN 978-2-923805-69-6 Version abrégée de : . ...
Full-text available
Les progrès de la médecine au cours des deux derniers siècles découlent essentiellement de son ancrage de plus en plus accentué dans la recherche scientifique qui se traduit par la révérence de la rigueur, de la rationalité, de la méthode scientifique et des données probantes. Au cœur de cette histoire, il y a le contenu des apprentissages, mais il y a aussi le cadre pédagogique de la formation visant l’apprentissage de ce contenu. Au XXIe siècle, le contenu de l’apprentissage de la formation médicale est clairement du côté des données probantes et de la rigueur. Mais qu’en est-il du cadre pédagogique de la formation médicale, et, dans le cas qui nous intéresse, de la résidence en médecine ? En 2009, l’Université de Toronto expérimente, à la résidence en médecine, une approche pédagogique développée par le Collège royal des médecins et chirurgiens du Canada (CRMCC), baptisée la compétence par conception (CPC). La CPC s’inscrit directement dans la lignée de la pédagogie axée sur le développement des compétences (PADC). On retrouve aujourd’hui ce courant en pédagogie générale (primaire, secondaire, post-secondaire), en pédagogie vocationnelle et en pédagogie médicale. Sur quel cadre théorique de la PADC du CRMCC (CPC) repose-t-il ? Ce cadre a-t-il été scientifiquement validé, en tout ou en partie ? Le présent texte soutient que le monde médical canadien et québécois, en adoptant la PADC du CRMCC, s’est éloigné des données probantes, de la rigueur nécessaire et de la prudence consciencieuse dont il fait habituellement preuve à l’égard des innovations médicales.
... Electronic searches were carried out through educational databases (e.g., ERIC, EBSCO, JSTOR, Psych INFO, ScienceDirect, Scopus, Dissertation Abstracts, ProQuest, WorldCat, CNKI), web-based repositories (e.g., Google, Google Scholar), and gray literature databases (e.g., OpenGrey, OpenDOAR). The key words for the search included 'formative assessment, ' 'formative evaluation, ' 'feedback, ' 'assessment for learning, ' 'assessment as learning, ' 'curriculum-based assessment, ' 'differentiated instruction, ' 'portfolio assessment, ' 'performance assessment, ' 'process assessment, ' 'progress monitoring, ' 'response to intervention' (Gersten et al., 2020), as well as the subset forms under the formative assessment umbrella suggested by Klute et al. (2017) (e.g., self-monitoring, self-assessment, self-direct, peer assessment). (3) Relevant contextualized assessments. ...
Full-text available
This quantitative synthesis included 48 qualified studies with a total sample of 116,051 K-12 students. Aligned with previous meta-analyses, the findings suggested that formative assessment generally had a positive though modest effect (ES = + 0.19) on students’ reading achievement. Meta-regression results revealed that: (a) studies with 250 or less students yielded significantly larger effect size than large sample studies, (b) the effects of formative assessment embedded with differentiated instruction equated to an increase of 0.13 SD in the reading achievement score, (c) integration of teacher and student directed assessment was more effective than assessments initiated by teachers. Our subgroup analysis data indicated that the effect sizes of formative assessment intervention on reading were significantly different between Confucian-heritage culture and Anglophone culture and had divergent effective features. The result cautions against the generalization of formative assessment across different cultures without adaptation. We suggest that effect sizes could be calculated and intervention features be investigated in various cultural settings for practitioners and policymakers to implement tailored formative assessment.
This convergent parallel mixed-methods pilot study explored the collaboration of preservice teachers (PSTs) in a university reading clinic. PSTs from a reading course and special education course were paired and shared responsibility for tutoring one child. Tutor surveys and focus group interview transcripts were used as data sources. Topics addressed by tutors related to benefits and barriers of collaboration, including the influence of collaborative relationships on their personal growth and on the growth of their tutee, strategies for establishing relationships and trust, and the ways they perceive collaboration as practice for future teaching. This study has implications for how teacher preparation program faculty prepare teachers to work alongside colleagues of various disciplines in future school settings.
One challenge in understanding “what works” in education is that effect sizes may not be comparable across studies, raising questions for practitioners and policymakers using research to select interventions. One factor that consistently relates to the magnitude of effect sizes is the type of outcome measure. This article uses study data from the What Works Clearinghouse to determine average effect sizes by outcome measure type. Outcome measures were categorized by whether the group who developed the measure potentially had a stake in the intervention (non-independent) or not (independent). Using meta-analysis and controlling for study quality and intervention characteristics, we find larger average effect sizes for non-independent measures than for independent measures. Results suggest that larger effect sizes for non-independent measures are not due to differences in implementation fidelity, study quality, or intervention or sample characteristics. Instead, non-independent and independent measures appear to represent partially but minimally overlapping latent constructs. Findings call into question whether policymakers and practitioners should make decisions based on non-independent measures when they are ultimately responsible for improving outcomes on independent measures.
Full-text available
The report of the national response to intervention (RTI) evaluation study, conducted during 2011–2012, was released in November 2015. Anyone who has read the lengthy report can attest to its complexity and the design used in the study. Both these factors can influence the interpretation of the results from this evaluation. In this commentary, we (a) explain what the national RTI evaluation examined and highlight the strengths and weaknesses of the design, (b) clarify the results of the evaluation and highlight some key implementation issues, (c) describe how rigorous efficacy trials on reading interventions can supplement several issues left unanswered by the national evaluation, and (d) discuss implications for future research and practice based on the findings of the national evaluation and reading intervention research.
Full-text available
CPRE released its evaluation of one of the most ambitious and well-documented expansions of a U.S. instructional curriculum. The rigorous independent evaluation of the Investing in Innovation (i3) scale-up of Reading Recovery, a literacy intervention for struggling first graders, was a collaboration between CPRE and the Center for Research on Education and Social Policy (CRESP) at the University of Delaware. The CPRE/CRESP evaluation revealed that students who participated in Reading Recovery significantly outperformed students in the control group on measures of overall reading, reading comprehension, and decoding. These effects were similarly large for English language learners and students attending rural schools, which were the student subgroups of priority interest for the i3 scale-up grant program. The study included an in-depth analysis of program implementation. Key findings focus on the contextual factors of the school and teachers that support the program’s success and the components of instructional strength in Reading Recovery.
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses — the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
This study leverages advances in multivariate cross-classified random effects models to extend the Simple View of Reading to account for variation within readers and across texts, allowing for both the personalization of the reading function and the integration of the component skills and text and discourse frameworks for reading research. We illustrate the Complete View of Reading (CVRi) using data from an intensive longitudinal design study with a large sample of typical (N = 648) and struggling readers (N = 865) in middle school and using oral reading fluency as a proxy for comprehension. To illustrate the utility of the CVRi, we present a model with cross-classified random intercepts for students and passages and random slopes for growth, Lexile difficulty, and expository text type at the student level. We highlight differences between typical and struggling readers and differences across students in different grades. The model illustrates that readers develop differently and approach the reading task differently, showing differential impact of text features on their fluency. To be complete, a model of reading must be able to reflect this heterogeneity at the person and passage level, and the CVRi is a step in that direction. Implications for reading interventions and 21st century reading research in the era of “Big Data” and interest in phenotypic characterization are discussed.
Many students at risk for or identified with reading disabilities need intensive reading interventions. This meta-analysis provides an update to the Wanzek and Vaughn synthesis on intensive early reading interventions. Effects from 25 reading intervention studies are analyzed to examine the overall effect of intensive early reading interventions as well as relationships between intervention and student characteristics related to outcomes. The weighted mean effect size (ES) estimate (ES = 0.39), with a mean effect size adjusted for publication bias (ES = 0.28), both significantly different from zero, suggested intensive early reading interventions resulted in positive outcomes for early struggling readers in kindergarten through third grades. There was no statistically significant or meaningful heterogeneity in the study-wise effect sizes. Exploratory examination of time in intervention, instructional group size, initial reading achievement, and date of publication are provided.
This randomized controlled trial in 55 low-performing schools across Florida compared 2 early literacy interventions—1 using stand-alone materials and 1 using materials embedded in the existing core reading/language arts program. A total of 3,447 students who were below the 30th percentile in vocabulary and reading-related skills participated in the study. Both interventions were implemented with fidelity for 45 minutes daily for 27 weeks in small groups of 4 students (or 5 in grade 2). The stand-alone intervention significantly improved grade 2 spelling outcomes relative to the embedded intervention; there were some differential impacts due to cohort and baseline and, in kindergarten, to English-learner status. On average, students in schools in both interventions showed similar improvement in reading and language outcomes and similar percentile gains to those in recent systematic reviews. Results are discussed with respect to alignment of Tier 2 instruction with Tier 1 instruction.
In 2010, the Institute of Education Sciences commissioned a much-needed national evaluation of response to intervention (RTI). The evaluators defined their task very narrowly, asking “Does the use of universal screening, including a cut-point for designating students for more intensive Tier 2 and Tier 3 interventions, increase children’s performance on a comprehensive reading measure?” Their regression-discontinuity analysis showed that first-grade children designated for (but not necessarily receiving) more intensive intervention in the 146 study schools performed significantly worse than children not designated for it. There were no reliable differences between designated and nondesignated students in Grades 2 or 3. The provocativeness of these findings notwithstanding, the evaluation’s focus and design weakens its importance. RTI implementation data were also collected in the 146 study schools. These data suggest many of them were not conducting RTI in a manner supported by research and policy. Such findings and others’ evaluations of RTI advance the idea that simpler frameworks may encourage more educators to implement RTI’s most important components with fidelity.
Dependent effect sizes are ubiquitous in meta-analysis. Using Monte Carlo simulation, we compared the performance of 2 methods for meta-regression with dependent effect sizes-robust variance estimation (RVE) and 3-level modeling-with the standard meta-analytic method for independent effect sizes. We further compared bias-reduced linearization and jackknife estimators as small-sample adjustments for RVE and Wald-type and likelihood ratio tests for 3-level models. The bias in the slope estimates, width of the confidence intervals around those estimates, and empirical type I error and statistical power rates of the hypothesis tests from these different methods were compared for mixed-effects meta-regression analysis with one moderator either at the study or at the effect size level. All methods yielded nearly unbiased slope estimates under most scenarios, but as expected, the standard method ignoring dependency provided inflated type I error rates when testing the significance of the moderators. Robust variance estimation methods yielded not only the best results in terms of type I error rate but also the widest confidence intervals and the lowest power rates, especially when using the jackknife adjustments. Three-level models showed a promising performance with a moderate to large number of studies, especially with the likelihood ratio test, and yielded narrower confidence intervals around the slope and higher power rates than those obtained with the RVE approach. All methods performed better when the moderator was at the effect size level, the number of studies was moderate to large, and the between-studies variance was small. Our results can help meta-analysts deal with dependency in their data.
This study reports the results of a cluster RCT evaluating the impact of Enhanced Core Reading Instruction on reading achievement of grade 1 at-risk readers. Fortyfour elementary schools, blocked by district, were randomly assigned to condition. In both conditions, at-risk readers received 90 minutes of whole-group instruction (Tier 1) plus an additional 30 minutes of daily, smallgroup intervention (Tier 2). In the treatment condition, Tier 1 instruction included enhancements to the core program and Tier 2 intervention was highly aligned with the core program. In the comparison condition, Tier 1 instruction used the same core program as treatment schools in the district and Tier 2 intervention followed standard district protocol. Significant treatment effects were found on measures of phonemic decoding and oral reading fluency from fall to winter and word reading from fall to spring. Student-and classroom-level variables predicted student response to instruction differentially by condition.
This study evaluates the impacts and costs of the Reading Partners program, which uses community volunteers to provide one-on-one tutoring to struggling readers in under-resourced elementary schools. The evaluation uses an experimental design. Students were randomly assigned within 19 different Reading Partners sites to a program or control condition to answer questions about the impact of the program on student reading proficiency. A cost study, using a subsample of six of the 19 study sites, explores the resources needed to implement the Reading Partners program as described in the evaluation. Findings indicate that the Reading Partners program has a positive and statistically significant impact on all three measures of reading proficiency assessed with an effect size equal to around 0.10. The cost study findings illustrate the potential value of the Reading Partners program from the schools' perspective because the financial and other resources required by the schools to implement the program are low. Additionally, the study serves as an example of how evaluations can rigorously examine both the impacts and costs of a program to provide evidence regarding effectiveness.