Full Terms & Conditions of access and use can be found at
Journal of Research on Educational Effectiveness
ISSN: 1934-5747 (Print) 1934-5739 (Online) Journal homepage: https://www.tandfonline.com/loi/uree20
Meta-Analysis of the Impact of Reading
Interventions for Students in the Primary Grades
Russell Gersten, Kelly Haymond, Rebecca Newman-Gonchar, Joseph Dimino
& Madhavi Jayanthi
To cite this article: Russell Gersten, Kelly Haymond, Rebecca Newman-Gonchar, Joseph Dimino
& Madhavi Jayanthi (2020) Meta-Analysis of the Impact of Reading Interventions for Students
in the Primary Grades, Journal of Research on Educational Effectiveness, 13:2, 401-427, DOI:
To link to this article: https://doi.org/10.1080/19345747.2019.1689591
Published online: 09 Jan 2020.
Submit your article to this journal
Article views: 148
View related articles
View Crossmark data
THEORY, CONTEXT, AND MECHANISMS
Meta-Analysis of the Impact of Reading Interventions for
Students in the Primary Grades
, Kelly Haymond
, Rebecca Newman-Gonchar
, Joseph Dimino
and Madhavi Jayanthi
This meta-analysis systematically reviewed the most up-to-date litera-
ture to determine the effectiveness of reading interventions on meas-
ures of word and pseudoword reading, reading comprehension, and
passage fluency, and to determine the role intervention and study vari-
ables play in moderating the impacts for students at risk for reading
difficulties in Grades 1–3. We used random-effects meta-regression
models with robust variance estimates to summarize overall effects
and to explore potential moderator effects. Results from a total of 33
rigorous experimental and quasi-experimental studies conducted
between 2002 and 2017 that met WWC evidence standards revealed a
significant positive effect for reading interventions on reading out-
comes, with a mean effect size of 0.39 (SE ¼.04, p<.001, 95%
CI [0.32, 0.46]). Moderator analyses demonstrated that mean effects
varied across outcome domains and areas of instruction.
Received 29 January 2019
Revised 22 October 2019
Accepted 31 October
Reading; response to
system of support; Tier 2
Multi-tiered systems of support (MTSS), also referred to as Response to Intervention
(RtI), have become routine in American elementary schools, especially in the area of lit-
eracy/reading. In 2010–2011, for example, full implementation of MTSS in Grade 1
reading occurred in 71 percent of schools from a demographically representative sample
(Balu et al., 2015). The massive scale-up of MTSS was fueled by major pieces of federal
legislation, such as the Reading First portion of the No Child Left Behind Act (NCLB,
2002), the Individuals with Disabilities Education Act (IDEA, 2004), and the Every
Student Succeeds Act (ESSA, 2015). ESSA explicitly called for an emphasis on evidence-
based interventions with “strong”and “moderate”levels of evidence, based on the What
Works Clearinghouse (WWC) standards (U.S. Department of Education [U.S. ED],
Institute of Education Sciences [IES], & What Works Clearinghouse, 2013).
With the rapid and widespread implementation of MTSS in reading across schools, a
national study was undertaken to examine the impact of high-quality MTSS in reading
(identified as high-quality by experts and onsite evaluation teams) on the performance
of students in primary grades (Balu et al., 2015). To the surprise of many in the reading
research community, the evaluation found statistically significant negative effects on
Grade 1 reading performance and non-significant impacts on Grades 2 and 3 reading
!2019 Taylor & Francis Group, LLC
CONTACT Kelly Haymond email@example.com Instructional Research Group, 4281 Katella Avenue, Suite 205,
Los Alamitos, California 90720, USA.
Instructional Research Group, Los Alamitos, California, USA
JOURNAL OF RESEARCH ON EDUCATIONAL EFFECTIVENESS
2020, VOL. 13, NO. 2, 401–427
performance in 146 elementary schools in 13 states using a regression discontinuity
design (Imbens & Lemieux, 2008).
Some in the reading research community raised concerns about the study design, the
use of a regression design to answer questions about effectiveness, how impacts were com-
bined across districts that use different curricula, and the limited monitoring of fidelity of
implementation (e.g., Fuchs & Fuchs, 2017; Gersten, Jayanthi, & Dimino, 2017). Yet, the
findings are hard to ignore. They raise questions about the effectiveness of interventions in
authentic settings, specifically, “How could interventions based on principles from scien-
tific research or those that were actually shown to be evidence-based according to current
standards be ineffectual or even slightly negative in practice?”
There are at least three plausible reasons for the finding. The first is that the evidence
based on the effectiveness of beginning reading interventions may not be as robust or
consistent as previously believed. The second is that there is a body of rigorously con-
ducted research documenting the effectiveness of beginning reading interventions and
the instructional practices used in the research, but these interventions were not imple-
mented with fidelity in practice. The third could be a result of the types of outcome
measures used. Measure types have been found to moderate the size of the impacts
(Elleman, Lindo, Morphy, & Compton, 2009). Students in the national evaluation were
assessed on comprehensive measures of reading performance, as opposed to measures
that measure discrete reading proficiencies, often well aligned with the interven-
Rather than listing the names of interventions with rigorous research support, as
done for example on the WWC website, we thought it more important to conduct a
careful examination and rigorous meta-analysis of the body of contemporary research
on reading interventions relevant to MTSS. Doing so would allow us to articulate the
instructional practices and principles that underlie the interventions found to be effect-
ive, and to delineate the outcomes and the grade levels for which there is strong
evidence to support intervention. It also seemed important to examine factors such as
the type of interventionist and the level of support and monitoring provided to the
First, we briefly review prior meta-analyses and literature reviews on this topic.
A Brief Summary of Meta-Analyses and Literature Reviews on Reading
Six syntheses of research (either literature reviews or meta-analyses) address the topic of
reading interventions in the primary grades, albeit from different perspectives. Three of
these do not examine the impacts of reading interventions: Al Otaiba and Fuchs (2002),
Stuebing et al. (2015), and Tran, Sanchez, Arellano, and Swanson (2011) explored the
relationship between students’prior reading abilities and other learner characteristics
and their responsiveness to reading intervention.
Slavin, Lake, Davis, and Madden (2011) examined 70 studies of programs geared
toward providing support to struggling readers in elementary school (K-5), including
not only the small-group interventions (20 studies) and one-on-one interventions (20
studies) typical for Tier 2 of MTSS, but also whole-class instructional practices (16
402 R. GERSTEN ET AL.
studies) and computer-based (14 studies) instruction for at-risk readers. The studies in
the review were not evaluated for quality, though the authors did limit their review to
studies that included randomization or matching to form a comparison group, a far
lower standard than the WWC standard used for these studies. Studies were only
included if the program lasted at least 12 weeks. Effects ranged from 0.09 for computer-
based interventions to 0.56 for whole-class interventions. One-on-one and small group
interventions resulted in effects of 0.39 and 0.31, respectively, suggesting that both
approaches show evidence of promise for Tier 2 intervention.
Wanzek and colleagues conducted two meta-analyses of reading intervention studies
involving students in kindergarten through Grade 3. The first (Wanzek et al., 2016) exam-
ined studies of shorter interventions (lasting less than 100 sessions); the latter (Wanzek
et al., 2018) included only longer interventions (lasting more than 100 sessions). The meth-
odology was similar, incorporating RCTs and QEDs only, but not ensuring that the study
met rigorous WWC standards. The first set included 72 studies, published between 1995
and 2013. The second, more recent meta-analysis of longer interventions located only 25
studies published from 1995 to 2015, using similar methodology.
For the first meta-analysis of shorter interventions, the authors classified outcomes
into one of two domains developed to correspond with the simple view of reading
(Francis, Kulesz, & Benoit, 2018). The first addressed the broad area of decoding,
including pre-reading skills (phonological awareness, rhyming, letter identification), as
well as measures of decoding of phonetically regular words and pseudowords, and word
reading and fluency. The second domain was called multicomponent and was essentially
a composite of both listening comprehension and reading comprehension, as well as
oral vocabulary and reading vocabulary. Using a random effects model, the authors
found positive mean effects ranging from 0.54 to 0.62 on the composite outcomes of
foundational reading and reading-related skills and 0.36 to 1.02 on language and com-
prehension measures. There was no evidence that group size, intervention type, grade
level, or interventionist were related to the magnitude of impacts.
Findings from the 2018 meta-analysis of longer interventions produced a mean effect
size of 0.28 when corrected for publication bias. The effects were found to be homoge-
neous, precluding the use of moderator analysis or meta-regression.
Rationale for the Present Meta-Analysis of Tier 2 Reading Interventions in
We decided to conduct a new meta-analysis for several reasons. One reason is that we
wanted to use robust variance estimation (RVE; Hedges, Tipton, & Johnson, 2010), a
more contemporary approach that addresses dependent effect sizes arising from multiple
outcomes and comparisons within studies. RVE allows researchers to model all depend-
encies statistically, compared to traditional meta-regression approaches, which address
dependencies by selecting specific comparisons, selecting a single measure, or aggregat-
ing all measures by computing an average effect. Of the six related syntheses we found,
only the most recent meta-analysis by Wanzek and colleagues (2018) used RVE but that
study focused on longer reading interventions. The current study used RVE on a set of
studies that have not been previously examined with this type of analysis.
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 403
A second reason for conducting a new meta-analysis is to limit the grade level to
those in which students are expected to begin reading. Typically, kindergarten interven-
tion studies include very few, if any, reading measures. Instead, they often include meas-
ures of pre-reading or reading-related skills such as listening comprehension, rhyming,
and phonemic awareness. The three previous reviews (Slavin et al., 2011; Wanzek et al.,
2016,2018) included kindergarten studies and measures of pre-reading skills
(i.e., phonological awareness, listening comprehension). As our goal was to determine
whether students receiving the intervention progressed beyond the pre-reading stage
and truly learned to read, we only included studies from Grades 1, 2, and 3 to reflect
the grades included in the national RtI evaluation study (Balu et al., 2015). We also lim-
ited the outcomes to include only measures of reading performance (word reading, pas-
sage fluency, reading comprehension), which is in line with the ESSA standards for
evaluating study outcomes in the primary grades (Center for Research and Reform in
Education & Johns Hopkins University, 2019) and the framework used for assessing stu-
dent reading performance in the Reading First national evaluation (Gamse, Jacob,
Horst, Boulay, & Unlu, 2008). (Reading First used reading comprehension and decoding
to assess the reading performance of struggling students in Grades K–2). We did not
include studies of interventions above Grade 3 because the interventions in Grades 4
and 5, for example, are very different than those in Grades 1–3: They focus more on
comprehension, vocabulary development, and fluency building, and less on decoding.
Finally, given the focus of ESSA (2015) on using interventions with “moderate”to
“strong”levels of evidence from studies that have met WWC standards (Version 3.0;
U.S. ED et al., 2013) for high-quality causal studies, we wanted to conduct a formal
review of the studies using WWC standards and include only those studies in the meta-
analysis that met those standards. The current set of studies is thus a more focused set
that has been screened for the design’s rigor and the findings’trustworthiness.
Purpose of the Present Meta-Analysis
The purpose of this meta-analysis is to synthesize rigorously conducted randomized
controlled trials and quasi-experimental studies on reading interventions for students
who are at risk for reading difficulty in Grades 1–3. The research questions guiding this
1. Overall, how effective are reading interventions that are designed to improve the
reading outcomes (i.e., reading of words and pseudowords, passage reading flu-
ency, and reading comprehension) of Grades 1–3 students who are considered at
risk for reading difficulties?
2. Do study characteristics (i.e., nature of comparison, design, grade level, risk sta-
tus, and outcome domain/type) or intervention characteristics (i.e., group size,
interventionist, average hours per week of intervention, whether the intervention
was scripted, areas of instruction within the interventions, and support provided
to interventionists) moderate the effect of reading interventions on read-
404 R. GERSTEN ET AL.
Literature Search and Selection of Relevant Studies for the Meta-Analysis
The goal of the search was to locate all studies published from January 2002 to March
2017 focused on reading interventions for students in Grades 1–3. The literature search
began with a keyword search of the following databases: Academic Search Premier,
Campbell Collaboration, Educator’s Reference Complete, ERIC, PsycINFO, Social
Sciences Citation Index, and WorldCat. The following keywords were used: reading, lit-
eracy, fluency, decoding, vocabulary, comprehension, reading ability, reading proficiency,
reading achievement, response to intervention and instruction, reading intervention, RtI,
response to intervention, response to instruction, Tier 2 intervention, tutoring, small-group
instruction, one-on-one instruction, intensive intervention, at-risk students, at-risk, contin-
ued risk, non-responders, responders, reading difficulties, reading disabilities, and strug-
gling readers. In addition, we examined all WWC intervention reports in beginning
reading and two relevant WWC Practice Guides, Assisting Students Struggling with
Reading and Improving Reading Comprehension in Kindergarten Through 3rd Grade. We
performed a version of hand-searching known as snowballing, by checking the reference
lists of research syntheses on the topic. Finally, we solicited recommendations from key
researchers in the field on studies likely to be eligible.
Toward the end of the search
and review process, we examined any studies not previously located but included in the
foundational reading practice guide (Foorman et al., 2016) and other meta-analysis and
research syntheses (e.g., Wanzek & Vaughn, 2007; Wanzek et al., 2016).
The search resulted in the identification of 2,423 publications. All studies were
screened for eligibility based on the title, keywords, and abstracts. The studies were then
examined to determine whether they met the following inclusion criteria:
(a) Location. To be eligible, studies had to take place in the United States.
(b) Publication date. We limited the search to studies published between 2002 and
2017. The 2002 start date was chosen because it marks a transition in how teachers
approached reading interventions. Beginning circa 2002, initiatives in states such as
Texas and California (and numerous others) were reinforced by the Reading First pro-
gram’s (NCLB, 2002) emphasis on small-group preventative reading interventions based
on early screening in the primary grades. Research after this date focused more on the
effectiveness of these preventative interventions. Therefore, studies published after 2002
seemed most relevant to the research questions.
(c) Reading intervention. The study had to focus on the effectiveness of a reading
intervention: that is, preventative instructional practices and activities designed to help
students who are considered at risk for reading difficulties (e.g., Gersten, Compton,
et al., 2009). The interventions had to be at least 8 h in duration and could be provided
to small groups of students or individually to one student. We did not exclude any stud-
ies based on the size of the small groups. The interventions could be conducted at
school, either during school or after school, or at non-school clinics. The intervention
could be conducted during the school year or during summer break. They could be
This procedure is documented in Gersten et al. (2017).
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 405
delivered by teachers, researchers, tutors, volunteers, parents, or paraprofessionals, pro-
vided they followed a specific intervention program or a clearly outlined approach.
Although we included interventions that taught phonological awareness, we did not
include interventions that focused solely on phonological awareness without providing
any instruction on reading words and/or pseudowords. We also did not include
whole-class (Tier 1) interventions (even if it was noted that the entire class or school
was considered at risk for reading failure) or intensive Tier 3 interventions that were
meant to meet the individual needs of students who failed to benefit from evidence-
based interventions (e.g., Fuchs, Fuchs, et al., 2008; Gersten, Compton, et al., 2009). In
other words, studies that selected only students who were nonresponders to a Tier 1 or
2 intervention were excluded. Denton et al. (2013), for example, examines the impact of
an intervention for students who have failed to show progress in both Tier 1 and Tier 2
interventions and, therefore, was excluded from the meta-analysis. Finally, we excluded
interventions that were delivered only at home, conducted in a language other than
English, or included only a professional development component for teachers on the
topic and lacked a specific intervention or intervention approach.
(d) Study design. Only RCTs and QEDs were included.
(e) Sample. The participants had to be students in Grades 1–3 who were considered
at risk for reading difficulties. To be considered at risk, students had to have (a) a score
on a valid screener or screening battery indicating that the student was likely to be at
risk for possible reading failure at the end of the school year or (b) a score on a norm-
referenced standardized test (such as Woodcock Reading Mastery) indicating that the
student performed below the 40th percentile at the beginning of the school year or at
the end of the previous school year. If a study sample included students from grades
that were outside the scope of the review (e.g., Grades K, 4, or 5), then the study had to
meet one of the following criteria: (a) the study findings disaggregated the results of stu-
dents in eligible grades or (b) students in eligible grades represented over 50 percent of
the aggregated mixed-age sample.
(f) Outcomes. Studies had to include outcome measures of reading proficiencies and
skills (i.e., word reading, passage fluency, reading comprehension, or overall reading
achievement). Studies that only included measures of pre-reading skills such as phon-
emic awareness, rhyming, and oral comprehension were excluded.
Of the 2,423 publications that were examined, 54 met the initial criteria for inclusion.
See Figure 1 for a pictorial representation of the screening process.
Coding of Studies
The 54 publications that met initial inclusion criteria were coded in 3 phases. In Phase
1, publications were coded for quality of research design. In Phase 2, publications were
coded to identify study characteristics and intervention characteristics. Finally, in Phase
3, publications were coded to explore the areas of reading covered in the interven-
406 R. GERSTEN ET AL.
Phase 1 Coding: Quality of Research Design
In the first phase, two members of the research team (who are certified WWC
reviewers) independently examined each publication for the study design’s strength and
quality using WWC Procedures and Standards Handbook (Version 3.0; U.S. ED et al., 2013).
Only studies that met WWC standards (with or without reservations) were included in
Phases 2 and 3.
Several publications we reviewed included more than one study (e.g., Denton,
Fletcher, Taylor, Barth, & Vaughn, 2014; Lane, Pullen, Hudson, & Konold, 2009).
For the purposes of this project, we defined a study as any comparison with a
unique treatment group compared to a unique, business-as-usual control condition.
Studies comparing the impacts of two researcher-controlled interventions, as well as
comparisons of variations in treatments, were excluded. For example, in Lane et al.
(2009), researchers report the effects of an intervention, as well as three variations of
that intervention, when compared with that of a business-as-usual control condition.
We considered each intervention and variation as a unique treatment group, and
each comparison of a unique treatment group with a business-as-usual control as a
separate study; therefore, we counted four studies in this publication (i.e., T0 vs. C,
T1 vs. C, T2 vs. C, and T3 vs. C). The comparisons of each variation in treatment
with the others were excluded. Studies of variations in treatments focus on a much
more precise research question than the effectiveness of reading interventions. The
aThe study is a randomized controlled trial with high attrition or a quasi-experimental design
study with analysis groups that are not shown to be equivalent. bThere was only one unit
assigned to at least one of the conditions, or the intervention was always used in combination
with another intervention.
Included in Meta-
Publications screened for eligibility:
Publications that were excluded because they
compare two unique treatments: 9
Publications and study comparisons
used in meta-analysis:
25 publications include 33 studies
Publications that met screening
Publications coded for quality: 54
Publications excluded at screening: 2,369
•Not conducted in the U.S.
•Not published between 2002–2017
•No eligible reading intervention
•Not an eligible study design, sample, or outcome
Publications that met WWC Evidence
Publications that did not meet WWC Evidence
Figure 1. Literature search, screening, and reviewing.
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 407
framework of this meta-analysis could not account for the more experimental manip-
ulations of specific components.
In total, of the 54 publications reviewed, 25 publications included 33 separate studies
that met standards (with or without reservations). See Figure 1.
Phase 2 Coding: Study and Intervention Characteristics
For studies that met WWC group design standards (with or without reservations), we
coded the following study characteristics: nature of the comparison, design (either RCT
or QED), grade level, participants’risk level, and outcome domain and type. Coding of
the intervention characteristics addressed the following: What was the size of the inter-
vention group, who implemented the intervention (i.e., interventionist), was the inter-
vention scripted, was monitoring and feedback provided to the interventionist, and how
many hours of instruction were provided per week. See Table 1 for the operational defi-
nitions. Two members of the researcher team coded all study and intervention charac-
teristics. The researchers discussed and rectified any discrepancies. After the initial
coding, a third researcher coded a randomly selected 20 percent of the studies for reli-
ability purposes. Reliability was 90.6 percent.
Phase 3 Coding: Area/Focus of Instruction
We examined descriptions of the interventions provided in the publications and cataloged
the interventions in two ways: (a) the target area of instruction, and (b) the focus of instruc-
tion. Each study was examined to determine if any of the following reading areas—phono-
logical awareness, decoding, encoding (spelling), fluency, vocabulary, comprehension, and
writing—were addressed during the intervention. If an intervention covered a reading area
in any manner, minimally or extensively, then the study was coded for that area.
During our coding, we noticed that some studies covered the main areas of reading
minimally during the intervention while others gave evidence of extended explicit
instruction. For instance, many studies mentioned that they included reading compre-
hension in the lesson, but then only described that they asked comprehension questions
as or after students read a passage, without providing any explicit instruction in com-
prehension strategies. Consequently, instruction in each intervention was further exam-
ined to determine whether the reading areas were taught routinely and explicitly. If so,
the studies were also coded for having a focus in that area of reading. This additional
level of coding—the focus of instruction—was limited to decoding, fluency, reading
comprehension, and vocabulary.
An example to illustrate how the research team determined whether a component
was a focus of instruction is the coding of the reading comprehension component in
Denton et al. (2014). This study consisted of two treatment conditions, explicit instruc-
tion and guided reading, and a control group. Comprehension was coded as a focus of
instruction in the explicit instruction condition but not the guided reading condition. In
the former, instruction consisted of teachers modeling comprehension strategies using
“think-alouds”and providing specific feedback when students practiced in small groups.
Although the guided reading condition included discussion activities, teachers never
408 R. GERSTEN ET AL.
modeled or provided any clear guidance on how and when to use various strategies to
discern a cause-effect relationship or for succinct retelling.
Coding of studies during this phase was done collaboratively by two members of the
research team (who are experts in beginning reading). Reliability was calculated on 20
percent of randomly selected studies. Reliability was 85.71 percent.
Calculation of Effect Sizes
To determine each intervention’s impact, we calculated the average effect size for
each domain of reading (i.e., word and pseudoword reading, passage reading
Table 1. Study and intervention characteristic definitions.
Moderator Levels and operational definitions
Design RCT ¼Randomized control Trial; QED ¼Quasi-experimental design.
1¼First grade only; 2/3 ¼Second- and third-grade combination class.
Nature of the comparison group Core reading instruction only ¼Business-as-usual, whole-class reading
instruction with no additional support (i.e., Tier 1); School-provided
intervention ¼Reading interventions typically provided by the
school/district (i.e., some form of preventative intervention provided
in addition to core Tier 1 reading).
At risk ¼Only students in the 25th percentile or lower on a
standardized norm-referenced screener; Minimal risk ¼Students
considered potentially at risk who score below the 40th percentile
on a standardized norm-referenced screener.
Outcome measure domains Word or pseudoword reading ¼e.g., TOWRE &Woodcock-Johnson Word
Attack; Passage reading fluency ¼e.g., AIMSweb Standard Reading
Assessment Passages; Reading comprehension ¼Woodcock Reading
Mastery Tests (WRMT) Passage Comprehension subtest, GRADE
reading comprehension subtest.
Standardized tests ¼Existing measures administered, scored, and
interpreted in the same way for all test-takers; Researcher-developed
measures ¼Only those the researcher developed for the study.
Group size Small group ¼groups of more than one student; One-on-one ¼1
student with 1 interventionist.
Certified teacher ¼Had a teaching credential, even if they were not
employed as a full-time teacher at the schools where the studies
took place; Paraprofessional ¼Anyone who worked or volunteered at
the school as part of the study and had no teaching credential;
Research staff ¼Were typically graduate students at a university.
Avg. hrs./week of instruction Low ¼Less than 1.5 h per week; Medium ¼1.5 to 2.0 h per week; High
¼2.0 or more hours per week.
Scripted Yes ¼Interventionist provided with step-by-step instructions on what to
say and do during each session; No ¼Interventionist was not
provided with step-by-step instructions.
Monitoring and feedback Yes ¼Interventionists were observed conducting the intervention and
were provided feedback after they were observed;
No ¼Interventionists were not observed and no feedback
If a study included students from more than one grade (e.g., from Grades 1 and 2), then the study was assigned to
the grade level of the majority of the sample.
If the authors did not provide a percentile on a nationally normed test to describe the at-risk sample, the study was
not coded for this variable.
Measures of pre-reading skills such as phonological awareness, rhyming, and letter naming were excluded, as were
measures of listening comprehension, spelling, and writing.
Studies with a mix of interventionists (e.g., both teachers and paraprofessionals) were coded by the most preva-
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 409
fluency, vocabulary, and reading comprehension). The effect size was calculated for
each outcome using the means and pooled standard deviations for intervention and
comparison groups, and corrected for small sample bias by Hedges (1981) proce-
dures. In cases where means and standard deviations were not available, the tor F
statistics and the treatment and comparison group sample sizes were used to calcu-
To account for dependencies in our data, we used random effects robust variance
estimation (RVE) techniques (Hedges et al., 2010). RVE permits the comparison of
effect sizes across studies in which multiple, dependent effect sizes are drawn from
the same sample. Random effects analyses were conducted using the statistical soft-
ware Stata (StataCorp, 2015) and the “Robumeta”package (Hedberg, 2011), a
macro that applies the RVE techniques. In RVE, the mean correlation between all
pairs of effect sizes within a study (q) must be specified to estimate the study
weights and calculate the between-study variance. We used a qvalue of .80 to esti-
mate the between-study variance and then conducted sensitivity analyses using q
values of 0 to .90.
The small-sample correction developed by Tipton (2015) was
implemented in Robumeta for all models, as RVE results have been shown to
inflate the Type I error rate when the meta-analysis includes fewer than 40 studies
We estimated a series of meta-regression models using RVE. First, we ran an inter-
cept-only model in which the estimate for the constant represented the average weighted
effect size across all 33 studies (Tanner-Smith & Tipton, 2014). The Robumeta package
calculated the following indices of heterogeneity: Qstatistic and its p-value, estimates of
(the percentage of between-study heterogeneity not due to chance variation in
effects), and s
(the true variance in the population of effects).
Next, we examined the role of moderators such as group size, grade level, and level
of support provided to interventionists. Though one meta-regression model with covari-
ates for all moderators is preferable, the number of studies that met the inclusion crite-
ria was small, and not every study included information that permitted coding of all
moderators. Thus, this approach was not taken because results would be uninterpretable
due to insufficient degrees of freedom. Instead, we examined potential moderators using
separate RVE meta-regression models with only the moderator of interest entered as a
predictor. We interpret these results with caution due to potential confounding effects
of other moderators that are unaccounted for in these single-predictor models.
Moreover, a small number of single-predictor models remained underpowered (df <4),
likely a result of large imbalances in the data (Tipton, 2015).
The moderator variables were dummy coded and included as covariates in each
model. To estimate a mean effect size for each level of the moderator variables (i.e.,
Hedges et al. (2010) demonstrated that the value selected for
generally does not affect results much and
recommended implementing a sensitivity analysis by analyzing models with varying
values. We conducted sensitivity
values of 0 to .90 and found no meaningful differences in the results across models, indicating that
our findings were robust across estimates of
410 R. GERSTEN ET AL.
RCT and QED are levels of the design moderator variable), intercept-only models also
were run for each level of the moderator. The p-value for determining statistical signifi-
cance in each of the moderator analyses was set to p<.05.
The potential impact of publication bias using the trim-and-fill methodology (Duval &
Tweedie, 2000) was examined by constructing a funnel plot of the effect sizes and not-
ing any asymmetry in the distribution of effects. The plot was then systematically
trimmed, removing the effect sizes causing the asymmetry, and filled in with any effect
sizes that may have been missing from unpublished studies that resulted in small and
non-significant treatment effects. The analysis estimated the number of missing effect
sizes and recalculated the overall mean effect size in a way that reflects the presence of
these missing effects.
A total of 33 studies from 25 publications met WWC group design standards and were
included in the final meta-analysis. These 33 studies spanned 13 years (2004–2016) and
provided a total of 128 effect sizes. The sample sizes in the studies ranged from 21 to
6,888 students. Total sample size across all studies was 11,737 students (median sample
Thirty of the studies were RCTs; the remaining three were QEDs. The comparison
condition for the majority of studies (k¼21) was Tier 1 core classroom instruction (i.e.,
nothing other than what the classroom teacher chose to provide). In the remaining 12
studies, the comparison was typical school- or district-provided intervention. Twenty-
two studies were conducted in Grade 1, and 11 studies were in Grades 2 and/or 3. Only
16 studies provided the information necessary for coding for participants’risk level.
those, 12 studies included students in the minimal risk category, and only 4 studies
included students in the at-risk category (i.e., 25th percentile or lower). The most com-
mon outcome domain was word or pseudoword reading (included in all but one of the
33 studies). Nineteen studies included outcomes in reading comprehension, 16 included
passage reading fluency outcomes, and only two studies included outcomes in vocabu-
lary. The study characteristics for each study included in the meta-analysis are presented
in Table 2.
Interventions were delivered to students in one-on-one settings in 21 of the studies and
in small-group settings (group sizes in the included studies ranged from 2 to 5 students)
If the authors did not provide a percentile on a nationally normed test to describe the at-risk sample, the study was
not coded for this variable. Across the studies, a wide range of screening measures and operational definitions were
used, and the screeners were typically not nationally normed.
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 411
Table 2. Study and intervention characteristics for each study included in the meta-analysis.
Study characteristics Intervention characteristics
Grade At-risk level
Allor and McCathren (2004) Study 1 RCT CR 1 RC 1:1 P Y L Y
Allor and McCathren (2004) Study 2 RCT CR 1 PF, RC, WR 1:1 P Y L Y
Berninger, Abbott, Vermeulen, and Fulton (2006) RCT CR 2/3 A WR SG R N M N
Blachman et al. (2004) RCT SI 2/3 A PF, RC, WR 1:1 CT N H Y
Case et al. (2010) RCT CR 1 WR SG R Y M Y
Case et al. (2014) RCT CR 1 PF, WR SG R Y M Y
Denton et al. (2010) RCT SI 1 RC, WR SG CT N H Y
Denton et al. (2014) RCT SI 2/3 M PF, RC, WR SG P Y H Y
Denton et al. (2014) RCT SI 2/3 M PF, RC, WR SG P Y H Y
Fien et al. (2015) RCT SI 1 M PF, WR SG P N H Y
Fuchs, Compton, Fuchs, Bryant, and Davis (2008) RCT CR 1 WR SG R N H N
Gunn et al. (2005) RCT CR 2/3 PF, RC, WR SG P Y M Y
Jacob, Armstrong, Bowden, and Pan (2016) RCT SI 2/3 PF, RC, WR 1:1 P Y L Y
Jenkins, Peyton, Sanders, and Vadasy (2004) QED CR 1 A RC, WR 1:1 P Y M Y
Lane et al. (2009) RCT CR 1 M WR 1:1 R N M N
Lane et al. (2009) RCT CR 1 M WR 1:1 R N M N
Lane et al. (2009) RCT CR 1 M WR 1:1 R N L N
Lane et al. (2009) RCT CR 1 M WR 1:1 R N L N
May, Sirinides, Gray, and Goldsworthy (2016) RCT SI 1 RC, WR 1:1 CT N H N
O’Connor et al. (2010) RCT CR 2/3 PF, RC, WR 1:1 P N L N
O’Connor et al. (2010) RCT CR 2/3 PF, RC, WR 1:1 P N L N
Pullen, Lane, and Monaghan (2004) RCT CR 1 M WR 1:1 P N L N
Scanlon, Vellutino, Small, Fanuele, and Sweeney (2005) RCT SI 1 RC, WR 1:1 CT H Y
Scanlon et al. (2005) RCT SI 1 RC, WR 1:1 CT H Y
RCT CR 1 RC, WR 1:1 CT N H N
Smith et al. (2016) RCT SI 1 M PF, WR SG P N H Y
Vadasy and Sanders (2011)
RCT CR 1 PF, RC, WR 1:1 P Y L Y
Vadasy, Sanders, and Peyton (2006) QED
QED CR 2/3 M PF, RC, WR 1:1 P Y M Y
Vadasy et al. (2006) RCT
RCT CR 2/3 M PF, RC, WR 1:1 P Y M Y
Vadasy et al. (2007)
RCT CR 2/3 M PF, WR 1:1 P Y M Y
Vellutino and Scanlon (2002) RCT SI 1 A WR 1:1 CT N H Y
Wang and Algozzine (2008) RCT CR 1 RC, WR SG P Y L N
Wanzek and Vaughn (2008)
QED SI 1 PF, WR SG R Y H Y
Identifies studies also included in Wanzek et al. (2016).
RCT ¼randomized controlled trial. QED ¼quasi-experimental design.
CR ¼core reading instruction. SI ¼school-provided intervention.
A¼at risk, a sample with students only in the 25th percentile or lower. M ¼minimal risk, a sample that included students below the 40th percentile. Blank ¼authors did not provide
a percentile or the information to calculate the percentile from a nationally normed test given for screening purposes.
WR ¼Word & Pseudoword Reading. RC ¼Reading Comprehension. PF ¼Passage Fluency.
SG ¼small group.
P¼paraprofessional. R ¼researcher. CT ¼certified teacher.
Y¼scripted. N ¼not scripted.
L¼low, M ¼medium, H ¼high.
Y¼support included. N indicates that monitoring and feedback was either not included or not reported.
412 R. GERSTEN ET AL.
in 12 studies. The interventions were implemented by a researcher in nine studies, a
certified teacher in seven studies, and a paraprofessional in 15 studies. In 21 studies,
additional support in the form of monitoring and feedback was provided to the inter-
ventionists. Nearly half of the studies (k¼15) included scripted interventions. The aver-
age hours per week of intervention ranged from less than an hour (i.e., 45 min) to
4.17 h per week (median ¼2). The intervention characteristics for each study included
in the meta-analysis are described in Table 2.
Instructional Area/Focus of the Intervention
All the studies included in the meta-analysis examined interventions that focused on
building students’reading skills in more than one instructional area (e.g., decoding, flu-
ency, comprehension). In many respects, the interventions appeared to be similar to
each other, mainly addressing decoding and fluency, while also attending to one or
more areas of reading instruction, such as encoding, comprehension, vocabulary, phono-
logical awareness, and writing.
All but two studies (k¼31) addressed decoding. Many studies also included instruc-
tion in passage fluency (k¼29) and encoding (k¼23). Over 50 percent of the studies
addressed phonological awareness (k¼19) and comprehension (k¼18). Vocabulary
(k¼13) and writing (k¼7) were addressed less frequently.
All studies with decoding and fluency were coded for both area of instruction as well
as the focus of instruction. Only nine of the 18 studies that were coded for comprehen-
sion were also given a Yes under the focus code as they showed evidence of systematic
teacher-led explicit instruction in comprehension that went beyond asking literal ques-
tions, questions about title and pictures, or monitoring comprehension strategies. Only
one of the nine studies coded for vocabulary was also given a Yes for focus. Vocabulary
instruction rarely involved explicit teaching and interaction; it typically involved defin-
ing words if students asked, or asking students to look at pictures to derive the meaning
of a word.
The meta-analysis included 128 effect sizes from 33 studies. See Table 3 for effect sizes
(Hedges’g) for all outcome measures by domain for each study. Effect sizes ranged
widely, from –0.20 to 1.37. The mean effect size for these studies was 0.39 (SE ¼.04,
p<.001, 95% CI [0.32, 0.46]), indicating that the reading interventions were generally
effective across students, settings, and measures. As expected, treatment effects varied
considerably. The I
estimate of the percentage of between-study heterogeneity not due
to chance was 50.75%, with a s
estimate of the true variance in the population of
effects of 0.02.
Eleven categorical moderator analyses of study characteristics (six variables) and inter-
vention characteristics (five variables) were individually tested using each as a single
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 413
Table 3. Outcomes and effect sizes.
Author Total NOutcome domain
Allor and McCathren (2004) Study 1 86 RC S 0.50
Allor and McCathren (2004) Study 2 157 WR S 0.05, 0.13, 0.33, 0.44, 0.78
RC S "0.16
PF R 0.13
Berninger et al. (2006) 93 WR S 0.35
Blachman et al. (2004) 69 WR S 0.74, 0.87
RC S 0.53
PF S 0.70
Case et al. (2010) 30 WR S 0.48, 0.73, 0.73
Case et al. (2014) 123 WR S 0.02, 0.17, 0.21, 0.23
PF S 0.20
Denton et al. (2010) 422 WR S 0.42
RC S 0.51
Denton et al. (2014) 103 WR S 0.34, 0.40, 0.50
RC S 0.08, 0.13
PF S 0.16
Denton et al. (2014) 112 WR S 0.31, 0.50, 0.63
RC S 0.29, 0.46
PF S 0.45
Fien et al. (2015) 239 WR S 0.38, 0.45
PF S 0.30
Fuchs et al. (2008) 64 WR S 0.26, 0.26, 0.38, 0.46, 0.65
Gunn et al. (2005) 245 WR S 0.30, 0.52
RC S 0.32
PF S 0.24
Jacob et al. (2016) 1,166 WR S 0.11
RC S 0.10
PF S 0.09
Jenkins et al. (2004) 99 WR S 0.37, 0.50, 0.52, 0.73, 0.76, 1.12
RC S 0.74
Lane et al. (2009) 41 WR R 0.64, 0.71
WR S 1.24
Lane et al. (2009) 42 WR R 0.24, 0.29
Lane et al. (2009) 43 WR R 0.39, 0.55
Lane et al. (2009) 46 WR R 0.52, 0.59
WR S 1.02
May et al. (2016) 6,888 WR S 0.41
RC S 0.42
O’Connor et al. (2010) 40 WR S 0.10, 0.56
RC S 0.48, 0.53
PF S 0.60, 0.75, 0.76, 0.87
O’Connor et al. (2010) 43 WR S 0.25, 0.57
RC S 0.37, 0.44
PF S 0.81, 0.84 0.93, 1.33
Pullen et al. (2004) 47 WR R 0.24, 0.81
WR S 0.54, 0.59
Scanlon et al. (2005) 114 WR S 0.31, 0.55
RC S 0.41
Scanlon et al. (2005) 117 WR S 0.51, 0.62
RC S 0.35
Schwartz (2005) 74 WR S 0.93, 1.37
RC S 0.14
Smith et al. (2016) 743 WR S 0.19
PF S 0.12
Smith et al. (2016) 729 WR S 0.23, 0.32
Smith et al. (2016) 749 WR S 0.24
PF S 0.18
Vadasy and Sanders (2011) 89 WR S 0.51
RC S 0.29
PF R 0.69
Vadasy et al. (2006) QED 31 WR S 0.61, 0.72
414 R. GERSTEN ET AL.
predictor in the meta-regression models. Some of these variables emerged as significant
moderators of the relationship between reading interventions and effect sizes on meas-
ures of students’reading proficiency.
Findings are summarized in Table 4. The coefficients from the intercept-only models
should be interpreted as the weighted effect size for studies with that level of moderator,
and statistically significant results indicate that the mean effect size for studies with that
level of the moderator is significantly different from zero.
Study Characteristics. The outcome domain for the measures, when comparing the
three areas of reading
(word or pseudoword reading, reading comprehension, passage
reading fluency) significantly moderated effect size. On average, outcomes in the word
or pseudoword reading domain yielded the largest effect size (b¼0.41 [0.33–0.50],
p<.001, k¼32). Reading comprehension domain outcomes produced a slightly smaller
effect (b¼0.32 [0.20–0.43], p<.001, k¼19), followed by passage reading fluency out-
comes, which generated the smallest effect size (b¼0.31 [0.17–0.44], p<.001, k¼16).
The only significant difference, however, was on outcomes in the word or pseudoword
reading domain, which yielded significantly larger effect sizes than outcomes in the
domains of reading comprehension or passage reading fluency (b¼0.10 [0.00–0.19],
Study characteristic variables that did not significantly moderate the effect size
included grade level (b¼0.06 ["0.26–0.15], p¼.542, k¼33), research design (b¼"0.06
["1.03–0.92], p¼.823, k¼33), the nature of the comparison group (b¼"0.11
["0.25–0.04], p¼.139, k¼33, participants’risk level (b¼0.13 ["0.18–0.44], p¼.330,
k¼16), and standardized measures versus researcher-developed measures (b¼0.11
["0.07–0.28], p¼.189, k¼33).
Table 3. Continued.
Author Total NOutcome domain
RC S 0.50
PF R 0.81
Vadasy et al. (2006) RCT 21 WR S 0.67, 0.75
RC S 0.21
PF R 0.55
Vadasy et al. (2007) 43 WR S 0.47
PF S 0.52
Vellutino and Scanlon (2002) 118 WR S 0.38
Wang and Algozzine (2008) 139 WR S "0.03, 0.39, 0.45
RC S 0.17
Wanzek and Vaughn (2008) 50 WR S 0.12, 0.18
PF S "0.20
WR ¼Word & Pseudoword Reading. RC ¼Reading Comprehension. PF ¼Passage Fluency.
R¼researcher developed. S ¼standardized.
One effect size per measure. If four effect sizes are listed, it means four measures were used in this outcome domain.
The analysis could not include a meta-analysis of effects from outcomes in the vocabulary domain due to the small
number of studies (k¼2).
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 415
Intervention Characteristics. None of the intervention characteristics led to significant
moderator effects. These included interventions implemented by researchers (b¼"0.01
["0.21–0.20], p¼.939, k¼33), interventions implemented by certified teachers
(b¼0.14 [0.00–0.28], p¼.053, k¼33), interventions implemented by a paraprofessional
(b¼"0.12 ["0.25–0.02], p¼.086, k¼33), if an intervention was scripted (b¼"0.13
["0.28–0.02], p¼.094, k¼31), whether or not monitoring and feedback was provided
(b¼"0.12 ["0.26–0.03], p¼.103, k¼33), or average hours per week of intervention
for the low (b¼"0.08, ["0.30–0.15], p¼.464, k¼33), medium (b¼0.04, ["0.14–0.23],
p¼.610, k¼33), or high (b¼0.03, ["0.13–0.18], p¼.722, k¼33) categories, and
grouping (either small group or one-on-one interventions) (b¼"0.13 ["0.27–0.01],
p¼.075, k¼33). When tested per grade level, however, grouping was a significant mod-
erator for Grade 1 but not Grades 2 and 3. For Grade 1 specifically, effects were larger
if the intervention was delivered to students individually rather than to groups of stu-
dents, (b¼"0.16 ["0.32–0.01], p¼.042, k¼22).
Area/Focus of Instruction. Given that all the studies examined interventions with mul-
tiple areas of instruction, we tested a meta-regression analysis model using all seven
areas of instruction. Analyzing the areas of instruction simultaneously allowed us to
examine each area’smoderatinginfluencewhileholdingtheotherareasconstant.Of
the seven areas of instruction, phonological awareness, encoding (spelling), and writ-
ing appeared to be significant moderators of effect sizes when holding all other areas
of instruction constant (see Table 4). Interventions that included phonological aware-
ness tended to result in smaller effects across the word or pseudoword reading, read-
ing comprehension, and passage reading fluency outcomes (b¼"0.19 ["0.32, "0.05],
p¼.010). However, this study did not specifically address phonological awareness out-
comes, so we do not know the impact on those. In contrast, providing instruction in
encoding (b¼0.18 [0.01, 0.35], p¼.045) or writing (b¼0.18 [0.02, 0.34], p¼.028)
yielded significantly higher effect sizes when they were included as a component in
We were also interested in whether studies that included a focus in a particular
area—that is, more in-depth and explicit instruction—significantly moderated impacts.
Effect sizes were not significantly associated with providing a more in-depth instruc-
tional focus in decoding (b¼"0.17 ["0.89–0.55], p¼.216, k¼33), fluency (b¼"0.04
["0.34–0.26], p¼.694, k¼33), or comprehension (b¼"0.06 ["0.30–0.17], p¼.572,
k¼33). Note that vocabulary was not examined here as a focus due to the limited num-
ber of studies.
Finally, to determine whether the findings suffered from publication/small-study
bias, we implemented the trim-and-fill method (Duval & Tweedie, 2000). The
results indicated that 23 effect sizes were estimated missing from the current meta-
analysis of 128 total effects. Including these in the random-effects model would
minimally decrease the mean effect size from g¼0.39 to g¼0.32 (p<. 001, 95%
CI [0.27, 0.36]).
416 R. GERSTEN ET AL.
Table 4. Moderator analysis.
Moderator Coeff SE 95% CI p df QI
1 vs. 2/3 0.06 0.09 (−0.26, 0.15) 0.542 12 54.82 41.63 0.02 128 33 .8
Small Group vs. 1:1 −0.13 0.07 (−0.27, 0.01) 0.075 20 64.31 50.24 0.02 128 33 .8
RCT vs. QED −0.06 0.23 (−1.03, 0.92) 0.823 <4 66.80 52.10 0.02 128 33 .8
Paraprofessional vs. Other −0.12 0.06 (−0.25, 0.02) 0.086 19 49.68 35.59 0.01 128 33 .8
Certified Teacher vs. Other 0.14 0.06 (0.00, 0.28) 0.053 8 50.87 37.09 0.02 128 33 .8
Researcher vs. Other −0.01 0.09 (−0.21, 0.20) 0.939 10 67.00 52.24 0.02 128 33 .8
Scripted vs. Non-Scripted −0.13 0.07 (−0.28, 0.02) 0.094 18 50.53 40.63 0.02 122 31 .8
Nature of the comparison group
SI vs. CR
−0.11 0.07 (−0.25, 0.04) 0.139 24 65.88 51.43 0.02 128 33 .8
At risk vs. Minimal risk 0.13 0.12 (−0.18, 0.44) 0.330 5 16.24 —0.01 58 16 .8
Hours of treatment
Low vs. Other −0.08 0.10 (−0.30, 0.15) 0.464 10 53.19 39.84 0.02 128 33 .8
Medium vs. Other 0.04 0.08 (−0.14, 0.23) 0.610 10 66.89 52.16 0.02 128 33 .8
High vs. Other 0.03 0.07 (−0.13, 0.18) 0.722 21 57.57 44.42 0.02 128 33 .8
Yes vs. No −0.12 0.07 (−0.27, 0.03) 0.103 12 54.27 41.04 0.02 128 33 .8
Standardized vs. Researcher 0.11 0.07 (−0.07, 0.28) 0.189 7 66.36 51.78 0.02 128 33 .8
Word/Pseudoword Reading vs. Other 0.10 0.05 (0.00, 0.19) 0.049 19 66.10 51.59 0.02 128 33 .8
Passage Reading Fluency vs. Other −0.09 0.06 (−0.23, 0.05) 0.188 10 60.01 46.68 0.02 128 33 .8
Reading Comprehension vs. Other −0.05 0.05 (−0.23, 0.05) 0.308 13 66.58 51.94 0.02 128 33 .8
Constant 0.49 0.15 (−0.02, 1.00) 0.055 <4 44.07 27.39 0.02 128 33 .8
Decoding 0.12 0.11 (−0.53, 0.29) 0.356 <4 .8
Passage fluency −0.00 0.10 (−0.27, 0.26) 0.974 5 .8
Reading comprehension −0.08 0.07 (−0.23, 0.06) 0.234 13 .8
Vocabulary 0.08 0.09 (−0.12, 0.27) 0.414 11 .8
Phonological awareness −0.19 0.06 (−0.32, −0.05) 0.010 13 .8
Encoding 0.18 0.07 (0.01, 0.35) 0.045 8 .8
Writing 0.18 0.07 (0.02, 0.34) 0.028 8 .8
Note. Coeff = coefficient; SE = standard error; CI = confidence interval; p= significance; df = degrees of freedom; Q = test of homogeneity of effect sizes; I
= measures of effect size
= between-study variance; n= number of effect sizes; k= number of studies; ρ= corrected correlation. In all RVE models, we used a ρvalue of .80 to estimate the
between-study variance; —= could not be estimated. Bolded coefficient values indicate statistically significant estimates at p< .05.
CR = core reading instruction. SI = school-provided intervention.
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 417
Results from this meta-analysis of 33 studies of reading interventions conducted
between 2002 and 2017 reveal significant, positive effects on a range of reading out-
comes. The significant mean effect size (Hedges’g) across 33 studies was 0.39 (p<.001),
indicating that students from Grades 1, 2, and 3 who score in the at-risk category on a
screening battery or on a normed test do, on average, benefit from the set of reading
interventions studied. This leads us to conclude that the research base underlying read-
ing interventions is sound and not the primary reason for the lack of impacts in the
national RtI evaluation (Balu et al., 2015), which found null, or in one case negative,
impacts on reading outcomes for students at or near the cut point on screening.
Mean effect sizes (Hedges’g) for each outcome domain ranged from 0.41 in the area
of word or pseudoword reading to 0.32 in comprehension and 0.31 in passage reading
fluency. All were statistically significant at p<.001. Note that the mean effect size was
the highest in the outcome domain of word and pseudoword reading. This is unsurpris-
ing, given the large body of evidence supporting the use of various forms of systematic,
explicit, small-group instruction in phonemic awareness, phonics instruction, and sight
word reading to help students who are likely to fall behind when experiencing more
traditional instruction (e.g., Gersten, Compton, et al., 2009; National Institute of Child
Health and Human Development [NICHD], 2000).
The reading interventions examined showed many commonalities. Every intervention
addressed multiple aspects of foundational reading—phonological awareness, decoding,
passage reading fluency, encoding (spelling) and, on occasion, writing. Nearly all inter-
ventions addressed comprehension in some fashion, although few provided much in the
way of detail. Vocabulary and comprehension instruction were rarely emphasized.
Virtually all interventions included systematic, explicit instruction. Typically, this
occurred during phonics/word reading skills and passage reading fluency, often with
some activities geared toward fluency building and phonological awareness.
Interventions that included instruction on phonological awareness were associated
with significantly smaller effects, whereas interventions that addressed encoding or writ-
ing yielded significantly higher effect sizes. Perhaps focusing on pre-reading skills
such as phonological awareness after students have started to learn to decode is coun-
ter-productive as it takes time and focus away from gaining proficiency in decoding
skills. We speculate that an encoding component may help reinforce phonics rules and
decoding, and we note that this has been a feature of some core reading programs and
Variables for Future Exploration
The percentage of between-study heterogeneity not due to chance was 50.75%, suggest-
ing both a good deal of variance in the pattern of effects and the need to use moderator
analyses to begin to understand salient factors. Although many of the moderators
explored in the current meta-analysis were non-significant (p>.05), future research is
needed to explore aspects of the interventions that may moderate the relationship
between the intervention and reading outcomes. In particular, researchers should
418 R. GERSTEN ET AL.
continue to investigate variables that could provide us with possible explanations for the
null and negative impacts from the Balu et al. (2015) study.
One finding worth exploring further is whether the interventionist moderates student
impacts on measures of reading achievement. Interventions implemented by certified
teachers did not yield significantly higher effect sizes (p¼.053) than those conducted by
others (primarily paraeducators, university students working for a researcher). Yet,
results of the Balu et al. (2015) survey revealed that teachers provided intervention in
over a third of schools implementing RtI in Grades 1–3. Similarly, the effect sizes for
interventions delivered by paraprofessionals did not differ significantly from those deliv-
ered by certified teachers or researchers (p¼.086) in our analyses. These results conflict
with those reached by Slavin et al. (2011) in an earlier review of literature on reading
interventions, which suggests this is an area that warrants further investigation.
The results from the Balu et al. (2015) study also suggest that schools implementing
RtI often used small groups ranging from 2 to 10 students, as opposed to the interven-
tions in the meta-analysis, which were implemented in smaller groups (2 to 5 students)
or one-on-one. We found an average effect size of 0.46 for interventions that were deliv-
ered one-on-one and 0.31 for those delivered to small groups of students; however, this
moderator variable was not statistically significant (p¼.075). Further analyses revealed
that grouping moderated effects for Grade 1 but not for Grades 2 and 3 (p¼.042).
One-on-one instruction may be more beneficial for beginning readers. Similarly, even in
small groups of 2–5 students, it may be easier to meet students’needs if all the students
in the group are similar in their basic knowledge of rhyming, alphabet, phonemes, and
decoding skills. A recent study by Al Otaiba, Connor, et al. (2014) supports this notion.
They found that it was necessary to make small groups more homogeneous by adjusting
both the text’s readability level and the lesson pacing to meet students’individual needs.
It could also be that the schools in the Balu et al. (2015) study used more scripted
interventions, though we cannot know for sure since the Balu survey did not ask
whether scripted interventions were used. Previous research has indicated that programs
where teachers are given some autonomy tend to produce higher results in reading
comprehension (Fang, Fu, & Lamme, 2004; Tivnan & Hemphill, 2005; Wilson, Martens,
& Arya, 2005). One reason for this might be that scripted interventions leave little room
for even slight adaptations to meet individual student needs when compared to those
with a lesson plan and no exact wording. Our results, however, found the effect sizes
for research interventions that allowed teachers to adapt the intervention to students’
needs did not differ significantly from scripted interventions (p¼.094). Future research
should explore this area.
Before overgeneralizing from these findings, it is important to note that other var-
iables may be confounding the relationship. For example, all but 3 of the 15 scripted
interventions were implemented by paraprofessionals. Typically, paraprofessionals
implement scripted programs because most do not have the training to make appro-
priate instructional decisions when using a traditional lesson plan. Because our lim-
ited number of studies hindered our ability to model all the potential moderators at
once (i.e., controlling for other variables), the moderator findings should be inter-
preted with caution.
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 419
Relation to Previous Relevant Meta-Analysis
It is difficult to draw a direct comparison between the current study and the Slavin
et al. (2011) and Wanzek et al. (2016,2018) meta-analyses. Though the studies included
in this meta-analysis overlap some of the studies included in the other meta-analyses,
this meta-analysis is the first to use rigorous standards of evidence in the inclusion cri-
teria. The other meta-analyses included studies that were not as rigorous as those in the
current study and included kindergarten interventions, which typically focus heavily on
reading-related skills such as phonological awareness, rhyming, and basic decoding.
The impacts in the current meta-analysis (0.39) are smaller than several impacts in
the Wanzek et al. (2016) meta-analysis of studies of shorter interventions: 0.54 on stand-
ardized foundational skill measures, 0.62 for non-standardized foundational skill meas-
ures, and 1.02 for non-standardized multicomponent measures. The differences in the
magnitude of effect sizes may be due to studies that were not as rigorous as those in the
current study or to the inclusion of kindergarten interventions. However, in the Wanzek
et al. (2016) study, domain level impacts were reported for composite domains—founda-
tional reading/reading-related skills (including phonological awareness, rhyming, letter
identification, as well as measures of decoding; 0.54 to 0.62)—and multicomponent
measures (a composite of listening and reading comprehension; 0.36 to 1.02), which
makes it difficult to compare against our domain-level impacts, which ranged from 0.31
Yet, effects in the Wanzek et al. (2018) meta-analysis of studies of longer interven-
tions, and the impacts of one-on-one and small group interventions in Slavin et al.
(2011), and the impacts on standardized multicomponent measures in Wanzek et al.
(2016) are similar to those found in our analysis. These findings suggest consistency in
the impact of reading interventions for struggling readers.
Challenges and Limitations in Conducting the Meta-Analysis
Issues in Using Rigorous Design Standards and Contemporary Meta-Analytic Techniques
A unique feature of this meta-analysis is that it included only those studies that met
what is often called the gold standard, What Works Clearinghouse (WWC 3.0) standards
for RCTs and quasi-experimental design. Ninety-one percent of the studies included in
this meta-analysis were RCTs (k¼30), which is a much higher proportion of RCTs
than in similar previous meta-analyses (e.g., Wanzek et al., 2016 [55% RCTs]; Swanson,
1999 [47.9% RCTs]). Including only those studies that met these rigorous standards
allows for more confidence in the meta-analytic findings. This is an especially important
contemporary issue given the lack of replicability of findings in the social sciences
(Ioannidis, 2005) and the general concern about false positives in both individual
research studies (Benjamini & Hochberg, 1995) and meta-analyses (Greco, Zangrillo,
Biondi-Zoccai, & Landoni, 2013).
The gain in trustworthiness, however, resulted in less statistical power for analyses,
including the crucial moderator analyses, which help in understanding possible underly-
ing themes in the data, as invariably, fewer studies met the rigorous design standards in
this meta-analysis. This is likely to become an issue in future meta-analyses, as the
420 R. GERSTEN ET AL.
tradeoff between the quality and validity of the research findings conflicts with the need
for a large number of studies in conducting important moderator analyses with suffi-
cient power. We suspect it will take some time for the field to produce enough high-
quality studies to result in statistically significant findings from which we could draw
conclusions across studies.
As studies most often contain more than one outcome measure and at times more
than one comparison, meta-analyses must address the dependencies arising from such
multiple outcomes and comparisons within studies. This issue was pertinent for our
meta-analysis, as 90% of the studies included multiple measures and 15% contained
multiple comparisons. Thus, to account for the dependencies in the data, we used the
Robust Variance Estimation (RVE; Hedges et al., 2010), a contemporary statistical tech-
nique. One problem in using the RVE is that it results in low statistical power rates
unless the meta-analysis includes a large number of studies (L!
opez, Van den
Noortgate, Tanner-Smith, Wilson, & Lipsey, 2017). Tipton (2015) notes that at least 40
studies are needed for adequate statistical power to conduct the moderator analyses.
Our meta-analysis included 33 high-quality experimental and quasi-experimental studies
(meeting WWC standards), a number not typically seen for a topic as specific as this in
other areas of educational research. However, it still fell below the minimum number of
40 studies for adequate statistical power to conduct the moderator analyses.
A meta-regression model with all moderators entered simultaneously (e.g., Gersten,
Chard, et al., 2009; Wanzek et al., 2016) would have been preferable to our series of
analyses, which tested each moderator individually. However, the overall number of
studies that met the inclusion and study quality criteria was small, and not every study
included information that permitted coding of all moderators. Thus, analyzing all varia-
bles within one regression model was not feasible, as results would be
uninterpretable due to insufficient degrees of freedom. Therefore, the single-predictor
RVE meta-regression models used in the meta-analysis must be interpreted with caution
due to the potential confounding effects of other moderators that are not accounted for
in these models.
Issues in Coding Studies
Only a few studies (Denton et al., 2014; Vadasy, Sanders, & Tudor, 2007) provided a
rich description on the nature of instruction in the intervention. Most articles did not
provide sufficient detail on how reading is taught; instead, they merely listed the areas
of instruction that were covered, provided a brief cursory explanation, or addressed
them in a figure or table with little sense of the amount of time devoted to activities or
what they actually entailed. Because written descriptions were often not detailed enough,
coding or classifying these areas was at times guesswork.
Given the difficulty we had with coding the instructional focus categories, we recom-
mend future intervention research articles to include more detailed descriptions of each
component of the intervention. However, this may be easier said than done as journal
article submission usually entails strict space allocation. Understanding this, we would
encourage authors to write detailed descriptions with sample lessons, place them in a
website noted in the article, and provide access to the information. That would
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 421
allow those involved in research syntheses and those interested in replications to
access this material and ultimately have a better understanding of the nature of
Coding the at-risk category was a challenging task, as out of the 33 studies, only 16
could be used for the analysis examining the moderating role of the at-risk status variable.
This is because, across the 33 studies, there was little commonality in how at-risk status
was operationally defined (i.e., described as below grade-level performance; based on local
norms, national norms, researcher-developed measures, or validated screening measures),
making comparisons across studies difficult and underpowered. We also found the use of
norms from standardized tests to be problematic because some of the norms were much
older than others, and the field of early literacy instruction has undergone massive changes
in the past 15years. In addition, there are likely to be shifts in national norms on some
measures, especially those involving phonological awareness, phonics, and possibly oral
reading fluency. It would be helpful if the field could adopt more consistent means of
determining the suitable samples for a Tier 2 reading intervention.
Fidelity of implementation was another area that was difficult to code due to the lack
of consistency across studies in how fidelity was explained and measured. For instance,
if different measurement systems are used, 80% fidelity in one study is not comparable
to 80% fidelity in another study. As a result, though this was very much an area of
interest for us, we could not code for fidelity as a moderator.
Implications for Future Research
Most intervention studies examine impacts immediately at the end of an intervention.
An important next step in reading intervention research, one only occasionally
attempted to date (e.g., Al Otaiba, Kim, Wanzek, Petscher, & Wagner, 2014; Blachman
et al., 2014; Vaughn et al., 2008), is to see whether the impacts on reading performance
are maintained, both with and without further intervention, in follow-up studies.
Additional intervention research is also needed in the area of vocabulary. Few studies
in our meta-analysis addressed reading vocabulary in a comprehensive manner during
the intervention, and only two studies (Gunn, Smolkowski, Biglan, Black, & Blair, 2005;
O’Connor, Swanson, & Geraghty, 2010) included vocabulary as an outcome measure.
We were therefore unable to draw conclusions on this crucial aspect of reading profi-
ciency. Future intervention research, especially in Grades 2 and 3, should include a sys-
tematic vocabulary instruction component in the interventions and assess its
effectiveness using reading vocabulary outcomes.
We would also encourage more intervention research in the areas of reading and lan-
guage comprehension, since these were areas of weaker impacts. Newer intervention
research (e.g., Foorman, Herrera, & Dombek, 2018) increasingly includes both reading
and listening comprehension, and these may lead to stronger impacts in the reading
The authors wish to acknowledge the sage advice provided by Nancy Lewis and Terri Pigott, and
recognize Samantha Spallone, Pam Foremski, and Christopher Tran for their assistance.
422 R. GERSTEN ET AL.
This research was supported in part by Contract Number [ED-IES-12-C-0011]. The views do not
represent those of the U.S. Department of Education.
Al Otaiba, S., Connor, C. M., Folsom, J. S., Wanzek, J., Greulich, L., Schatschneider, C., &
Wagner, R. K. (2014). To wait in Tier 1 or intervene immediately: A randomized experiment
examining first-grade response to intervention in reading. Exceptional Children,81(1), 11–27.
Al Otaiba, S., & Fuchs, D. (2002). Characteristics of children who are unresponsive to early liter-
acy intervention: A review of the literature. Remedial and Special Education,23(5), 300–316.
Al Otaiba, S., Kim, Y. S., Wanzek, J., Petscher, Y., & Wagner, R. K. (2014). Long-term effects of
first-grade multitier intervention. Journal of Research on Educational Effectiveness,7(3),
Allor, J., & McCathren, R. (2004). The efficacy of an early literacy tutoring program implemented
by college students. Learning Disabilities Research and Practice,19(2), 116–129. doi:10.1111/j.
Balu, R., Zhu, P., Doolittle, F., Schiller, E., Jenkins, J., & Gersten, R. (2015). Evaluation of response
to intervention practices for elementary school reading (NCEE 2016-4000). Washington, DC:
National Center for Education Evaluation and Regional Assistance, Institute of Education
Sciences, U.S. Department of Education. Retrieved from http://ies.ed.gov/ncee/pubs/20164000/
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and power-
ful approach to multiple testing. Journal of the Royal Statistical Society: Series B
(Methodological),57(1), 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x
Berninger, V. W., Abbott, R. D., Vermeulen, K., & Fulton, C. M. (2006). Paths to reading com-
prehension in at-risk second-grade readers. Journal of Learning Disabilities,39(4), 334–351.
Blachman, B. A., Schatschneider, C., Fletcher, J. M., Francis, D. J., Clonan, S. M., Shaywitz, B. A.,
& Shaywitz, S. E. (2004). Effects of intensive reading remediation for second and third graders
and a 1-year follow-up. Journal of Educational Psychology,96 (3), 444–461. doi:10.1037/0022-
Blachman, B. A., Schatschneider, C., Fletcher, J. M., Murray, M. S., Munger, K. A., & Vaughn,
M. G. (2014). Intensive reading remediation in grade 2 or 3: Are there effects a decade later?
Journal of Educational Psychology,106 (1), 46–57. doi:10.1037/a0033663
Case, L. P., Speece, D. L., Silverman, R., Ritchey, K. D., Schatschneider, C., Cooper, D. H., …
Jacobs, D. (2010). Validation of a supplemental reading intervention for first-grade children.
Journal of Learning Disabilities,43(5), 402–417. doi:10.1177/0022219409355475
Case, L., Speece, D., Silverman, R., Schatschneider, C., Montanaro, E., & Ritchey, K. (2014).
Immediate and long-term effects of tier 2 reading instruction for first-grade students with a
high probability of reading failure. Journal of Research on Educational Effectiveness,7(1),
Center for Research and Reform in Education & Johns Hopkins University. (2019). Evidence for
ESSA: Standards and procedures. Retrieved from https://content.evidenceforessa.org/sites/
Denton, C. A., Fletcher, J. M., Taylor, W. P., Barth, A. E., & Vaughn, S. (2014). An experimental
evaluation of guided reading and explicit interventions for primary-grade students at-risk for
reading difficulties. Journal of Research on Educational Effectiveness,7(3), 268–293. doi:10.1080/
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 423
Denton, C. A., Nimon, K., Mathes, P. G., Swanson, E. A., Kethley, C., Kurz, T. B., & Shih, M.
(2010). Effectiveness of a supplemental early reading intervention scaled up in multiple schools.
Exceptional Children,76 (4), 394–416. doi:10.1177/001440291007600402
Denton, C. A., Tolar, T. D., Fletcher, J. M., Barth, A. E., Vaughn, S., & Francis, D. J. (2013).
Effects of tier 3 intervention for students with persistent reading difficulties and characteristics
of inadequate responders. Journal of Educational Psychology,105(3), 633–648. doi:10.1037/
Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot-based method of testing and
adjusting for publication bias in meta-analysis. Biometrics,56 (2), 455–463. doi:10.1111/j.0006-
Elleman, A. M., Lindo, E. J., Morphy, P., & Compton, D. L. (2009). The impact of vocabulary
instruction on passage-level comprehension of school-age children: A meta-analysis. Journal of
Research on Educational Effectiveness,2(1), 1–44. doi:10.1080/19345740802539200
Every Student Succeeds Act of 2015, Pub. L. No. 114–95, § 8101(21)(A), 129 Stat.1939 (2015).
Fang, Z., Fu, D., & Lamme, L. L. (2004). From scripted instruction to teacher empowerment:
Supporting literacy teachers to make pedagogical transitions. Literacy (Formerly Reading),
38(1), 58–64. doi:10.1111/j.0034-0472.2004.03801010.x
Fien, H., Smith, J. L. M., Smolkowski, K., Baker, S. K., Nelson, N. J., & Chaparro, E. (2015). An
examination of the efficacy of a multitiered intervention on early reading outcomes for first
grade students at risk for reading difficulties. Journal of Learning Disabilities,48(6), 602–621.
Foorman, B., Beyler, N., Borradaile, K., Coyne, M., Denton, C. A., Dimino, J., …Wissel, S.
(2016). Foundational skills to support reading for understanding in kindergarten through 3rd
grade (NCEE 2016-4008). Washington, DC: National Center for Education Evaluation and
Regional Assistance (NCEE), Institute of Education Sciences, U.S. Department of Education.
Retrieved from https://ies.ed.gov/ncee/wwc/practiceguide/21
Foorman, B. R., Herrera, S., & Dombek, J. (2018). The relative impact of aligning Tier 2 interven-
tion materials with classroom core reading materials in grades K–2. The Elementary School
Journal,118(3), 477–504. doi:10.1086/696021
Francis, D. J., Kulesz, P. A., & Benoit, J. S. (2018). Extending the simple view of reading to
account for variation within readers and across texts: The complete view of reading (CVR i).
Remedial and Special Education,39(5), 274–288. doi:10.1177/0741932518772904
Fuchs, D., Compton, D. L., Fuchs, L. S., Bryant, J., & Davis, G. N. (2008). Making “secondary
intervention”work in a three-tier responsiveness-to-intervention model: Findings from the
first-grade longitudinal reading study of the National Research Center on Learning Disabilities.
Reading and Writing,21(4), 413–436. doi:10.1007/s11145-007-9083-9
Fuchs, D., & Fuchs, L. S. (2017). Critique of the national evaluation of response to intervention:
A case for simpler frameworks. Exceptional Children,83(3), 255–268. doi:10.1177/
Fuchs, L. S., Fuchs, D., Powell, S. R., Seethaler, P. M., Cirino, P. T., & Fletcher, J. M. (2008).
Intensive intervention for students with math disabilities: Seven principles of effective practice.
Learning Disability Quarterly,31(2), 79–92. doi:10.2307/20528819
Gamse, B. C., Jacob, R. T., Horst, M., Boulay, B., & Unlu, F. (2008). Reading first impact study
final report (NCEE 2009-4038). Washington, DC: National Center for Education Evaluation
and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
Gersten, R., Chard, D., Jayanthi, M., Baker, S., Morphy, P., & Flojo, J. (2009). Mathematics
instruction for students with learning disabilities: A meta-analysis of instructional components.
Review of Educational Research,79(3), 1202–1242. doi:10.3102/0034654309334431
Gersten, R., Compton, D., Connor, C. M., Dimino, J., Santoro, L., Linan-Thompson, S., & Tilly,
W. D. (2009). Assisting students struggling with reading: Response to Intervention and multi-tier
intervention for reading in the primary grades. A practice guide (NCEE 2009-4045).
Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute
of Education Sciences, U.S. Department of Education. Retrieved from http://ies.ed.gov/ncee/
424 R. GERSTEN ET AL.
Gersten, R., Jayanthi, M., & Dimino, J. (2017). Too much, too soon? A commentary on what the
national RtI evaluation left unanswered and what reading intervention research tells us.
Exceptional Children,83(3), 244–254. doi:10.1177/0014402917692847
Greco, T., Zangrillo, A., Biondi-Zoccai, G., & Landoni, G. (2013). Meta-analysis: Pitfalls and
hints. Heart, Lung and Vessels,5(4), 219–225.
Gunn, B., Smolkowski, K., Biglan, A., Black, C., & Blair, J. (2005). Fostering the development of
reading skill through supplemental instruction results for Hispanic and non-Hispanic students.
The Journal of Special Education,39(2), 66–85. doi:10.1177/00224669050390020301
Hedberg, E. C. (2011). ROBUMETA: Stata module to perform robust variance estimation in meta-
regression with dependent effect size estimates. Boston, MA: Boston College.
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estima-
tors. Journal of Educational Statistics,6(2), 107–128. doi:10.3102/10769986006002107
Hedges, L. V., Tipton, E., & Johnson, M. C. (2010). Robust variance estimation in meta-regres-
sion with dependent effect size estimates. Research Synthesis Methods,1(1), 39–65. doi:10.1002/
Imbens, G. W., & Lemieux, T. (2008). Regression discontinuity designs: A guide to practice.
Journal of Econometrics,142(2), 615–635. doi:10.1016/j.jeconom.2007.05.001
Individuals with Disabilities Education Act, Pub. L. No. 108-446, 20 U.S.C. § 1400, 118 Stat. 2649 (2004).
Ioannidis, J. (2005). Why most published research findings are false. PLoS Medicine,2(8), e124.
Jacob, R., Armstrong, C., Bowden, A. B., & Pan, Y. (2016). Leveraging volunteers: An experimen-
tal evaluation of a tutoring program for struggling readers. Journal of Research on Educational
Effectiveness,9(Supp 1), 67–92. doi:10.1080/19345747.2016.1138560
Jenkins, J. R., Peyton, J. A., Sanders, E. A., & Vadasy, P. F. (2004). Effects of reading decodable
texts in supplemental first-grade tutoring. Scientific Studies of Reading,8(1), 53–85. doi:10.
Lane, H. B., Pullen, P. C., Hudson, R. F., & Konold, T. R. (2009). Identifying essential instruc-
tional components of literacy tutoring for struggling beginning readers. Literacy Research and
Instruction,48(4), 277–297. doi:10.1080/19388070902875173
opez, J. A., Van den Noortgate, W., Tanner-Smith, E. E., Wilson, S. J., & Lipsey, M. W.
(2017). Assessing meta-regression methods for examining moderator relationships with
dependent effect sizes: A Monte Carlo simulation. Research Synthesis Methods,8(4), 435–450.
May, H., Sirinides, P., Gray, A., & Goldsworthy, H. (2016). Reading recovery: An evaluation of the
four-year i3 scale-up. Philadelphia, PA: Consortium for Policy Research in Education,
University of Pennsylvania.
National Institute of Child Health and Human Development [NICHD]. (2000). Report of the
National Reading Panel. Teaching children to read: Reports of the subgroups (NIH Publication
No. 00-4754). Washington, DC: U.S. Department of Health and Human Services. Retrieved
No Child Left Behind Act of 2001 [NCLB], Pub. L. No. 107-110, § 1201, 115 Stat. 1425 (2002).
O’Connor, R. E., Swanson, H. L., & Geraghty, C. (2010). Improvement in reading rate under
independent and difficult text levels: Influences on word and comprehension skills. Journal of
Educational Psychology,102(1), 1–19. doi:10.1037/a0017488
Pullen, P. C., Lane, H. B., & Monaghan, M. C. (2004). Effects of a volunteer tutoring model on
the early literacy development of struggling first grade students. Reading Research and
Instruction,43(4), 21–40. doi:10.1080/19388070409558415
Scanlon, D. M., Vellutino, F. R., Small, S. G., Fanuele, D. P., & Sweeney, J. M. (2005). Severe
reading difficulties–Can they be prevented? A comparison of prevention and intervention
approaches. Exceptionality,13(4), 209–227. doi:10.1207/s15327035ex1304_3
Schwartz, R. M. (2005). Literacy learning of at-risk first-grade students in the reading recovery
early intervention. Journal of Educational Psychology,97(2), 257–267. doi:10.1037/0022-0663.97.
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 425
Slavin, R. E., Lake, C., Davis, S., & Madden, N. A. (2011). Effective programs for struggling read-
ers: A best-evidence synthesis. Educational Research Review,6(1), 1–26. doi:10.1016/j.edurev.
Smith, J. L. M., Nelson, N. J., Fien, H., Smolkowski, K., Kosty, D., & Baker, S. K. (2016).
Examining the efficacy of a multitiered intervention for at-risk readers in grade 1. The
Elementary School Journal,116 (4), 549–573. doi:10.1086/686249
StataCorp. (2015). Stata statistical software (Release 14). College Station, TX: StataCorp LP.
Stuebing, K. K., Barth, A. E., Trahan, L. H., Reddy, R. R., Miciak, J., & Fletcher, J. M. (2015). Are
child cognitive characteristics strong predictors of responses to intervention? A meta-analysis.
Review of Educational Research,85(3), 395–429. doi:10.3102/0034654314555996
Swanson, H. L. (1999). Reading research for students with LD: A meta-analysis of intervention
outcomes. Journal of Learning Disabilities,32(6), 504–532. doi:10.1177/002221949903200605
Tanner-Smith, E. E., & Tipton, E. (2014). Robust variance estimation with dependent effect sizes:
Practical considerations including a software tutorial in Stata and SPSS. Research Synthesis
Methods,5(1), 13–30. doi:10.1002/jrsm.1091
Tipton, E. (2015). Small sample adjustments for robust variance estimation with meta-regression.
Psychological Methods,20(3), 375–393. doi:10.1037/met0000011
Tivnan, T., & Hemphill, L. (2005). Comparing four literacy reform models in high-poverty
schools: Patterns of first-grade achievement. The Elementary School Journal,105(5), 419–441.
Tran, L., Sanchez, T., Arellano, B., & Swanson, H. L. (2011). A meta-analysis of the RTI literature
for children at risk for reading disabilities. Journal of Learning Disabilities,44(3), 283–295. doi:
U.S. Department of Education [U.S. ED], Institute of Education Sciences [IES], & What Works
Clearinghouse [WWC]. (2013). What Works Clearinghouse: Procedures and standards handbook
(Version 3.0). Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_proce-
Vadasy, P. F., & Sanders, E. A. (2011). Efficacy of supplemental phonics-based instruction for
low-skilled first graders: How language minority status and pretest characteristics moderate
treatment response. Scientific Studies of Reading,15(6), 471–497. doi:10.1080/10888438.2010.
Vadasy, P. F., Sanders, E. A., & Peyton, J. A. (2006). Paraeducator-supplemented instruction in
structural analysis with text reading practice for second and third graders at risk for reading
problems. Remedial and Special Education,27(6), 365–378. doi:10.1177/07419325060270060601
Vadasy, P. F., Sanders, E. A., & Tudor, S. (2007). Effectiveness of paraeducator-supplemented
individual instruction: Beyond basic decoding skills. Journal of Learning Disabilities,40(6),
Vaughn, S., Cirino, P. T., Tolar, T., Fletcher, J. M., Cardenas-Hagan, E., Carlson, C. D., &
Francis, D. J. (2008). Long-term follow-up of Spanish and English interventions for first-grade
English language learners at risk for reading problems. Journal of Research on Educational
Effectiveness,1(3), 179–214. doi:10.1080/19345740802114749
Vellutino, F. R., & Scanlon, D. M. (2002). The Interactive Strategies approach to reading interven-
tion. Contemporary Educational Psychology,27(4), 573–635. doi:10.1016/S0361-476X(02)00002-4
Wang, C., & Algozzine, B. (2008). Effects of targeted intervention on early literacy skills of at-risk
students. Journal of Research in Childhood Education,22(4), 425–439. doi:10.1080/
Wanzek, J., Stevens, E. A., Williams, K. J., Scammacca, N., Vaughn, S., & Sargent, K. (2018).
Current evidence on the effects of intensive early reading interventions. Journal of Learning
Disabilities,51(6), 612–624. doi:10.1177/0022219418775110
Wanzek, J., & Vaughn, S. (2007). Research-based implications from extensive early reading inter-
ventions. School Psychology Review,36 (4), 541–561.
Wanzek, J., & Vaughn, S. (2008). Response to varying amounts of time in reading intervention
for students with low response to intervention. Journal of Learning Disabilities,41(2), 126–142.
426 R. GERSTEN ET AL.
Wanzek, J., Vaughn, S., Scammacca, N., Gatlin, B., Walker, M. A., & Capin, P. (2016). Meta-anal-
yses of the effects of tier 2 type reading interventions in grades K-3. Educational Psychology
Review,28(3), 551–576. doi:10.1007/s10648-015-9321-7
Wilson, P., Martens, P., & Arya, P. (2005). Accountability for reading and readers: What the
numbers don’t tell. The Reading Teacher,58(7), 622–631. doi:10.1598/RT.58.7.3
READING INTERVENTIONS FOR STUDENTS IN THE PRIMARY GRADES 427