ArticlePDF Available

Curriculum-Based Measurement of Oral Reading: Evaluation of Growth Estimates Derived With Pre-Post Assessment Methods

Authors:

Abstract and Figures

Curriculum-based measurement of oral reading (CBM-R) is used to index the level and rate of student growth across the academic year. The method is frequently used to set student goals and monitor student progress. This study examined the diagnostic accuracy and quality of growth estimates derived from pre-post measurement using CBM-R data. A linear mixed effects regression model was used to simulate progress-monitoring data for multiple levels of progress-monitoring duration (6, 8, 10, ⋯, 20 weeks) and data set quality, which was operationalized as residual/error in the model (σε= 5, 10, 15, and 20). Results indicate that the duration of instruction, quality of data, and method used to estimate growth influenced the reliability and precision of estimated growth rates, in addition to the diagnostic accuracy. Pre-post methods to derive CBM-R growth estimates are likely to require 14 or more weeks of instruction between pre-post occasions. Implications and future directions are discussed.
Content may be subject to copyright.
A preview of the PDF is not available
... Regardless of the method, existing research demonstrates that multiple factors influence the accuracy of RtI decisions. The rate at which students improve, the duration of progress monitoring, and the quantity and quality of the data points all appear to be relevant for decisions about students' response to intervention (Christ, Monaghen, Zopluoglu, & Van Norman, 2013;Van Norman & Christ, 2016). Further, the manner in which goals are set for students is also likely to impact decision accuracy. ...
... A subset of students in the current study met exit criteria for the program prior to the end of the 2014-2015 school year (approximately 49%). Although empirical guidance for decision-making with progress monitoring data exists (e.g., Christ et al., 2013;Van Norman & Christ, 2016), that research is primarily concerned with whether a student is responding to an intervention, not necessarily when the intervention should be discontinued. Thus, the program adopted exit criteria that were relatively consistent with broad-based recommendations (e.g., using a goal line based on grade-level benchmarks) and adapted them to increase confidence in long-term outcomes (e.g., requiring at least two scores above the next benchmark). ...
... Previous research on decision-making frameworks provides an empirically-based reference for practitioners tasked with deciding what constitutes a response to intervention (Burns et al., 2010;Christ et al., 2013;Maki et al., 2017). Those guidelines help educators decide if an intervention is working and help to increase the confidence behind decisions to increase resources or refer for a full psychoeducational evaluation (VanDerHeyden et al., 2007). ...
Article
Full-text available
The current study examined reading skills at two distal time-points for 6828 students who received support from a tier II reading intervention program in the 2015 and 2016 school years. The first follow-up assessment occurred at the end of the year in which intervention was provided and the second assessment occurred at the beginning of the next year. Multilevel models were fit to the data to predict the log odds that a student would meet spring and fall reading benchmarks depending on a variety of student- and school-level predictors. Of most interest was the probability of future success as a function of whether students met intervention exit criteria, defined as consistent grade-level performance on a progress monitoring measure. Meeting intervention exit criteria was a statistically and practically significant predictor of scoring above the spring and fall benchmark the following school year. Yet despite improved outcomes relative to students not exited from the intervention, many students who met exit criteria due to grade-level performance failed to meet spring and fall benchmarks. The proportion of students meeting state-defined proficiency criteria, duration of intervention, and proportion of students receiving free or reduced lunch at the school-level did not influence the association between meeting exit criteria and scoring above benchmark at either screening period. Results suggest that future research is needed to evaluate and guide “downward movement” in an RtI model (i.e., ensuring gains made during tier II intervention are maintained after that support is removed).
... This is a provisional file, not the final typeset article Hence, issues related to the reliability of progress monitoring slopes such as schedule and duration 36 (i.e., number of occasions per week and overall number of weeks of data collection), or dataset 37 quality (as operationalized by the amount of residual variance in growth models) have been 38 extensively examined in simulation studies (Christ et al., 2012;Christ, Monaghen, et al. 2013; Van 39 Norman, Christ, and Zopluoglu 2013). One major dependent variable in such simulation studies is 40 the true reliability of slope estimates (i.e., the squared correlation between estimated slopes and their 41 true values). ...
... These studies have shown that acceptable levels of slope reliability (i.e., .70) can only be 42 achieved for data collection durations of at least six or eight weeks (depending further on the 43 schedules; e.g., Christ, Monaghen, et al. 2013). A conclusion that was later backed-up with empirical 44 data (Thornblad & Christ, 2014). ...
Article
Full-text available
Reliable learning progress information is crucial for teachers’ interpretation and data-based decision making in everyday classrooms. Slope estimates obtained from simple regression modeling or more complex latent growth models are typically used in this context as indicators of learning progress. Research on progress monitoring has used mainly two ways to estimate reliability of learning progress, namely a) split-half reliability and b) multilevel reliability. In this work we introduce empirical reliability as another attractive alternative to quantify measurement precision of slope estimates (and intercepts) in learning progress monitoring research. Specifically, we extended previous work on slope reliability in two ways: a) We evaluated in a simulation study how well multilevel reliability and empirical reliability work as estimates of slope reliability, and b) we wanted to better understand reliability of slopes as a latent variable (by means of empirical reliability) vs. slopes as an observed variable (by means of multilevel reliability). Our simulation study demonstrates that reliability estimation works well over a variety of different simulation conditions, while at the same time conditions were identified in which reliability estimation was biased (i.e., with very poor data quality, 8 measurement points, and when empirical reliability was estimated). Furthermore, we employ multilevel reliability and empirical reliability to estimate reliability of intercepts (i.e., initial level) and slopes for the quop-L2 test. Multilevel and empirical reliability estimates were comparable in size with only slight advantages for latent variable scores. Future avenues for research and practice are discussed.
... Much more can be said about reliability and validity, but we refer readers to Section 2 of this book for information on this topic. In terms of the number of forms, DBI requires administration weekly-or even more often (see Christ, Monaghen, Zopluoglu, & Norman, 2013). So, it is critical that teachers have enough different versions at the same level, so students do not take the same exact test repeatedly, as increased exposure to the same items could potentially affect their scores. ...
... Researchers have cautioned that there is considerable error in the point estimation of CBM, with estimates commonly ranging from eight to 11 words correct per minute (WCPM) in the case of oral reading fluency (ORF; Christ, Monaghen, Zopluoglu, & Van Norman, 2012;Christ & Silberglitt, 2007;Poncy, Skinner, & Axtell, 2005). For example, if a second-grade student scored 55 wcpm on a skill condition ORF probe, the 20% threshold would be 66 wcpm for decision-making with BEA. ...
Article
Brief experimental analysis (BEA) is a well-researched approach to conducting problem-analysis, where potential interventions are pilot-tested using a single-subject alternating treatment design. However, its brevity may lead to a high frequency of decision-making errors, particularly in situations where one tested condition is rarely optimal for students (i.e., the base rate). The current study explored the accuracy of a specific variant of BEA, skill vs. performance deficit analysis (SPA), across different variations of the basic BEA design, score difference thresholds, and reading and math curriculum-based measurements (CBMs). Findings indicate that the ABAB design provides a reasonable control of such error rates when using reading CBM, whereas subtraction CBM required the use of an ABABAB design. Such error rates could not be controlled, regardless of design, when using multiplication CBM. Implications for best practice in the use of BEA is discussed.
... Findings revealed that reliable and valid estimates of growth for making low-stakes decisions were possible when monitoring once monthly, but only after 2-3 months using a very good (SEE = 5) passage set and more than 4 months with a good (SEE = 10) passage set. Similar results were observed in a related simulation study when data were collected using a pre-post schedule (Christ, Monaghen, Zopluoglu, & Van Norman, 2012). Despite the encouraging findings, the simulated nature of the data used in the studies limits their external validity. ...
Article
The present study examined the utility of two progress monitoring assessment schedules (bimonthly and monthly) as alternatives to monitoring once weekly with curriculum-based measurement in reading (CBM-R). General education students (N = 93) in Grades 2-4 who were at risk for reading difficulties but not yet receiving special education services had their progress monitored via three assessment schedules across 1 academic year. Four mixed-factorial analyses of variance tested the effect of progress monitoring schedule (weekly, bimonthly, monthly), grade (2, 3, and 4), and the interaction effect between schedule and grade on four progress monitoring outcomes: intercept, slope, standard error of the estimate, and standard error of the slope. Results indicated that (a) progress monitoring schedule significantly predicted each outcome, (b) grade predicted each progress monitoring outcome except the standard error of the slope, and (c) the effect of schedule on each outcome did not depend on students' grade levels. Overall, findings from this study reveal that collecting CBM-R data less frequently than weekly may be a viable option for educators monitoring the progress of students in Grades 2-4 who are at risk for reading difficulties.
... Dies ist im Rahmen der Lernverlaufsdiagnostik insofern besonders wichtig, als die Tests nicht nur sensitiv für Leistungsveränderungen, sondern auch für spezifische Bedingungen der jeweiligen Testsituation (z.B. Ort der Testung, Testleiter) sind, die gerade im Fall der Lernverlaufsdiagnostik nur bedingt kontrollierbar sind (Christ, Monaghen, Zopluoglu & Van Norman, 2012). ...
Article
Full-text available
In diesem Kurzbeitrag schildern wir Herausforderungen bei der Normierung von Verfahren zur Lernverlaufsdiagnostik, die sich bei der Statusdiagnostik in dieser Form nicht stellen. Diese betreffen insbesondere die Frage, ob Normen für regulären Unterricht oder intensive Förderung benötigt werden, aber auch die Unterschiedlichkeit von Lernzuwächsen in Abhängigkeit von der erfassten Kompetenz, des verwendeten Messverfahrens, des Untersuchungszeitraums und bestimmter Schülermerkmale. Darüber hinaus weisen Lernverläufe im Unterschied zu einmaligen Testungen die statistische Besonderheit auf, dass die Größe der Vertrauensintervalle für den Lernzuwachs von der Anzahl der verfügbaren Messungen abhängt. Basierend auf einer Analyse dieser Herausforderungen schlagen wir Designmerkmale und Analyseschritte bei der Normierung von Verfahren zur Lernverlaufsdiagnostik vor.
Article
Curriculum-based measurement (CBM) for oral passage reading (OPR) is among the most commonly used tools for making screening decisions regarding academic proficiency status for students in frst through sixth grades. Multiple publishers make available OPR tools, and while they are designed to measure the same broad construct of reading, research suggests that student performance varies within grades and across publishers. Despite the existence of multiple publishers of CBM tools for OPR, many of which include publisher-specific recommendations comparing student performance to a proficiency standard, the use of normative-based cut scores to interpret student performance remains prevalent. In the current study, three commercially available CBM tools for OPR were administered to 1,482 students in frst through sixth grade. Results suggest differences between normativeand criterion-based approaches to determining cut scores for screening decisions. Implications regarding resource allocation for students in need of additional intervention are discussed.
Article
Progress monitoring has been adopted as an integral part of multi-tiered support systems. Oral reading fluency (ORF) is the most established assessment for progress-monitoring purposes. To generate valid trend lines or slopes, ORF passages must be of equivalent difficulty. Recently, however, evidence indicates that ORF passages are not equivalent, potentially hindering our ability to generate valid student trend lines for decision making. This study examines passage and order effects on the estimation of ORF scores using a set of second-grade passages. A single group with counterbalancing design was employed to randomly assign 156 second-grade students to three different orders of passages. Scores from the passages were examined using growth curve modeling and empirical Bayes estimates. Results indicate that passage effects were substantial, but order effects were small but significant. The impact of passage and order effects on research design, equating methods, and measure development is considered.
Article
Full-text available
This study investigated the complexity of leveled passages used in four classroom reading assessments. A total of 167 passages leveled for Grades 1–6 from these assessments were analyzed using four analytical tools of text complexity. More traditional, two-factor measures of text complexity found a general trend of fairly consistent across-grade progression of average complexity among the four assessments. However, considerable cross-assessment variability was observed in terms of the size of increase in complexity from grade to grade, the overall range of complexity, and the within-grade text complexity. These cross-assessment differences were less pronounced with newer, multi-factor analytical tools. The four assessments also differed in the extent to which their passages met the text complexity guidelines of the Common Core State Standards. The authors discuss implications of the differences found among and within the classroom assessment systems, on one hand, and among the measures of text complexity, on the other.
Article
Curriculum-Based Measurement of Oral Reading (CBM-R) is often used to monitor student progress and guide educational decisions. Ordinary least squares regression (OLSR) is the most widely used method to estimate the slope, or rate of improvement (ROI), even though published research demonstrates OLSR’s lack of validity and reliability, and imprecision of ROI estimates, especially after brief duration of monitoring (6-10 weeks). This study illustrates and examines the use of Bayesian methods to estimate ROI. Conditions included four progress monitoring durations (6, 8, 10, and 30 weeks), two schedules of data collection (weekly, biweekly), and two ROI growth distributions that broadly corresponded with ROIs for general and special education populations. A Bayesian approach with alternate prior distributions for the ROIs is presented and explored. Results demonstrate that Bayesian estimates of ROI were more precise than OLSR with comparable reliabilities, and Bayesian estimates were consistently within the plausible range of ROIs in contrast to OLSR, which often provided unrealistic estimates. Results also showcase the influence the priors had estimated ROIs and the potential dangers of prior distribution misspecification.
Article
Full-text available
Reading traditionally is characterized as having two major components, decoding and comprehension. Published reading tests are created using these two components. Reading fluency, a combination of reading speed and accuracy, typically is not measured. Attention to reading fluency has increased through the emerging literature on Curriculum-Based Measurement (CBM), which employs standardized oral reading tests derived from basal readers to make decisions about students' general reading skills. Despite a series of published validation studies, questions about what CBM oral reading fluency measures persist. This study examined the relation of CBM oral reading fluency to the reading process from a theoretical perspective. Reading models were tested using confirmatory factor analysis procedures with 114 third-and 124 fifth-grade students. Subjects were tested on tasks requiring decoding of phonetically regular words and regular nonsense words, literal comprehension, inferential comprehension, cloze items, written retell, and CBM oral reading fluency. For third graders, a unitary model of reading was validated with all measures contributing significantly. For fifth graders, a two-factor model was validated paralleling current conceptions of reading measurement. Regardless of the factor model employed, CBM oral reading fluency provided a good index of reading proficiency, including comprehension.
Article
There are relatively few studies that evaluate the quality of progress monitoring estimates derived from curriculum-based measurement of reading. Those studies that are published provide initial evidence for relatively large magnitudes of standard error relative to the expected magnitude of weekly growth. A major contributor to the observed magnitudes of standard error is the inconsistency of passage difficulty within progress monitoring passage sets. The purpose of the current study was to evaluate and estimate the magnitudes of standard error across an experimental passage set referred to as the Formative Assessment Instrumentation and Procedures for Reading (FAIP-R) and two commercially available passage sets (AIMSweb and Dynamic Indicators of Basic Early Literacy Skills [DIBELS]). Each passage set was administered twice weekly to 68 students. Results indicated significant differences in intercept, weekly growth, and standard error. Estimates of standard error were smallest in magnitude for the FAIP-R passage set followed by the AIMSweb and then DIBELS passage sets. Implications for choosing a progress monitoring passage set and estimating individual student growth are discussed.
Article
The purpose of this article is to illustrate how one well-developed, technically strong measurement system, curriculum-based measurement (CBM), can be used to establish academic growth standards for students with learning disabilities in the area of reading. An introduction to CBM and to the basic concepts underlying the use of CBM in establishing growth standards is provided. Using an existing database accumulated over various localities under typical instructional conditions, the use of CBM to provide growth standards is illustrated. Next, normative growth rates under typical instructional conditions are contrasted with CBM growth rates derived from studies of effective practices. Finally, based on these two data sets, issues and conclusions about appropriate methods for establishing academic growth rates using CBM are discussed.
Article
This paperback edition is a reprint of the 2000 edition. This book provides a comprehensive treatment of linear mixed models for continuous longitudinal data. Next to model formulation, this edition puts major emphasis on exploratory data analysis for all aspects of the model, such as the marginal model, subject-specific profiles, and residual covariance structure. Further, model diagnostics and missing data receive extensive treatment. Sensitivity analysis for incomplete data is given a prominent place. Several variations to the conventional linear mixed model are discussed (a heterogeity model, conditional linear mixed models). This book will be of interest to applied statisticians and biomedical researchers in industry, public health organizations, contract research organizations, and academia. The book is explanatory rather than mathematically rigorous. Most analyses were done with the MIXED procedure of the SAS software package, and many of its features are clearly elucidated. However, some other commercially available packages are discussed as well. Great care has been taken in presenting the data analyses in a software-independent fashion. Geert Verbeke is Professor in Biostatistics at the Biostatistical Centre of the Katholieke Universiteit Leuven in Belgium. He is Past President of the Belgian Region of the International Biometric Society, a Board Member of the American Statistical Association, and past Joint Editor of the Journal of the Royal Statistical Society, Series A (2005--2008). He is the director of the Leuven Center for Biostatistics and statistical Bioinformatics (L-BioStat), and vice-director of the Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat), a joint initiative of the Hasselt and Leuven universities in Belgium. Geert Molenberghs is Professor of Biostatistics at Universiteit Hasselt and Katholieke Universiteit Leuven in Belgium. He was Joint Editor of Applied Statistics (2001-2004) and Co-Editor of Biometrics (2007-2009). He was President of the International Biometric Society (2004-2005), and has received the Guy Medal in Bronze from the Royal Statistical Society and the Myrto Lefkopoulou award from the Harvard School of Public Health. He is founding director of the Center for Statistics and also the director of the Interuniversity Institute for Biostatistics and statistical Bioinformatics. Both authors have received the American Statistical Association's Excellence in Continuing Education Award in 2002, 2004, 2005, and 2008. Both are elected Fellows of the American Statistical Association and elected members of the International Statistical Institute.
Article
Curriculum-based measurement of oral reading fluency (CBM-R) is an established procedure used to index the level and trend of student growth. A substantial literature base exists regarding best practices in the administration and interpretation of CBM-R; however, research has yet to adequately address the potential influence of measurement error. This study extends results of Hintze and Christ (2004) by incorporating research-based estimates of the standard error of the estimate (SEE) to generate likely magnitudes for the standard error of the slope (SEb) across a variety of progress monitoring durations and measurement conditions. Fourteen progress monitoring durations (2-15 weeks) and nine levels of SEE (2, 4, 6, 8, 10, 12, 14, 16, 18) were used to derive SEb. The outcomes are discussed in relation to assessment practices, such as selecting optimal progress monitoring durations to reduce measurement error. Implications and limitations are discussed.
Article
This study examined the educational effects of repeated curriculum-based measurement and evaluation. Thirty-nine special educators, each having three to four pupils in the study, were assigned randomly to a repeated curriculum-based measurement/evaluation (experimental) treatment or a conventional special education evaluation (contrast) treatment. Over the 18-week implementation, pedagogical decisions were surveyed twice; instructional structure was observed and measured three times; students' knowledge about their learning was assessed during a final interview; reading achievement was tested before and after treatment. Analyses of covariance revealed that experimental teachers effected greater student achievement. Additional analyses indicated that (a) experimental teachers' decisions reflected greater realism about and responsiveness to student progress, (b) their instructional structure demonstrated greater increases, and (c) their students were more aware of goals and progress.
Article
Curriculum-based measurement (CBM) is an approach for assessing the growth of students in basic skills that originated uniquely in special education. A substantial research literature has developed to demonstrate that CBM can be used effectively to gather student performance data to support a wide range of educational decisions. Those decisions include screening to identify, evaluating prereferral interventions, determining eligibility for and placement in remedial and special education programs, formatively evaluating instruction, and evaluating reintegration and inclusion of students in mainstream programs. Beyond those fundamental uses of CBM, recent research has been conducted on using CBM to predict success in high-stakes assessment, to measure growth in content areas in secondary school programs, and to assess growth in early childhood programs. In this article, best practices in CBM are described and empirical support for those practices is identified. Illustrations of the successful uses of CBM to improve educational decision making are provided.