Article

Curriculum-Based Measurement of Oral Reading: Multi-study evaluation of schedule, duration, and dataset quality on progress monitoring outcomes

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Curriculum-Based Measurement of Oral Reading (CBM-R) is used to collect time series data, estimate the rate of student achievement, and evaluate program effectiveness. A series of 5 studies were carried out to evaluate the validity, reliability, precision, and diagnostic accuracy of progress monitoring across a variety of progress monitoring durations, schedules, and dataset quality conditions. A sixth study evaluated the relation between the various conditions of progress monitoring (duration, schedule, and dataset quality) and the precision of weekly growth estimates. Model parameters were derived from a large extant progress monitoring dataset of second-grade (n=1517) and third-grade students (n=1561) receiving supplemental reading intervention as part of a Tier II response-to-intervention program. A linear mixed effects regression model was used to simulate true and observed CBM-R progress monitoring data. The validity and reliability of growth estimates were evaluated with squared correlations between true and observed scores along with split-half reliabilities of observed scores. The precision of growth estimates were evaluated with root mean square error between true and observed estimates of growth. Finally, receiver operator curves were used to evaluate the diagnostic accuracy and optimize decision thresholds. Results are interpreted to guide progress monitoring practices and inform future research.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Decoding is a crucial skill for literacy and for the consolidation of fluent reading and, consequently, reading comprehension [16][17][18]. The assessment of decoding through read aloud is currently the most frequently used measure to monitor the acquisition and progress of the skill, both with regard to school assessments and to verify the effectiveness of intervention programs [19][20][21]. In addition, the results of the oral decoding assessment are an important predictor of the reading performance of the individual. ...
... In addition, the results of the oral decoding assessment are an important predictor of the reading performance of the individual. In the United States, oral reading assessment measures are analyzed by the Federal Education Department to monitor the academic development and to develop stimulation and/or intervention programs [19]. The type of material used for the evaluation must be adequate to the objective that has been set, as the results differ according to measures, such as isolated words or texts [19][20][21]. ...
... In the United States, oral reading assessment measures are analyzed by the Federal Education Department to monitor the academic development and to develop stimulation and/or intervention programs [19]. The type of material used for the evaluation must be adequate to the objective that has been set, as the results differ according to measures, such as isolated words or texts [19][20][21]. The oral reading of isolated words is the most frequently used task to assess the individual's proficiency in decoding [22]. ...
Article
Full-text available
Decoding skills are crucial for literacy development and they tend to be acquired early in transparent languages, such as Brazilian Portuguese. It is essential to better understand which variables may affect the decoding process. In this study, we investigated the processes of decoding as a function of age of children who are exposed to a transparent language. To this end, we examined the effects of grade, stimulus type and stimulus extension on the decoding accuracy of children between the ages of six and 10 years who are monolingual speakers of Brazilian Portuguese. The study included 250 children, enrolled from the first to the fifth grade. A list of words and pseudowords of variable length was created, based on Brazilian Portuguese structure. Children assessment was conducted using the computer program E-prime® which was used to present the stimuli. The stimuli were programmed to appear on the screen in a random order and children were instructed to read them. The results indicate two important moments for decoding: the acquisition and the mastery of decoding skills. Additionally, the results highlight an important effect of the extent and type of stimuli and how it interacts with the school progress. Moreover, data indicate the multifactorial nature of decoding acquisition and the different interactions between variables that can influence this process. We discuss medium- and long-term implications of it, and possible individual and collective actions which can improve this process.
... Perhaps most often researchers use the split-half odd-even method to estimate the reliability of student growth estimates (VanDerHeyden and Burns, 2008;Christ et al., 2013b;Van Norman et al., 2013). This method requires measurement timepoints to be splitted into the odd and even timepoints. ...
... Among other outcomes, previous simulation studies typically focus on true reliability as well as estimated split-half reliability (Christ et al., 2012(Christ et al., , 2013b and, thus, split-half reliability is the only method for which we know how well it works. The match between estimated split-half reliability and true reliability decreased as a function of number of measurement timepoints as well as data quality (operationalized as the amount of residual variance). ...
... Yet, given that uncorrected split-half refers to reliability of slopes based on only half the timepoints, this is not surprising. True reliability of OLS slopes has also been quantified in simulation studies on learning growth in the context of curriculum-based measurement as the squared correlation between estimated and true learning growth (Christ et al., 2012(Christ et al., , 2013b. However, these studies did not estimate multilevel reliability. ...
Article
Full-text available
Reliable learning progress information is crucial for teachers’ interpretation and data-based decision making in everyday classrooms. Slope estimates obtained from simple regression modeling or more complex latent growth models are typically used in this context as indicators of learning progress. Research on progress monitoring has used mainly two ways to estimate reliability of learning progress, namely a) split-half reliability and b) multilevel reliability. In this work we introduce empirical reliability as another attractive alternative to quantify measurement precision of slope estimates (and intercepts) in learning progress monitoring research. Specifically, we extended previous work on slope reliability in two ways: a) We evaluated in a simulation study how well multilevel reliability and empirical reliability work as estimates of slope reliability, and b) we wanted to better understand reliability of slopes as a latent variable (by means of empirical reliability) vs. slopes as an observed variable (by means of multilevel reliability). Our simulation study demonstrates that reliability estimation works well over a variety of different simulation conditions, while at the same time conditions were identified in which reliability estimation was biased (i.e., with very poor data quality, 8 measurement points, and when empirical reliability was estimated). Furthermore, we employ multilevel reliability and empirical reliability to estimate reliability of intercepts (i.e., initial level) and slopes for the quop-L2 test. Multilevel and empirical reliability estimates were comparable in size with only slight advantages for latent variable scores. Future avenues for research and practice are discussed.
... However, based on an estimate of the SEM of such a probe (11.29 wcpm; Dynamic Measurement Group, 2013), an 80% confidence interval for this datum extends to 69 wcpm. Importantly, the SEM of CBM varies by grade, as well as by CBM (i.e., math or reading; Christ & Silberglitt, 2007;Christ, Zopluoglu, Monaghen, & Van Norman, 2013;Seethaler & Fuchs, 2011;Shapiro, Edwards, & Zigmond, 2005), suggesting there is no fixed precision that should be assumed across BEAs. ...
... Table 1, assuming student scores in all conditions followed a standard normal distribution, where X ~ N(µ = X , σ = SEM). This assumption is common to the CBM literature (Christ & Silberglitt, 2007;Christ et al., 2013;Van Norman & Christ, 2016). Here, X equals the mean reported score, given the type of CBM and grade, and the SEM is equal to the SD of the sampling distribution (see Table 1), for one administration. ...
... As the current results suggest, it is plausible for the SEM to be high enough such that the FDR and/or FNR may render BEA invalid, regardless of implementation design. These results fit within a history of research highlighting the consequences of probe variability on decision making (Christ & Silberglitt, 2007;Christ et al., 2013;Poncy et al., 2005). ...
Article
Brief experimental analysis (BEA) is a well-researched approach to conducting problem-analysis, where potential interventions are pilot-tested using a single-subject alternating treatment design. However, its brevity may lead to a high frequency of decision-making errors, particularly in situations where one tested condition is rarely optimal for students (i.e., the base rate). The current study explored the accuracy of a specific variant of BEA, skill vs. performance deficit analysis (SPA), across different variations of the basic BEA design, score difference thresholds, and reading and math curriculum-based measurements (CBMs). Findings indicate that the ABAB design provides a reasonable control of such error rates when using reading CBM, whereas subtraction CBM required the use of an ABABAB design. Such error rates could not be controlled, regardless of design, when using multiplication CBM. Implications for best practice in the use of BEA is discussed.
... Unfortunately, evaluating student progress is extremely complex. In addition to identifying a suitable measure, conflicting recommendations have emerged regarding, among other things, optimal methods to summarize performance (Parker, Vannest, Davis, & Clemens, 2012;Shinn, Good, & Stein, 1989), methods to account for errors in time series data (Albano & Rodriguez, 2012;, and the minimum duration and frequency at which data ought to be collected (Christ, Zopluoglu, Monaghen, & Van Norman, 2013;Jenkins, Graff, & Miglioretti, 2009). Further, the most promising recommendations generally outpace the capacity of practitioners to carry them out (e.g., Mercer, Lyons, Johnston, & Millhoff, 2015;Solomon & Forsberg, 2017). ...
... Initial research on CBM-R decision rules evaluated the accuracy of decisions in conjunction with a variety of other implementation characteristics. For example, Christ et al. (2013) used simulation methodology to evaluate the impact of data collection duration (i.e., number of weeks progress was monitored), frequency of data collection (number of observations collected per week), and measurement error on progress monitoring outcomes. The researchers concluded that when data were collected under typical conditions with instruments often used in practice, or when measurement error resembled what is observed in applied practice, low-stakes, or easily reversible, decisions were feasible after 12-14 weeks when one observation was collected once a week; after 10 weeks when one observation was collected three times per week; and after 12-13 weeks when three observations were collected once a month using an ordinary least squares (OLS) trend line rule. ...
... Three observed scores were generated at Week 1 to enable baseline estimation to create goal lines. Based upon previous simulations (e.g., Christ et al., 2013) observations collected a week or more apart were assumed to be uncorrelated across time. However, error terms associated with observations collected on the same day, to estimate baseline, were assumed to be correlated. ...
Article
Full-text available
School psychologists regularly use decision rules to interpret student response to intervention in reading. Recent research suggests that the accuracy of those decisions rules depends on the duration of progress monitoring, the number of observations available, and the amount of measurement error present. In this study, we extended existing research to evaluate the influence of a student's initial level of performance, goal line type, and decision rule type on the accuracy of interpretations of progress monitoring data. Normative goal lines performed best for students scoring far below benchmark at the beginning of the year, while goal lines based upon a spring benchmark score were appropriate for students performing just below expectations at the beginning of the year. The data point rule performed poorly across all progress monitoring conditions, while comparing the median of the the 3 most recent observations to a goal line performed similarly to the trend line rule.
... For instance, the SEb value can be used to create a confidence interval around the slope (Christ, 2006); smaller confidence intervals equate to greater confidence in the progress monitoring decisions that are made. Alternatively, one can examine the relation between slope and SEb (Fuchs & Fuchs, 2005), which can be characterized as a ratio of the signal (i.e., slope) to noise (i.e., error; Christ, Zopluoglu, Monaghen, & Van Norman, 2013). Although no research-based guidelines are available, the magnitude of the ratio should be large, as a ratio of 1 would mean that the estimate of growth is as large as the error associated with that growth. ...
... For example, January et al. (2018) found that, in comparison with a once weekly schedule, a bimonthly schedule was more appropriate than progress monitoring monthly, when data were collected across 30 weeks. Although a few other studies have examined the potential utility of monthly monitoring (e.g., Christ et al., 2013;Mercer & Keller-Margulis, 2015), little research exists investigating the technical characteristics and utility of monitoring more frequently than once weekly. In consideration of time and resources, once weekly progress monitoring may be desirable; however, it is important that the frequency and density of monitoring is balanced with obtaining technically adequate data. ...
... 2 than weekly. In the first study, Christ et al. (2013) used simulated data to investigate the technical characteristics of a variety of progress monitoring schedules. Relevant to the current study was their examination of: once per week (with 3 probes), twice weekly (with 3 probes), three times per week (with 1 probe), and daily (with 1 probe) schedules across 20 weeks. ...
Article
School-based professionals often use curriculum-based measurement of reading (CBM-R) to monitor the progress of students with reading difficulties. Much of the extant CBM-R progress monitoring research has focused on its use for making group-level decisions, and less is known about using CBM-R to make decisions at the individual level. To inform the administration and use of CBM-R progress monitoring data, the current study evaluated the utility of 4 progress monitoring schedules that differed in frequency (once or twice weekly) and density (1 or 3 probes). Participants included 79 students (43% female; 51% White, 25% Hispanic or Latino, 11% Black or African American, 1% other, 12% unknown) in Grades 2 (n = 45) and 4 (n = 34) who were monitored across 10 weeks (February to May). Consistent with a focus on individual-level decision making, we used regression and mixed-factorial analysis of variances (ANOVAs) to evaluate the effect of progress monitoring schedule frequency, schedule density, grade level, and their interaction effects on CBM-R intercept, slope, SE of the slope (SEb), and SE of the estimate (SEE). Results revealed that (a) progress monitoring schedule frequency and density influenced the magnitude of SEb, (b) density had a significant but negligible impact on SEE, and (c) grade level had a significant effect on slope and intercept. None of the interaction effects were statistically significant. Findings from this study have implications for practitioners and researchers aiming to monitor students’ progress with CBM-R.
... Specifically, if the median cSEM of the progress monitoring data for a particular student exceeds the value of 16 USS points (see Table 1), the student should be retested or testing should continue until median cSEM values for the data collected are below this criterion. Christ et al. (2013) argued that "rate-based measures, such as CBM-R, are often more sensitive to variations in performancesas compared to measures based on frequency (e.g., the number of instances of a behavior) or accuracy (e.g., the percentage of items correct)" (p. 21). ...
... Despite Star Reading not being a ratebased measure, it appears that it is possible to use its results for the purpose of progress monitoring. Furthermore, a CAT approach to progress monitoring allows the test user to avoid the potential sources of measurement variability that are likely to appear in CBM-R measures, such as examiner characteristics, setting, delivery of directions, and alternate forms (see Christ et al., 2013, for a review). ...
... Regardless, as progress monitoring practices are developed, it is important to remember that one of the fundamental characteristics of measures used for this purpose is that they need to be sensitive to student growth over a relatively short period of time. As noted by Christ et al. (2013), "if the scenario requires 20 or more weeks of data, then the utility of progress monitoring and reality of inductive hypothesis testing are seriously threatened" (p. 55). ...
Article
Full-text available
The increasing use of computerized adaptive tests (CATs) to collect information about students' academic growth or their response to academic interventions has led to a number of questions pertaining to the use of these measures for the purpose of progress monitoring. Star Reading is an example of a CAT-based assessment with considerable validity evidence to support its use for progress monitoring. However, additional validity evidence could be gathered to strengthen the use and interpretation of Star Reading data for progress monitoring. Thus, the purpose of the current study was to focus on three aspects of progress monitoring that will benefit Star Reading users. The specific research questions to be answered are: (a) how robust are the estimation methods in producing meaningful progress monitoring slopes in the presence of outliers; (b) what is the length of the time interval needed to use Star Reading for the purpose of progress monitoring; and (c) how many data points are needed to use Star Reading for the purpose of progress monitoring? The first research question was examined using a Monte Carlo simulation study. The second and third research questions were examined using real data from 6,396,145 students who took the Star Reading assessment during the 2014–2015 school year. Results suggest that the Theil-Sen estimator is the most robust estimator of student growth when using Star Reading. In addition, it appears that five data points and a progress monitoring window of approximately 20 weeks appear to be the minimum parameters for Star Reading to be used for the purpose of progress monitoring. Implications for practice include adapting the parameters for progress monitoring according to a student's current grade-level performance in reading.
... Each student was assessed with NWF and CBM-R measures once a week for progress-monitoring purposes. All measures were developed by Fastbridge Learning (Christ & FastBridge Learning, 2015). ...
... Predictive validity with the Group Reading Assessment and Diagnostic Evaluation test was reported as .67 (Christ & FastBridge Learning, 2015). ...
... The number of words read correct in 1 min was recorded by tutors. Median alternate form and internal consistent reliability for FAST CBM-R probes was equal to .92 (Christ & FastBridge Learning, 2015). Fall to winter and winter to spring test-retest reliability estimates were equal to .90 and .82, ...
Article
Student response to instruction is a key piece of information that school psychologists use to make instructional decisions. Curriculum-based measures (CBMs) are the most commonly used and researched family of academic progress-monitoring assessments. There are a variety of reading CBMs that differ in the type and specificity of skills they assess. The purpose of this study was to determine the degree to which the CBM of oral reading (CBM-R) progress-monitoring data differed from nonsense-word fluency (NWF) progress-monitoring data in the presence of a common intervention. We used multivariate multilevel modeling to compare growth trajectories from CBM-R and NWF progress-monitoring data from a geographically diverse sample of 3,000 1st-grade students receiving Tier-2 phonics interventions. We also evaluated differences in sensitivity to improvement and reliability of improvement from each measure. Improvement on CBM-R was statistically, but not practically, significantly greater than NWF. Although CBM-R was not as direct a measure of decoding, it still captured student response to phonics instruction similarly to NWF. NWF demonstrated slightly better sensitivity to growth, but CBM-R yielded more reliable growth estimates.
... When observed growth approximates target growth, however, measurement error and related factors are much more influential on decision rule accuracy. Results from simulation studies suggest that when collecting one observation per week using commercial probe sets and standardized administration and scoring procedures, measurement error is minimized and 12-14 weeks' worth of data is required to obtain a sufficiently reliable estimate of growth (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). When CBM-R probes are not of roughly equal difficulty and consistent standardized data collection procedures are not followed, as much as 18-20 observations may be necessary to obtain a sufficiently reliable estimate of growth . ...
... When CBM-R probes are not of roughly equal difficulty and consistent standardized data collection procedures are not followed, as much as 18-20 observations may be necessary to obtain a sufficiently reliable estimate of growth . Collecting more observations per week decreases the amount of time required to obtain reliable estimates of growth; however, the duration of data collection and the measurement error are more influential factors on the reliability of CBM-R growth estimates (Christ, 2006;Christ et al., 2013). ...
... Dataset quality was based upon the magnitude of the standard error of estimate (SEE) of WRCM scores around an OLS trend line. SEE quantifies the average distance of WRCM observations from a line of best fit, and has been used in previous studies to describe the reliability or quality of progress monitoring outcomes (Christ, 2006;Christ et al., 2013;Van Norman & Christ, 2016). To simplify matters, cases that had SEE values above 10 WRCM were considered "poor quality," and cases with SEE values equal to or less than 10 WRCM were considered "good quality." ...
Article
The accuracy of decision rules for progress monitoring data is influenced by multiple factors. This study examined the accuracy of decision rule recommendations with over 4,500 second‐and third‐grade students receiving a tier II reading intervention program. The sensitivity and specificity of three decision rule recommendations for predicting year‐end spring benchmark targets was evaluated over different data collection durations under good and poor dataset conditions. Across grade level and dataset quality, the sensitivity of decisions made using trend lines and the median of recent data points tended to improve to acceptable levels after 18 weeks. Specificity tended to decrease around that same time but was less pronounced with the median decision rule method. Limitations and directions for future research are discussed.
... Using simulated data, Christ, Zopluoglu, Monaghen, and Van Norman (2013) examined the precision of slope estimates from monthly progress monitoring schedules over the course of 20 weeks. Findings revealed that reliable and valid estimates of growth for making low-stakes decisions were possible when monitoring once monthly, but only after 2-3 months using a very good (SEE = 5) passage set and more than 4 months with a good (SEE = 10) passage set. ...
... Although this is an ambitious goal, it is likely that data collected within school settings would yield slopes that are more or less steep than 1.5 WRCM per week. Further, Christ et al. (2013) did not investigate progress monitoring schedules with frequencies that fell between weekly and monthly. ...
... Other considerations associated with the aforementioned CBM-R simulation studies (e.g., Christ et al., 2013) warrant further discussion. When simulating hypothetical intercepts and slopes for students, previous findings that growth magnitude depends on a student's initial level of performance (Silberglitt & Hintze, 2007) were not modeled. ...
Article
The present study examined the utility of two progress monitoring assessment schedules (bimonthly and monthly) as alternatives to monitoring once weekly with curriculum-based measurement in reading (CBM-R). General education students (N = 93) in Grades 2-4 who were at risk for reading difficulties but not yet receiving special education services had their progress monitored via three assessment schedules across 1 academic year. Four mixed-factorial analyses of variance tested the effect of progress monitoring schedule (weekly, bimonthly, monthly), grade (2, 3, and 4), and the interaction effect between schedule and grade on four progress monitoring outcomes: intercept, slope, standard error of the estimate, and standard error of the slope. Results indicated that (a) progress monitoring schedule significantly predicted each outcome, (b) grade predicted each progress monitoring outcome except the standard error of the slope, and (c) the effect of schedule on each outcome did not depend on students' grade levels. Overall, findings from this study reveal that collecting CBM-R data less frequently than weekly may be a viable option for educators monitoring the progress of students in Grades 2-4 who are at risk for reading difficulties.
... The graphs from each of the 19 remaining studies were evaluated using guidelines set forth by (Kratochwill et al., 2010) and Parker et al. (2005). We elected to use the criteria from Parker et al. (2005) for the total number of observations to limit the impact of imprecise slope estimates on the PEBT ES as CBM-R progress monitoring research suggests that slope estimates based upon less than six observations contain extremely high levels of measurement error (Christ, 2006;Christ, Zopluoglu, Monaghen, & Van Norman, 2013). Studies were excluded if they contained fewer than five baseline data points (n = 2), fewer than five intervention phase data points (n = 2), or fewer than 14 data points across both pages (n = 9). ...
... Author specified. Based upon the multilevel analysis, the residual variance of WRCM scores was approximately 100 WRCM (SD = 10; Table 2), which coincides with values typically observed in the research literature, and is consistent with the qualitative descriptor of a "good" or "typical" data set in previous CBM-R simulation studies (Christ, 2006;Christ, Zopluoglu, et al., 2013). To better understand the influence of SEM on ES outcomes, we also selected residual values that represented ideal or "very good" quality data sets (SEM =5), as well as suboptimal or "poor" quality data sets (SEM = 15; Table 1). ...
... Computation of PEBT involves calculating and projecting a monotonic linear trend line based upon baseline data into the intervention phase. The standard error of the slope of OLS trend lines from CBM-R data is largely influenced by the duration of data collection and the number of observations collected (Christ, 2006;Christ, Zopluoglu, et al., 2013). For PEBT, projected trend lines based upon fewer observations are likely to contribute to the inconsistency of observations that exceed that line. ...
Article
Researchers and practitioners frequently use curriculum-based measures of reading (CBM-R) within single-case design (SCD) frameworks to evaluate the effects of reading interventions with individual students. Effect sizes (ESs) developed specifically for SCDs are often used as a supplement to visual analysis to gauge treatment effects. The degree to which measurement error associated with academic measures like CBM-R influences said ESs has not been fully explored. We used simulation methodology to evaluate how common magnitudes of error influenced the consistency and accuracy of outcomes from two nonparametric SCD ESs, percentage of data exceeding baseline trend and TauU. After accounting for other data characteristics, measurement error accounted for a statistically and practically significant amount of variance in the consistency and accuracy of outcomes from both ESs. This article suggests that the psychometric properties of academic measures are important to consider when interpreting ESs from SCDs.
... Systematic lines of research have begun to emerge that have explored various influences on the accuracy of treatment recommendations from decision rules. In general, researchers have found that the accuracy of recommendations improve if: (a) educators use the median or trend line rule as opposed to the data point rule, (b) commercial probe sets and standardized directions are used to minimize measurement error, (c) at least one observation is collected per week for at least 10-occasion, and (e) observed growth is highly discrepant from the goal line (Christ, Zopluoglu, Monaghen, & Van Norman, 2013;Hintze, Wells, Marcotte, & Solomon, 2018;Van Norman & Christ, 2016a;Van Norman, Christ, & Newell, 2017). It is worth nothing that all studies reviewed thus far have assumed that goal lines, as well as student growth, were monotonic and linear across the school year. ...
... Generally at least twelve weeks of data were required to support low-stakes treatment decisions (i.e., sensitivity and specificity > .70; Christ et al., 2013). The data point rule yielded suboptimal sensitivity across all durations. ...
Article
Full-text available
Despite repeated findings that within-year growth in oral reading rate is nonlinear for many students, existing decision-making frameworks to evaluate response to intervention assume that growth is linear across an entire school year. The purpose of this study was to evaluate the consequences of failing to account for nonlinear growth among students in Grade 3 receiving supplemental interventions progress monitored with FastBridge Learning probes when using existing curriculum-based measurement of reading decision rules. Not accounting for nonlinear growth when using a goal line based upon expected growth between fall and spring assessment periods led to suboptimal outcomes for the data point, trend line, and median rules through 16 weeks of progress monitoring. Using a goal line based upon expected improvement between fall to winter benchmarks helped improved the accuracy of identifying cases that needed an instructional change (sensitivity) but led to lower levels of accuracy in identifying students that were benefitting from intervention (specificity). Using a gated framework in which growth was compared to both types of goal lines led to slight improvements in specificity among cases that showed nonlinear growth at the expense of sensitivity. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
... It is likely that this is due to the ability to more readily link mastery measures to an instructional approach; however, because mastery measures are so directly tied to instruction, it is possible that teachers might not be able to draw conclusions about the overall efficacy of their instruction (Fuchs & Deno, 1991). To determine whether monitoring progress using CBM is a more accurate approach, or whether more systematic decision-making methods such as slope rules are more effective as some research indicates (e.g., Christ et al., 2013), more research is needed. ...
... A majority of the studies collected data once a week or more, which is consistent with previous research that recommends collecting data one to three times per week (e.g., Christ et al., 2013;Van Norman & Christ, 2016). Importantly, only one study reported how often decisions were made based on these data (Fuchs et al., 1989). ...
... The trend line rule has shown some promise when appropriate conditions are met. Collecting more observations per data collection occasion (e.g., three v. one) tends to reduce the number of weeks data needs to be collected before one can make a decision (e.g., 20 weeks vs. 16 weeks; Christ et al., 2013). ...
... Individual time-series data are often autocorrelated (Busk & Marascuilo, 1988) and much more research has been conducted to establish typical levels of autocorrelation in CBM-R data (Parker, Vannest, Davis, & Clemens, 2012) relative to LSF. Despite this, a number of CBM-R simulation studies have not modeled autocorrelation (e.g., Christ et al., 2013;Christ & Desjardins, 2018). ...
Article
Full-text available
A growing body of research suggests that growth in early literacy skills, including letter sound fluency, is predictive of later reading outcomes. In turn, educators and school psychologists use measures of letter sound fluency to monitor student response to early reading instruction. Limited research has evaluated whether decision-making frameworks that educators apply to early reading time series data to make instructional decisions (e.g., to continue or change the intervention) yield accurate recommendations. Further, it is unclear how long data need to be collected and which type of decision rule (e.g., data point, trend line) will produce the most accurate recommendations. We conducted a series of simulations to investigate the impact of data collection duration (4 to 16 weeks) and decision-rule type (data point, trend line, and median) on the accuracy of data-based decisions using early literacy data. Results suggest that the median and trend-line rules produced recommendations that were sufficiently accurate in identifying students that were not making adequate progress (sensitivity) after about 12 weeks when data were collected once a week. However, the median and trend-line rules did not produce recommendations that were sufficiently accurate to continue an intervention (specificity) across all progress monitoring durations. The opposite pattern was observed for the data-point rule. Outcomes suggest that recommendations developed from other progress monitoring measures (e.g., oral reading fluency) should not be extrapolated to other measures without empirical investigation. More research is needed to identify appropriate decision rules to evaluate early literacy progress monitoring data. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
... First, a teacher administers a test of math computation in a nonstandardized manner across students. Such administrative variability results in enhanced measurement error, thereby reducing the reliability of scores (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). Second, school administers multiple highly correlated universal screening tools to inform a single decision of student risk for reading difficulties. ...
... For instance, research regarding assessment procedures is particularly plentiful in the area of curriculum-based measurement (CBM). Studies have yielded a number of procedural recommendations, including those related to directions for standardized administration, duration of progress monitoring (e.g., number of weeks), and schedule of data collection (e.g., number of times per week; Christ et al., 2013;Shinn, 1989). Taken together, the CBM literature provides a rich set of recommendations regarding which specific tools are supported for use and the procedures by which they should be applied. ...
Article
Full-text available
Assessing for applied decision-making is an essential role of a school psychologist. Advances in complex statistical analyses (e.g., item response theory, structural equation modeling) have allowed for more robust psychometric evaluation and assessment tools than ever before. School psychologists now have access to a wide range of evidence-based assessments across academic, behavioral, and cognitive domains. In contrast, relatively less research has examined the procedures through which assessment data are collected and analyzed. Such limitations restrict the utility of assessment data, as well as the validity of decisions made in consideration of said data. The primary goal of this special issue is to feature research that critically examines factors that facilitate or inhibit accurate and efficient decision-making in schools, including that which elucidates best practice for improving the input and output of assessment data. Articles provide empirical support and procedural guidance to improve decision-making in schools. Commentaries reflect on the current state of evidence and offer suggestions for future research and practice. © 2018 National Association of School Psychologist. All Rights Reserved.
... It is likely that this is due to the ability to more readily link mastery measures to an instructional approach; however, because mastery measures are so directly tied to instruction, it is possible that teachers might not be able to draw conclusions about the overall efficacy of their instruction (Fuchs & Deno, 1991). To determine whether monitoring progress using CBM is a more accurate approach, or whether more systematic decision-making methods such as slope rules are more effective as some research indicates (e.g., Christ et al., 2013), more research is needed. ...
... A majority of the studies collected data once a week or more, which is consistent with previous research that recommends collecting data one to three times per week (e.g., Christ et al., 2013;Van Norman & Christ, 2016). Importantly, only one study reported how often decisions were made based on these data (Fuchs et al., 1989). ...
Article
For students with persistent reading difficulties, research suggests one of the most effective ways to intensify interventions is to individualize instruction through use of performance data—a process known as data-based decision making (DBDM). This article reports a synthesis and meta-analysis of studies of reading interventions containing DBDM for struggling readers, as well as the characteristics and procedures that support the efficacy of these interventions. A systematic search of peer-reviewed literature published between 1975 and 2017 was conducted, resulting in 15 studies of reading interventions that incorporated DBDM for struggling readers in Grades K–12. A comparison of students who received reading interventions with DBDM with those in business-as-usual (BAU) comparison groups yielded a weighted mean effect of g = .24, 95% confidence interval (CI) = [.01 to .46]. A subset of six studies that compared students receiving similar reading interventions with and without DBDM yielded a weighted mean effect of g = .27, 95% CI = [.07, .47]. Implications for DBDM in reading interventions for struggling readers and areas for future research are described. In particular, experimental investigation is necessary to establish DBDM as an evidence-based practice for struggling readers.
... If inaccurate decisions are made based on variable data, students who need a change in intervention may not receive it, whereas students who do not need a change may experience an unneeded change in intervention. For CBM ORF, researchers have suggested that very low variability exists when most (i.e., 2/3) of the data points fall within five correctly read words per minute (five above and five below) of a trend line, and acceptable variability exists when most of the data points fall within 10 correctly read words per minute (10 above and 10 below) of a trend line (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). These values are based on ranges across grade levels (e.g., Christ & Silberglitt, 2007); therefore, students who read more slowly would have lower limits of variability that are acceptable. ...
... Once data have been collected using the highest-quality assessment passages according to standardized instructions and procedures and in a consistent, quiet location, educators can graph and interpret the CBM ORF data in ways that minimize the impact of some variability. Recent research studies suggest that better decisions for individual students were made when educators (a) used graphical supports, such as a trend line and a goal line for comparison of progress (e.g., Van Norman & Christ, 2016;Van Norman, Nelson, Shin, & Christ, 2013); (b) collected data for longer periods of time in the presence of variability and low slope (e.g., 12-14 weeks of onceweekly measures, as opposed to 6 weeks of once-weekly measures in the presence of low variability and higher slope, such as growth of at least 1.5 WCPM per week; Christ et al., 2013;Van Norman & Christ, 2016); (c) received training on graph and data interpretation, such as identifying and removing extreme values that can skew a trend line ; and (d) used visual analysis procedures (e.g., comparing trend line with goal line in the context of the data) rather than decision rules (e.g., 3 data points above or below the line; Van Norman & Christ 2016). It is important to note that educators may need more advanced training to conduct some of these procedures (i.e., treatment of extreme values) to ensure that accuracy of measurement is preserved. ...
... Accordingly, researchers have highlighted the need for more refined analyses of the error inherent in CBM-R scores when making decisions at the individual level. A growing body of research suggests that the error in CBM-R scores and the observed trends in student performance must be accounted for when interpreting student performance (Christ, 2006;Christ, Zopluoglu, Monaghen, & Van Norman, 2013). ...
... SEM. Three levels of measurement error were evaluated in this study. Researchers (e.g., Christ et al., 2013) have anecdotally identified SEM and standard error of the estimate magnitudes of 5 WRCM to reflect optimal, if rarely achievable levels of precision, and 15 WRCM to reflect suboptimal precision. Often, high SEM values (≥15) are reflective of data collected with poorly constructed instruments in unstandardized conditions. ...
Article
Recently, researchers have argued that using quantitative effect sizes in single-case design (SCD) research may facilitate the identification evidence-based practices. Indices to quantify nonoverlap are among the most common methods for quantifying treatment effects in SCD research. Tau-U represents a family of effect size indices that were developed to address criticisms of previously developed measures of nonoverlap. However, more research is necessary to determine the extent to which Tau-U successfully addresses proposed limitations of other nonoverlap methods. This study evaluated Tau-U effect sizes, derived from multiple-baseline designs, where researchers used curriculum-based measures of reading (CBM-R) to measure reading fluency. Specifically, we evaluated the distribution of the summary Tau-U statistic when applied to a large set of CBM-R data and assessed how the variability inherent in CBM-R data may influence the obtained Tau-U values. Findings suggest that the summary Tau-U statistic may be susceptible to ceiling effects. Moreover, the results provide initial evidence that error inherent in CBM-R scores may have a small but meaningful influence on the obtained effect sizes. Implications and recommendations for research and practice are discussed.
... Untersuchungen zur Reliabilität und Validität von Veränderungsmessungen zeigen, dass deren Güte insbesondere von der Reliabilität der Einzelmessungen sowie der Länge des Zeitraumes, währenddessen die Veränderungen erfasst werden, beeinflusst wird (Christ, Zopluoglu, Monaghen & Van Norman, 2013 (Christ et al., 2010), müssen die Datenerhebungen gezielt terminiert werden, wobei insbesondere Ferienzeiten und -effekte berücksichtigt werden sollten (Fink et al., 2015). ...
... Je größer der Unterschied zwischen der individuellen Lernentwicklung und der Norm ist, desto früher kann diese Abweichung auch bei noch großen Vertrauensintervallen des slope identifiziert werden. Nach Christ et al. (2013) muss der Lernzuwachs wenigstens doppelt so groß sein wie der Standardmessfehler des Zuwachses 2 , was in ihren Simulationsstudien erst nach et-wa acht bis zehn Wochen Verlaufsdiagnostik der Fall war. Bis verlässliche Schätzungen des Lernzuwachses möglich sind, sollten Lernverläufe entsprechend vorsichtig interpretiert werden. ...
Article
Full-text available
In diesem Kurzbeitrag schildern wir Herausforderungen bei der Normierung von Verfahren zur Lernverlaufsdiagnostik, die sich bei der Statusdiagnostik in dieser Form nicht stellen. Diese betreffen insbesondere die Frage, ob Normen für regulären Unterricht oder intensive Förderung benötigt werden, aber auch die Unterschiedlichkeit von Lernzuwächsen in Abhängigkeit von der erfassten Kompetenz, des verwendeten Messverfahrens, des Untersuchungszeitraums und bestimmter Schülermerkmale. Darüber hinaus weisen Lernverläufe im Unterschied zu einmaligen Testungen die statistische Besonderheit auf, dass die Größe der Vertrauensintervalle für den Lernzuwachs von der Anzahl der verfügbaren Messungen abhängt. Basierend auf einer Analyse dieser Herausforderungen schlagen wir Designmerkmale und Analyseschritte bei der Normierung von Verfahren zur Lernverlaufsdiagnostik vor.
... Computerized formative assessments in K-12 are often used to screen and monitor student performance, evaluate instruction effectiveness and make informed decisions about students and educational programs (Bulut & Cormier, 2018;Christ et al., 2013;January et al., 2018). However, the validity of test scores and inferences made based on these scores is jeopardized by the lack of test-taking engagement (Finn, 2015;Wise & DeMars, 2005). ...
Article
Full-text available
The purpose of this study was to develop predictive models of student test-taking engagement in computerized formative assessments. Using different machine learning algorithms, the models utilize student data with item responses and response time to detect aberrant test behaviors such as rapid guessing. The dataset consisted of 7,602 students (grades 1 to 4) who responded to 90 multiple-choice questions in a computerized reading assessment two times (i.e., fall and spring) during the 2017-2018 school year. We completed data analysis in four phases: 1. A response time method was used to label student engagement in both semesters; 2. The training data from the fall semester was used for training the machine learning models; 3. The testing data from the fall semester was used for evaluating the models and 4. The spring semester was used for model evaluation. Among the different algorithms, naive Bayes and support vector machine which were built on response time data from the fall semester, out performed other algorithms in predicting student engagement in the spring semester in terms of accuracy, sensitivity, specificity, area under the curve, kappa, and absolute residual values. The results are promising for early prediction of student test-taking engagement to intervene with the test administration and ensure that the validity of test scores and inferences made based on them.
... For example, from the perspective of CBM, a comprehensive simulation study revealed that validity and reliability of slope estimates depend on the overall duration (i.e., in weeks) of progress monitoring as well as the number of assessments within each week . Christ et al. (2013) found that valid and reliable slope estimation required at least four weeks of progress monitoring. While the overall duration of progress monitoring in LPA tends to be longer (e.g., 31 weeks in this study), the overall schedule must be considered to be clearly less dense with successive measurement timepoints being separated by approximately three-week intervals , for example. ...
Article
Full-text available
Monitoring the progress of student learning is an important part of teachers’ data-based decision making. One such tool that can equip teachers with information about students’ learning progress throughout the school year and thus facilitate monitoring and instructional decision making is learning progress assessments. In practical contexts and research, estimating learning progress has relied on approaches that seek to estimate progress either for each student separately or within overarching model frameworks, such as latent growth modeling. Two recently emerging lines of research for separately estimating student growth have examined robust estimation (to account for outliers) and Bayesian approaches (as opposed to commonly used frequentist methods). The aim of this work was to combine these approaches (i.e., robust Bayesian estimation) and extend these lines of research to the framework of linear latent growth models. In a sample of N = 4970 second-grade students who worked on the quop-L2 test battery (to assess reading comprehension) at eight measurement points, we compared three Bayesian linear latent growth models: (a) a Gaussian model, (b) a model based on Student’s t-distribution (i.e., a robust model), and (c) an asymmetric Laplace model (i.e., Bayesian quantile regression and an alternative robust model). Based on leave-one-out cross-validation and posterior predictive model checking, we found that both robust models outperformed the Gaussian model, and both robust models performed comparably well. While the Student’s t model performed statistically slightly better (yet not substantially so), the asymmetric Laplace model yielded somewhat more realistic posterior predictive samples and a higher degree of measurement precision (i.e., for those estimates that were either associated with the lowest or highest degree of measurement precision). The findings are discussed for the context of learning progress assessment.
... All assessments were double scored by a different tutor or the primary author who were not involved in assessing the student. Christ et al. (2013). The CBM measures the words students can read correctly during a one-minute reading of an instructional-level passage. ...
Article
Progress monitoring data are central to making informed decisions on intervention intensification for struggling learners. The general outcome measure (GOM) of curriculum-based measurement of oral reading fluency (CBM-R) has been found to correlate with high-stakes assessment; however, data are highly variable, resulting in decisions that must be made 15 weeks after implementation of intervention. Recent researchers have recommended the use of both GOM and specific subskill mastery measurement (SSMM) to overcome the challenges presented with the use of GOM alone, but research on the efficacy of this approach is limited. Using Bayesian and ordinary least squares regression, we compared the GOM of CBM-R with SSMM slopes for words read correctly per minute (wcpm) per week at 5, 7, and 12 weeks, and explored the relation of the respective slopes with subsequent standardized assessment tools for struggling upper elementary students receiving word reading intervention. We found that the SSMM had a similar slope to that noted in prior research (i.e., β = 1.46 wcpm per week). This slope was significant and related to future standardized assessment outcomes across the various time points. The slope for CBM-R was not significant or related to future assessment outcomes. Implications for research and practice are discussed.
... Universal screening and progress monitoring measures aim to evaluate both student performance and instruction effectiveness while helping educators make informed decisions about the educational programme (Bulut & Cormier, 2018;Christ et al., 2013;January et al., 2018;Van Norman et al., 2017). These measures are often used for quantifying students' academic growth rates in the basic skill areas of reading, mathematics, and writing. ...
Article
This study investigated the impact of students’ test-taking effort on their growth estimates in reading. The sample consisted of 7,602 students (Grades 1 to 4) in the United States who participated in the fall and spring administrations of a computer-based reading assessment. First, a new response dataset was created by flagging both rapid-guessing and slow-responding behaviours and recoding these non-effortful responses as missing. Second, students’ academic growth (i.e., daily increase in ability levels) from fall to spring was calculated based on their original responses and responses in the new dataset excluding non-effortful responses. The results indicated that students’ growth estimates changed significantly after recoding non-effortful responses as missing. Also, the difference in the growth estimates varied depending on the grade level. Overall, students’ test-taking effort appeared to be influential in the estimation of students’ reading growth. Implications for practice were discussed.
... K. Baker et al., 2008;Christ et al., 2010;Goffreda & DiPerna, 2010;L. S. Fuchs et al., 2001;Wanzek et al., 2010), as a screening measure (Reschly et al., 2009) and as a progress-monitoring tool (Christ et al., 2013). Furthermore, Connor et al. (2009) found that fall oral reading fluency and vocabulary predicted reading intervention effects in first, second, and third-grade students. ...
Article
Universal screening is the first stage in identifying students at risk for reading difficulties within a Response to Intervention model. However, there is a lack of validated screening tools for assessing reading abilities in first-grade students from Spain. This pilot study examine the technical adequacy, classification accuracy, and best predictors within a set of curriculum-based measures (CBMs) in Spanish. A sample of 178 first graders from urban and peripheral areas of Santa Cruz de Tenerife (the Canary Islands, Spain) was assessed in the fall, winter, and spring. Receiver operating characteristic curves and logistic regression models were conducted to evaluate the predictive validity of each CBM and their composite score. In addition, students’ learning growth on each CBM was analyzed using hierarchical linear models. Although results suggested that most of the CBMs had adequate reliability and validity throughout first grade and were able to detect students’ growth, some measures showed low reliability and validity coefficients. Practice or police: Some of the studied CBMs could potentially be used as universal screening to early detect reading difficulties in this population. A two-stage gated screening procedure is proposed for future research and practical implementation using oral reading fluency in the first step.
... Für alle drei Leseflüssigkeitstests ergeben sich daher unterschiedliche Bearbeitungszeiten pro Item, was zu größeren Lernanstiegen bei gleicher Bearbeitungszeit führen kann. Diese Beobachtungen stärken die Argumente vonChrist et al. (2013), dass die Testkonstruktion bei der Interpretation der Zuwachsraten berücksichtigt werden muss.Bemerkenswert ist, dass in allen Klassen bei bis zu 15 % der Schülerinnen und Schüler neutrale oder negative Lernverläufe beobachtet wurden. Neutrale und negative Lernverläufe werden häufig in äußeren Leistungsperzentilen berichtet(Fuchs et al., 1993;Walter, 2011 a). ...
Article
Full-text available
Individuelle Lernverläufe im Lesen können mittels Lernverlaufsdiagnostik im Längsschnitt beobachtet werden. Die Lernverlaufstests müssen dafür auch über die Zeit reliabel und änderungssensibel messen. Dieser pilotierende Beitrag prüft erstmalig die Relia-bilität sowie die Änderungssensibilität von bereits im Querschnitt evaluierten Lernverlaufs-tests zur Messung der Leseflüssigkeit und des basalen Leseverständnisses im Längsschnitt. Zu vier Messzeitpunkten innerhalb eines Schuljahres wurden Lernende des dritten und vierten Jahrgangs einer regulären Grundschule (N = 90) mit drei Leseflüssigkeitstests (Silben, Wörter, Pseudowörter) sowie einem Satzverständnistest von externen Testerinnen in der Onlineplatt-form www.levumi.de getestet. Die Lehrkräfte erhielten am Ende der Studie die Ergebnisse. Die Ergebnisse zeigen für beide Klassenstufen und alle Messzeitpunkte zufriedenstellende Paralleltest-Reliabilitätswerte (r = .72-.90). Im Schnitt steigen die Leseleistungen in 12 Wo-chen zwischen 2.15 (0.55) und 5.41 (0.98) Items. 4.26-14.80 % der Lernenden zeigen negative Lernverläufe (b =-0.8-4.0), die anteilig durch studiendesignbedingte Deckeneffekte erklärt werden können. Die Befunde der Pilotstudie deuten auf eine grundsätzliche Einsatzmöglich-keit der Tests im Feld hin, sodass sie bei größeren Studien Verwendung finden können.
... Other factors that influence defensibility include rater training, procedural guidance (e.g., frequency or timing of assessment), and guidelines for the interpretation of data to inform the decision. For example, research has begun to delineate essential procedures that improve the quality of decision making within academic curriculum-based measures (e.g., number of times per week, influence of goal line on intervention determination; Christ et al., 2013;Van Norman & Parker, 2018) and rater training on progress monitoring tools for behavioral intervention (e.g., collection schedules for direct behavior ratings; Chafouleas et al. 2015). New research has informed guidelines on how to interpret data to inform intervention (e.g., ordinate scaling on single-case graphs; Dart & Radley, 2017). ...
Article
Teachers are often called upon to identify students at behavioral and emotional risk by completing a variety of assessment tools. However, many teachers may lack the requisite skills to reliably identify students at risk or use data derived from assessment tools to inform intervention. A series of trainings was developed to improve decision making on the Intervention Selection Profile–Social Skills, with a focus on improving accuracy and use of data. Specifically, a two-study randomized controlled design was employed to evaluate the efficacy of a basic informational training and a training with a practice component with regards to a control condition on the collection and use of social–emotional assessment data on the ISP-SS. Results suggest limited influence of training on the accuracy of data collection, yet significant influence on improving how data are used to inform intervention. Implications for practice and research, as well as limitations, are discussed.
... PM schedules can differ in how frequently probes are administered and in the number of CBM probes administered at each data collection time point. Both of these factors, combined with the length of time across which data are collected, affect the accuracy with which CBM data represent a student's true performance (Christ et al., 2013). Christ and colleagues (2013) suggest that a minimum of 8 to 12 weeks of data is necessary to accurately use CBM for instructional decision making, depending on the quality of the data set (i.e., the size of the error residuals of the data). ...
Article
Curriculum-based measurement (CBM) is a systematic, ongoing assessment framework that allows special educators to monitor students’ progress and determine the need for instructional adaptations. Jenkins and colleagues examined the accuracy and timeliness of six different schedules of CBM progress monitoring (PM). The authors found that weekly and intermittent PM schedules were similarly accurate and timely. This study replicated and extended the work of Jenkins and colleagues by examining the accuracy and timeliness of different PM schedules for 51 students with disabilities. Results indicated that the accuracy and timeliness of the PM schedules for the current sample was poorer than the accuracy and timeliness reported by Jenkins and colleagues. In line with the results of the original study, however, these results indicated that intermittent PM schedules sufficiently predicted student true growth compared to weekly PM schedule. Implications for research and practice are discussed.
... Follow-up research done by Ardoin (2006) suggested that this number is closer to 20 data points across 10 or 12 weeks. Even more recently, Christ, Zopluoglu, Monaghen, and Van Norman (2013) noted that it could take 2 to 4 months to make meaningful estimates on interventions gains when progress monitoring once per month, depending on the standard error of estimate of the measure. As Ardoin (2006) mentioned, this is not a problem if the intervention is working, but this can be detrimental for students who have a less-than-optimal intervention. ...
Article
Brief experimental analysis (BEA) is frequently used to drive intervention selection decisions for students in need of intensive reading fluency intervention. Researchers have demonstrated that most BEA results for students with reading fluency difficulties are undifferentiated when considering the standard error of measurement (SEM) of curriculum-based measurement reading (CBM-R) passages. When confronted with this situation, practitioners must rely on some other characteristic of interventions to drive selection decisions. Considering intervention efficiency alongside effectiveness may be one way to inform selection decisions in the face of undifferentiated BEA results. The current study was undertaken to determine how frequently BEA results are undifferentiated when considering the SEM of CBM-R passages and demonstrate how efficiency data may be considered in tandem with effectiveness data to inform intervention selection. One-trial brief experimental analyses (OTBEAs) consisting of three reading interventions and a control condition were conducted with 14 1st-11th (M = 4th) graders in the southeastern region of the United States. Results indicated the confidence interval of the most effective intervention overlapped with that of the control condition 50% of the time and that of a more efficient intervention 37% of the time. Limitations and future directions are also discussed.
... Finally, as with the example of progress monitoring for students, the precision of estimates of change in practices over time should be expected to play a major role in planning schedules of observation and data-based decision making, including considerations for the duration of data collection, as well as the density (number of data collected per week; e.g., Christ, Zopluoglu, Long, & Monaghen, 2012;Christ, Zopluoglu, Monaghen, & Van Norman, 2013;Jenkins, Graff, & Miglioretti, 2009). Knowing that measurement error will reduce the confidence we can place in decisions based on estimated rates of change, it is necessary to determine how many classroom observations should be completed before data would be of sufficient reliability to be helpful in roles such as feedback for individual teachers during coaching (e.g., Reinke, Lewis-Palmer, & Martin, 2007) or as indicators for coach decision making (e.g., Fabiano, Reddy, & Dudek, 2018). ...
Article
Full-text available
Despite growing interest in formative assessment of teacher practices, research on rates of change in teachers’ practices is sparse. This is the first study to examine the characteristics of observed change in classroom practices using the Classroom Strategies Assessment System (CSAS) across alternative schedules of data collection during instructional coaching. Our primary objectives included examining: (1) the magnitude, variability, and precision of estimates of average rates of change in teacher practices, and (2) the impact of data collection duration (i.e.. number of weeks of data collection) and density (i.e., the number of classroom observations per week) on the precision of estimates of rates of change over time. A sample of teachers (N = 63) participating in instructional coaching were observed 14 times during coaching using the CSAS. Findings revealed a significant gradual improvement in strategy use, with significant between-teacher variation in rates of change. The frequency of observations was associated with the precision of estimates for average rates of change across teachers and for individual teachers, providing initial guidance on minimum number of observations required to monitor change in practice over time. Impact and Implications: This study offers a first look at teacher formative assessment using an observational instrument designed to monitor progress in teaching strategies during the process of instructional coaching. The results suggest a minimum frequency of data collection that might be necessary to gather useful information about rates of change in teachers’ use of strategies.
... The outcomes of this study suggest that the accuracy of predictions can be improved significantly if one collects data for longer periods of time. This finding is consistent with previous research documenting a clear association between the precision of OLS trend lines, the number of data points available to estimate growth, and the number of weeks progress was monitored (Christ, 2006;Christ et al., 2013;Willett, 1989). Relatedly, the precision of predictions seems to be influenced by how far one wishes to forecast performance into the future. ...
Article
Estimating a trend line through words read correct per minute scores collected across successive weeks is a preferred method to evaluate student response to instruction with curriculum-based measurement of reading (CBM-R). This is due in part, because the slope of that line of best fit is used to predict the trajectory of student performance if the current intervention is maintained. In turn, trend lines should predict future scores with a high degree of accuracy when an intervention is maintained. We evaluated the forecasting accuracy of a trend estimation method currently used in practice (i.e., ordinary least squares), and five alternate methods recently evaluated in CBM-R simulation studies, using actual student data. Results suggest that alternate trend estimation methods predicted future performance with a similar level of accuracy as ordinary least squares trend lines across most conditions, with the exception of slopes estimated via Bayesian analysis. Bayesian trend lines estimated using informed prior distributions yielded noticeably less biased and more precise predictions when applied to short data series relative to all other estimation methods across most conditions. Outcomes from the current study highlight the need to further explore the viability of Bayesian analysis to evaluate individual time series data.
... That is, in addition to how an intervention is changed, there are the issues of when to administer the assessments, and when to change the instruction, both of which are malleable factors of the intervention delivery. Research has focused on the assessment schedule of progress monitoring, when to assess and for how long, under the premise that assessment schedule is a necessary component for consequential validity of progress monitoring outcomes (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). And a general rule of thumb used by researchers is that instruction should be adapted if three to five consecutive data points fall above or below the aim line (Burns et al., 2010), but current research indicates that these rules should be abandoned (Jenkins & Terjeson, 2011). ...
Article
We used data from the 2014–2015 easyCBM assessment system to explore the applied reading intervention characteristics in a sample of 3,074 Grade 1 students (and 5,145 interventions) in school districts applying a multitiered systems of support (MTSS) framework. We describe the number of interventions, number of assessments, the intervention start dates, curricula, instructional strategies, tier, group size, frequency, dosage, total time, and quantitative intensity. We found variance across all instructional variables, with 156 curricula and 59 instructional strategies applied. Based on our data, a “typical” intervention was a Tier 2 intervention that began before October, was delivered for 30 minutes/day for 5 days/week in a group with three to five students, was changed once if at all, and student progress was most likely monitored with word reading fluency measures.
... The use of IRIs and CBM-Rs to gather data on students' reading performance is recommended for improving instructional match and has been linked to increased student achievement (January & Ardoin, 2015). Hence, ensuring the quality of data gathered from the assessment has been deemed a priority (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). The taxonomy of scorers' mismarkings that reduce the precision with which students' true scores can be estimated suggests work remains to be done in this area. ...
Article
Full-text available
Informal reading inventories (IRI) and curriculum-based measures of reading (CBM-R) have continued importance in instructional planning, but raters have exhibited difficulty in accurately identifying students’ miscues. To identify and tabulate scorers’ mismarkings, this study employed examiners and raters who scored 15,051 words from 108 passage readings by students in Grades 5 and 6. Word-by-word scoring from these individuals was compared with a consensus score obtained from the first author and two graduate students after repeated replaying of the audio from the passage readings. Microanalysis conducted on all discrepancies identified a cumulative total of 929 mismarkings (range = 1–37 per passage) that we categorized in 37 unique types. Examiners scoring live made significantly more mismarkings than raters scoring audio recordings, t(214) = 4.35, p = .0001, with an effect size of d = 0.59. In 98% of the passages, scorers disagreed on the number of words read correctly—the score used for screening and progress monitoring decisions. Results suggest that IRIs and CBM-Rs may not be accurate as diagnostic tools for determining students’ particular word-level difficulties.
... For instance, there is considerable research on effective interventions and the use of curriculum-based measures (CBM) to monitor student progress in reading but far less research on such tools in other academic areas (McMaster, Parker, & Jung, 2012). Even under the best conditions such as using multiple assessments, which increases CBM reliability, the data can still result in unreliable decisions (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). Third, RTI implementation at the secondary level is not well understood (Vaughn & Fletcher, 2012) and is implemented less frequently than in elementary schools (Spectrum K-12 School Solutions, 2010). ...
... Therefore, the interventionist may incorrectly conclude that the current intervention is not working when in reality the instrument is not sufficiently sensitive to the student's rate of improvement in the targeted skill. Moreover, decisions based upon limited CBM-R data (less than 12-14 observations across three months) often yields inaccurate decisions regarding student progress (Christ, Zopluoglu, Monaghen, & Van Norman, 2013). The precision of CBM-R growth estimates depends upon the number of datapoints collected, variability of observations, and duration of data collection (Christ, 2006). ...
Article
Full-text available
Interventionists often monitor the progress of students receiving supplemental interventions with general outcome measures (GOMs) such as curriculum-based measurement of reading (CBM-R). However, some researchers have suggested that interventionists should collect data more closely related to instructional targets, specific subskill mastery measures (SSMMs) because outcomes from GOMs such as CBM-R may not be sufficiently sensitive to gauge intervention effects. In turn, interventionists may prematurely terminate an effective intervention or continue to deliver an ineffective intervention if they do not monitor student progress with the appropriate measure. However, such recommendations are based upon expert opinion or studies with serious methodological shortcomings. We used multi-variate multilevel modeling to compare pre-intervention intercepts and intervention slopes between GOM and SSMM data collected concurrently in a sample of 96 first, 44 second, and 53 third grade students receiving tier 2 phonics interventions. Statistically significant differences were observed between slopes from SSMM consonant-vowel-consonant words and CBM-R data. Statistically significant differences in slopes were not observed for consonant blend, digraph or consonant-vowel-consonant-silent e (CVCe) SSMMs. Results suggest that using word lists to monitor student response to instruction for early struggling readers is beneficial but as students are exposed to more complex phonetic patterns, the distinction between SSMMs and CBM-R become less meaningful.
... First, both should be aware of the impact that DPPXYR has on visual analysis of single-case data. Visual analysis is frequently performed to determine response to academic and behavioral interventions (e.g., Barnett et al., 2006;Christ et al., 2013;Munger, Snell, & Loyd, 1989). This document is copyrighted by the American Psychological Association or one of its allied publishers. ...
Article
Research based on single-case designs (SCD) are frequently utilized in educational settings to evaluate the effect of an intervention on student behavior. Visual analysis is the primary method of evaluation of SCD, despite research noting concerns regarding reliability of the procedure. Recent research suggests that characteristics of the graphic display may contribute to poor reliability and overestimation of intervention effects. This study investigated the effect of increasing or decreasing the data points per x- to y-axis ratio (DPPXYR) on rater evaluations of functional relation and effect size in SCD data sets. Twenty-nine individuals (58.6% male) with experience in SCD were asked to evaluate 40 multiple baseline data sets. Two data sets reporting null, small, moderate, and large intervention effects (8 total) were modified by manipulating the ratio of the x- to y-axis (5 variations), resulting in 40 total graphs. Results indicate that raters scored effects as larger as the DPPXYR decreased. Additionally, a 2-way within-subjects analysis of variance (ANOVA) revealed a significant main effect of DPPXYR manipulation on effect size rating, F(2.11, 58.98) = 58.05, p < .001, η2 = .675, and an interaction between DPPXYR manipulation and magnitude of effect, F(6.71, 187.78) = 11.45, p < .001, η2 = .29. Overall, results of the study indicate researchers and practitioners should maintain a DPPXYR of .14 or larger in the interest of more conservative effect size judgments.
... A final reason for differences in growth rates might be the study design, more specifically, the duration, and schedule employed in the studies. Christ et al. (2013) found that the stability of growth trajectories produced by reading-aloud scores differed with duration (number of weeks) and schedule (weekly vs. biweekly data collection). Espin et al. (2010) and Tichá et al. (2009) collected data over a short duration (10 weeks) using a dense schedule (weekly), whereas Tolar et al. (2012) collected data over a long duration (school year) using a less dense schedule (every 6-8 weeks). ...
Article
Full-text available
The technical adequacy of CBM maze-scores as indicators of reading level and growth for seventh-grade secondary-school students was examined. Participants were 452 Dutch students who completed weekly maze measures over a period of 23 weeks. Criterion measures were school level, dyslexia status, scores and growth on a standardized reading test. Results supported the technical adequacy of maze scores as indicators of reading level and growth. Alternate-form reliability coefficients were significant and intermediate to high. Mean maze scores showed significant increase over time, students’ growth trajectories differed, and students’ initial performance levels (intercepts) and growth rates (slopes) were not correlated. Maze reading level and growth were related to reading level and/or growth on criterion measures. A nonlinear model provided a better fit for the data than a linear model. Implications for use of CBM maze-scores for data-based decision-making are discussed.
Article
Full-text available
One of the main goals of the teacher and the school system as a whole is to close learning gaps and support children with difficulties in learning. The identification of those children as well as the monitoring of their progress in learning is crucial for this task. The derivation of comparative standards that can be applied well in practice is a relevant quality criterion in this context. Continuous normalization is particularly useful for progress monitoring tests that can be conducted at different points in time. Areas that were not available in the normalization sample are extrapolated, closing gaps in applicability due to discontinuity. In Germany, teachers participated in a state-funded research project to formatively measure their children’s spelling performance in primary school. Data (N = 3000) from grade two to four were scaled, linked and translated into comparative values that can be used in classrooms independently from specific times. The tests meet the requirements of item response models and can be transferred well to continuous norms. However, we recommend using the 10th or 20th percentile as cut-off points for educational measures, as the 5th percentile is not discriminating enough.
Article
Full-text available
The purpose of this study was to measure and describe students’ learning development in mental computation of mixed addition and subtraction tasks up to 100. We used a learning progress monitoring (LPM) approach with multiple repeated measurements to examine the learning curves of second-and third-grade primary school students in mental computation over a period of 17 biweekly measurement intervals in the school year 2020/2021. Moreover, we investigated how homogeneous students’ learning curves were and how sociodemographic variables (gender, grade level, the assignment of special educational needs) affected students’ learning growth. Therefore, 348 German students from six schools and 20 classes (10.9% students with special educational needs) worked on systematically, but randomly mixed addition and subtraction tasks at regular intervals with an online LPM tool. We collected learning progress data for 12 measurement intervals during the survey period that was impacted by the COVID-19 pandemic. Technical results show that the employed LPM tool for mental computation met the criteria of LPM research stages 1 and 2. Focusing on the learning curves, results from latent growth curve modeling showed significant differences in the intercept and in the slope based on the background variables. The results illustrate that one-size-fits-all instruction is not appropriate, thus highlighting the value of LPM or other means that allow individualized, adaptive teaching. The study provides a first quantitative overview over the learning curves for mental computation in second and third grade. Furthermore, it offers a validated tool for the empirical analysis of learning curves regarding mental computation and strong reference data against which individual learning growth can be compared to identify students with unfavorable learning curves and provide targeted support as part of an adaptive, evidence-based teaching approach. Implications for further research and school practice are discussed.
Article
Replication studies in special education are necessary to strengthen the foundation upon which instruction and intervention for students with disabilities are built. J. Jenkins et al. (2017) found intermittent reading fluency progress monitoring schedules did not delay decision-making and were similar in decision-making accuracy to the traditional weekly progress monitoring schedule. Results of the current pilot study, although underpowered, conceptually replicated the original claims and extended their work by investigating their questions in the area of mathematics computation. Implications for research and practice are shared.
Article
Full-text available
There are still many unanswered questions regarding the application of response to intervention (RTI) to making eligibility decisions for specific learning disabilities (SLD). Both federal regulations and research support that students identified with SLD using RTI should be deficient in both level of academic functioning and rate of growth in response to scientifically based instruction. To date, there is little research examining whether these eligibility criteria are predictive in identifying students with SLD by evaluation teams in schools. Two studies conducted in different states examined if level of academic performance and rate of improvement (ROI) using curriculum-based measurement in reading (CBM-R) predicted student eligibility for special education. Logistic regression results indicated that level of performance predicted special education eligibility across sites and that ROI did not. Implications for research and practice are discussed.
Article
Despite evidence that frequent progress monitoring to identify children at-risk of delays and inform early intervention services improves child outcomes, this practice is rare in infant–toddler settings where children could benefit the most from early intervention. Using a descriptive research design within an Implementation Science framework, we evaluated how 10 community-based infant–toddler agencies implemented a standardized progress monitoring assessment using a web application to monitor children’s growth and identify children at-risk for delay. An Implementation Index was developed to quantify implementation progress for each agency, which included their percent of tasks completed, and rate of task implementation over time. Staff turnover and high staff:child ratios were associated with low implementation of progress monitoring. The Implementation Index differentiated between agencies that otherwise demonstrated similar implementation rates. Implications for supporting progress monitoring and other evidence-based practices in community-based infant–toddler childcare settings are discussed.
Article
This study explored the validity of growth on two computer adaptive tests, Star Reading and Star Math, in explaining performance on an end-of-year achievement test for a sample of students in Grades 3 through 6. Results from quantile regression analyses indicate that growth on Star Reading explained a statistically significant amount of variance in performance on end-of-year tests after controlling for baseline performance in all grades. In Grades 3 through 5, the relationship between growth on Star Reading and the end-of-year test was stronger among students who scored higher on the end-of-year test. In math, Star Math explained a statistically significant amount of variance in end-of-year scores after statistically controlling for baseline performance in all grades. The strength of the relationship did not differ among students who scored lower or higher on the end-of-year test across grades.
Chapter
The reauthorization of the Individuals with Disabilities Act (IDEA) in 2004 and passage of the Every Student Succeeds Act in 2015 have increasingly led to a paradigm shift in both education and school psychology practice (Fan et al. Contemp Sch Psychol 20:383–391, 2016). This paradigm shift has slowly allowed for school psychologists to expand beyond the confines of their traditional roles of assessor and tester and assume expanded leadership roles at the school and district levels (Eagle et al. J Educ Psychol Consult 25:160–177, 2015; National Association of School Psychologists, Building capacity for student success every student succeeds act opportunities: Engaging school psychologists to improve multi-tiered systems of support. Author, Bethesda, 2016). Despite school psychologists increasingly assuming such leadership roles, many in the field continue to engage in the “traditionalistic” isolated practices of conducting psychoeducational evaluations and writing reports. Historically, over 50% of school psychologist’s work has been consumed by heavy psychoeducational assessment caseloads (Brown et al. Psychol Rep 98:486–496, 2006; Stoiber and Vanderwood, J Educ Psychol Consul 18:264–292, 2008). However, as more schools shift away from reactionary models of providing academic, behavioral, and social-emotional services, there is an increased need to utilize school psychologists’ unique training in preventative practices to lead efforts under the framework of Multi-Tiered Systems of Support (MTSS).
Article
Bayesian regression has emerged as a viable alternative for the estimation of curriculum-based measurement (CBM) growth slopes. Preliminary findings suggest such methods may yield improved efficiency relative to other linear estimators and can be embedded into data management programs for high-frequency use. However, additional research is needed, as Bayesian estimators require multiple specifications of the prior distributions. The current study evaluates the accuracy of several combinations of prior values, including three distributions of the residuals, two values of the expected growth rate, and three possible values for the precision of slope when using Bayesian simple linear regression to estimate fluency growth slopes for reading CBM. We also included traditional ordinary least squares (OLS) as a baseline contrast. Findings suggest that the prior specification for the residual distribution had, on average, a trivial effect on the accuracy of the slope. However, specifications for growth rate and precision of slope were influential, and virtually all variants of Bayesian regression evaluated were superior to OLS. Converging evidence from both simulated and observed data now suggests Bayesian methods outperform OLS for estimating CBM growth slopes and should be strongly considered in research and practice.
Article
The current availability of research examining the precision of single-skill mathematics (SSM) curriculum-based measurements (CBMs) for progress monitoring is limited. Given the observed variance in administration conditions across current practice and research use, we examined potential differences between student responding and precision of slope when SSM-CBMs were administered individually and in group (classroom) conditions. No differences in student performance or measure precision were observed between conditions, indicating flexibility in the practical and research use of SSM-CBMs across administration conditions. In addition, findings contributed to the literature examining the stability of SSM-CBMs slopes of progress when used for instructional decision-making. Implications for the administration and interpretation of SSM-CBMs in practice are discussed.
Article
Full-text available
Curriculum-based measurement (CBM) represents a critical strategy for data-based decisionmaking within educational settings. Visual analysis is frequently used to analyze CBM data; thus, CBM vendors often automatically generate graphs based on student data to facilitate analysis. Differences in graph formatting are apparent across CBM vendors, and currently, the extent to which these differences may influence rater interpretation of data is unknown. As such, the current study sought to evaluate whether there were differences in rater evaluation of presence and magnitude of an intervention effect when identical data were plotted on graphs of four CBM vendors: AIMSweb, FastBridge Learning, easyCBM, and DIBELS Next. Results of the study indicated that visual analysts rated. Results of the study found that probability of identifying the presence of a treatment effect was similar across vendors. However, differences emerged with respect to ratings of magnitude of effect, with raters indicating the greatest magnitude of intervention effect for easyCBM and DIBELS Next graphs. As data graphs generated by CBM vendors may be utilized by school personnel to make decisions regarding the continuation, discontinuation, or alteration of intervention procedures, the results of the study indicate the need for increased consideration in the manner in which data are presented visually such that accurate decisions are made regarding student progress and response to intervention.
Article
This study examined the effect of progress monitoring frequency and scoring metric on curriculum-based measurement of written expression (CBM-WE) progress monitoring estimates. The writing progress of 116 second-grade students receiving a classwide writing fluency intervention in their general education classrooms was monitored across 13 weeks (i.e., 1 week of baseline and 12 weeks during the intervention) using CBM-WE. The writing samples were scored for total words written, words spelled correctly, correct writing sequences, and correct minus incorrect writing sequences. Repeated measures analyses of variance (ANOVAs) revealed that progress monitoring frequency (weekly, bimonthly, monthly, or every 6 weeks) had a small effect on intercept and slope but a moderate effect on standard error of the estimate (SEE) and standard error of the slope (SEb). Scoring metric had a moderate effect on intercept but a small effect on slope, SEE, and SEb. Students’ intercepts were not related to their slopes across any scoring metrics. Overall, the results of this study suggest that monitoring students’ writing progress every 6 weeks may reduce error while preserving their intercept and slope estimates.
Article
Comparatively little research exists on single-skill math (SSM) curriculum-based measurements (CBMs) for the purpose of monitoring growth, as may be done in practice or when monitoring intervention effectiveness within group or single-case research. Therefore, we examined a common variant of SSM-CBM: 1 digit × 1 digit multiplication. Reflecting how such measures are often used in contemporary research and practice, we examined the comparative reliability of three representative SSM-CBM set sizes of 8, 16, and 32 unique problems. In a separate study, we investigated the possible benefit of stratifying problems within operation and probe relative to random assignment. Findings suggest that SSM-CBM slope reliability benefits from explicit stratification and that set size is a relevant consideration. Implications for the selection and interpretation of SSM-CBMs when engaging in practice and research are discussed.
Article
Full-text available
Response to Intervention (RtI) is a commonly used framework to identify students in need of additional or specialized instruction. Special education eligibility decisions within RtI rely on the assumption that there are subpopulations of students: those who demonstrate appropriate growth and those who do not demonstrate appropriate growth, when provided specialized instruction. The purpose of the present study was to illustrate the use of random-effects mixture models (RMMs) to estimate the likely number of (unobserved) subpopulations within one curriculum-based measurement of oral reading (CBM-R) progress monitoring dataset. The dataset comprised second grade students’ CBM-R data collected weekly over 20 weeks. RMMs were fit with several numbers of classes, and a two-class model best fit the data. Results suggest that RMMs are useful to understand subpopulations of students who need specialized instruction. Results also provide empirical support to some extent for the use of a dual-discrepancy model of learning disability identification within RtI.
Article
In an online experiment, a sample of N = 109 pre-service teachers were presented with 14 graphs mimicking graphs used in curriculum-based measurement. Graphs depicted a student’s weekly test scores for the first part of a semester, and participants were instructed to use the graphs to predict students’ achievement at the end of the semester. Relative to a linear regression model, participants generally tended to underestimate future achievement (i.e., predictions were negatively biased). Predictions were more negatively biased when data variability was low rather than high, when improvement was steep rather than flat, and when the most recent score indicated a performance upturn as opposed to downturn. The results are interpreted in the light of models of judgmental anchoring (Kahneman & Tversky, 1973; Mussweiler & Strack, 1999). Implications for practice are discussed.
Article
Full-text available
The purpose of this study was to demonstrate the use of Generalizability (G) theory as an alternative method of validating direct behavioral measures. Reliability and validity from a classic test score theory are explored and rephrased in terms of G theory. Two studies that used oral reading fluency measures within a curriculum-based measurement (CBM) approach are examined with G theory. Results indicate that CBM oral reading fluency measures are highly dependable and can be reliably used to make both between individual (nomothetic) and within individual (idiographic) decisions.
Article
Full-text available
The aim of this study was to investigate the effect of instructions of curriculum-based measurement (CBM) of reading on (a) the number of words read correctly and incorrectly per minute and (b) the relationship between CBM reading and reading achievement. Results indicated that the specific instructions used have a significant impact on CBM reading outcomes. Statistically significant mean differences were found among the fast, best, and baseline reading conditions in the number of words read correctly and in the number of errors. Correlations between words read correctly per minute and a test of reading achievement were statistically significant and substantial for all three conditions, but differences among their correlations were not. These results underscore the importance of using standardized instructions on CBM results both within and across settings. Implications of these results for the responsiveness-to-intervention method for identifying children with learning difficulties are discussed.
Article
Full-text available
Generalizability (G) theory was used with a sample of 37 third-grade students to assess the variability in words correct per minute (WCPM) scores caused by student skill and passage variability. Reliability-like coefficients and the SEM based on a specific number of assessments using different combinations of passages demonstrated how manipulating probe variability could reduce measurement error. Results showed that 81% of the variance was due to student skill, 10% was due to passage or probe variability, and 9% was due to unaccounted sources of error. Reliability-like coefficients ranged from .81 to .99, and SEMs ranged from 18 to 4 WCPM depending on the ea number of probes given. When passage variability was controlled, SEMs were decreased and ranged from 12 to 4 WCPM. Results indicated that WCPM scores yield high reliability-like coefficients but also have a large SEM that can be reduced by administering multiple alternate passages. Discussion focuses on conducting research designed to identify more equivalent passages in order to reduce erroneous relative and absolute decisions.
Article
Full-text available
This study examined the effects of controlling the level of difficulty on the sensitivity of repeated curriculum-based measurement (CBM). Participants included 99 students in Grades 2 through 5 who were administered CBM reading passage probes twice weekly over an 11-week period. Two sets of CBM reading progress monitoring materials were compared: (a) grade level material that was controlled for difficulty, and (b) uncontrolled randomly selected material from graded readers. Students' rate of progress in each progress monitoring series was summarized for slope, standard error of estimate, and standard error of slope. Re- sults suggested that controlled reading passages significantly reduced measure- ment error as compared to uncontrolled reading passages, leading to increased sensitivity and reliability of measurement.
Article
Full-text available
Curriculum-based measurement of oral reading (CBM-R) is used to index the level and rate of student growth across the academic year. The method is frequently used to set student goals and monitor student progress. This study examined the diagnostic accuracy and quality of growth estimates derived from pre-post measurement using CBM-R data. A linear mixed effects regression model was used to simulate progress-monitoring data for multiple levels of progress-monitoring duration (6, 8, 10, ⋯, 20 weeks) and data set quality, which was operationalized as residual/error in the model (σε= 5, 10, 15, and 20). Results indicate that the duration of instruction, quality of data, and method used to estimate growth influenced the reliability and precision of estimated growth rates, in addition to the diagnostic accuracy. Pre-post methods to derive CBM-R growth estimates are likely to require 14 or more weeks of instruction between pre-post occasions. Implications and future directions are discussed.
Article
Full-text available
Curriculum-based measurement of oral reading (CBM-R) is frequently used to set student goals and monitor student progress. This study examined the quality of growth estimates derived from CBM-R progress monitoring data. The authors used a linear mixed effects regression (LMER) model to simulate progress monitoring data for multiple levels of progress monitoring duration (i.e., 6, 8, 10 . . . 20 weeks) and data set quality, which was operationalized as residual/ error in the model (σϵ = 5, 10, 15, and 20). The number of data points, quality of data, and method used to estimate growth all influenced the reliability and precision of estimated growth rates. Results indicated that progress monitoring outcomes are sufficient to guide educational decisions if (a) ordinary least-squares regression is used to derive trend lines estimates, (b) a very good progress monitoring data set is used, and (c) the data set comprises a minimum of 14 CBMs-R. The article discusses implications and future directions.
Article
Full-text available
Examined the forecasting accuracy of 2 slope estimation procedures (ordinary-least-squares regression and split-middle trend lines) for reading curriculum-based measurement (CBM), a behavioral approach to the assessment of academic skills that emphasizes the direct measurement of academic behaviors. Ss were 19 2nd–6th graders receiving special education services. Results support the superiority of ordinary-least-squares slope estimates for reading CBM data. This study extended the work of M. R. Shinn et al (see record 1990-06155-001) by using moving forecasts in which a sequence of slope estimates, forecasts, and forecast errors was generated by each S. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The purpose of this study was to demonstrate the use of Generalizability (G) theory as an alternative method of validating direct behavioral measures. Reliability and validity from a classic test score theory are explored and rephrased in terms of G theory. Two studies that used oral reading fluency measures within a curriculum-based measurement (CBM) approach are examined with G theory. Results indicate that CBM oral reading fluency measures are highly dependable and can be reliably used to make both between individual (nomothetic) and within individual (ideographic) decisions. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Compared 2 methods of measuring slope of student skill acquisition: split middle (SPM) and ordinary least squares (OLS). 20 mildly handicapped 2nd–6th graders' reading progress was monitored for 1 school year, using curriculum-based measurement procedures. Special education teachers plotted Ss' performance data on equal-interval graphs. SPM and OLS procedures were applied to the data to estimate trend in Ss' progress and to predict future performance. Results indicate that OLS provided a more precise estimate of the slope of Ss' progress than SPM. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
In this article, the authors review the research on curriculum-based measurement (CBM) in reading published since the time of Marston's 1989 review. They focus on the technical adequacy of CBM related to measures, materials, and representation of growth. The authors conclude by discussing issues to be addressed in future research, and they raise the possibility of the development of a seamless and flexible system of progress monitoring that can be used to monitor students' progress across students, settings, and purposes.
Article
Full-text available
This study examined several aspects of Passage Reading Fluency (PRF) including performance variability across passages alternative designs for measuring PRF gain, and effects on PRF level from retesting with the same passages. Participants were 33 students from grades 2 to 10 attending a school for students with learning disabilities. PRF was measured at three test points. Time-2 tests occurred 10 weeks after time-1 tests, and time-3 tests occurred 5 weeks after the time-2 tests. At Test points 2 and 3, students read old passages (same-passage design) and new passages (different-passage design). Results showed substantial individual variation on concurrent PRF measures, smaller variation in gains measured with the same-passage design, and no passage memory effects (i.e., from retested passages). Results are discussed in relation to measuring reading gains in Response to Intervention models.
Article
There are relatively few studies that evaluate the quality of progress monitoring estimates derived from curriculum-based measurement of reading. Those studies that are published provide initial evidence for relatively large magnitudes of standard error relative to the expected magnitude of weekly growth. A major contributor to the observed magnitudes of standard error is the inconsistency of passage difficulty within progress monitoring passage sets. The purpose of the current study was to evaluate and estimate the magnitudes of standard error across an experimental passage set referred to as the Formative Assessment Instrumentation and Procedures for Reading (FAIP-R) and two commercially available passage sets (AIMSweb and Dynamic Indicators of Basic Early Literacy Skills [DIBELS]). Each passage set was administered twice weekly to 68 students. Results indicated significant differences in intercept, weekly growth, and standard error. Estimates of standard error were smallest in magnitude for the FAIP-R passage set followed by the AIMSweb and then DIBELS passage sets. Implications for choosing a progress monitoring passage set and estimating individual student growth are discussed.
Article
Four simple line-fitting procedures are presented for practitioners to quickly summarize student time series performance data. Two are in common use — Koenig's “quarter-intersect” and White's “split-middle” adjustment, while two — Tukey I and Tukey II — are less known. Each of the four can be performed on a medium-size classroom dataset in less than 3 minutes. The four procedures were assayed against three criteria: (a) matching line slopes to an ordinary least squares (OLS) standard; (b) “best fit” to the data (minimizing residuals); and (c) prediction of a future reading score at Week 16. Weekly oral reading fluency data were collected on 45 Grade 4 and 5 students with reading disabilities, over a period of 12 weeks. Tukey I and II techniques generally outperformed the Koenig and White line-fitting methods, especially White's “split-middle” adjustment. Performance differences were generally large enough to be educationally meaningful. Given the small database supporting the popular White and Koenig procedures, the authors recommend that practitioners cautiously try out Tukey I and II procedures, comparing results with Koenig's and White's procedures. Of course, further psychometric studies of all four procedures are needed also. The authors discuss three notable study limitations: limited generalizability, use of the future score prediction criterion, and no instructional use of the best-fit lines.
Article
The purpose of this article is to illustrate how one well-developed, technically strong measurement system, curriculum-based measurement (CBM), can be used to establish academic growth standards for students with learning disabilities in the area of reading. An introduction to CBM and to the basic concepts underlying the use of CBM in establishing growth standards is provided. Using an existing database accumulated over various localities under typical instructional conditions, the use of CBM to provide growth standards is illustrated. Next, normative growth rates under typical instructional conditions are contrasted with CBM growth rates derived from studies of effective practices. Finally, based on these two data sets, issues and conclusions about appropriate methods for establishing academic growth rates using CBM are discussed.
Article
Curriculum-based measurement of oral reading fluency (CBM-R) is an established procedure used to index the level and trend of student growth. A substantial literature base exists regarding best practices in the administration and interpretation of CBM-R; however, research has yet to adequately address the potential influence of measurement error. This study extends results of Hintze and Christ (2004) by incorporating research-based estimates of the standard error of the estimate (SEE) to generate likely magnitudes for the standard error of the slope (SEb) across a variety of progress monitoring durations and measurement conditions. Fourteen progress monitoring durations (2-15 weeks) and nine levels of SEE (2, 4, 6, 8, 10, 12, 14, 16, 18) were used to derive SEb. The outcomes are discussed in relation to assessment practices, such as selecting optimal progress monitoring durations to reduce measurement error. Implications and limitations are discussed.
Article
The current aptitude-treatment interaction (ATI) approach to accommodating individual differences has not proved instructionally useful. In this paper, reasons for this failure are identified and considered, and the desirable characteristics of an alternative model are identified. Formative evaluation is offered as a promising alternative that addresses individual uniqueness rather than dimensionalized individual differences. The notion that the failure of ATI research to be practically useful in designing instruction justifies ignoring individual difference is rejected.
Book
Data sets and errata are available on ResearchGate. The book is available from Sage: https://us.sagepub.com/en-us/nam/longitudinal-data-analysis-for-the-behavioral-sciences-using-r/book234770
Article
Curriculum-based measurement (CBM) is an approach for assessing the growth of students in basic skills that originated uniquely in special education. A substantial research literature has developed to demonstrate that CBM can be used effectively to gather student performance data to support a wide range of educational decisions. Those decisions include screening to identify, evaluating prereferral interventions, determining eligibility for and placement in remedial and special education programs, formatively evaluating instruction, and evaluating reintegration and inclusion of students in mainstream programs. Beyond those fundamental uses of CBM, recent research has been conducted on using CBM to predict success in high-stakes assessment, to measure growth in content areas in secondary school programs, and to assess growth in early childhood programs. In this article, best practices in CBM are described and empirical support for those practices is identified. Illustrations of the successful uses of CBM to improve educational decision making are provided.
Article
This study was designed to examine the effects of setting and method accuracy in Curriculum-Based Assessment (CBA), based on Cone's (1981, 1987) elaboration of a methodology for validating behavioral assessment procedures. The effects of who administered the assessment (teacher vs. author), the physical location of the assessment (reading group vs. teacher's desk vs. office outside classroom), and whether the performance was timed or untimed were examined for oral reading rates of 26 third- and fourth-grade students in a regular education classroom. The effects of these conditions on the number of words read correctly per minute (WC) and the percentage of errors (%E) were examined in three separate studies. Results showed significant effects for the tester, setting, and task demand variables on WC scores and significant effects for the task demand variables on%E scores. Implications of these results are addressed, and the needed research in this area is discussed.
Article
Monte carlo studies are being used in item response theory (IRT) to provide information about how validly these methods can be applied to realistic datasets (e.g., small numbers of examinees and multidimensional data). This paper describes the conditions under which monte carlo studies are appropriate in IRT-based re search, the kinds of problems these techniques have been applied to, available computer programs for gen erating item responses and estimating item and exam inee parameters, and the importance of conceptualizing these studies as statistical sampling experiments that should be subject to the same principles of experimen tal design and data analysis that pertain to empirical studies. The number of replications that should be used in these studies is also addressed.
Article
Monte Carlo studies in item response theory have been used in a number of ways, for example, to evaluate new parameter estimation procedures, to compare item analysis programs, and to study the effects of multidimensional data on parameter estimation. These studies typically rely on simple descriptive methods to analyze Monte Carlo results, implying that complex effects are unlikely to be detected or their magnitudes estimated. These problems are exacerbated when Monte Carlo studies lack an experimental design to guide the data analyses. The results from two Monte Carlo studies in item response theory are analyzed with inferential methods to illustrate the strengths of these procedures. It is recommended that researchers in item response theory employ both descriptive and inferential methods to analyze Monte Carlo results.
Article
Research and policy have established that data are necessary to guide decisions within education. Many of these decisions are made within problem solving and response to intervention frameworks for service delivery. Curriculum-Based Measurement in Reading (CBM-R) is a widely used data collection procedure within those models of service delivery. Although the evidence for CBM-R as a screening and benchmarking procedure has been summarized multiple times in the literature, there is no comprehensive review of the evidence for its application to monitor and evaluate individual student progress. The purpose of this study was to identify and summarize the psychometric and empirical evidence for CBM-R as it is used to monitor and evaluate student progress. There was an emphasis on the recommended number of data points collected during progress monitoring and interpretive guidelines. The review identified 171 journal articles, chapters, and instructional manuals using online search engines and research databases. Recommendations and evidence from 102 documents that met the study criteria were evaluated and summarized. Results indicate that most decision-making practices are based on expert opinion and that there is very limited psychometric or empirical support for such practices. There is a lack of published evidence to support program evaluation and progress monitoring with CBM-R. More research is required to inform data collection procedures and interpretive guidelines.
Article
Presented is an empirically oriented, data based program modification (DBPM) manual for individualizing educational plans for any child with a learning or behavioral problem. The rationale for an empirically based program, the socio-legal context, and specific measurement and evaluation procedures (e.g. time series procedures and discrepancy measurement) are described in Part I. Covered in Part II is the sequencing of initial assessment and in Part III a program planning sequence is provided. Program implementation, adjustment, and certification are discussed in Parts IV, V, and VI. Consultation, training, and the indirect role of the resource teacher are treated in Part VII. Featured throughout is the application of DBPM to the case of a hypothetical child. Three appendixes provide appropriate questions for each decision area of the DBPM, case report summaries, and a list of change strategies. (BB)
Article
There are relatively few studies that evaluate the quality of progress monitoring estimates derived from curriculum-based measurement of reading. Those studies that are published provide initial evidence for relatively large magnitudes of standard error relative to the expected magnitude of weekly growth. A major contributor to the observed magnitudes of standard error is the inconsistency of passage difficulty within progress monitoring passage sets. The purpose of the current study was to evaluate and estimate the magnitudes of standard error across an experimental passage set referred to as the Formative Assessment Instrumentation and Procedures for Reading (FAIP-R) and two commercially available passage sets (AIMSweb and Dynamic Indicators of Basic Early Literacy Skills [DIBELS]). Each passage set was administered twice weekly to 68 students. Results indicated significant differences in intercept, weekly growth, and standard error. Estimates of standard error were smallest in magnitude for the FAIP-R passage set followed by the AIMSweb and then DIBELS passage sets. Implications for choosing a progress monitoring passage set and estimating individual student growth are discussed. (PsycINFO Database Record (c) 2010 APA, all rights reserved) (journal abstract)
Article
Educational accountability and its counterpart, high-stakes assessment, are at the forefront of the educational agenda in this era of standards-based reform. In this article, we examine assessment and accountability in the context of a prevention-oriented assessment and intervention system designed to assess early reading progress formatively. Specifically, we explore the utility of a continuum of fluency-based indicators of foundational early literacy skills to predict reading outcomes, to inform educational decisions, and to change reading outcomes for students at risk of reading difficulty. First, we address the accountability era, discuss the promise of prevention-oriented assessment, and outline a continuum of fluency-based indicators of foundational reading skills using Dynamic Indicators of Basic Early Literacy Skills and Curriculum-Based Measurement Oral Reading Fluency. Next, we describe a series of linked, short-term, longitudinal studies of 4 cohorts examining the utility and predictive validity of the measures from kindergarten through 3rd grade with the Oregon Statewide Assessment-Reading/Literature as a high-stakes reading outcome. Using direct measures of key foundational skills, predictive validities ranged from. 34 to. 82. The utility of the fluency-based benchmark goals was supported with the finding that 96% of children who met the 3rd-grade oral reading fluency benchmark goal met or exceeded expectations on the Oregon Statewide Assessment, a high-stakes outcome measure. We illustrate the utility of the measures for evaluating instruction, modifying the instructional system, and targeting children who need additional instructional support to achieve benchmark goals. Finally, we discuss the instructional and policy implications of our findings and their utility in an active educational accountability environment.
Article
Discusses proposed solutions to the challenges of developing assessment practices for mildly handicapped students, including curriculum-based assessment (CBA) and in particular 1 type of CBA—curriculum-based measurement—that offers technical adequacy for making decisions with students. A descripiton of the development and use of local norms for use in eligibility decisions concerning learning disabled, mildly mentally retarded, and emotionally disturbed is presented. It is asserted that local norms provide a consistent and continuous database that links the data collected for screening and eligibility purposes to student progress decisions. Implications of the use of these measures on the practice of school psychology are addressed. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Examined the differential effects of using grade vs goal level reading material on curriculum-based measurement (CBM) progress-monitoring procedures. Participants included a total of 80 students, 20 each from Grades 1–4. CBM reading passage probes from both grade and goal level material were administered to all students, twice weekly during an 11-wk period. Students' rate of progress in each level of materials was indexed using the slope of their data series calculated by ordinary least-squares regression. Results indicate that the amount of progress observed (i.e., slope of improvement) varied as a function of grade and whether student progress was monitored in grade or goal level material. (PsycINFO Database Record (c) 2012 APA, all rights reserved)