ArticlePDF Available

Using Learning Progressions to Design Vertical Scales that Support Coherent Inferences about Student Growth

Authors:

Abstract and Figures

The concept of growth is at the foundation of the policy and practice around systems of educational accountability. It is also at the foundation of what teachers concern themselves with on a daily basis as they help children learn. Yet there is a disconnect between the criterion-referenced intuitions that parents and teachers have for what it means for students to demonstrate growth and the primarily norm-referenced metrics that are used to infer growth. One way to address this disconnect would be to develop vertically linked score scales that could be used to support both criterion-referenced and norm-referenced interpretations, but this hinges upon having a coherent conceptualization of what it is that is growing from grade to grade. In this paper, a learning-progression approach to the conceptualization of growth and the subsequent design of a vertical score scale is proposed and illustrated in the context of the Common Core State Standards for Mathematics.
Content may be subject to copyright.
This article was downloaded by: [Derek C. Briggs]
On: 09 July 2015, At: 08:54
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: 5 Howick Place, London, SW1P 1WG
Click for updates
Measurement: Interdisciplinary Research
and Perspectives
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/hmes20
Using Learning Progressions to Design
Vertical Scales that Support Coherent
Inferences about Student Growth
Derek C. Briggsa & Frederick A. Pecka
a University of Colorado at Boulder
Published online: 02 Jul 2015.
To cite this article: Derek C. Briggs & Frederick A. Peck (2015) Using Learning Progressions to
Design Vertical Scales that Support Coherent Inferences about Student Growth, Measurement:
Interdisciplinary Research and Perspectives, 13:2, 75-99, DOI: 10.1080/15366367.2015.1042814
To link to this article: http://dx.doi.org/10.1080/15366367.2015.1042814
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
Measurement, 13: 75–99, 2015
Copyright © Taylor & Francis Group, LLC
ISSN: 1536-6367 print / 1536-6359 online
DOI: 10.1080/15366367.2015.1042814
FOCUS ARTICLE
Using Learning Progressions to Design Vertical Scales that
Support Coherent Inferences about Student Growth
Derek C. Briggs and Frederick A. Peck
University of Colorado at Boulder
The concept of growth is at the foundation of the policy and practice around systems of educational
accountability. It is also at the foundation of what teachers concern themselves with on a daily basis
as they help children learn. Yet there is a disconnect between the criterion-referenced intuitions that
parents and teachers have for what it means for students to demonstrate growth and the primarily
norm-referenced metrics that are used to infer growth. One way to address this disconnect would
be to develop vertically linked score scales that could be used to support both criterion-referenced
and norm-referenced interpretations, but this hinges upon having a coherent conceptualization of
what it is that is growing from grade to grade. In this paper, a learning-progression approach to
the conceptualization of growth and the subsequent design of a vertical score scale is proposed and
illustrated in the context of the Common Core State Standards for Mathematics.
Keywords: growth, vertical scaling, learning progressions, educational accountability
More than 10 years have passed since the advent of No Child Left Behind, and if anything has
changed about the nature of educational accountability it is the increasing emphasis on using evi-
dence of growth in student learning to evaluate the efficacy of teachers and schools. To a great
extent this represents an improved state of affairs, since it implicitly recognizes that it is unfair
to compare teachers on the basis of what their students have achieved at the end of a school
year without taking into consideration differences in where the students began at the outset. Yet
when researchers build models to quantify the contribution of teachers to growth in student learn-
ing, growth does not always mean what laypeople naturally think it means. This can lead to
fundamental misunderstandings.
Correspondence should be addressed to Derek C. Briggs, School of Education, University of Colorado at Boulder,
Campus Box 249, Boulder, CO 80309-0249. E-mail: Derek.Briggs@colorado.edu
Color versions of one or more of the figures in this article can be found online at www.tandfonline.com/hmes.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
76 BRIGGS AND PECK
FIGURE 1 Growth, effectiveness and two hypothetical teachers.
To appreciate why, consider the graphic shown in Figure 1. The axes of the plot represent
scores from the same test given at the beginning of a school year (pretest on the horizontal axis)
and the end of a school year (pretest on the vertical axis). The ellipses within the plot capture dif-
ferent collections of data points corresponding to the students of two different teachers, Teacher
A and Teacher B. The dashed line at the 45-degree angle indicates a score on the posttest that
is identical to a score on the posttest. To keep the scenario simple, assume each teacher has the
same number of students. On this basis of this data collection design, two researchers are asked
to compare the teachers and make a judgment as to which is better. Researcher 1 computes the
average test score gains for both groups of students and gets identical numbers. This researcher
concludes that students in each classroom have grown by the same amount, hence neither teacher
can be inferred to be better than the other. This can be seen in Figure 1 by noticing that each
teacher’s class of students has about the same proportion of data points above the dashed line
(indicating a pre to post gain) as they do below (indicating a pre to post loss). Researcher 2 takes
a different approach. This researcher takes all the available data for both classes of students and
proceeds to regress posttest scores on pretest scores and an indicator variable for Teacher B. The
parameter estimate for the Teacher B indicator variable is large and statistically significant. This
can be seen in Figure 1 by noticing that the regression line (solid black line) passing through the
data ellipse for Teacher B is higher (has a larger y-intercept) than the regression line for Teacher
A. Researcher 2 concludes that B is the better teacher because given how they scored on the
pretest, the students of Teacher B scored higher than the students of Teacher A. Which researcher
is right?
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 77
Many readers will have immediately recognized the example above as a retelling of Lord’s
Paradox (Lord, 1967) with the classrooms of Teachers A and B substituted for males and females
and test scores substituted for weight. Holland and Rubin (1983) reconciled Lord’s Paradox by
essentially pointing out that the 2 cases involved analyses pertaining to fundamentally different
causal inferences. The same logic can be used for the example above. Researcher 1 is inferring the
effect of Teacher B relative to Teacher A through a comparison of average score gains. Researcher
2 is inferring the effect of Teacher B relative to Teacher A by comparing the average difference
in posttest scores for those students with the same pretest scores. Both researchers could argue
that they are making comparisons on the basis of student growth. Researcher 1 defines growth as
the change in magnitude from pretest to posttest. Researcher 2 defines growth as the increment
in achievement we would predict if 2 students with the same pretest score had Teacher B instead
of Teacher A. Which one has come to the right conclusion about the effect of one teacher relative
to the other?
Most of the growth and value-added models that play a central role in teacher evaluation
follow the approach of Researcher 2 (cf. Chetty, Friedman, & Rockoff, 2014; Kane & Staiger,
2008; McCaffrey, Lockwood, Koretz, & Hamilton, 2003). A root of considerable confusion in the
interpretation of estimates from such models is that important stakeholders in K–12 education—
teachers, parents, the general public—assume that inferences about effectiveness derive from the
sort of approach taken by Researcher 1. Put differently, judgments about the quality of a student’s
schooling are not based on direct estimates of the amount that a student has learned but, rather,
on how well a student has performed relative to peers who are comparable with respect to vari-
ables such as prior achievement, free and reduced-price lunch status, race/ethnicity, and so on.
Yet while econometricians and statisticians may notice and appreciate the distinction between
growth as measured by differences in quantity versus growth as inferred by normative compar-
ison, teachers, parents, and the general public do not. And to some extent, this misconception
is encouraged by the way results from these models are presented. Consider, for example, the
Policy and Practitioner Brief released by the Measures of Effective Teaching Project (MET) enti-
tled Ensuring Fair and Reliable Measures of Effective Teaching. In the Executive Summary, the
first key finding is presented as follows:
Effective teaching can be measured. We collected measures of teaching during 2009–10. We adjusted those mea-
sures for the backgrounds and prior achievement of the students in each class. But, without random assignment,
we had no way to know if the adjustments we made were sufficient to discern the markers of effective teaching
from the unmeasured aspects of students’ backgrounds. In fact, we learned that the adjusted measures did identify
teachers who produced higher (and lower) average student achievement gains following random assignment in
2010–11. The data show that we can identify groups of teachers who are more effective in helping students learn.
Moreover, the magnitude of the achievement gains that teachers generated was consistent with expectations.
(MET Project, 2013, pp. 4–5, italics added for emphasis)
The Measures of Effective Teaching Policy Brief was very intentionally written for a general
audience of policymakers and practitioners in education. Note that in the passage above “learn-
ing” is equated to “achievement gains.” Since student achievement is typically inferred from test
performance, most readers of this policy brief would interpret achievement gains as implying test
score gains. The larger the magnitude of test score gains, the more that a student has learned.
However, this reading of the passage above would be incorrect. The MET study was able to show
that differences in prior estimates of teacher value-added was strongly predictive of differences
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
78 BRIGGS AND PECK
in relative student achievement following random assignment. Teachers flagged as effective only
produced “gains” in the sense that their students scored higher, on average, than they would have
had they instead been assigned to a less effective teacher. In this context then, a score “gain”
could plausibly mean a true decrease in learning that was less than expected. Notions of growth
in such contexts are fundamentally normative; effective and ineffective teachers are guaranteed
to be found in any population of teachers—whether the actual amount of student learning is high,
low, or even nonexistent.
For another example, this time with individual students as the units of analysis, consider the
way that growth is communicated in Colorado (and many other states) using student growth
percentiles (SGPs) computed using the Colorado Growth Model (CGM; Betebenner, 2009). The
publicly available tutorial about the CGM can be found at http://www.cde.state.co.us/schoolview/
growthmodeltutorials; also see Castellano and Ho (2013a,2013b). In a nutshell, an SGP attempts
to show how a student’s achievement at the end of the year compares with that of other students
who started the year at the same level. SGPs can be interpreted as indicating the probability of
observing a score as high or higher than a student’s current score, given what has been observed
on all of the student’s prior scores. A student with an SGP of 75 has a current-year test score
that is higher than 75% of peers with a comparable test score history. It follows that the prob-
ability of observing a score this high or higher for any student with a comparable test score
history is 25%. An SGP supports inferences about growth in the sense that if 2 students started
at the same achievement level at the beginning of the year and one scores higher than the other
on a test at the end of the year, it seems reasonable to infer that the student with the higher
score has demonstrated more growth. Betebenner and colleagues have also made it possible to
weave criterion-referenced information into the CGM by comparing each student’s SGP to her
adequate growth percentile—the growth percentile that would be needed to achieve a desired per-
formance level on a test. This makes it possible to answer the question, Is the growth a student
has demonstrated good enough relative to the standards that have been established and enacted
by the state?
Yet results from the Colorado Growth Model are also easy to misinterpret. Many teachers and
parents are likely to equate a student’s score with “math knowledge.” Teachers and parents with
this interpretation would think that a student’s score should be steadily increasing across grades.
If presented with a scenario in which a student has a score of 500 across grades 6 through 8, it
would be natural for a parent to think that the student has not “learned anything” during these
years. However, if the meaning of a score of 500 changes every year, this would not be a correct
inference.
SGPs can easily be misunderstood as “changes in math knowledge”—that is, “amount of learn-
ing.” For example, if a student has an SGP of 90, of 75, and of 60 across grades 6 through 8, it
would be natural for a parent to interpret this to mean that the student is learning less in grade
8 than in grade 7, and less in grade 7 than in grade 6. But such an inference would be impossible
to support on the basis of SGP comparisons alone. For all its advantages, the CGM cannot be
used to infer whether the amount a student has learned in the most recent year is significantly
more or less than the amount a student learned in the past year.
Nonetheless, there is good reason to suspect that parents and teachers are implicitly encour-
aged to use it in this manner, as illustrated by the plot in Figure 2. This exemplar plot is made
available to parents in order to help them interpret their child’s SGP. The vertical consists of scale
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 79
FIGURE 2 Example of a student growth report in Colorado.
Source. http://www.schoolview.org/documents/ISR_explanation.pdf
scores in mathematics, organized into proficiency levels. The thresholds for these levels change
from grade to grade because Colorado has standards that become more and more difficult to reach
as students enter middle school. Grade levels are shown along the x-(horizontal) axis. Below the
horizontal axis are scale scores and SGPs. Data points are represented by small circles that indi-
cate the student’s scale score and a gray gradation indicates proficiency levels, shown along the
y-(vertical) axis. The location of the small circle thus indicates a student’s scale score in a given
grade and where that scale score is located relative to 3 proficiency-level thresholds. Note that
in addition to the circles there are color-coded arrows that indicate whether a student’s SGP in a
given grade is “low” (1–34), “typical” (35–65), or “high” (66–99).
The “next year” spot on the x-axis is meant to reflect the most likely proficiency levels of the
student if the student were to have a low, typical, or high SGP in the following year. Visually, the
first thing a parent is likely to interpret is the trajectory implied by the collective slopes of the
individual arrows, and the height of the bar segments in the “next year” prediction. The visual
interpretation suggests that the student represented in this plot showed flat or slightly negative
growth between grades 6 and 7, positive growth between grades 7 and 8, and negative growth
between grades 8 and 9. If the student has positive growth between grades 9 and 10, he or she
will fall within the proficient performance level; if the student has flat or negative growth the he
or she will fall within the partially proficient performance level. Across grades 6 through 9, the
overall growth trajectory appears relatively flat. Since the likely interpretation of this trajectory
is “change in knowledge,” it appears that the student has learned nothing between grades 6 and
8. This inference is supported by the direction and color of the arrows that constitute the trajec-
tory: the downward-pointing red arrows support the inference that the student endured 2 years
of negative growth, which was compensated by 1 year of positive growth indicated by the green
upward-pointing arrow. The overall picture is that the student’s knowledge has not really changed.
In education and in life there is a constant tension between norm- and criterion-referenced
interpretations. Neither can be sustained in perpetuity without eventually encountering the need
to invoke the other. In this article we argue that normative interpretations about student growth
and teacher effectiveness need to be complemented by criterion-referenced interpretations about
how much and of what?How much has my child grown this year? How much more has she
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
80 BRIGGS AND PECK
grown relative to last year? What did my child learn and how can the effectiveness of my child’s
teacher be quantified relative to the amount that was learned? In theory, the best way to answer
such questions would be through the development of tests that could be expressed on vertically
linked scales. In the next section we explain why, to date, vertical scaling appears to have been
unsuccessful at meeting such ambitions. In the section that follows, we propose a new approach
to the design of vertical scales that is premised upon a priori hypotheses about growth in the form
of a learning progression hypothesis. In a nutshell, our argument is that meaningful criterion-
referenced interpretations of growth magnitudes can only be supported when they follow from a
coherent conceptualization of what it is that is growing over time. To speak of a student’s growth
in “mathematics” is incoherent, because mathematics is just a generic label for the content domain
of interest and not an attribute for which it makes sense to speak of a student having more or less.
A benefit of designing a vertical scale according to a learning progression is that it becomes
possible to speak about growth in terms of specific knowledge and skills that are hypothesized
to build upon one another over time. We illustrate this using a learning progression that shows
how students develop the knowledge and skills necessary to be able to analyze and reason about
proportional relationships.
SOME BACKGROUND ON CONVENTIONAL VERTICAL-SCALING
METHODOLOGY
The conventional method for creating a vertical scale is documented in books1such as
Educational Measurement (4th edition); Test Equating, Linking, and Scaling; and The Handbook
of Test Development. Although there are a number of different ways to create a vertical score
scale, the approach generally consists of 2 interdependent stages: a data collection stage and a
data calibration stage. In the data collection stage, the key design principle is to select a set of
common test items (also known as “linking” items) that will be administered to students across
2 or more adjacent grade levels (e.g., grades 3 through 4 or grades 3 though 8). This is in contrast
to a unique test item, which would only be administered to students at any single grade. In some
designs, the common items consist of an external test given to students across multiple grades; in
others they consist of an external test given only across adjacent grades; and in others they con-
sist of items embedded within operational test forms. Once item responses have been gathered for
representative students at each grade level, the next task is to analyze differences in performance
on the common items. These differences become the basis for the data calibration stage. In order
to calibrate the responses from students at different grade levels onto a single scale, either the abil-
ity of the students, or the characteristics of the items (e.g., difficulty) needs to be held constant
across grades. Since growth in student ability across successive grades is the underlying basis
for the vertical scale, the only reasonable is to hold the item characteristics constant. There are
2 known approaches to accomplishing this, Thurstone Scaling (Thurstone, 1925,1927) and Item
Response Theory scaling (IRT; Lord & Novick, 1968; Rasch, 1960). IRT-based methods are by
far the predominant approach and have been used since the mid-1980s. The selling point of IRT
is the property of parameter invariance, which will hold so long as the assumption of local inde-
pendence has been satisfied and the data can be shown to fit the item response function that has
1See, respective of titles, Kolen, 2006, pp. 171–180; Kolen & Brennan, 2004, pp. 372–414; Young, 2006; pp. 469-485.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 81
been specified. Parameter invariance is the critical property of IRT models that makes it possible
to establish values for the characteristics of common items that do not depend on the particu-
lar group of students responding to them. When parameter invariance holds, the same difficulty
parameter will be estimated for an item whether it is administered to a 3rd grade student or an
8th grade student. An even stronger invariance property, that of invariance of comparisons (i.e.,
specific objectivity), must hold when specifying the Rasch Model, and this can have implications
for claims that a scale has equal intervals (Briggs, 2013).
Much of the research literature on vertical scales has focused on choices that must be made in
the calibration of the scale (cf. Skaggs & Lissitz, 1986). Two choices in particular have received
considerable attention: the functional form of the IRT model, and the manner in which tests
scores across grades are concatenated. The first choice is typically a contrast between the use of
the 3 parameter logistic model (3PLM; Birnbaum, 1968) or the Rasch Model (Rasch, 1960). The
second choice is a contrast between a separate or concurrent calibration approach. In the separate
approach, item parameters are estimated separately for each grade-specific test. Then a base grade
for the scale is established and other grades are linked to the base grade after estimating linking
constants for each grade-pair using the Stocking-Lord approach (Stocking & Lord, 1983). In the
concurrent approach, all item parameters are estimated simultaneously. Although there is very
little in the way of consensus in the research literature about the best way to calibrate a vertical
scale, when different permutations of approaches have been applied to create distinct scales from
the same data, this application has been shown to have an impact on the magnitudes of grade-
to-grade growth (Briggs & Weeks, 2009). One message that has been communicated by this
research base is that there is no “right answer” when it comes to creating a vertical scale. If this
message is taken to its extreme, it implies that nonlinear transformations can be employed to the
scale following the calibration stage to produce whatever depiction of growth is most desirable
to stakeholders, since no one depiction can be said to be more accurate than the other.
In a review of the vertical scaling practices among states as of 2009, Dadey and Briggs (2012)
found that 21 out of 50 states had vertically scaled criterion-referenced assessments spanning
grades 3 through 8. Notably, Dadey and Briggs found no evidence that those states with verti-
cal scales used their scales to make inferences about criterion-referenced growth at the student,
school, or state levels. In many cases, it appears that states did not actually trust the aggregate
inferences about student growth implied by their vertical scales. For one of the more ironic exam-
ples, Colorado, the originators of the norm-referenced Colorado Growth Model, also expressed
its criterion-referenced tests in math and reading along a vertical scale. This fact would come
as a surprise to most Colorado educators,2because grade-to-grade scale score gains are never
emphasized in conjunction with the reporting of SGPs. In another instance, as part of the process
of designing their vertical scale, contractors for the state of Arizona applied a nonlinear transfor-
mation to ensure that grade-to-grade–reading-score scale means would increase monotonically,
even though the empirical evidence prior to applying the transformation indicated that students in
some upper grades had performed slightly worse on items that were common to the lower grade.
2Indeed, the second author of this paper, who taught high school mathematics in Colorado as recently as 2012–2013,
was completely unaware that math scale scores in Colorado had been linked vertically until informed of this by the first
author. This even extends to personnel at the Colorado Department of Education who work in the educational accountabil-
ity group, who on one occasion in correspondence with the first author insisted that Colorado’s tests were not vertically
scaled.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
82 BRIGGS AND PECK
One possible explanation for the reluctance of states to use their vertical scales to report growth
in terms of grade-to-grade changes in magnitudes is that there is a disconnect between the infor-
mation about growth that such scales imply and the intuitive expectations about growth that are
common among teachers, parents, and the public—the primary audience for the communica-
tion of growth—namely, the intuitive expectation that as students learn they build a larger and
larger repertoire of knowledge and skills that they can use to navigate the world around them.
As such, irrespective of the subject in which this repertoire of knowledge and skills is to be mea-
sured, from year to year one would expect to see significant evidence of growth. In contrast to
this intuition, many vertical scales show evidence of a large deceleration of growth, particularly
as students transition from the elementary school grades to the middle school grades (Dadey &
Briggs, 2012; Tong & Kolen, 2007). In addition, because the concept of growth borrows so heav-
ily from the analogy of measuring height, it is intuitive to believe that the interpretation of gains
from one to grade to another along a vertical scale do not depend upon a student’s initial location
on the scale. Indeed, Briggs (2013) argues that the premise of equal-interval interpretations has
been central to the way that some testing companies have marketed the advantages to creating a
vertical scale.
One response to this disconnect between intuition and practice is to say that both of the intu-
itions described above are wrong or at least in some sense misguided (Yen, 1986). For example,
it could be argued that if students were tested repeatedly across grades to make inferences about
their ability to decode and extract meaning from selected vocabulary words in a reading passage,
then larger gains would be observed in the early grades of a child’s schooling when decoding
is a focus of instruction and these gains would be smaller in the later grades when the instruc-
tional focus shifts from “learning to read” to “reading to learn.” Similarly, it could be argued
that there is nothing inherent to the process of creating a vertical scale that would guarantee that
the scale has equal-interval properties. Because of this, statements along the lines of Student X
has grown twice as much as Student Y are meaningless unless both students started at the same
baseline—which brings us back to normative growth inferences.
The problem with the approach of discrediting “faulty” intuitions in this manner is that it
defeats the purpose of creating a vertical scale in the first place. In the first example we have a
clear instance of construct underrepresentation if a test claims to measure “reading” or “English
Language Arts” yet really only measures the decoding of words. This would explain why growth
decelerates but would certainly not validate the inferences about growth that were purported.
In the second example, if a vertical scale can only support inferences about ordinal differences
among students, why create the vertical scale at all? As Briggs (2013) argues, the purpose of
vertical scales is to facilitate inferences about changes in magnitude with respect to a common
unit of measurement. The warrant behind this use is the assumption that changes along any point
of the scale have an equal-interval interpretation. Therefore, to validate that a given vertical scale
can be used for its intended purpose, evidence must be presented to support the equal-interval
assumption.
We take the position that the best way to move the science behind vertical scaling forward
is to place a greater emphasis on design issues. In making this case we are essentially sounding
the same drum that was first pounded in the National Research Council’s 2001 report Knowing
What Students Know (Pellegrino, Chudowsky, & Glaser, 2001), which emphasized that principled
assessment design always involves an implicit model of cognition and learning. Yet while this
message has resulted in some important improvements in assessment design over the past decade
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 83
(e.g., the application of “Evidence Centered Design” principles; Mislevy, Steinberg, & Almond,
2002), it is less clear that the message has had much influence on the design of vertical scales.
In the next section we use the subject area of mathematics to illustrate an approach to vertical
scale design that is premised on what we call a learning progression conceptualization of growth.
USING LEARNING PROGRESSION HYPOTHESES TO DESIGN VERTICAL SCALES
Domain sampling versus learning progression conceptualizations of growth
Fundamental to the development of large-scale assessments for use in systems of educational
accountability is a collection of content-specific targets for what students are expected to know
and be able to do within and across grades. At present, through their participation in 1 of the
2 large-scale assessment consortia (the Partnership for Assessment of Readiness for College and
Careers [PARCC] and the Smarter Balanced Assessment Consortium [SBAC]), many American
states are using the Common Core of State Standards for Mathematics and English Language
Arts (CCSS-M and CCSS-ELA) as the basis for these targets. A good case can be made that
the Common Core of State Standards is especially amenable to the creation of vertical scales to
support inferences about growth because these standards were written with any eye toward how
students’ knowledge and skills in mathematics and English language arts would be expected to
become more sophisticated over time. However, there are still 2 different ways that the concept
of growth could be conceptualized before choosing a data collection design that could result in
the calibration of a vertical scale. These different conceptualizations are illustrated in Figure 3 in
the context of mathematics.
The left side of Figure 3 contains planes that are intended to encompass what it means to be
“proficient” or “on track for college and career readiness in mathematics” at a given grade level
(e.g., grade 3). Within each plane are light-colored shapes, and within each shape is a series of
FIGURE 3 Different construct conceptualizations and implications for
growth.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
84 BRIGGS AND PECK
dots. The shapes are meant to represent different “content domains” (e.g., Numerical Operations,
Measurement & Data, Geometry); the dots represent domain-specific performance standards that
delineate grade-level expectations for students (e.g., within the domain of Measurement & Data:
“Generate measurement data by measuring lengths using rulers marked with halves and fourths
of an inch.”). This sort of taxonomy has traditionally been used in the design of large-scale
assessments to deconstruct the often amorphous notion of “mathematical ability” into the discrete
bits of knowledge, skills, and abilities that should, in principle, be teachable within a grade-level
curriculum. Such an approach facilitates the design of grade-specific assessments because test
items can be written to correspond to specific statements about what students should know and
be able to do. The growth target in such designs is not a cognitive attribute of the test taker, but
a composite of many, possibly discrete pieces of knowledge, skills, and abilities. We refer to the
assessment design implied by the left side of Figure 3 as the domain-sampling approach.
Under the domain-sampling approach, the intent is for growth to be interpreted as the extent
to which a student has demonstrated increased mastery of the different domains that constitute
mathematical ability. This is represented by the single arrow indicating movement from the plane
for a lower grade to the plane for a higher grade. Note that if both the domains and the content
specifications within each plane change considerably from grade to grade, then it becomes possi-
ble for students to appear to “grow” even if entirely different content is tested across years. This
is represented in Figure 3 by the fact that 2 domains (circles and triangle shapes) are shown in
each grade, while 1 domain (hexagon shape) is only present in grades xand x+1 and another
(pentagon shape) is only present in grades x+1 and x+2. In the best-case scenario for growth
inferences, considerable thought has been put into the vertical articulation of the changes among
content domains and standards from grade to grade. For example, according to the CCSS-M, a
composite “construct” of mathematical ability could be defined from grade to grade as a function
of 5 content domains and 6 skill domains (i.e., mathematical practices). Yet this leaves ample
room for growth in terms of the composite to have an equivocal interpretation depending upon
the implicit or explicit weighting of the domains in the assessment design and scoring of test
items. Furthermore, the number of items required to make inferences about all CCSS domains
at one point in time in addition to change over time is likely to be prohibitive. The problem of
changing domains over time has been described as the problem of “construct shift” in the context
of research conducted by Joseph Martineau (Martineau, 2004,2005,2006). The basic argument
is that most achievement tests are only unidimensional to a degree. At one point in time for
specific grade level, ignoring minor secondary dimensions is unlikely to cause large distortions
in inferences about student achievement. However, when the nature of the primary and second
dimensions and their relative importance are themselves changing over time, the calibration of
a single unidimensional vertical has much greater potential to lead to distortions about student
growth.
A different basis for a growth conceptualization comes from what we refer to as the learning-
progression approach. Learning progressions have been defined as empirically grounded and
testable hypotheses about how students’ understanding of core concepts within a subject domain
grows and become more sophisticated over time with appropriate instruction (Corcoran, Mosher,
& Rogat, 2009). Learning progressions provide “likely paths” (Confrey, 2012, p. 157) for learn-
ing, along with the instructional activities that support this path. The key feature of learning
progressions is that they are developed by coupling learning theories with empirical studies of
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 85
student reasoning over time. This is in contrast to some curricula that are developed based on dis-
ciplinary logic, or “reductionist techniques to break a goal competence into subskills, based on
an adult’s perspective” (Clements & Sarama, 2004, p. 83). Therefore, while there are many ways
that understanding can develop over time, learning progressions capture particularly robust path-
ways that are supported by both learning theory and empirical studies of learning in situ (Daro,
Mosher, & Corcoran, 2011; Sarama & Clements, 2009). As Daro et al. (2011, p. 45) explain,
“Evidence establishes that learning trajectories are real for some students, a possibility for any
student and probably modal trajectories for the distribution of students.” At the same time, learn-
ing progressions are always somewhat hypothetical, and should be refined over time (Shea &
Duncan, 2013).
This key idea is shown in the right panel of Figure 3, which depicts a hypothesis about the
nature of growth: the way that students’ understanding of some core concept or concepts within
the same domain is expected to become qualitatively more sophisticated from grade to grade.
The notion that this constitutes a hypothesis about growth to be tested empirically is represented
by the question marks placed next to the arrows that link one grade to the next. In contrast to
inferences about growth based on domain sampling, changes in a student’s depth of knowledge
and skills within a single well-defined domain over time are fundamental to a learning progression
conceptualization.
In mathematics, the distinction between, across, and within domain inferences about what
students know and can do is evident in the fact that the CCSS-M makes it possible to view
standards by grade (across-domain emphasis, single point in time) or by domain (within-domain
emphasis, multiple points of time).3Importantly, when math standards from the CCSS are viewed
by domain and by grades 3 through 8, as in Table 1, it becomes evident that there is in fact good
reason to be concerned about the potential for construct shift in how “mathematics” is being
defined between grades 3 through 5 (elementary school) and grades 6 through 8 (middle school).
Notice that the only content domain that remains present across the all 6 grades is geometry. This
TABLE 1
Math Content Domains Associated With Grades 3 to 8 in the CCSS
Grade in Which CCSS Includes Domain
Content Standards by Domain 345678
Operations & Algebraic Thinking X X X
Number & Operations in Base 10 X X X
Number & Operations—Fractions X X X
Measurement & Data X X X
Geometry XXXXXX
Ratios & Proportional Relationships X X
The Number System X X X
Expressions & Equations X X X
Functions X
Statistics & Probability X X X
3http://www.corestandards.org/Math
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
86 BRIGGS AND PECK
is why, well before worrying about technical issues in calibrating a vertical scale, it is important
to first ask whether the vertical scale would allow for inferences about growth over time that
are conceptually coherent. If all the content domains shown in Table 1 were to be the basis for
a domain-sampling approach to the creation of a single vertical scale, what would it mean if a
student grew twice as much between grades 4 and 5 as between grades 5 and 6? At least on the
basis of the CCSS-M content domains, this would seem to be an apples to oranges comparison.
When taking a learning-progression approach, one would eschew the notion of representing
growth with a single composite scale for mathematics across grades 3 through 8 and instead
choose a cluster of standards within a given domain and across a subset of grades as candidates
for quantifying growth. So, for example, a single learning progression might be hypothesized
with respect to how students in grades 3 through 5 become increasingly sophisticated in the way
that they reason and model numbers and operations that involve fractions. After designing and
calibrating a vertical scale associated with this learning progression, 2 different pieces of infor-
mation could be provided to a 4th grade student: a number summarizing the student’s composite
achievement across all math content domains tested in grade 4 and a measure pertinent to the
student’s growth along the vertical scale for numbers and operations that involve fractions. To be
clear, these 2 numbers would derive from 2 different scales for 2 different purposes: one scale
to characterize achievement status across domains and another scale to measure growth within a
single, well-defined domain.
Example: A learning progression for proportional reasoning
The content domains in the CCSS-M, and the ways they are expected to change across grades
as a function of their standards, provide a starting point for math education researchers and
psychometricians—working together—to flesh out learning-progression hypotheses. As stated
in the online introduction to the CCSS-M.4
What students can learn at any particular grade level depends upon what they have learned before.
Ideally then, each standard in this document might have been phrased in the form, “Students who
already know A should next come to learn B.” But at present this approach is unrealistic—not least
because existing education research cannot specify all such learning pathways. Of necessity therefore,
grade placements for specific topics have been made on the basis of state and international compar-
isons and the collective experience and collective professional judgment of educators, researchers,
and mathematicians. One promise of common state standards is that over time they will allow research
on learning progressions to inform and improve the design of standards to a much greater extent than
is possible today.
The last sentence of this paragraph is important because it makes clear that within-domain con-
tent standards (“clusters” of standards) in the CCSS-M are unlikely to serve as an adequate
basis for a learning progression without further elaboration and also that the domain concep-
tualizations in the Common Core are by no means sacrosanct as models for student learning.
Finally, this sentence explicitly calls for more research on learning progressions. An encourag-
ing development along these lines is the recent efforts by Jere Confrey and colleagues at North
4http://www.corestandards.org/Math/Content/introduction/how-to-read-the-grade-level-standards
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 87
Carolina State University to “unpack” the CCSS-M in terms of multiple learning progressions—
18 in all (Confrey, Nguyen, Lee, Panorkou, Corley, & Maloney, 2012; Confrey, Nguyen, &
Maloney, 2011). Building on Confrey’s work, we provide an example of a learning progression
for proportional reasoning that could be used to conceptualize growth along a vertical scale.5
Proportional reasoning involves reasoning about 2 quantities, xand y, that are multiplicatively
related. This relationship can be expressed formally as a linear equation in the form y=mx,or
2 value pairs can be expressed as equivalent ratios in the form y1
x1
=y2
x2. For example, the following
questions involves proportional reasoning: (A) If 3 pizzas can feed 18 people, how many pizzas
would you need to feed 30 people? (B) At one table, there are 3 pizzas for 8 people. At another
table, there are 7 pizzas for 12 people. At each table, the people share the pizza equally. Which
table would you rather sit at, if you want to get the most pizza? Question (A) involves finding a
missing value in a proportional situation, and question (B) involves comparing 2 ratios.
The first 5 levels of this progression are based upon a detailed learning progression for
equipartitioning developed by Confrey and colleagues (Confrey, Maloney, Nguyen, Mojica, &
Myers, 2009; Confrey 2012),6while levels 6 and 7 come from a progression developed by Peck
and Matassa to extend the equipartitioning progression into algebra I (Matassa & Peck 2012;
Migozuchi, Peck, & Matassa, 2013). This progression, like all learning progressions, is grounded
in studies of student learning. To develop the equipartitioning progression, Confrey et al. first
engaged in a comprehensive synthesis of the literature related to student learning of rational
numbers. From this, they developed a number of “researcher conjectured” learning progressions
for different aspects of rational number and multiplicative reasoning. One of these aspects was
equipartitioning, which Confrey et al. (2009, p. 347) describe as “behaviors to create equal-sized
groups” in sharing situations; for example, students use equipartitioning to find the fair share
when 7 pizzas are shared by 12 people.
To refine the progression for equipartitioning, they conducted 52 clinical interviews with stu-
dents in grades kindergarten through 6. Peck and Matassa’s work to extend this progression into
middle- and high school followed a similar path of creating a researcher-conjectured progression
based on the research literature and testing and refining it through work with students (Peck and
Matassa conducted classroom design studies rather than clinical interviews for this step). Because
the progression is grounded in studies of student learning, it is not simply an abstract construction
developed by researchers but rather an empirically supported description of learning over time.
The concepts that are developed in this learning progression are foundational for school math-
ematics. The progression begins with equipartitioning, which Confrey and colleagues (Confrey
et al., 2009; Confrey & Smith, 1995) have argued ought to be considered a “primitive” (along with
counting) for the development of fractions, multiplicative reasoning, and proportional reasoning.
Thus, the levels in the equipartitioning portion of the learning progression (Levels 1–5) set the
stage for many of the standards that students are expected to master in elementary school (e.g.,
fair sharing as a basis for division and fractions and reversing the process—i.e., reassembling
5What we show here is a snapshot view of the full learning progression, which is too large to fit on a single page and
is much easier to convey on a website. For the full learning progression, please visit http://www.colorado.edu/education/
cadre/learning-progression
6In the mathematics education literature, the term learning trajectory is typically used in place of learning progression,
and the work of Confrey and colleagues also invokes the trajectory terminology. However, for the sake of consistency, we
use the term progression throughout.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
88 BRIGGS AND PECK
shares into a whole—as a basis for multiplication). Moreover, mastery of equipartitioning sets
the stage for proportional reasoning. This is important because just as equipartitioning provides
a fertile environment for so much subsequent mathematics, so too does proportional reasoning
(Post, Behr, & Lesh, 1988). In fact, the National Council of Teachers of Mathematics identifies
proportional reasoning as one of 5 “foundational ideas” (NCTM, 2000, p. 11) in mathematics
(rate of change—which is also developed in the progression—is another foundational idea). Thus
the progression represents what is arguably the most important thread in elementary- and middle
school mathematics.
Figures 4 and 5present an overview of the 7 distinct levels of the proportional-reasoning learn-
ing progression. The figures are 2 sides of the same coin in that Figure 4 describes, for each level,
the attributes students are mastering in order to demonstrate increasing sophistication in their
proportional reasoning, while Figure 5 describes the essence of the instructional and assessment
activities that can be used both to develop and to gather evidence of mastery. The lowest level of
the learning progression is premised on a student that has just begun to receive formal instruction
in mathematics (perhaps in kindergarten, perhaps in 1st grade) and is being asked to complete
activities that require the first building blocks in the development of proportional reasoning—
sharing collections of objects with a fixed number of people. The highest level of the learning
progression represents the targeted knowledge and skills in mathematics that would be expected
of a student at the end of grade 8. At this level, when faced with problems that involve making
predictions from linear relationships, students are able to apply modified proportional reasoning
to solve for unknowns, to calculate unit rates (the rate at which one quantity changes with respect
to a unit change in a different quantity, e.g., “miles per hour”), and to interpret the algebraic
construct of “slope” flexibly both as a rate of change and as steepness. The levels in between
represent intermediate landmarks for students and teachers to aim for as they move along from
the elementary school grades to the middle school grades.
Note that in this learning progression, at least as it has been initially hypothesized, there is
not a one-to-one relationship between the number of distinct levels of the progression and the
number of grades through which a student will advance over time. It may be the case that as we
gather empirical evidence about student learning along this progression that we discover addi-
tional levels or collapse existing ones. Rather than assigning a single grade with a single level,
we might instead associate grade bands with each level, recognizing that grade designations are
largely arbitrary and that a student’s sophistication in proportional reasoning is likely to depend
upon the quality of focused instruction he or she has received on this concept rather than the age
the student happens to be. Notice also that the levels of the learning progression are not always
defined by standards pulled from a single grade of the CCSS-M. In fact, standards from grade 4 of
the CCSS-M do not fit within this particular progression at all because the grade 4 standards for
fractions and rational number are focused on fraction-as-number. This subconstruct is the focus
of a separate (but related) learning progression based on the synthesis discussed above (Confrey,
2012).
It is the key activities that have been linked to each level of the progression in Figure 5 that
ground proportional reasoning within the curriculum and teaching that are expected to take place
behind classroom doors. These activities also serve as a basis for the design of assessment tasks or
items that could be used in support of both formative and summative purposes. This is facilitated
by the construction of item design templates for each level of the progression. These item design
templates are similar in nature to the design pattern templates associated with evidence-centered
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 89
FIGURE 4 Learning progression for proportional reasoning: Student
attributes.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
90 BRIGGS AND PECK
FIGURE 5 Learning progression for proportional reasoning: Key
activities.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 91
FIGURE 6
Item design template for level 5 of proportional reasoning progression
Title Multiple people sharing multiple wholes
Overview This family of activities involves finding equal shares when there are multiple
items to be shared among multiple “sharers” (e.g., people), and the number of
sharers is not a multiple of the number of items (i.e., some or all of the
individual items will have to be partitioned).
Factors that change the
difficulty of the task
Sharing multiple wholes [p=items; n=sharers]
p=n+1; p=n1
pis odd & n=2j
pnor pis close to n
all p; all n
Task in general form <n><sharers>share <p><items>equally.
Either
Representation given:
The <items>are shown below. Mark the <items>to show how the
<sharers>could share the <items>and shade in one <sharer>’s equal
share. Explain your reasoning.
or
Representation not given:
Find one <sharer>’s equal share. Explain your reasoning.
How many different ways can you use to describe each <sharer>’s share
numerically? Write as many ways as you can think of.
Task in exemplar form Ten chickens share 4 pounds of food.
(a) Find 1 chicken’s equal share. Explain your reasoning.
(b) How many different ways names can you use to describe each chicken’s
share numerically? Write as many ways as you can think of.
design. However, one feature of the templates we develop that makes them unique for the context
of designing a vertical scale is the specification of item design factors that could be purposefully
manipulated to make any given item harder or easier to solve. To illustrate this, and more gener-
ally the way that an item design template is linked to the learning progression, we describe the
attributes of level 5 in more depth, using the exemplar task given in at the bottom of Figure 6 to
ground the discussion.
For attribute 1, students can name a fair share in multiple ways and can explain why the differ-
ent names represent equivalent quantities. In general, this means that students can use different
referent units when naming a share and can coordinate the numerical value with the referent unit.
In the exemplar task, this would result in share names of “1/10 of the four pounds,” “4/10 of one
pound,” or “4/10 pounds per chicken.” For attribute 2, students use and justify multiple strate-
gies when sharing multiple wholes to multiple sharers. In the exemplar task, students might use a
“partition-all” strategy or an “equivalent ratio” strategy (Lamon, 2012). In the partition-all strat-
egy, students would partition each pound into 10ths, and then distribute 1/10 from each pound
to each chicken. In the equivalent-ratio strategy, students would reason that 10 chickens sharing
4 pounds of food results in the same shares as if 5 chickens shared 2 pounds of food, and then
share the food according to this reduced ratio. For attribute 3, students assert, use, and justify
the general principle that whenever pitems are shared by nsharers, the fair shares will have size
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
92 BRIGGS AND PECK
of p/nitems per sharer (or equivalent names as discussed above). In the exemplar task, students
would write a correct name for the fair share and would justify this share by using a strategy as
described above.
The task family implicit in Figure 6 is designed to help students master these attributes, and
also to help test developers and teachers assess student mastery of these attributes. The task can
be varied by changing the number and type of items to be shared as well as the number and type
of sharer. By varying these task features, test developers and teachers can (a) create novel learning
and assessment experiences, (b) vary the difficulty of the task, and (c) create conditions that are
conducive to particular teaching strategies. Perhaps most obviously, for the level-5–task family
the number of objects (p) to be shared and the number of people with whom the objects are to
be shared (n) can be changed (e.g., chocolate bars and people or chicken food and chickens).
This does more than change the surface appearance of the task; it can also adjust the “distance”
between the real-world activity and the mathematical activity. For example, in the chocolate-bars-
and people situation, the real-world activity of breaking chocolate bars and passing out pieces is
closely related to the mathematical activity of partitioning and distributing. For the chicken-food-
and-chickens situation, the activities are less closely related. In this way the task can become
more or less abstract as the items and sharers are varied. The difficulty of the task can also be
varied by changing pand naccording to the schedule given in row 3 of Figure 6 (this progression
of difficulty comes from Confrey, 2012). In classroom settings, teachers could modify pand n
to create conditions that are conducive to particular strategies. For example, situations in which
pand nhave a common divisor are more conducive to the equivalent ratio strategy than are
situations in which pand nare relatively prime.
A fully elaborated item design template would also include scoring rules for constructed-
response items and examples of student responses that would earn different scores. As evidence
is gathered about the ways that students tend to respond to such items, the template could be
extended to include rules or guidelines for writing selected-response items. From the standpoint
of extracting diagnostic information from such items, a particularly compelling feature of such
items might be to give students partial credit for responses that demonstrate mastery of some, but
not all, of the attributes associated with the level to which an item has been written.
Common item-linking designs
A challenge in designing a vertical scale is collecting data on how students at one grade level
would fare when presented with items written for students at a higher or a lower grade level.
There is understandably some concern about overwhelming younger students with items that are
much too hard, or boring older students with items that are much too easy. Adopting a learning
progression as the basis for a common item-linking design has the potential to lessen this con-
cern for 3 reasons. First, because explicit connections are being made between the mathematical
content and practices to which students are exposed from the lower (e.g., elementary school) to
upper (e.g., middle school) anchors of the learning progression, it would no longer be the case
that, for example, the activities at upper levels of a learning progression would be completely
foreign to students at the lower levels. For example, activities at level 6 of the proportional rea-
soning learning progression (see Figure 5) could still involve asking students to devise fair shares
using equipartitioning strategies, a common feature of activities from levels 1 through 5. Second,
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 93
because the items designed for each level of the progression could be manipulated to be easier or
harder, one would naturally expect to see a great deal of overlap in the ability of students to solve
these different item families correctly across grade bands; for example, a very hard level-5 item
might be just as challenging as a very easy level-6 item. This blurring of artificial grade level
boundaries makes it possible to envision field-test designs in which students in adjacent grades
could be given items that span 3 or more hypothesized learning-progression levels, because a level
would not necessarily be equivalent to a grade; for example, while it would surely be unreason-
able to ask 1st-grade students to answer level-6 or level-7 items, it might be entirely reasonable
to pose some of these items to students in 3rd or 4th grade, just as it might be reasonable to pose
level-3 through level-5 items to students in grade 7 or grade 8. Third, as noted previously, there is
no requirement that a vertical scale associated with any given learning progression design would
need to span any set number of grades; for example, instead of building a vertical scale to rep-
resent growth in proportional reasoning across grades 3 through 8, a decision could be made to
create a vertical scale that spans only grades 6 through 8. Indeed, an entirely different learning-
progression hypothesis might be the basis for a another vertical scale that spans grades 3 through
5 or grades 4 through 6, and so on.
DISCUSSION
To recap, the concept of growth is at the foundation of the policy and practice around systems of
educational accountability. It is also at the foundation of what teachers concern themselves with
on a daily basis as they help their students learn. Yet there is a disconnect between the criterion-
referenced intuitions that parents and teachers have for what it means for students to demonstrate
growth and the primarily norm-referenced metrics that are used to communicate inferences about
growth. One way to address this disconnect would be to develop vertically linked score scales
that could be used to support both criterion-referenced and norm-referenced interpretations, but
this hinges upon having a coherent conceptualization of what it is that is growing from grade to
grade. In this paper we have proposed a learning-progression approach to the conceptualization
of growth and the subsequent design of a vertical score scale. We have used the context of the
CCSS-M and the “big idea” of proportional reasoning to give a concrete illustration for what such
a design approach would entail.
In their book Test Equating, Scaling, and Linking, Kolen and Brennan (2004) also distinguish
between 2 different ways that growth could be conceptualized when designing a vertical scale.
They introduce what they call the “domain” and “grade to grade” definitions of growth. In what
they refer to as a domain definition of growth, the term domain is used much more broadly than
we have used it here to encompass the entire range of test content covered by the test battery
across grades. In other words, the domain of a sequence of grade-specific tests of mathematics
as envisioned by Kolen and Brennan would include all the shapes we defined as unique content
domains in Figure 3. In contrast, Kolen and Brennan define grade to grade growth with respect
to content that is specific to 1 grade level but which has also been administered to students at an
adjacent grade level (i.e., all the shapes in Figure 3 that overlap grades). The learning-progression
definition of growth we have illustrated has some similarity to Kolen and Brennan’s domain def-
inition in the sense that a learning-progression design focuses upon growth with respect to a
common definition of focal content across grades. However, the learning-progression approach
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
94 BRIGGS AND PECK
departs from Kolen and Brennan’s domain definition in the emphasis on (a) 1 concept (or col-
lection of related concepts) at a time and (b) how students become more sophisticated in their
understanding and application of this concept as they are exposed to instruction.
A learning-progression approach to design has the potential to address 2 of the concerns that
can threaten the validity of growth inferences on existing vertical scales. The first concern is the
empirical finding that growth decelerates as students enter middle school grades to the point that
it appears that some students have not learned anything from one grade to the next (Briggs &
Dadey, 2015; Dadey & Briggs, 2012). Although such a finding could still persist even when a
vertical scale has been designed on the basis of a learning-progression hypothesis, it would be
easier to rule out construct shift as a plausible cause of score deceleration. If it were to be found,
for example, that students grew twice as fast in their proportional reasoning between grades 5 and
6 relative to between grades 6 and 7, this could raise important questions about the coherence of
curriculum and instruction in grade 7 relative to grade 6. The second concern is that gains along a
vertical scale cannot be shown to have interval properties. Although there is nothing about taking
a learning-progression approach that guarantees a resulting scale with interval properties, there
are in fact novel empirical methods that could be used to evaluate this proposition (Briggs, 2013;
Domingue, 2013; Karabatsos, 2001; Kyngdon, 2011). One of the key design features that could
make test data more likely to approximate the canonical example of an attribute with ratio scale
properties (length) or interval scale properties (temperature), is the presence of external factors
that can be used to predict the empirical difficulty of any given item or the probability of any given
person answering an item correctly. Because such factors are made explicit in the development
of a learning-progression hypothesis, this represents a step in the right direction. At a minimum,
tests designed according to a learning progression would seem to be more likely to fit the Rasch
family of IRT models and thereby inherit some of the desirable invariance properties of such
models (Andrich, 1988; Wright, 1997).
Another key advantage of the learning progression approach is that it can serve as a bridge
between summative and formative uses of assessments. Although there is a great deal of rhetoric
around the need for teachers to make “data-driven” instructional decisions, there is little reason
to believe that teachers are able to extract diagnostic information from the student scores reported
on a large-scale assessment, even when scores are disaggregated into content-specific subscores.
With respect to inferences about growth in particular, finding out that in a normative sense one’s
students are not growing fast enough relative to comparable peers tells a teacher nothing about
what they need to be changing about their instruction. In contrast, if a normative SGP attached
to each student could be accompanied by information about the change and current location
of the students along a vertical scale for proportional reasoning, this would greatly expand the
diagnostic utility of the results; not only would parents and teachers have a sense for how much
a student has grown, but by referencing the canonical items and tasks associated with a student’s
current location, teachers would have actionable insights about what could be done next. Further,
by making item design templates associated with the learning progression publicly available, it
becomes possible for teachers to create and score their own tasks to assess and monitor student
progress at multiple junctures over the course of a school year.
Our focus here on the potential benefits of thoughtfully designed vertical scales is not intended
as a rebuke of the normative inferences fundamental to value-added models or the Colorado
Growth Model. Instead, it is a recognition that neither purely normative nor purely criterion-
referenced growth interpretation are sufficient to answer all the questions parents, teachers, and
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 95
students have about learning in educational settings. Economists and applied statisticians have
made great innovations in the development and research into models that can flag teachers and
schools that appear to be excelling or struggling on the basis of normative comparisons. Similar
innovations in the development and research on vertical scaling have lagged in the psychometric
community. If fundamental questions about how student growth should be conceptualized and
measured are not being taken up among psychometricians, they are likely to remain unanswered
altogether.
Taking a learning-progression approach to design 1 or more vertical scales within a subject
area (i.e., math, English language arts) is not incompatible with the need to also assess the breadth
of student understanding along the full range of the CCSS. Just as the salient distinction between
status and growth has become clear since the advent of No Child Left Behind in 2002, so to is
it possible to distinguish between the use of a large-scale assessment to produce different scale
scores for different purposes. If the sole purpose is to take a grade-specific inventory of the dif-
ferent knowledge and skills that students are able to demonstrate from the different domains that
define math and ELA, then domain sampling is an entirely appropriate method for building a
test blueprint. However, if an additional purpose is to support coherent and actionable inferences
of growth, this can be accomplished at the same time by adopting a stratified domain-sampling
approach, in which one or more strata might consist of the domain within which a learning pro-
gression has been specified. Naturally it would be convenient to have a single scale that could
fulfill both purposes, and this has been the impetus for conventional approaches to vertical scale
design. But what does it really mean to say that a student has grown X points in math or Y points
in ELA? This merely begs the next question: growth in what aspect of math or ELA? In our
view the latter is a question that has a much greater chance of being answered coherently when a
vertical scale is based on a learning-progression hypothesis.
CHALLENGES AND OPPORTUNITIES
The use of the learning-progression approach within the context of large-scale assessment design
and analysis comes with significant psychometric challenges. To begin with, the initial develop-
ment of a learning-progression hypothesis can be time-consuming process, not always amenable
to the tight deadlines facing large-scale assessment programs. Fortunately, there is a considerable
literature on learning progression in math education, so much of this initial work has already
been started. A thornier issue is coming up with items that are rich enough to elicit information
about the sophistication of student understanding without always requiring lengthy performance
tasks with open-ended scoring. The problem with such tasks is that while they may be ideal as a
means of eliciting the information needed to place a student at a specific location along a verti-
cal scale, the context of the task may contribute so much measurement error that it is very hard
to feel much confidence in a student’s location. And if a student’s location at one point in time
cannot be established reliably, the reliability of gain scores across 2 points or of score trajectories
across more than 2 points in time are likely to suffer even more. A possible solution to this is to
attempt to break larger performance tasks into smaller sets of selected-response and constructed-
response items. This is essentially the compromise approach presently being taken for the math
assessments that have been designed by PARCC and SBAC. The item template we illustrated for
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
96 BRIGGS AND PECK
level 5 of our proportional-reasoning learning progression also hints at this strategy, since the
target-item prompts could be expressed as short constructed-response items, selected-response
items, or some combination of the two. Where this challenge is likely to be hardest to over-
come would be for a learning progression that focused on increasing sophistication of a written
argument.
Another significant challenge to the learning-progression approach comes in providing hetero-
geneity of curricular sequences to which students are exposed across states, within the same state,
and even within the same school district; for example, given one state that repeatedly empha-
sizes the concepts underlying proportional reasoning in its K–8 curriculum and another state that
does not, one might expect to find differential item functioning on linking items as a function of
each state’s enacted curriculum. Of course, this is a potential problem for the assessments being
developed by PARCC and SBAC even without taking a learning progression approach.
At the same time, there is a risk that a learning-progression approach to assessment will nar-
row and homogenize learning opportunities and can lead to simplistic interpretations of complex
processes (Sikorski, Hammer, & Park, 2010). At worst, this approach might limit opportunities
for students to bring their own heterogeneous backgrounds and ways of knowing to bear on their
learning, thus “re-inscrib[ing] normative expectations in learning that have homogenizing effects”
(Anderson et al., 2012, p. 15). In part, this risk derives from a tension in the research on learning
progressions that we alluded to earlier—namely, that learning is a complicated process with mul-
tiple pathways, even as some pathways are more likely than others. While our focus in this paper
is on learning progressions, we note in passing that some researchers—for example, those in the
Dynamic Learning Maps consortium—are exploring how psychometric techniques can be incor-
porated into progressions with multiple pathways.7The risk of homogenization is compounded
to the extent that researchers who develop learning progressions do not attend to heterogeneity in
students’ ways of knowing or simply account for this diversity in the “lower anchor” of a progres-
sion (Anderson et al., 2012). One response, then, is that it is the responsibility of the researchers
who create the learning progressions to attend to heterogeneity and to create progressions at large
enough grain sizes so as to allow for diverse learning opportunities. From this perspective, learn-
ing progressions are simply the a priori background that informs assessments and vertical scales.
However, we reject this unidirectional model and, instead, suggest that assessments and learning
progressions can—and should—be mutually informing.
A learning progression constitutes a hypothesis about growth, and as longitudinal evidence is
collected over time, the hypotheses can be proven wrong and at a minimum is likely to evolve.
This fact represents a challenge to conventional psychometric practices but also an opportunity.
It is an opportunity for psychometricians to partner with content specialists, cognitive and learn-
ing scientists, and teachers to gain insights about not just what students know and can do, but
what and how much they can learn. For more than a decade now every state has been testing its
students across multiple grades in math and reading, but all this testing has generated very little
insight about student learning and how it can best be facilitated. Vertical scales could provide
these kinds of insights if a case can be made that the growth indicated by test scores is a measure
of learning. Making this case coherently could be the next frontier in educational assessment.
7We thank an anonymous reviewer for bringing this to our attention.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 97
ORCID
Frederick A. Peck http://orcid.org/0000-0002-2212-0535
REFERENCES
Anderson, C. W., Cobb, P., Barton, A. C., Confrey, J., Penuel, W. R., & Schauble, L. (2012). Learning progressions
footprint conference: Final report. East Lansing, MI: Michigan State University.
Andrich, D. (1988). Rasch models for measurement. Beverly Hills, CA: Sage.
Betebenner, D. (2009). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice,
28(4), 42–51.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R.
Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Reading, MA: Addison-Wesley.
Briggs, D. C. (2013). Measuring growth with vertical scales. Journal of Educational Measurement,50(2), 204–226.
Briggs, D. C., & Dadey, N. (2015). Making sense of common test items that do not get easier over time: Implications for
vertical scale designs. Educational Assessment,20(1), 1–22.
Briggs, D. C., & Weeks, J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational
Measurement: Issues & Practice,28(4), 3–14.
Castellano, K. E., & Ho, A. D. (2013a). A practitioner’s guide to growth models. Washington, D.C.: Council of Chief
State School Officers.
Castellano, K. E., & Ho, A. D. (2013b). Contrasting OLS and quantile regression approaches to student “growth”
percentiles. Journal of Educational and Behavioral Statistics,38(2), 190–214.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the impacts of teachers I: Evaluating bias in teacher
value-added estimates. American Economic Review,104(9), 2593–2632.
Clements, D. H., & Sarama, J. (2004). Learning trajectories in mathematics education. Mathematical Thinking and
Learning,6(2), 81–89. doi:10.1207/s15327833mtl0602
Confrey, J. (2012). Better measurement of higher cognitive processes through learning trajectories and diagnostic assess-
ments in mathematics: The challenge in adolescence. In V. F. Reyna, S. B. Chapman, M. R. Dougherty, & J. Confrey
(Eds.), The adolescent brain: Learning, reasoning, and decision making (pp. 155–182). Washington, D.C.: American
Psychological Association.
Confrey, J., Maloney, A., Nguyen, K. H., Mojica, G., & Myers, M. (2009). Equipartitioning/splitting as a foundation
of rational number reasoning using learning trajectories. In M. Tzekaki, M. Kaldrimidou, & C. Sakonidis (Eds.),
Proceedings of the 33rd Conference of the International Group for the Psychology of Mathematics Education (Vol. 1).
Thessaloniki, Greece: PME.
Confrey, J., Nguyen, K. H., Lee, K., Panorkou, N., Corley, A. K., and Maloney, A. P. (2012). Turn-On Common
Core Math: Learning Trajectories for the Common Core State Standards for Mathematics. Retrieved from
http://www.turnonccmath.net
Confrey, J., Nguyen, K. H., and Maloney, A. P. (2011). Hexagon map of Learning Trajectories for the K-8 Common Core
Mathematics Standards. Retrieved from: http://www.turnonccmath.net/p=map.
Confrey, J., & Smith, E. (1995). Splitting, covariation, and their role in the development of exponential functions. Journal
for Research in Mathematics Education,26(1), 66–86.
Corcoran, T., Mosher, F. A., & Rogat, A. (2009). Learning progressions in science: An evidence-based approach to
reform. New York, NY: Center on Continuous Instructional Improvement, Teachers College—Columbia University.
Dadey, N., & Briggs, D. C. (2012). A meta-analysis of growth trends from vertically scaled assessments. Practical
Assessment, Research & Evaluation, 17(14). Retrieved from http://pareonline.net/getvn.asp?v=17&n=14
Daro, P., Mosher, F. A., & Corcoran, T. (2011). Learning trajectories in mathematics: A foundation for standards, curricu-
lum, assessment, and instruction. CPRE Research Report #RR-68. Philadelphia, PA: Consortium for Policy Research
in Education. DOI:10.12698/cpre.2011.rr68
Domingue, D. (2013). Evaluating the equal-interval hypothesis with test score scales. Psychometrika,79(1), 1–19.
Holland, P. W., & Rubin, D. B. (1983). On Lord’s paradox. In H. Wainer & S. Messick (Eds.), Principals of modern
psychological measurement. Hillsdale, NJ: Lawrence Erlbaum.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
98 BRIGGS AND PECK
Kane, T. J., & Staiger, D. O. (2008). Estimating teacher impacts on student achievement: An experimental evaluation
(No. w14607). National Bureau of Economic Research. doi:10.3386/w14607
Karabatsos, G. (2001). The Rasch model, additive conjoint measurement, and new models of probabilistic measurement
theory. Journal of Applied Measurement,2(4), 389–423.
Kolen, M. J. (2006). Scaling and norming. In R. Brennan (Ed.), Educational measurement (4th ed.) (pp. 155–186).
Westport, CT: American Council on Education. Praeger.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices.NewYork,NY:
Springer Verlag.
Kyngdon, A. (2011). Plausible measurement analogies to some psychometric models of test performance. British Journal
of Mathematical and Statistical Psychology,64(3), 478–497.
Lamon, S. J. (2012). Teaching fractions and ratios for understanding: Essential content knowledge and instructional
strategies for teachers (3rd ed.). Mahwah, NJ: Lawrence Erlbaum.
Lord, F. M. (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin,68, 304–305.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Some latent trait models and their use in
inferring an examinee’s ability. Reading, MA: Addison-Wesley.
Martineau, J. A. (2004). The effects of construct shift on growth and accountability models (Unpublished Dissertation).
Michigan State University, East Lansing, MI.
Martineau, J. A. (2005). Un-distorting measures of growth: Alternatives to traditional vertical scales. Paper presented at
the 35th Annual Conference of the Council of Chief State School Officers.
Martineau, J. A. (2006). Distorting value added: The use of longitudinal, vertically scaled student achievement data for
growth-based, value-added accountability. Journal of Educational and Behavioral Statistics,31(1), 35–62.
Matassa, M., & Peck, F. (2012). Rise over run or rate of change? Exploring and expanding student understanding of slope
in Algebra I. Proceedings of the 12th International Congress on Mathematics Education, 7440–7445. Seoul, Korea.
Retrieved from http://www.icme12.org/upload/UpFile2/WSG/0719.pdf
McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., & Hamilton, L. S. (2003). Evaluating value-added models for
teacher accountability. Santa Monica, CA: RAND Education. (Vol. 158). Research Report prepared for the Carnegie
Corporation.
MET Project. (2013). Ensuring fair and reliable measures of effective teaching. Policy and Practitioner Brief. Retrieved
from http://www.metproject.org/downloads/MET_Ensuring_Fair_and_Reliable_Measures_Practitioner_Brief.pdf
Migozuchi, T., Peck, F., & Matassa, M. (2013). Developing robust understandings of slope. Elementary Mathematics
Teaching Today (Journal published in Japan), 2013(511), 31–32.
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). On the structure of educational assessments. Measurement:
Interdisciplinary Research and Perspectives,1, 3–67.
Pellegrino, J., Chudowsky, N., & Glaser, R. (Eds.). (2001). Knowing what students know: The science and design of
educational assessment. Washington, DC: National Academy Press.
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA:
Author.
Post, T. R., Behr, M. J., & Lesh, R. (1988). Proportionality and the development of pre-algebra understanding. In J. Hiebert
&M.J.Behr(Eds.),Number concepts and operations in the middle grades (pp. 93–118). Reston, VA: National
Council of Teachers of Mathematics.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish
Institute for Educational Research.
Sarama, J., & Clements, D. H. (2009). Early childhood mathematics education research: Learning trajectories for young
children. New York, NY: Routledge.
Shea, N. A., & Duncan, R. G. (2013). From theory to data: The process of refining learning progressions. Journal of the
Learning Sciences,22(1), 7–32.
Sikorski, T., Hammer, D., & Park, C. (2010). A critique of how learning progressions research conceptualizes sophistica-
tion and progress. In Proceedings of the 9th International Conference of the Learning Sciences Vol. 1 (pp. 1032–1039).
Chicago, IL: International Society of the Learning Sciences.
Skaggs, G., & Lissitz, R. W. (1986). IRT test equating: Relevant issues and a review of recent research. Review of
Educational Research,56(4), 495–529.
Stocking, M. L., & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological
Measurement,7(2), 201–210.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
USING LEARNING PROGRESSIONS TO DESIGN VERTICAL SCALES 99
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology,
16(7), 433–451.
Thurstone, L. L. (1927). The unit of measurement in educational scales. Journal of Educational Psychology,18, 505–524.
Tong, Y., & Kolen, M. J. (2007). Comparisons of methodologies and results in vertical scaling for educational achievement
tests. Applied Measurement in Education,20(2), 227–253.
Wright, B. D. (1997). A history of social science measurement. Educational Measurement: Issues and Practice, Winter
1999, 33–45.
Yen, W. M. (1986). The choice of scale for educational measurement: An IRT perspective. Journal of Educational
Measurement,23(4), 299–325.
Young, M. J. (2006). Vertical scales. In S. Downing & T. Haladyna (Eds.). Handbook of test development, (pp. 469–485).
Mahwah, NJ: Lawrence Erlbaum Associates.
Downloaded by [Derek C. Briggs] at 08:54 09 July 2015
... Drew (2020) expounded Bruner's spiral curriculum as an approach to education that involves regularly re-visiting the same educational topics throughout a student's education. The importance of the careful arrangement of topics and competencies in a spiral progression approach has been expounded in several studies (Castillo, 2014;Briggs & Peck, 2015) to achieve the expected learning outcomes in each grade level. In a spiral progression approach, previous knowledge and skills acquired by the students in the previous lessons (Schema Theory) are essential in building up new knowledge and skills connected to the next lesson (Harden, 1999;Capilitan, Cabili & Sequete, 2015;Drew, 2020). ...
... Vertical coherence of the content topics from kindergarten to Grade 10 has been identified not as the sole characteristic of a well-planned spiral progression approach of the curriculum. It also needs careful sequencing of learning competencies (Briggs & Peck, 2015) according to the level of student cognitive development (Tran, Reys, Teuscher, Dingman & Kasmer, 2016) with the provision of appropriate instructional technologies as suggested by the Master Teachers. Integration and connection of topics between and among the content areas capitalizing student prior experiences are necessary for the spiral curriculum design either for reinforcements and/or deepening of skills or for mastery. ...
Book
Full-text available
We are very happy to publish this issue of the International Journal of Learning, Teaching and Educational Research. The International Journal of Learning, Teaching and Educational Research is a peer-reviewed open-access journal committed to publishing high-quality articles in the field of education. Submissions may include full-length articles, case studies and innovative solutions to problems faced by students, educators and directors of educational organisations. To learn more about this journal, please visit the website http://www.ijlter.org. We are grateful to the editor-in-chief, members of the Editorial Board and the reviewers for accepting only high quality articles in this issue. We seize this opportunity to thank them for their great collaboration. The Editorial Board is composed of renowned people from across the world. Each paper is reviewed by at least two blind reviewers. We will endeavour to ensure the reputation and quality of this journal with this issue.
... According to previous researches, the growth rate of student's ability changes as the grade rising up. Also in this study, it is found that common-item ratio interacts with effect size which means we need to adjust common-item ratio according to the effect size between adjacent grades but not choose a fixed ratio in large scale testing (Briggs & Dadey, 2015;Briggs & Peck, 2015). As for the different behaviors between item parameter and ability parameter on certain common-item ratios, further studies need to continue such as narrowing the gap of ratios to find out the balance point for best estimated precision of item parameters and ability parameters. ...
Article
Full-text available
In order to investigate the influence of separation of grade distributions and ratio of common items on the precision of vertical scaling, this simulation study chooses common item design and first grade as base grade. There are four grades with 1,000 students each to take part in a test which has 100 items. Monte Carlo simulation method is used to simulate the response matrices by self-made program in R 3.0. As the items are scored by 0/1, we select two-parameter logistic model of Item Response Theory and use BILOG-MG for concurrent calibration with EAP method. The Bias and RMSE are calculated as precision indicators. The results show that: (1) Estimation precision of item and ability parameters differs in different grades. For discrimination and difficulty parameters, estimation precision is higher as closer to the base grade and is lower with the increase of effect size. For the ability parameters, the estimation precision is high generally except for fourth grade which is much lower. The precision is best at 0.5 of effect size in general. (2) There is an interaction between the ratio of common items to total test and effect size. When the effect size is 0.5 and 1.0, estimation precision of each grade is most accurate at 30% of common-item ratio. When the effect size is 1.5, the estimation precision of difficulty parameters is best for first, second, and third grade at 30% of common-item ratio while grade 4 at 15% of common-item ratio. The ability parameters of all grades are all best estimated at 15% of common item ratio. There must be a trade-off between the estimation precision of ability parameters and item parameter if the common item ratio is at the range of 15% to 30%. (3) The choice of base grade affects the accuracy of vertical scaling. When the lower grade is selected as the base grade, if the number of consecutive cumulative conversions from the upper grade test score to the lower grade exceeds 2, there will be a large deviation. Therefore if the senior grade changes to the junior grade, it is suggested that the gap of grades should not exceed 2 grades. As a whole, the proportion of anchor items for vertical scaling is set at 30%, but it is better to set the proportion of anchor items as “variable” value (15%–30%) when considering the separation of grade distributions.
... Preface xvii Now, one thing that is appealing about Michell's book and related articles is that he has a clear and well-argued answer to this question. You can get a sense for what I took away from this and applied to the context of measuring growth with vertical scales in Briggs (2013), Briggs and Domingue (2013), and Briggs and Peck (2015). At this point, now a full professor, I thought that with a better grasp on diferences in theories of measurement, that I was ready to write a book that brought together issues of measurement and modeling in the context of growth in student learning. ...
... Learning trajectory (LT)-based diagnostic assessments represent an alternative approach to traditional domain-sampling assessments (Briggs and Peck, 2015) in two fundamental ways: 1) they assess progress along a set of levels in an ordered sequence from less to more sophisticated, and 2) they are better positioned to formatively guide instructional modifications to improve student learning during, not after, instruction. Students and teachers can use data from LTs to find out what a student currently knows, and where that knowledge is in relation to what was learned prior and what has yet to be learned. ...
Article
Full-text available
This study reports how a validation argument for a learning trajectory (LT) is constituted from test design, empirical recovery, and data use through a collaborative process, described as a “trading zone” among learning scientists, psychometricians, and practitioners. The validation argument is tied to a learning theory about learning trajectories and a framework (LT-based data-driven decision-making, or LT-DDDM) to guide instructional modifications. A validation study was conducted on a middle school LT on “Relations and Functions” using a Rasch model and stepwise regression. Of five potentially non-conforming items, three were adjusted, one retained to collect more data, and one was flagged as a discussion item. One LT level description was revised. A linear logistic test model (LLTM) revealed that LT level and item type explained substantial variance in item difficulty. Using the LT-DDDM framework, a hypothesized teacher analysis of a class report led to three conjectures for interventions, demonstrating the LT assessment’s potential to inform instructional decision-making.
Article
Full-text available
The validation of core competency learning progressions is crucial for implementing the new curriculum standards and cultivating core competencies. Drawing upon the international framework for the validity of learning progressions, this study verified the validity of the developed learning progressions models through a mixed qualitative and quantitative empirical research methodology. 41, 103 samples from 30 primary and secondary schools in Beijing were selected for empirical analysis,utilizing developed scenario tasks based on the principle of evidence-centered design. We conducted the text analysis and Rasch model analysis based on verbal reports and measurement models, in order to verify the effectiveness of the learning progressions model for core competency. Theoretical support and empirical evidence are provided for the application of learning progressions in curriculum, teaching, and evaluation.
Article
In the educational context, one of the main goals is to reduce the disparities among students, generally at the national level, to allow all individuals to achieve a similar cultural background. Using data from a large-scale standardised test administered by INVALSI (National Institute for the Evaluation of the Educational System), this paper offers a first longitudinal analysis of the performance in the maths test of a cohort of students enrolled in 2013/2014 at grade 8 and observed up to grade 13. The aim is to identify those obstacles that undermine students’ learning to help adopt informed educational actions. Specific features of these data are their hierarchical structure and the presence of not vertically scaled scores. Two approaches have been followed for their analysis: growth models and growth percentiles. Coherently with the literature, our results suggest the presence of a gender gap, a significant impact of the type of school, and of social-cultural background. Differently from previous research on the INVALSI data, we evaluate these time-invariant covariates’ effects on students’ performance over different school cycles.
Thesis
Full-text available
In the United States, national and state legislative mandates have forced school districts to include student growth measures in teacher evaluation systems. However, statistical models for monitoring student growth on standardized tests have not been found to foster teachers’ reflective practice or pedagogical content knowledge and goal-based models have been found to lack adequate structure for supporting implementation. This basic qualitative inquiry explored how teachers perceive using standards-based rubrics to monitor student growth for teacher evaluation influences their reflective practice and pedagogical content knowledge in mathematics. Nine teachers who have used standards�based rubrics to monitor student growth were recruited through snowball sampling. Through semi structured interviews and inductive and deductive coding, six themes were identified to understand teacher perceptions of the experience monitoring growth with standards-based rubrics: (a) fosters collaborative dialogue and descriptive feedback, (b) promotes standards-based focus, (c) supports evidence-based assessment, (d) supports student-centered instruction, (e) encourages students’ reflective practice, and (f) cultivates a positive teacher evaluation experience. This study may inform standards�based growth monitoring practices for formative and summative teacher evaluation in K–8 education systems. Formative teacher evaluation has been found to promote positive social change by improving both teacher practice and student achievement, thereby supporting teachers and students to continuously grow in knowledge, skill, and understanding. These findings indicate that monitoring student growth on standards�based rubrics may provide the necessary structure other models have been lacking.
Article
As models of how students' thinking may change over time, learning progressions (LPs) have been considered as supports for teachers' classroom assessment practices. However, like all models, LPs provide simplified representations of complex phenomena. One key simplification is the characterization of student thinking using levels—that is, the twin assumptions that student thinking is both coherent and consistent. While useful for the design of standards and curricula, the LP level simplification may threaten the basic premise that LPs could be used to diagnose a student's level and then provide tailored instruction in response. At the same time, our work with teachers suggests that, even with their simplifications, LPs may be useful in the classroom. Thus, rather than abandoning LPs, we sought to understand their potential affordances by exploring how teachers learn from LPs (knowledge‐for‐practice) and contribute to deeper understanding of LP use (knowledge‐of‐practice) as they identify and enact uses of these tools. To do so, we engaged high school physics teachers in a 2‐year, LP‐based professional development program. Based on qualitative analyses of planning meetings and interviews with the teachers, we describe how teachers used LPs to support classroom assessment with varying reliance on the LP level simplification. Although teachers used LPs in ways that relied on the coherence and consistency assumptions of the LP level simplification, uses of LPs that did not require these assumptions were more prevalent both within and across teachers. This study's findings have implications for research, teacher professional development, and the design of LPs.
Article
Full-text available
A vertical scale, in principle, provides a common metric across tests with differing difficulties (e.g., spanning multiple grades) so that statements of absolute growth can be made. This paper compares 16 states' 2007-2008 effect size growth trends on vertically scaled reading and math assessments across grades 3 to 8. Two patterns common in past research on vertical scales, score deceleration (grade-to-grade growth that decreases over time) and scale shrinkage (variability in scale scores that decreases from lower to higher grades), are investigated. Pervasive, but modest, patterns of score deceleration are found for both math and reading. Limited evidence of scale shrinkage was found for reading, and virtually no evidence was found for math. In addition, linear regression was used to show that little of the considerable variability in the growth effect sizes across states could be explained by readily identifiable characteristics of the vertical scales. However, many scale characteristics were not well documented in available technical reports. The most important of these characteristics, along with their implications for interpretations of growth, are discussed. The results serve both as a normative baseline against which other scaling efforts can be compared.
Article
American education policy seems poised to escalate and shift its two decade long commitment to standards and outcome-based reform. That commitment has involved a set of “grand bargains”, in which the federal government provides Title I (The “No Child Left Behind Act” or NCLB) disadvantaged education funds in return for the states’ agreeing to set ambitious content standards, and define performance or “proficiency” standards associated with them that all students in the states’ schools will be expected to meet by the 2013/2014 school year. The disadvantaged children targeted by Title I are expected to meet the same standards as all of the rest of the children in each state. In return for agreeing to hold their schools accountable for meeting these expectations, the states are left free to set their standards and their related measures of proficiency as they wish, within some broadly defined parameters. And the local school systems and schools in each state, in return for their share of the Title I/NCLB money are left free, for the most part, to choose their preferred approaches to instruction as long as they agree to be held accountable for ensuring that all their students are making adequate progress towards meeting the state’s proficiency goals. So, the general form of each bargain is an agreement to reduce or forgo regulation of inputs in return for a commitment to define, and meet, outcome expectations. But, having agreed to do something they had never before tried to do—to succeed with essentially all students—schools and educators face the problem that they don’t know how to meet their side of the bargain. Proponents and observers of reform claim to be shocked that some states are setting their performance standards in ways that minimize or disguise the degree to which their students are likely to fail to meet the hopes of reform. In addition, schools and teachers are resorting to approaches, such as relentless test preparation and focusing on students who are just at the edge of meeting proficiency requirements, that try to meet the letter of the bargains’ requirements while leaving the more ambitious spirit of the reforms’ hopes well behind, along with all too many children.
Article
A number of vertical scaling methodologies were examined in this article. Scaling variations included data collection design, scaling method, item response theory (IRT) scoring procedure, and proficiency estimation method. Vertical scales were developed for Grade 3 through Grade 8 for 4 content areas and 9 simulated datasets. A total of 11 scaling variations were investigated for both real and simulated data. When the assumptions were met with the simulated data, all 11 scaling variations investigated were able to preserve the general characteristics of the scales. With the real data, vertical scales using all the methods showed decelerating growth from lower to higher grades. For within-grade variability, the Thurstone method produced scales with increasing variability over grades, whereas the IRT methods produced scales with fluctuating or decreasing variability over grades. Consequently, the growth patterns of high- and low-achieving students tended to differ across scaling methodologies. The scaling designs produced scales with dissimilar properties, especially for the tests that tended to be less homogeneous in content across grades and for tests that included testlet-based items. Discussion of the findings is provided, followed by a description of limitations of the study and possibilities for future research. Practical implications of the study also are discussed.