Conference PaperPDF Available

Four Years of Cognitively Based Assessment of, for, and as Learning (CBAL): Learning About Through-Course Assessment (TCA)

Commissioned by the Center for K – 12 Assessment & Performance Management at ETS. Copyright © 2011 by Educational Testing Service. All rights reserved.
John P. Sabatini, Randy Elliot Bennett, and Paul Deane
Educational Testing Service
April 2011
John P. Sabatini, Randy Elliot Bennett, and Paul Deane
Educational Testing Service
The goal of this paper is to describe some of the many lessons learned from a multiyear
research and development (R&D) program aimed at building a model for innovative summative
and formative assessment at the K-12 level. The program, Cognitively Based Assessment of, for,
and as Learning (CBAL), is intended to generate new knowledge and capability that can be used
in the near-term for the design, administration, and scoring of innovative assessments like
those intended for use by the Common Core State Standards Assessment consortia. As of this
writing, the project is in its fourth year, and its operation has required that the research team
grapple with issues of through-course assessment (TCA) because such assessment is a central
feature of the CBAL conceptualization. Because the lessons learned are derived from this
conceptualization and experience, we will first describe some of the key elements of the CBAL
program and the theory of action that guides the R&D agenda.1 This description will set the
groundwork for the discussion of lessons learned, in which we detail some of the reasoning,
challenges, and design decisions we reached in designing TCA.
1 For more information about the CBAL initiative, including full papers, see
Cognitively Based Assessment of, for, and as Learning
The CBAL program intends to produce a model for a system of assessment that:
documents what students have achieved (of learning); helps identify how to plan instruction
(for learning); and is considered by students and teachers to be a worthwhile educational
experience in and of itself (as learning). Towards achieving these goals, CBAL consists of
multiple, integrated components: summative assessments, formative assessments, professional
development, and domain-specific cognitive competency models (Bennett 2010; Bennett &
Gitomer, 2009).
Each of the CBAL components is informed by, and aligned with, domain-specific
cognitive competency models (which we describe in some detail in the next section). The
summative assessments consist of multiple events distributed across the school year. The
intention is that results will be aggregated for accountability purposes. The formative
assessments consist of componential item sets, classroom tasks, and extended activities, as well
as associated teacher guides that provide insights into techniques for integrating formative
tasks into instructional units. In some cases, these formative tasks are based on developmental
or learning progressions (Harris & Bauer, 2009; Heritage, 2008). Finally, the professional
development component consists both of organized teacher communities of practice and a
collaborative social networking website that further elaborates relationships among
assessments, instruction, and the cognitive competency models.
Domain-Specific Cognitive Competency Models
Domain-specific competency models are developed with the goal of integrating learning
sciences research, including learning progressions (where such progressions are available), with
content standards. The competency models help not only in the specification of knowledge,
processes, strategies, and habits of mind to be assessed, but also in identifying instructional-
practice principles for use in assessment design. In CBAL, the models also serve as a common
conceptual foundation for both summative and formative assessments.
Each model is derived from reviews of the cognitive and learning sciences literature in
mathematics and in English language arts (ELA) reading and writing. That literature speaks to
both student development and effective instructional practice. Recent versions of the models
for middle-grades students can be found in Deane (2010), Graf (2009), and O’Reilly and
Sheehan (2009). The models have been linked to the Common Core State Standards (CCSS).
They are iteratively refined via collaborations with teachers and classroom pilot data as the
CBAL project progresses through its multiyear research agenda.
Figure 1 illustrates the central role of the competency models in the overall assessment
system. In CBAL, the competency models help to integrate learning science research with
content standards such that the amalgam becomes the driver of assessment design (both
formative and summative), the basis for an evidence-based curriculum, and the starting point
for professional support to aid teachers in building their repertoire of research-based
pedagogical practices. By grounding assessment, curriculum and instruction, and professional
development in the same learning sciences and content standards foundation, we hope to
facilitate the intended outcome of improved classroom practice. This approach also represents
a deliberate attempt to ensure that learning sciences research, itself often informed by and
encapsulated in the wisdom of practice, is better disseminated throughout the educational
While carefully constructed and thoughtful content standards are important for setting
appropriate targets for instruction, they often remain abstract, too far removed from informing
good instructional practice. Consequently, assessments that are defined solely with respect to
the content standards run the risk of having limited instructional relevance, and, additionally,
may fail to account for results from decades of learning science research that can serve as a
principled guide to implementing sound instruction. As we will discuss in more detail later,
CBAL takes advantage of the opportunities afforded by through-course assessments to
Figure 1. The role of competency models in CBAL.
Note. From “Cognitively Based Assessment of, for, and as Learning: A Preliminary Theory of Action for
Summative and Formative Assessment,” by R. E. Bennett, 2010, Measurement: Interdisciplinary Research
and Perspectives, 8, pp. 70-91. Copyright 2010 by Educational Testing Service. Reprinted with permission.
instantiate design principles derived from this competency-model foundation that would
otherwise be difficult to achieve in a single, comprehensive end-of-year examination.
Why Through-Course Assessments?
As noted, TCAs have been a foundational attribute of the CBAL initiative since its
inception. Three primary aims justify the decision to use through-course assessments: first, the
importance of any one assessment occasion is diminished; second, tasks can be more complex
and more integrative because more time is available for assessment in the aggregate; and,
third, the assessments can provide prompt interim information to teachers while there is time
to take instructional action.
For the CBAL research program, one might say that the latter two aims are the most
critical, as the first aim would generically be true of any through-course assessment system.
However, one’s goals for designing a TCA system need not be to deploy more complex,
integrative tasks; likewise, one does not need to aim for providing interim information to
teachers. TCAs serve CBAL precisely because they create the opportunity to deliver more
complex, integrative tasks and to feed back information to teachers and learners. However, as
we describe in detail later, these goals entail a set of design decisions and complexities that
must be coordinated and managed.
Theory of Action
The CBAL system model is designed as an educational intervention, as well as an
indicator of student achievement. As such, Bennett (2010) has described a theory of action for
CBAL in terms of components, hypothesized action mechanisms, intended intermediate effects,
and intended ultimate effects. The logic model summarizing this theory of action is shown in
Figure 2. The theory of action guides assessment design and validation, but not just in the sense
of the evaluation of score claims, but also in the evaluation of the intended impact of the
assessment system on individuals and institutions.
Several ideas are important to note in Figure 2. First, the logic model makes clear that
the ultimate goals (or intended effects) of CBAL as an assessment system are to provide more
meaningful information to policy makers and to contribute to improved student learning.
Second, these intended effects are caused by a set of intended intermediate effects. The latter
effects target changes in teacher competency, classroom practice, and student engagement.
Last, these intermediate effects are, in turn, caused by action mechanisms, each of which is
associated with a particular CBAL component.
Figure 2. A logic model summarizing the CBAL theory of action.
Note. From “Cognitively Based Assessment of, for, and as Learning: A Preliminary Theory of Action for
Summative and Formative Assessment,” by R. E. Bennett, 2010, Measurement: Interdisciplinary Research and
Perspectives, 8, pp. 70-91. Copyright 2010 by Educational Testing Service. Reprinted with permission.
Two action mechanisms are associated with the CBAL competency model and concern
teacher use of that model to guide instruction and communicate learning goals. Three action
mechanisms are linked to the CBAL summative assessments. For students and teachers, these
mechanisms entail the instructional use of the tools and representations contained in the
summative assessments and the use of summative results as a starting point for formative
follow-up. For state and local policy makers, the action mechanism is the use of summative
results to identify classes, schools, and districts needing administrative attention. The CBAL
formative components have three associated action mechanisms concerning the making of
inferences about student standing, the use of those inferences to adjust instruction, and the
use of student responses to those adjustments to revise inferences and readjust. Finally, the
professional support component has as its action mechanism participation by teachers in
communities of practice to reflect upon their experiences with using CBAL to understand and
improve student performance.
Some Lessons Learned
In the remainder of the paper, we describe a few of the many lessons learned from
CBAL research that may be helpful to designers of through-course assessments. The unifying
theme concerns careful consideration of the specific purposes to be achieved by using a TCA
approach (and there are likely to be more than one) and how those purposes influence the
actual content and design of the assessments themselves. We would not advocate targeting
more than a few purposes, as optimizing to the set reduces the effectiveness of achieving each
individual purpose. Tradeoffs are inevitable, and a manageable set of purposes, especially at
the onset of a complex project with high stakes for all involved, is a prudent course to take.
Evidence Sources for Lessons Learned
To test out the CBAL theory of action would require not only the delivery of TCA at
different points in time, but also the implementation of all of the CBAL components in
authentic settings, including the use of using results for accountability purposes. Such a
scenario is well beyond what an assessment research program like CBAL can hope to achieve. In
keeping with the idea of creating a system model, only parts of the CBAL system have been
developed and studied. Parts of the model that have been developed include the extensive
reviews of the cognitive science literature that constitute the basis for the cognitive
competency models; the creation of prototype assessments through collaborative activities
with talented educators to help ground design in the wisdom of practice; iterative pilots in field
sites to learn about fit in the types of environments within which the full system might operate;
and linking of CBAL assessment prototypes and competency models to the Common Core State
Standards. In 2009, the CBAL team conducted a multistate trial in which two reading or two
writing summative assessments were administered to the same (seventh and eighth grade)
students close together in time. Across 16 CBAL pilots, nearly 10,000 online tests have been
administered. Psychometric results from those pilot administrations are reported by Bennett (in
press). Finally, a second multistate study is being conducted with through-course assessments
being administered at two points in time—winter and spring semesters of 2011. This
experience provides several important lessons about the design of through-course
Lesson 1: Clearly articulate the intended purpose(s) for through-course assessment
and use those purposes to drive assessment design. In the CBAL program, we prioritized two
key aims for summative assessments. First, we sought to design summative and formative
assessments that would function as useful measurement tools. Second, we designed those
assessments so that they would be considered by students and teachers to be worthwhile
educational experiences in and of themselves, experiences that would promote the
development of higher-order thinking skills demanded for success in the English language arts
and mathematics content domains. Setting these as our primary goals guided a variety of
decisions and ultimately led us to specific design principles.
Our examination of the learning sciences research in each domain helped identify
models of effective learning and instructional approaches and how those models might lead to
the development of proficiency in critical knowledge, processes, strategies, and habits of mind.
That examination also produced examples of the kinds of activities and tasks that require
students to reason and problem solve in the domains in question. The insights gained provided
a lens by which we could observe skilled practitioners to better appreciate variations of good
instruction. Seeking to capture these practices in assessments led us to two key design
principles that now undergird most all of our assessment designs, and that fit well with the TCA
Scenario-based task sets. Such task sets are composed of a series of related tasks that
unfold within an appropriate social context. The goals include: to communicate how the tasks
fit into a larger social activity system; to set standards for performance; to give test takers a
clearer idea of how to allocate attention and give focus to their deliberations; to provide
opportunities to apply strategic processing and problem solving; and to have learners evaluate
and integrate multiple sources of information in a meaningful, purpose-driven context. The
scenarios are created to focus on targeted nodes in the respective competency models, but also
to permit the integration of other knowledge and skills that may be prerequisite or co-requisite
in performing tasks in the domain. Below we provide brief examples of scenarios from the
three domains which CBAL has explored: mathematics, reading, and writing (see Appendix for
example screen shots of scenarios).
As Harris and Bauer (2009) explain, the CBAL mathematics prototypes utilize scenario-
based task sets that draw from at least two content areas in the competency model. The
scenario functions not as simply a setting but, rather, drives the design of the task set (see also
Harris, Bauer, & Redman, 2008). For example, one scenario involves a region experiencing
drought, with particular focus on a lake whose receding water levels may no longer be high
enough to exit the dam. The focus of this task set is the cross-cutting mathematical process of
argument. The content strands of linear functions and statistics are also drawn upon. The
introductory activity sets up the big question or idea that the students will have to address at
the end of the task set, “Does action need to be taken about the water crisis?” Students are
provided with an explanation of why the lake is important to the community, that is, because it
is used to produce electricity and provides water for crops. A series of tasks is then presented
that leads students through the problem to a culminating task calling for a judgment about
whether action needs to be taken and evidence to back that assertion.
A similar scenario drives one of the ELA reading assessment designs (O’Reilly & Sheehan,
2009; Sheehan & O’Reilly, in press). Students are introduced to a scenario-based task set in
which a wind farm has been proposed for their community, and their class has decided to
create a website to help members of the community to be more informed about wind power.
The scenario unfolds across a series of tasks addressing the questions:
How does wind power work?
What are some possibilities and challenges of using wind power as an energy
Is the proposed idea good for the community?
Related readings drive each subsection, with a combination of selected- and
constructed-response items to which the student must respond.
Finally, a scenario-based writing assessment covering the writing competencies of
summarization and argumentation addresses the question of whether there should be a ban on
advertisements directed at children (Deane, Sabatini, & Fowles, 2011). In conjunction with
several readings on the topic, students work their way through tasks that require them to:
apply the points in a rubric to someone else’s summary of an article about children’s
advertisements; read and summarize two articles about the issue; determine whether
statements addressing the issue are presenting arguments pro or con; determine whether
specific pieces of evidence will weaken or strengthen particular arguments; critique someone
else’s argument about the issue; and, finally, write an argumentative essay taking a position on
children’s advertising.
While the CBAL scenario-based task sets share a heritage with earlier performance
assessments in education, there are important design distinctions. Specifically, earlier
performance assessments tended to be composed of a smaller number of more highly
interdependent tasks delivered under less formal conditions and without the use of technology
(e.g., conducting a specific science experiment and writing up the results). Like these
assessments, CBAL scenario-based task sets share in the goal of creating a more authentic,
meaningful, and purposeful context for deploying one’s knowledge and skills. Furthermore,
scenario-based tasks provide an opportunity to assess understanding of key related content at
deeper levels than discrete questions can; therefore, it is essential to target content identified
as critical to assess. However, the CBAL task sets are designed to gather information about
particular constellations of skill in the competency models, as well as how those skills are
integrated into a more complex performance. Thus, in each of the CBAL scenario-based task
sets described above, there are items that test discrete skills, as well as more complex
interrelated tasks most appropriately scored with a holistic rubric. The TCA design allows for
the assessment of a broad range of content when aggregated across multiple administrations.
Scenario-based task sets help in achieving a foundational learning sciences principle of
contextualizing skill and knowledge as they are applied by expert practitioners in a domain,
rather than asking students to recall isolated facts or execute procedures absent any
meaningful context. In this way, CBAL assessments can better serve as worthwhile learning
experiences because they can help students connect knowledge, processes, strategies, and
habits of mind to conditions of use. The tradeoff is that engaging students in a scenario requires
that assessment time be spent setting up the purpose and allowing students to deliberate,
reason, and reflect on the tasks with respect to that purpose. In general, however, the TCA
design allows for simultaneously achieving depth by using a focused problem set within an
individual TCA and breadth by covering the broader set of required domain-competencies
across TCAs.
Tools and representations. The goal of including innovative tools and representations
derived from domain practice is to get the most accurate estimate of the student’s
achievement and to model good teaching and learning. In the category of tools and
representations, we include rubrics and guidelines providing explicit information about how the
performance will be judged; tips, checklists, and graphic organizers providing direct models of
what kinds of strategies are deployed by successful performers; appropriate reference
materials and devices to support comprehension and thinking; and simulations that encourage
exploration and understanding of conceptual relationships.
As an example, Harris and Bauer (2009) describe simulation tools used in the math
scenario-based task set described above. To assist students in becoming familiar with inflow
and outflow in the context of a dam and lake, a simulation was developed that allowed
students to experiment with inflow and outflow rates and their effect on the volume of water
in a sink. The simulation presents a familiar setting where students can set the rate of inflow
from the faucet and outflow by manipulating the drain plug. Questions accompanying the sink
simulation require students to interpret graphs of inflow/outflow rates and describe the effects
on water volume. After several such tasks, the problem context returns to the main question
around the viability of the lake.
In the wind power reading example discussed above, a student must complete graphic
organizers designed to probe her or his understanding of scientific text explanations, for
example, the difference between windmills used to generate electricity from wind versus the
operation of household fans that use electricity to generate wind. In another question, the
student must complete a graphic organizer, which helps in probing his or her understanding of
the organization and structure of information in a text, both in aligning details with main ideas
and in inferring or inducing topical categories.
The children’s advertising writing scenario includes rubrics for evaluating a summary, as
well as activities in which students are asked to use those rubrics to examine simulated peer
summaries to identify whether they adhere to or violate specific rubric elements (e.g., inserting
one’s own opinion into a summary). In another portion from the same scenario-based task set,
students evaluate a series of statements as pro or con and judge whether specific claims are
warranted. These elemental skills of argumentation are highly predictive of performance on the
culminating essay task, but also model thinking that is foundational to the formulation of a
persuasive argument.
In each of these examples, the learning sciences literature reveals insights into the
cognitive strategies that skilled individuals use in proficiently performing complex tasks in a
domain. Including these tools and representations in the assessment calls upon the student to
demonstrate strategic processing using devices common in domain practice. Further, that
inclusion encourages the student and teacher to incorporate such tools and representations
into classroom practice and, more generally, to develop the reasoning and strategic behavior
required to successfully use similar tools and representations more broadly in domain
performance. Further opportunities to use such tools and representations are provided in the
CBAL formative assessments, which offer elaborated task sequences that cover the competency
models more deeply than a summative assessment could.
Lesson 2: Use a theory of action to guide the design and evaluation of through-course
assessments. In the CBAL theory of action (Bennett, 2010), the assessment system as an
intervention becomes a key part of what it means to demonstrate technical quality. Technical
quality as such is not just instrument functioning; it is also the impact (negative and positive) of
instrument use on students, teachers, classroom practice, school functioning, and the larger
education system as a whole (Bennett, Kane, & Bridgeman, 2011). Thus, the theory of action
becomes a key component in assessment design and in evaluating the success of that design
through its implementation and impact. Following is a select set of examples of how a theory of
action (as depicted in Figure 2) fits into the design of TCA.
Theory-of-action states as an intermediate outcome: Teachers and students use
periodic assessment results as a starting point for formative follow-up. This outcome has
several very specific design implications. First, it suggests that results from assessments given
during the year must be scored and reported with a reasonable turnaround for student and
teacher use. Selected-response items are most efficient, as they can be scored nearly
immediately when administered electronically. Constructed-response turnaround time can also
be rapid with the strategic use of automated scoring. One of the active research areas in the
CBAL program concerns the development and evaluation of natural language processing (NLP)
approaches to the scoring of essay and other writing tasks (e.g., Deane, in press). It may be that
some, but not all, of the responses to tasks composing an assessment can be scored
immediately, allowing some types of instructionally relevant results to be provided quickly to
teachers and learners. Results requiring greater levels of quality control and statistical
postprocessing would be reported later. Such a phased approach to reporting may serve the
purpose of providing instructionally actionable information in a reasonable time period.
Second, the outcome obviously suggests that score reports must be designed to
encourage valid inferences about performance. Valid inferences may need to be couched as
qualified interpretive claims, that is, formative hypotheses (Bennett, 2010). A formative
hypothesis is a qualified statement suggesting that the teacher collect follow-up evidence to
confirm or refute the hypothesis. This idea is rooted in the fact that it is not often possible to
derive from a summative test sufficient information to support a reliable inference about an
individual’s skill strengths and weaknesses. Expecting summative assessments to provide
individual diagnostic information is often a bridge too far. For groups, it may be more feasible
to make test-based inferences regarding relative mastery or deficiency of subskills (e.g., when
nearly all students in class answer correctly or incorrectly all items in a specific node of the
competency model or standard), but even this inference may be weakened by the
underrepresentation of certain skill areas on the test such that teachers may still need to do
additional informal data gathering to confirm the suggestion.
Formative tools and processes designed to generate additional student or classroom
information might be used by teachers to carry out the needed follow-up. These formative
assessments may be designed to simply add more items or tasks targeting a specific subskill, to
sample a wider range of knowledge and skills in the subdomain, or to probe at a finer grain size
a progression of skills that comprise performance. In the CBAL program, formative assessments
are designed to serve each of these purposes.
The use of summative tests to generate formative hypotheses for teacher follow-up has
obvious implications for test security and confidentiality. Presumably, those hypotheses will be
most actionable if teachers have access to the item responses and tasks that generated the
hypotheses—that is, examples of student work. However, to give teachers access is to reveal
content that can no longer be reused. This disclosure has implications for the number of tasks in
the TCA item pool and puts pressure on the testing program to continuously refresh the item pool.
Theory-of-action states as an intermediate outcome: Teachers and students can use
tools and representations in instruction. We have previously described how CBAL uses tools
and representations in designing the summative (and formative) assessments. One challenge is
to guard against picking and choosing tools and representations that are too specific, and
therefore cannot be used more generally in domain performance. For example, the five-
paragraph essay, while perhaps a useful heuristic for introducing students to one basic
organizational structure, can generate unintended consequences when used repeatedly in high-
stakes assessment. The unintended consequence is students (and teachers) may focus too
much attention on the lower level features of this structure without addressing deeper writing
and thinking skills. The challenge for assessment design is to select a variety of general tools
and representations that are legitimately part of the domain so that students learn to use them
in various settings, adapting their thinking as necessary to effectively use those tools and
representations in task performance.
Lesson 3: Decide how achievement is to be conceptualized and use that
conceptualization in through-course assessment design. How TCA scores are aggregated
depends on how one conceptualizes achievement, with different conceptualizations implying
different designs and different approaches to aggregating TCA scores. The purpose here is not
to consider the (many) technical complexities, but rather to focus on how decisions might
influence the content and design of TCA. For example, if one wants to measure student growth
across the TCA in a year, then there must be considerable overlap in what is measured each
time to ensure something comparable to assess growth with. If one’s primary goal is to
document a student’s final status at year end, then the culminating TCA might comprehensively
cover the year’s work, with each preceding TCA used simply to refine the estimate provided by
that final measurement. Last, if one’s goal is to measure accomplishment, the individual TCAs
might each be constructed to measure different content and skills, probing those content and
skills in some depth, with the summary score across TCA taking the form of a composite.
CBAL designs have thus far primarily explored an accomplishment conceptualization of
achievement. In mathematics, key conceptual and developmental competencies have guided
the design of each TCA (e.g., development of proportional reasoning; understanding of the
concepts of variable and equality; and functions). In reading, broad text types (e.g., literary,
informational, persuasive) are used to focus the scenario-based task sets in a TCA, but each TCA
also includes a discrete task set to broaden coverage and potentially support longitudinal
linking. In writing, each TCA targets a specific writing genre (e.g., persuasive writing, critical
interpretation, appeal building).
Ongoing and Future Directions
One of the research foci of the current CBAL agenda is to understand the development
of competency across time. A useful way to operationalize this understanding is to postulate
developmental sequences—roughly, learning progressions (e.g., Heritage, 2008). While the
CCSS emphasize increasing skill sophistication across grades, often it is not clear precisely how
to interpret differences in standards across grades; and where the differences are clear, it is not
always clear how these descriptive claims are empirically grounded. Progressions, by contrast,
are often built around clearly defined qualitative shifts reflecting the emergence of new
cognitive capacities; and these, in turn, can be related to empirical observations from the
developmental literature.
In the CBAL mathematics strand, Harris and Bauer (2009) note that a rich research-
based understanding of mathematical competency is not sufficient to connect summative
assessment, formative assessment, and professional development in ways that can deeply
support learning. They argue that it is also necessary to consider how competency develops. As
such, the CBAL mathematics team has organized its assessments around developmental models
that: define stages of competency through which students are proposed to progress from a
cognitive perspective; are explicit about changes that occur as a consequence of learning;
provide a basis for defining a meaningful scale of measurement; and offer a road map for
supporting teaching and learning. In mathematics, there is a foundation of empirically based
models of learning progression that the team draws upon.
In reading and writing, there is less agreement about empirically based learning
progressions (Heritage, 2008). Some research is available, for example, with respect to the
development of children’s understanding of narrative (McKeough, 2007; Nicolopoulou &
Bamberg, 1997; Nicolopoulou, Blum-Kulka, & Snow, 2002) and argumentative writing (Felton &
Kuhn, 2001; Kuhn, 1999; Kuhn & Udell, 2003). In other cases, the research literature is quite
sparse or inconclusive, and we have had to glean information from various sources, including
curricula and standards, in order to propose progressions that make sense in terms of what is
known about child development and the progression of standards, even if they cannot yet be
validated directly. Thus, the developmental sequences embedded in the CBAL reading and
writing model constitute hypotheses that we intend to verify and revise as research proceeds.
This paper reviewed some of the lessons learned from four years of work on CBAL, a
research and development activity centered around creating a model for innovative K-12
assessment. Among the more general lessons learned from our experience is the importance of
focusing on a small number of clearly articulated assessment purposes, since the purposes of
an assessment, and their relative priority, has a major impact on its design. In CBAL, our primary
purposes have been (a) to measure student achievement effectively and (b) to create
assessments that also function as worthwhile educational experiences. We are attempting to
fulfill that second purpose by grounding our assessment design in learning sciences research, as
well as in content standards, relying heavily in design on such devices as scenario-based task
sets, and on tools and representations modeling good teaching and scaffolding effective
learning practice.
A second lesson learned was that a theory of action can be an indispensible tool in
guiding the design of through-course assessments and in evaluating the extent to which validity
and impact claims for the overall assessment system can be supported. The purposes that drive
an assessment are constrained by the role that that assessment is supposed to play in the
theory of action; thus, our experience suggests that significant thought should be given early on
as to exactly what role the TCAs in a particular assessment system will play in the theory of
action. This conclusion implies a prerequisite action very early in the test design process,
namely, specifying the theory of action in enough detail to make it useable for assessment
design and evaluation purposes.
Finally, we learned that much depends on how achievement is conceptualized. If
achievement is viewed in terms of growth, through-course assessments must be designed to
support measurement of change, which implies similar content across TCAs. On the other hand,
if achievement is viewed in terms of accomplishment, the contents of specific TCAs may be
strongly linked to curricular decisions and provide much less support for growth modeling (but
may provide rather more coverage of the full construct).
All of these considerations imply that the design of through-course assessment is not a
straightforward process, since we must think through how each design decision will play out
across every each node in the theory of action.
Bennett, R. E. (in press). CBAL: Results from piloting innovative K-12 assessments. Princeton, NJ:
Educational Testing Service.
Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education:
Principles, Policy and Practice, 18, 5–25.
Bennett, R. E. (2010). Cognitively Based Assessment of, for, and as Learning: A preliminary
theory of action for summative and formative assessment. Measurement:
Interdisciplinary Research and Perspectives, 8, 70–91.
Bennett, R. E., & Gitomer, D. H. (2009). Transforming K-12 assessment: Integrating
accountability testing, formative assessment and professional support. In C. Wyatt-
Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43–61).
New York, NY: Springer.
Bennett, R. E., Kane, M., & Bridgeman, B. (2011). Theory of action and validity argument in the
context of through-course summative assessment. Princeton, NJ: Educational Testing
Deane, P. (in press). NLP methods for supporting vocabulary analysis. In J. P. Sabatini & E. R.
Albro (Eds.), Assessing reading in the 21st century: Aligning and applying advances in the
reading and measurement sciences. Lanham, MD: Rowman and Littleford Education.
Deane, P. (2010). The skills underlying writing expertise: Implications for K-12 writing
assessment. Princeton, NJ: ETS.
Deane, P., Sabatini, J., & Fowles, M. (2011, February). Rethinking K-12 writing assessment to
support best instructional practices. Paper presented at the Writing Research Across
Borders II conference, Fairfax, VA.
Felton, M., & Kuhn, D. (2001). The development of argumentive discourse skill. Discourse
Processes, 32(2/3), 135–153.
Graf, A. E. (2009). Defining mathematics competency in the service of cognitively based
assessment for grades 6 through 8 (ETS Research Report No. RR-09-42). Princeton,
Harris, K., & Bauer, M. I. (2009, September). Using assessment to infuse a rich mathematics
disciplinary pedagogy into classrooms. Paper presented at the 35th International
Association for Educational Assessment (IAEA) Annual Conference, Brisbane, Australia.
Harris, K., Bauer, M. I., & Redman, M. (2008, September). Cognitive based developmental
models used as a link between formative and summative assessment. Paper presented
at the 34th International Association for Educational Assessment (IAEA) Annual
Conference, Cambridge, England.
Heritage, M. (2008). Learning progressions: Supporting instruction and formative assessment.
Paper prepared for the Formative Assessment for Teachers and Students (FAST) State
Collaborative on Assessment and Student Standards (SCASS) of the Council of Chief State
School Officers (CCSSO). Retrieved from the CCSSO website:
Kuhn, D. (1999). A developmental model of critical thinking. Educational Researcher, 28(2), 16–
Kuhn, D., & Udell, W. (2003). The development of argument skills. Child Development, 74(5),
McKeough, A. (2007). Best narrative writing practices when teaching from a developmental
framework. In S. Graham, C. MacArthur, & J. Fitzgerald (Eds.), Best practices in writing
instruction (pp. 50–73). New York, NY: Guilford.
Nicolopoulou, A., & Bamberg, M. G. W. (1997). Children and narratives: Toward an interpretive
and sociocultural approach. In M. Bamberg, (Eds.), Narrative development: Six
approaches (pp. 179–215). Mahwah, NJ: Lawrence Erlbaum Associates.
Nicolopoulou, A., Blum-Kulka, S., & Snow, C. E. (2002). Peer-group culture and narrative
development. In S. Blum-Kulka & C. E. Snow (Eds), Talking to adults: The contribution of
multiparty discourse to language acquisition (pp. 117–152). Mahwah, NJ: Lawrence
Erlbaum Associates.
O’Reilly, T., & Sheehan, K. M. (2009). Cognitively Based Assessment of, for, and as Learning: A
framework for assessing reading competency (ETS Research Report. No. RR-09-26).
Princeton, NJ: ETS.
Sheehan, K. M., & O’Reilly, T. (in press). The case for scenario-based assessments of reading
competency. In J. P. Sabatini & E. R. Albro (Eds.), Assessing reading in the 21st century:
Aligning and applying advances in the reading and measurement sciences. Lanham, MD:
Rowman and Littleford Education.
Screen Shots of Mathematics, Reading, and Writing Assessments
... In a test situation, the test taker is placed as a key character within the scenario. As Sabatini et al. (2011) stated, these "task sets are composed of a series of related tasks that unfold within an appropriate social context" (p. 10). ...
... This test development also showed that it was possible to transform a traditional assessment of and for learning into an assessment as learning for the test takers (in the spirit of the Learning-Oriented Assessment approach). As Sabatini et al. (2011) aptly pointed out, such an assessment design could not only raise students' awareness about how a series of language use activities unfold in a "larger social activity system," but also provide opportunities for students to develop their strategic competence while focusing on problem solving. The SBA approach used was also a pioneering attempt in language assessment as most test development projects were still rather traditional in design. ...
A new computer-assisted test of academic English for use at an Asian University was commissioned by administrators. The test was designed to serve both placement and diagnostic purposes. The authors and their team conceptualized, developed, and administered a scenario-based assessment with an online delivery with independent and integrated language skills tasks. The project provided many advantages: (1) the test would be locally developed by university faculty and students who would have a good understanding of the test takers and the needs of the university, (2) the test would use topics, texts, and materials and technology that are socially and culturally appropriate and sensitive to the local context, and (3) the sustainability of the test would be higher as it were cost-effective in the long run in comparison to purchasing and renewing a license for an international test. This article documents the key considerations and processes in the development of this new scenario-based test of academic English that was conceptualized and designed by faculty and students collaboratively. It also discusses the challenges involved in the implementation of such a test, including resistance from local assessment culture and high workload of language teachers.
... Different fields have helped inform theory and practice, and one approach, which began as part of K-12 assessment, seems particularly promising-the approach of Scenario-based Assessment (SBA). SBA was initially developed as part of the Cognitively Based Assessment of, for and as Learning (CBAL), an initiative that aimed to use findings from cognitive and behavioral sciences to inform the development of innovative assessments, such as those to be designed in alignment to the Common Core State Standards (Sabatini, Bennet, & Deane, 2011). In this way, SBA has been used to explore the assessment of mathematical skills (Bennet, 2011;Harris & Bauer, 2009), reading proficiency (O'Reily & Sheehan, 2009Sabatini & O'Reily, 2013), and writing skills (Deane, Sabatini, & Fowles, 2012). ...
... SBA was initially developed as part of the Cognitively Based Assessment of, for and as Learning (CBAL), an initiative that aimed to use findings from cognitive and behavioral sciences to inform the development of innovative assessments, such as those to be designed in alignment to the Common Core State Standards (Sabatini, Bennet, & Deane, 2011). In this way, SBA has been used to explore the assessment of mathematical skills (Bennet, 2011;Harris & Bauer, 2009), reading proficiency (O'Reily & Sheehan, 2009Sabatini & O'Reily, 2013), and writing skills (Deane, Sabatini, & Fowles, 2012). ...
Full-text available
Download from here:
... Research has documented the learning benefits of participating in assessment because it helps students encode information into long-term memory (Rohrer & Pashler, 2010). Therefore, participating in a well-designed performance assessment can serve both as a productive instructional episode and as an assessment event (see, e.g., Sabatini, Bennett, & Deane, 2011). Shepard (2000) and others have argued that high-quality tasks and assessments provide teachers and students the opportunity to learn more about the content being assessed. ...
... To develop the WEC prototype, we built on earlier work conducted at Educational Testing Service that developed design thinking for scenario-based assessments that informed the present work (e.g., Cognitively Based Assessment of, for, and as Learning [CBAL]; Bennett, 2010;Deane et al., 2015;Sabatini et al., 2011). We also build on advances by recent large-scale initiatives such as (a) the Common Core State Standards for English language arts (National Governors Association Center for Best Practices & Council of Chief State School Officers, 2010), (b) the Gordon Commission (2013), and (c) the Partnership for 21 st Century Skills (2008,2015). ...
Full-text available
Structured Abstract • Background: An expanded skillset is needed to meet today's shifting workplace demands, which involve collaboration with geographically distributed multidisciplinary teams. As the nature of work changes due to increases in automation and the elevated need to work in multidisciplinary teams, enhanced visions of Workplace English Communication (WEC) are needed to communicate with diverse audiences and effectively use new technologies. Thus, WEC is ranked as one of the top five skills needed for employability. Even so, results of employer surveys report that incoming employees lack communication competency (National Association of Colleges and Employers [NACE], 2018). To address this issue, with a focus on WEC teaching and assessment, we describe a framework used to guide the design of WEC modules. We suggest that conceptual frameworks can be used to inform the design process of the module. In
... The CBAL ELA competency model (Bennett, 2010;Deane et al., 2013;Sabatini, Bennett, & Deane, 2011;Sheehan & O'Reilly, 2011) offers several lenses to analyze competencies involved in literacy practices, one of which is the five modes of cognitive representation. For example, reading requires people to coordinate multiple representations of the literacy activities, including representations in the social, conceptual, discourse, verbal, and print dimensions. ...
Full-text available
This paper presents a framework intended to link the following assessment development concepts into a systematic framework: evidence-centered design (ECD), scenario-based assessment (SBA), and assessment of, for, and as learning. The context within which we develop this framework is the English language arts (ELA) for K-12 students, though the framework could easily be applied to cover reading, writing, and critical thinking skills from pre-K through college. Central to the framework is the concept of a key practice, drawn from constructivist learning theory, which emphasizes the purposeful social context within which skills are recruited and organized to carry out complex literacy tasks. We argue that key practices provide a key link between existing CBAL™ ELA learning progressions (defined as part of a student model for literacy skills) and the structure of well-designed SBAs. This structure enables us to design assessments that model a key practice, supporting the systematic creation of task sequences that can be used to support both instruction and assessment.
Developing EFL/ESL learners' speaking skills play a crucial role in making them capable communicators of the target language in a real-life situation. As a recently developing frame or approach, scenario-based language learning may be utilized to foster the English language speaking skills of language learners. In this respect, the current study contains sections to discuss scenario-based language learning, to present how to use this approach to develop language learners' speaking skills development, to present the implementation of scenario-based language learning in online classrooms, and propose a sample scenario-based speaking task series for EFL learners.
Educational policies such as Race to the Top in the USA affirm a central role for testing systems in government-driven reform efforts. Such reform policies are often referred to as the global education reform movement (GERM). Changes observed with the GERM style of testing demand socially engaged validity theories that include consequential research. The article revisits the Standards and Kane’s interpretive argument (IA) and argues that the role envisioned for consequences remains impoverished. Guided by theory of action, the article presents a validity framework, which targets policy-driven assessments and incorporates a social role for consequences. The framework proposes a coherent system that makes explicit the interconnections among policy ambitions, testing functions, and the levels/sectors that are affected. The article calls for integrating consequences into technical quality documentation, demands a more realistic delineation of stakeholders and their roles, and compels engagement in policy research.
Full-text available
This report describes the foundation and rationale for a framework designed to measure reading literacy. The aim of the effort is to build an assessment system that reflects current theoretical conceptions of reading and is developmentally sensitive across a prekindergarten to 12th grade student range. The assessment framework is intended to document the aims of the assessment program, define the target constructs to be assessed, and describe the assessment designs that are aligned with the aims and constructs. This framework report is preliminary in that we are engaged in an iterative process of writing and revising the framework, based on what we learn from efforts to instantiate the ideas in new assessment designs and the results we garner from piloting novel designs. We also anticipate drafting further sections to address issues such as the scoring models and analytic plans once assessments have been designed and piloted.
Full-text available
This paper describes the role developmental models can play in linking formative and summative assessment. We explore this issue in the context of a larger assessment design and pilot effort that is focused on middle school mathematics and English language arts, called Cognitively Based Assessment of, for and as Learning (CBAL) (Bennett & Gitomer, in press). The overall structure consists of a summative assessment system, and a formative assessment system accompanied by professional support. Research-based competency models provide a foundation for the design overall. In this context, we describe a developmental model of proportional reasoning, based upon Baxter and Junker's model in Weaver and Junker (2004) and how it was integrated into the overall design of the mathematics assessment. The developmental model helped improve the validity argument of the summative assessment by enabling us to define principled levels of competency. Because the developmental model contains research-based descriptors, for the formative assessments it provided teachers with reference points for what students can do and recommendations for areas on which they can focus to develop proficiency in a systematic way. In this way, the developmental model provided a strong link between the formative and summative assessment systems. Introduction. Developmental models provide descriptions of skills at different levels of development. They often provide an understanding of how skills develop over time. This paper explores the use of developmental models for assessment design. We focus on the value they can play in supporting internal coherence (Gitomer & Duschl, 2007) in the alignment of formative and summative assessments. We explore this idea in the context of an assessment design effort (Bennett & Gitomer, in press) that began with the premise that cognitive models should drive the entire design and development process. The intent is to start with a research foundation about the full breadth and depth of skills in a domain, and base both summative and formative assessments upon it. In this way, the system of accountability, and the evidence teachers use for instructional decision-making in the classroom are aligned to a common and rich understanding of what it means to be competent in areas such as reading, writing, and mathematics. This paper focuses on a specific aspect of the on-going research: the pivotal role that developmental models have played in facilitating this alignment in mathematics. We describe the overall effort and discuss the design methodologies we are employing. Within this context, we describe the role that developmental models are playing in aligning the formative and summative designs in mathematics. We then provide some examples from early pilot work in middle school mathematics to illustrate the value of developmental models on the design of summative and formative assessment. Developmental Models. Pellegrino and Goldman (2008) analyzed four contemporary mathematics curricula for K-8. One finding was that for each curriculum there was typically a sequential knowledge progression in what is assessed both within and across content and process strands, although the explicit nature of this progression is often unclear. A sequence of mathematical content has prerequisites, such as ratio precedes proportion. What is missing from the sequence is how students develop cognitively from the concept of ratio to that of proportion. A developmental model for a competency such as proportional reasoning provides this missing information. For example, a ratio is the relationship between two objects or quantities. Proportional reasoning requires considering the relationship between two relationships (the ratios). Students have to progress from the concept of an absolute change, e.g. the difference between a mother's and daughter's age over time, to a multiplicative or relative change, e.g. ratio of cups of water to cups of pancake mix as one scales a recipe for more people.
Full-text available
This paper presents the rationale and research base for a reading competency model designed to guide the development of cognitively based assessment of reading comprehension. The model was developed from a detailed review of the cognitive research on reading and learning and a review of state standards for language arts. A survey of the literature revealed three key areas of reading competency: prerequisite reading skill, model building skill, and applied comprehension skill. Prerequisite reading skill is the ability to read text accurately and fluently. Model building skill is the ability to construct meaning from either decoded text or spoken language. Applied comprehension skill is the ability to use and apply the information contained in text for some particular purpose. The framework is discussed in terms of 7 key principles that have implications for the design of a modern assessment of reading.
Full-text available
The critical thinking movement, it is suggested, has much to gain from conceptualizing its subject matter in a developmental framework. Most instructional programs designed to teach critical thinking do not draw on contemporary empirical research in cognitive development as a potential resource. The developmental model of critical thinking outlined here derives from contemporary empirical research on directions and processes of intellectual development in children and adolescents. It identifies three forms of second-order cognition (meta-knowing)—metacognitive, metastrategic, and epistemological—that constitute an essential part of what develops cognitively to make critical thinking possible.
Full-text available
The skills involved in argument as a social discourse activity presumably develop during the childhood and adolescent years, but little is known about the course of that development. As an initial step in examining this development, a coding system was developed for the purpose of analyzing multiple dialogues between peers on the topic of capital punishment. A comparison of the dialogues of young adolescents and those of young adults showed the teens to be more preoccupied with producing the dialogue and less able to behave strategically with respect to the goals of argumentive discourse. Teens also did not exhibit the strategic skill that adults did of adapting discourse to the requirements of particular argumentive contexts (agreeing vs. disagreeing dialogues).
CBAL, an acronym for Cognitively Based Assessment of, for, and as Learning, is a research initiative intended to create a model for an innovative K–12 assessment system that provides summative information for policy makers, as well as formative information for classroom instructional purposes. This paper summarizes empirical results from 16 CBAL summative assessment pilots involving almost 10,000 online administrations conducted between 2007 and 2010. The results suggest that, on average, the CBAL initiative was successful in building innovative assessments in reading, writing, and mathematics that worked as intended. However, there was considerable variation in the functioning of test forms, especially in writing and math. That variation might suggest that the knowledge needed to produce high-quality forms of these innovative tests in a replicable and scalable manner is not yet at hand. Further, although the results described offer a significant start, many critical questions must be answered before CBAL assessments (or ones like them) are ready for high-stakes operational use.