Randy E. Bennett

Randy E. Bennett
Educational Testing Service | ETS · Research Division

About

172
Publications
153,949
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,298
Citations

Publications

Publications (172)
Article
Full-text available
In the United States, opposition to traditional standardized tests is widespread, particularly obvious in the admissions context but also evident in elementary and secondary education. This opposition is fueled in significant part by the perception that tests perpetuate social injustice through their content, design, and use. To survive, as well as...
Article
Full-text available
This paper covers six interrelated issues in formative assessment (aka, ‘assessment for learning’). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under‐representation of measurement principles in that conce...
Article
Full-text available
On the surface, this chapter concerns the evolution of educational assessment from a paper-based technology to an electronic one. On a deeper level, that evolution is more substantive. As has been noted, that evolution can be viewed in terms of developmental stages (Bennett, 1998, 2010b; Bunderson, Inouye, & Olsen, 1989). In the first section of th...
Article
Full-text available
CBAL (Cognitively Based Assessment of, for, and as Learning) is a research initiative intended to create a model for an innovative K–12 assessment system that documents what students have achieved (of learning); helps identify how to plan instruction (for learning); and is considered by students and teachers to be a worthwhile educational experienc...
Article
This article exemplifies how assessment design might be grounded in theory, thereby helping to strengthen validity claims. Spanning work across multiple related projects, the article first briefly summarizes an assessment system model for the elementary and secondary levels. Next the article describes how cognitive-domain theory and principles are...
Chapter
This entry focuses on automated scoring of questions calling for answers that cannot be graded using exact-matching techniques and that are used operationally in assessment programs and learning products. The entry describes the benefits of automated scoring, the contexts in which it has been used, and approaches to validation. Operational uses are...
Article
This commentary focuses on one of the positive impacts of COVID‐19, which was to tie societal inequity to testing in a manner that could motivate the reimagining of our field. That reimagining needs to account for our nation's dramatically changing demographics so that assessment generally, and standardized testing specifically, better fit the need...
Article
Grouping individuals according to a set of measured characteristics, or profiling, is frequently used in describing, understanding, and acting on a phenomenon. The advent of computer‐based assessment offers new possibilities for profiling writing because aspects can be captured that were not heretofore observable. We explored whether writing proces...
Article
本研究基于一个中学同等学力测验,考察了教育高危群体在写作过程上的性 别差异。研究涉及了来自美国23个州的合计三万多考生,每一考生均参与了 该语言测验的12副本考卷中的一个。研究借助键盘记录中抽取出的特征推断 背后的写作过程,并将之整合为7个过程指标。研究结果发现女性被试的作文 得分和语言测验总分均领先于男性,但领先程度很微弱。更重要的是,当控 制了语言测验总分、年龄和作文题目后,全部7个过程指标均显示出显著的性 别差异,其中,最突出的指标是流畅性和编辑性的不同方面。当前研究结果 在许多重要方面与先前一些对在校生和成人的研究结果相一致,也与在线和 纸笔写作任务的研究结果相吻合。关于对使用字符类语言(如汉语) 进行写作 的个体如何开展类似研究,文章结尾给出了一些建议。
Article
This study examined differences in the composition processes used by educationally at-risk males and females who wrote essays as part of a high-school equivalency examination. Over 30,000 individuals were assessed, each taking one of 12 forms of the examination’s language arts writing subtest in 23 US states. Writing processes were inferred using f...
Article
We evaluate how higher- vs. lower-scoring middle-school students differ in their composition processes when writing persuasive essays from source materials. We examined differences on four individual process features–time taken before beginning to write, typing speed, total time spent, and number of words started. Next, we examined differences for...
Chapter
Full-text available
This chapter focuses on making sense from test-score comparisons. The chapter begins with some basic premises. It then proceeds to a discussion of factors that can weaken the tenability of test-score comparisons. Finally, the chapter offers some suggestions for responsibly interpreting and communicating comparisons. Much of the content draws upon i...
Article
Full-text available
This study investigates the effects of a scenario-based assessment design on students' writing processes. An experimental data set consisting of four design conditions was used in which the number of scenarios (one or two) and the placement of the essay task with respect to the lead-in tasks (first vs. last) were varied. Students' writing processes...
Poster
Full-text available
A theoretically driven approach to assessment of, as, and for learning led to a scenario-based assessment (SBA) design to engage and assist students in writing. The structure of the SBA simulates a condensed writing project undertaken in an order that a skilled practitioner might follow. In this study, we analyzed students' keystroke logs and inves...
Article
We used an unobtrusive approach, keystroke logging, to examine students’ cognitive states during essay writing. Based on data contained in the logs, we classified writing process data into three states: text production, long pause, and editing. We used semi-Markov processes to model the sequences of writing states and compared the state transition...
Article
Full-text available
This paper presents a theoretical and empirical case for the value of scenario-based assessment (SBA) in the measurement of students’ written argumentation skills. First, we frame the problem in terms of creating a reasonably efficient method of evaluating written argumentation skills, including for students at relatively low levels of competency....
Book
Full-text available
The Handbook of Formative Assessment in the Disciplines addresses current developments in the field, with a focus on domain dependency. Building from an updated definition of formative assessment, the book covers the integration of measurement principles into practice; the operationalization of formative assessment within specific domains, beyond g...
Article
This study compared gender groups on the processes used in writing essays in an online assessment. Middle‐school students from four grades responded to essays in two persuasive subgenres, argumentation and policy recommendation. Writing processes were inferred from four indicators extracted from students’ keystroke logs. In comparison to males, on...
Article
Full-text available
Writing from source text is critical for developing college-and-career readiness because it is required in advanced academic environments and many vocations. Scenario-based assessment (SBA) represents one approach to measuring this ability. In such assessment, the scenario presents an issue that the student is to read and write about. Before writin...
Article
This article is a written adaptation of the Presidential address I gave at the NCME annual conference in April 2018. The article describes my thoughts on the future of assessment. I discuss eleven likely characteristics of future tests and, for each characteristic, why I think it is important and what to watch with respect to it. Next, I outline wh...
Article
Full-text available
The goal of this study is to model pauses extracted from writing keystroke logs as a way of characterizing the processes students use in essay composition. Low-level timing data were modeled, the interkey interval and its subtype, the intraword duration , thought to reflect processes associated with keyboarding skills and composition fluency. Heavy...
Book
Full-text available
This book is open access under a CC BY-NC 2.5 license. This book describes the extensive contributions made toward the advancement of human assessment by scientists from one of the world’s leading research institutions, Educational Testing Service. The book’s four major sections detail research and development in measurement and statistics, educa...
Chapter
Full-text available
This chapter reviews the historical roots of ETS from two perspectives. First, the requirements and history of the Internal Revenue Code (IRC) governing the establishment and operation of 501(c)3 organizations are described. Next, the people and events leading to the establishment of ETS as a nonprofit educational measurement organization are explo...
Chapter
Full-text available
This chapter synthesizes ETS contributions to educational research and policy analysis, psychology, and psychometrics covering seven decades. The synthesis is organized by decade, providing a picture of the persistent, as well as the changing, emphases that characterized ETS research over time.
Article
Through a synthesis of news accounts, research studies, survey results, and state and federal education department documents, this paper examines the opt-out movement and some of the dynamics that appear to underlie it. Several topics are covered, including the movement’s extent, the demographics of those participating in it, how much time students...
Chapter
Full-text available
This chapter concerns the automated scoring of answers to constructed-response items as seen through the lens of validity. That lens was chosen because scoring and validity cannot be meaningfully separated (Bennett, 2011; Bennett & Bejar, 1998). Attempts to treat automated scoring without a central focus on validity have too often led to a misunder...
Article
This study examined the relationship of a machine???scorable, constrained free???response computer science item that required the student to debug a faulty program to two other types of items: (1) multiple???choice and (2) free response requiring production of a computer program. Confirmatory factor analysis was used to test the fit of a three???fa...
Article
The use of constructed response items in large scale standardized testing has been hampered by the costs and difficulties associated with obtaining reliable scores. The advent of expert systems may signal the eventual removal of this impediment. This study investigated the accuracy with which expert systems could score a new, non???multiple choice...
Technical Report
Full-text available
Based on the changing demands of today’s workforce, advances in other nations, and original analysis, this report provides a set of criteria for high-quality student assessments. These criteria can be used by assessment developers, policymakers, and educators as they work to create and adopt assessments that promote deeper learning of 21stcentury s...
Chapter
Full-text available
This chapter reviews the contribution of new information-communication technologies to the advancement of educational assessment. Improvements can be described in terms of precision in detecting the actual values of the observed variables, efficiency in collecting and processing information, and speed and frequency of feedback given to the particip...
Article
CBAL, an acronym for Cognitively Based Assessment of, for, and as Learning, is a research initiative intended to create a model for an innovative K–12 assessment system that provides summative information for policy makers, as well as formative information for classroom instructional purposes. This paper summarizes empirical results from 16 CBAL su...
Chapter
Assessment design concerns the processes of gathering background information, building assessment arguments, authoring tasks, and developing scoring algorithms to support the intended purpose of an assessment. A wide variety of processes and knowledge representations are used to carry out these steps. Each offers opportunities for technology suppor...
Article
Computer technology is now commonly used in the workplace, in university education, and in the devices we employ for entertainment, communication, and transportation. As a consequence, it should be no surprise that technology is finding its way into assessment. In this article, the recent evolution of technology for large-scale educational assessme...
Article
Full-text available
This paper describes a study intended to demonstrate how an emerging skill, problem solving with technology, might be measured in the National Assessment of Educational Progress (NAEP). Two computer-delivered assessment scenarios were designed, one on solving science-related problems through electronic information search and the other on solving sc...
Article
Full-text available
People use external knowledge representations (EKRs) to identify, depict, transform, store, share, and archive information. Learning how to work with EKRs is central to be-coming proficient in virtually every discipline. As such, EKRs play central roles in cur-riculum, instruction, and assessment. Five key roles of EKRs in educational assessment ar...
Article
Full-text available
This paper describes a study intended to demonstrate how an emerging skill, problem solving with technology, might be measured in the National Assessment of Educational Progress (NAEP). Two computer-delivered assessment scenarios were designed, one on solving science-related problems through electronic information search and the other on solving sc...
Article
Full-text available
Many colleges and universities require handicapped students to submit scores from admissions tests such as the SAT, ACT, and GRE as part of the application process. Testing handicapped students for admissions purposes raises several critical issues. These issues relate to the accessibility of the admissions testing process, the content of admission...
Chapter
Full-text available
In this chapter, we ask whether advances in cognitive science, psychometrics and technology can transform the accountability paradigm that is currently in place in the United States. Of course, asking this question implies that there are problems with the present enactment of what is known as the No Child Left Behind Act, a system that requires eac...
Chapter
This article has no abstract.
Chapter
This article has no abstract.
Chapter
This article has no abstract.
Article
Full-text available
This article describes selected results from the Math Online (MOL) study, one of three field investigations sponsored by the National Center for Education Statistics (NCES) to explore the use of new technology in NAEP. Of particular interest in the MOL study was the comparability of scores from paper- and computer-based tests. A nationally represen...
Article
Full-text available
This study evaluated a “substantively driven” method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater®, to compare the performance of three approaches to automated essay scoring: a brute-empirical approach in which variables are selected and weighted solely according to statist...
Article
Full-text available
This study investigated the comparability of scores for paper and computer versions of a writing test administered to eighth grade students. Two essay prompts were given on paper to a nationally representative sample as part of the 2002 main NAEP writing assessment. The same two essay prompts were subsequently administered on computer to a second s...
Article
Full-text available
The Formulating-Hypotheses (F-H) item presents a situation and asks examinees to generate as many explanations for it as possible. This study examined the generalizability, validity, and examinee perceptions of a computer-delivered version of the task. Eight F-H questions were administered to 192 graduate students. Half of the items restricted exam...
Article
In this study, we created a computer-delivered problem-solving task based on the cognitive research literature and investigated its validity for graduate admissions assessment. The task asked examinees to sort mathematical word problem stems according to prototypes. Data analyses focused on the meaning of sorting scores and examinee perceptions of...
Article
In this study we examined alternative item types and section configurations for improving the discriminant and convergent validity of the GRE General Test. A computer-based test of reasoning items and a generating-explanations measure was administered to a sample of 388 examinees who previously had taken the General Test. Confirmatory factor analys...
Article
Full-text available
This publication presents the reports from two studies, Math Online (MOL) and Writing Online (WOL), part of the National Assessment of Educational Progress (NAEP) Technology-Based Assessment (TBA) project. Funded by the National Center for Education Statistics (NCES), the Technology-Based Assessment project is intended to explore the use of new tec...
Article
Full-text available
Computer-based simulations can give a more nuanced understanding of what students know and can do than traditional testing methods. These extended, integrated tasks, however, introduce particular problems, including producing an overwhelming amount of data, multidimensionality, and local dependence. In this paper, we describe an approach to underst...
Article
Full-text available
The goal of this study was to assess the feasibility of an approach to adaptive testing using item models based on the quantitative section of the Graduate Record Examination (GRE) test. An item model is a means of generating items that are isomorphic, that is, equivalent in content and equivalent psychometrically. Item models, like items, are cali...
Article
This paper argues that the inexorable advance of technology will force fundamental changes in the format and content of assessment. Technology is infusing the workplace, leading to widespread requirements for workers skilled in the use of computers. Technology is also finding a key place in education. This is occurring not only because technology s...
Article
The purpose of this study was to evaluate whether variance due to computer-based presentation was associated with performance on a new constructed-response type-Mathematical Expression (ME) -that requires examinees to enter mathematical expressions. Participants took parallel computer-based and paper-based tests consisting of ME items, plus a test...
Article
Full-text available
This article discusses some of the advantages of computer-based testing and highlights efforts by several states and organizations to introduce electronic assessment. It also describes the challenges policymakers face in planning and implementing such an initiative and details the steps they can take to pursue this type of testing. As electronic le...
Article
Shows how schema theory can be applied to the automatic generation of items, the scoring of more complex test responses, and to intelligent tutoring. Schema theory is applied to the automatic generation and variation of items, the analysis of multiple-line solutions, and delivery of instructions. The key to this analysis is the assertion that math...
Article
Full-text available
Measures of accomplishments-notable attainments that have been publicly recognized-have promise for research and practice in education and other applied fields. In this study we investigated sex and ethnic group differences on these measures for students bound for graduate school. Examined were 6 accomplishments scales (Academic Achievement, Leader...
Article
Full-text available
Large-scale assessment in the United States is undergoing enormous pressure to change. That pressure stems from many causes. Depending upon the type of test, the issues precipitating change include an outmoded cognitive-scientific basis for test design; a mismatch with curriculum; the differential performance of population groups; a lack of informa...
Article
Abstract The purpose of this study was to evaluate whether variance due to computer-based presentation was associated with performance on a new constructed-response type -- Mathematical Expression -- that requires examinees to build mathematical expressions using a mouse and an on-screen tool palette. Participants took parallel computer-based and p...
Article
Three open-ended response types—mathematical expression (ME), generating examples (GE), and graphical modeling (GM)—are described that could broaden the conception of mathematical problem solving used in computerized admissions tests. ME presents single-best-answer problems that call for an algebraic formalism, the correct rendition of which can ta...
Article
Full-text available
We investigated the functioning of a new computer-delivered response type for potential use in graduate admissions assessment. This response type, which is open-ended and automatically scorable, presents problems calling for the examinee to draw a graph modeling a given situation. Problem situations can be like the single-best-answer items currentl...
Article
Full-text available
Problem-solving strategy is frequently cited as mediating the effects of response format (multiple-choice, constructed response) on item difficulty, yet there are few direct investigations of examinee solution procedures. Fifty-five high school students solved parallel constructed response and multiple-choice items that differed only in the presenc...
Article
We evaluated a machine-scorable, computer-delivered response type for measuring quantitative reasoning skill. “Generating Examples” (GE) is built around items that present constraints and ask candidates to give one or more answers that meet those constraints. These items are attractive because, like many real-world problems, GE items can have multi...
Article
Full-text available
We evaluated a computer-delivered response type for measuring quantitative skill. “Generating Examples” (GE) presents under-determined problems that can have many right answers. We administered two GE tests that differed in the manipulation of specific item features hypothesized to affect difficulty. Analyses related to internal consistency reliabi...
Article
Under some circumstances, allowing examinees to choose which test items to respond to may increase test validity. In this study, we explored how choice, that is, allowing examinees to assign themselves to test questions, affected examinee performance and test characteristics for a measure of the ability to generate hypotheses about a given situatio...
Article
Addresses how new technology and advances in cognitive and measurement science can transform large-scale educational assessments, particularly testing for educational admissions. The critical assessment areas discussed are test design, item generation, task presentation, scoring, and testing purpose and location. For each area, the article identifi...
Article
To measure problem solving and related cognitive constructs effectively, future generations of tests will need to include tasks more like those actually encountered in academic and work settings. The advent of computer-based tests makes the inclusion of such performance tasks more feasible. One way in which some tasks might be made more relevant is...
Article
To measure problem solving and related cognitive constructs effectively, future generations of tests will need to include tasks more like those actually encountered in academic and work settings. The advent of computer-based tests makes the inclusion of such performance tasks more feasible. One way in which some tasks might be made more relevant is...
Article
What are the validity issues involved in automated scoring of tests? What is the nature of the interplay among construct definition, task design, examinee interface, tutorial, test development tools, and automated scoring and reporting?
Article
Acknowledgments Many individuals contributed to this study. Bob Mislevy advised us on study design; Jan Flaugher managed development and data collection; Kelli Boyles and Lois Frankel wrote items, created rubrics, assembled test forms, and scored a subset of responses; Holly Knott produced the test packages; Peggy Redman conducted examinee intervie...
Article
The potential benefits of computer-based testing include the ability to present a wider variety of item formats and to tailor tests for specific purposes. In this study we examined the relationships among revised reasoning items that had more varied formats than the traditional items, a constructed-response generating-explanations task, and the cur...
Article
Full-text available
This paper offers a scenario for how educational assessment might change in response to market forces that affect not only the future of large-scale testing but also society in general. The scenario divides into three generations distinguished by the purpose of testing, test format and content, and the extent to which testing capitalizes on new tec...
Article
In this report, we describe the development of general, accurate, cost-effective, and immediately usable automatic analysis routines that dichotomously score rational expressions of arbitrary complexity. This development has led to the new Mathematical Expression (ME) response type, which may appear operationally in the new GRE Mathematical Reasoni...
Article
In this paper, I offer one scenario for the future of large-scale educational assessment. I argue that, although large-scale assessment has altered relatively little in recent years, this situation is about to change. The same competitive forces driving U.S. industry will compel test makers to (1) satisfy new market needs through continuous innovat...
Article
Generating Explanations (GE) is a computer-delivered item type that presents a situation and asks the examinee to pose as many plausible reasons for it as possible. Previous research suggests that GE measures a divergent thinking ability largely independent of the convergent skills tapped by the GRE General Test. This study was conducted to determi...
Article
The first generation of computer-based tests depends largely on multiple-choice items and constructed-response questions that can be scored through literal matches with a key. This study evaluated scoring accuracy and item functioning for an open-ended response type where correct answers, posed as mathematical expressions, can take many different s...
Book
Full-text available
Early work on automated scoring predated the ready availability of mechanisms for inexpensively delivering computer-based tests and collecting responses. Hence, this work used responses to conventionally delivered tasks that had somehow been translated to machine-readable form. The necessity of operating in this manner focused attention initially o...
Article
Relatively little is known about the characteristics of inner-city adults who seek assistance from literacy programs. Increased knowledge about this population will enhance the development of more effective programs, as well as policy options. This study describes the characteristics of 280 adults, ages 16 to 63, who came to an adult literacy progr...
Article
Full-text available
This study investigated the strategies subjects adopted to solve stem‐equivalent SAT‐Mathematics (SAT‐M) word problems in constructed‐response (CR) and multiple‐choice (MC) formats. Parallel test forms of CR and MC items were administered to subjects representing a range of mathematical abilities. Format‐related differences in difficulty were more...
Article
Changing goals in mathematics education have encouraged more open-ended problem solving in assessment. However, the use of these less constrained approaches has been limited by a lack of demonstrated relations between the underlying cognitive models and measurement consequences. In order to begin to characterize the cognitive basis for this emergin...
Article
Expert systems have the potential to help computer-based testing programs give qualitative feedback about examinee performance on constructed-response items. This study evaluated the accuracy of such feedback for algebra word problems. The responses of Graduate Record Examinations examinees were diagnostically analyzed by an expert system and by fo...
Article
One of the main limitations of the current generation of computer-based tests is its dependency on the multiple-choice item. Our work is aimed at extending computer-based testing by bringing limited forms of performance assessment to it in the domain of mathematics. This endeavor involves not only building task types that better reflect valued prob...
Article
The Advanced Placement Computer Science (APCS) Practice System is an instructional assessment device meant to give students practice in, and feedback on, elementary programming tasks. The system includes a programming editor, item pool, two feedback facilities, and electronic portfolio tools. Students write Pascal procedures using the editor, test...
Article
Two computer-based categorization tasks were developed and pilot tested. In study 1, the task asked examinees to sort mathematical word problem stems according to prototypes. Results with 9 faculty members and 107 undergraduates showed that those who sorted well tended to have higher Graduate Record Examination General Test scores and college grade...
Article
ETS Is moving rapidly to computerize its tests for admissions to post-secondary education and occupational licensure/certification. Computerized tests offer important advantages, including immediate score reporting, the convenience of testing when the examinee wishes, and for adaptive tests, equal accuracy throughout the score scale and a shorter t...
Article
Large-scale institutional testing, and testing in general, are in a period of rapid change. Among the more obvious dimensions is the growing use of constructed-response items and of computer-based testing. This study explores the potential for using a computer-based scoring procedure for the formulating-hypotheses item. This item type presents a si...
Article
This study evaluated expert system diagnoses of examinees' solutions to complex constructed-response algebra word problems. Problems were presented to three samples, each of which had taken the GRE General Test. One sample took the problems in paper-and-pencil form and the other two on computer. Responses were then diagnostically analyzed by an exp...
Article
Full-text available
Evaluated the hypothesis that gender and behavior, as perceived by teachers, affect judgments of the academic skill of their students. A path model was proposed to describe the relationships among tested academic skill, gender, behavior grades, and teachers' academic judgments. The model was evaluated separately in each of 3 grades (kindergarten–2n...

Network