About
67
Publications
44,490
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,386
Citations
Introduction
Current institution
Additional affiliations
January 2015 - October 2020
August 1997 - May 1999
September 1990 - present
Publications
Publications (67)
This paper investigates the issue of internal validity in the context of complex assessments and constructs, focusing on Information and Communication Technology (ICT) literacy as measured by the iSkills assessment. Utilizing exploratory and confirmatory factor analyses, the paper aims to explore the internal structure of the iSkills assessment vis...
We investigated how item formats influence test takers’ response tendencies under uncertainty. Adult participants solved content-equivalent math items in three formats: multiple-selection multiple-choice, grid with forced-choice (true-false) options, and grid with non-forced-choice options. Participants showed a greater tendency to commit (rather t...
Computer‐based educational assessments often include items that involve drag‐and‐drop responses. There are different ways that drag‐and‐drop items can be laid out and different choices that test developers can make when designing these items. Currently, these decisions are based on experts’ professional judgments and design constraints, rather than...
The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of...
The current study investigated how item formats and their inherent affordances influence test‐takers’ cognition under uncertainty. Adult participants solved content‐equivalent math items in multiple‐selection multiple‐choice and four alternative grid formats. The results indicated that participants’ affirmative response tendency (i.e., judge the gi...
Few would argue about the growing importance of information and communication technology (ICT) literacy as relatively new and distinct skills that affect educational attainment, workforce readiness, and lifelong learning. There is less agreement, however, as to what ICT literacy skills and knowledge are, how best to measure them, and strategies for...
Introduction Assessment design methodologies such as evidence-centered design (ECD) provide an approach for representing the " argument " underlying an assessment. Argument-based structures articulate the chain of reasoning connecting task-level data to the evidence required to support assessment claims about the student. Intelligent tutoring syste...
Introduction Advancements in technology have led to a revolution in assessments. No longer limited to the bubble-and-booklet approach of the 20 th century, today's assessments may involve rich, interactive exercises, assess new and complex constructs, and automatically record and score evidence of skills. Still, the inferences drawn from technology...
Digital information literacy (DIL)—generally defined as the ability to obtain, understand, evaluate, and use information in a variety of digital technology contexts—is a critically important skill deemed necessary for success in higher education as well as in the global networked economy. To determine whether college graduates possess the requisite...
In this chapter, we first attempt to characterize cognitive models in ways that are useful to the assessment domain and provide a rationale for developing and validating cognitive models for the applied goals in this field. Having clarified the value of cognitive models for assessment, we introduce different forms of evidence that have promise and...
Score reports have one or more intended audiences: the people who use the reports to make decisions about test takers, including teachers, administrators, parents and test takers. Attention to audience when designing a score report supports assessment validity by increasing the likelihood that score users will interpret and use assessment results a...
Trialogue-based tasks can be used to gather evidence that may be difficult to obtain using traditional assessment approaches, such as embedded questions. However, more research needs to be done in order to create valid, fair, and reliable conversation tasks that can be used for assessment purposes. This paper describes ongoing efforts at developing...
Despite coming of age with the Internet and other technology, many college students lack the information and communication technology (ICT) literacy skills—locating, evaluating, and communicating information—necessary to navigate and use the overabundance of information available today. This paper presents a study of the validity of a simulations-b...
The purpose of this study was to use G-theory variance components to determine the minimum number of items required to make stable cut-score estimates using the Angoff method. Data from operational standard setting studies were re-sampled. Proportionally stratified subsets of items were extracted from each full-length test using simple random sampl...
Traditional science assessments, such as multiple-choice tests, have been criticized as too limited to appropriately evaluate students' knowledge and skill in complex scientific reasoning. This paper explores the potential of a game-like assessment for improving measurement of middle-school students' science inquiry skills. The assessment, in which...
Traditional science assessments, such as multiple-choice tests, have been criticized as too limited to appropriately evaluate students' knowledge and skill in complex scientific reasoning. This paper explores the potential of a game-like assessment for improving measurement of middle-school students' science inquiry skills. The assessment, in which...
Today’s information and communications technology (ICT) provides unprecedented amounts ofinformation to organizations and their employees. This overabundance challenges workers,placing an increasing premium on skills of sifting through information of sometimes dubiousquality, integrating information critically, and producing well-reasoned conclusio...
Although the business community increasingly recognizes information literacy as central to its work, there remains the critical problem of measurement: How should employers assess the information literacy of their current or potential workers? In this article, we use a commercially available assessment to investigate the relationship between inform...
To engineers accustomed to the sensors and other instrumentation typically discussed in this magazine, a college entrance exam might hardly seem like measurement. Yet the construction and evaluation of educational instruments rest on familiar foundational measurement concepts. However, while engineering instruments measure physical properties, educ...
Although web-based meeting technology has advanced significantly in the past few years, few published examples of applying the technology to standard setting exist. Yet many of the costs and inconveniences of face-to-face meetings of geographically dis- persed experts could be eliminated through virtual standard setting, with experts participating...
Evaluating the trustworthiness of Internet-based or other digital information has become an essential 21st century skill. The iSkills™ assessment, from Educational Testing Service (ETS), purports to measure such digital evaluation skills, along with other digital literacy skills. In this work, we use an argument-based approach to assessment validat...
This study presents an investigation of information literacy as defined by the ETS iSkills™ assessment and by the New Jersey Institute of Technology (NJIT) Information Literacy Scale (ILS). As two related but distinct measures, both iSkills and the ILS were used with undergraduate students at NJIT during the spring 2006 semester. Undergraduate stud...
This study presents an investigation of information literacy as defined by the ETS iSkills™ assessment and by the New Jersey Institute of Technology (NJIT) Information Literacy Scale (ILS). As two related but distinct measures, both iSkills and the ILS were used with undergraduate students at NJIT during the spring 2006 semester. Undergraduate stud...
Abstract Despite coming of age with the Internet and other technology, many college students lack the information and communication,technology (ICT) literacy skills necessary to navigate, evaluate, and use the overabundance ofinformation available today. This paper describes the development and early administrations of the ETS iSkills™ assessment,...
This research applied a cognitive model to identify item features that lead to irrelevant variance on
the Test of Spoken English™ (TSE®). The TSE is an assessment of English oral proficiency and
includes an item that elicits a description of a statistical graph. This item type sometimes appears
to tap graph-reading skills—an irrelevant construct; T...
This paper investigates the predictive validity of various features of Generating Examples (GE) test items – algebra problems that pose mathematical constraints and ask examinees to provide example solutions meeting those constraints. Selection of item features was motivated by a cognitive model of how examinees solve GE items using informal soluti...
This research applies theories of graph comprehension to investigate the factors affecting how easily a graph can be described. We find that the structure of a graph- the number of visual chunks (Shah, Mayer, & Hegarty, 1999) to be described - influences the communicative quality of elicited descriptions. The work extends our understanding of graph...
This report is a case study of the process of creating experimental items for a new Graduate Record Examination (GRE®) Engineering Test. Using a prototyping tool called the Free-Response Authoring and Delivery System (FRADS), a single test developer generated 55 computer-based test items that require either complex constructed responses or multiple...
ABSTRACT The use of the Internet for course preparation is ill served by traditional, content-based search engines. This paper describes SourceFinder, a web search engine that locates text material based on linguistic characteristics, such as reading level. Combining SourceFinder with content-based searches may allow instructors more,easily to iden...
Problem-solving strategy is frequently cited as mediating the effects of response format (multiple-choice, constructed response) on item difficulty, yet there are few direct investigations of examinee solution procedures. Fifty-five high school students solved parallel constructed response and multiple-choice items that differed only in the presenc...
We evaluated a machine-scorable, computer-delivered response type for measuring quantitative reasoning skill. “Generating Examples” (GE) is built around items that present constraints and ask candidates to give one or more answers that meet those constraints. These items are attractive because, like many real-world problems, GE items can have multi...
We evaluated a computer-delivered response type for measuring quantitative skill. “Generating Examples” (GE) presents under-determined problems that can have many right answers. We administered two GE tests that differed in the manipulation of specific item features hypothesized to affect difficulty. Analyses related to internal consistency reliabi...
Design is an important part of engineering education, but cannot be assessed easily through current multiple‐choice tests. In this report, we explore the possibility of including design problems in assessments of engineering students. We report a study comparing constructed‐response design problems and constructed‐response versions of typical engin...
This report describes the Free-Response Authoring, Delivery, and Scoring System (FRADSS). FRADSS allows test developers, who might have no programming experience, to put their ideas for computer-based test items directly onto computer. The system was used for the GRE Mathematical Reasoning pre-pilot and is currently being used for the GRE Engineeri...
This study investigated the strategies subjects adopted to solve stem‐equivalent SAT‐Mathematics (SAT‐M) word problems in constructed‐response (CR) and multiple‐choice (MC) formats. Parallel test forms of CR and MC items were administered to subjects representing a range of mathematical abilities. Format‐related differences in difficulty were more...
We survey how several algorithm animation systems are used in Computer Science instruction. Reported student reactions to the use of these systems is favorable, but little information is available on their effectiveness for learning. We examine several formal studies that have implications for how animation systems can most effectively be used to t...
This chapter describes two examples of computer-based constructed-response questions that represent real-world tasks and must be automatically scored. The design specifications are described for each question, along with their corresponding interfaces. Particular attention is given to the iterative evolution of each interface, as well as the design...
While user interface toolkits and managers facilitate prototyping by programmers, few systems allow nonprogrammers to create their own applications. In this paper, we report some techniques that bring prototyping to nonprogramming domain experts, namely professional test developers at Educational Testing Service. The Free-Response Authoring and Del...
A previous study of new item types for the analytical measure of the GRE General Test found that the items loaded on three of four separable factors that were labeled verbal reasoning, informal reasoning, formal-deductive reasoning, and quantitative reasoning. The present study examined the issue of how processing differed for these item types in t...
Our goal in this pilot study is to explore students’ behavior as they learn about two search algorithms, observing the role of algorithm animations. We find that alternative animations of the same algorithm may provide different information and facilitate different types of reasoning.
Contrasts between constructed response items and stem-equivalent multiple-choice counterparts have yielded only a few weak generalizations. Such comparisons typically have involved averaging item characteristics, and this aggregation has masked differences in statistical properties at the item level. Moreover, even aggregated format differences hav...
Design is a complex cognitive task that pushes the limits of human information processing. How do expert designers handle this complexity? Professional and student architects solved a real-world diagram construction task that required satisfying multiple, sometimes conflicting, constraints to achieve an acceptable design. Professionals' initial des...
This paper presents a technique for applying the Rule Space model of cognitive diagnosis (Tatsuoka, 1983) to assessment in a semantically‐rich domain. Responses to 22 architecture test items, developed to assess a range of architectural knowledge, were analyzed using Rule Space. Verbal protocol analyses guided the construction of a model of examine...
This report describes development of a new tool for assessment research in graduate education. The tool, the Algebra Assessment System, is based on GIDE, a pre-existing program that diagnostically analyzes complex constructed responses to algebra word problems. The project had three goals. The first goal was to build a generically usable interface...
This paper presents a series of four experiments investigatingstudents' debugging of LISP programs. The experiments involve apopulation of subjects who know LISP reasonably well and whoseerrors are best classified as slips (Brown & Van Lehn, 1980).That is, students are unlikely to repeat the same errors eitherwithin their program or across programs...
This article presents a series of four experiments investigating students' debugging of LISP programs. The experiments involve a population of students who know LISP reasonably well in that their errors are best classified as slips (Brown & Van Lehn, 1980). That is, students are unlikely to repeat the same errors either within their program or acro...
Previous empirical studies that have relied on debugging (Lucas & Kaplan, 1974) and memory (Sheppard, Borst, Curtis, & Love,
1978) as measures of computer program comprehension have led to inconsistent results and conclusions regarding the benefits
of structure. The present experiment utilizes a new technique for measuring comprehension that is sen...
This study investigated the influence of prior computer experience on writing computer programs in natural English. A multiple regression analysis revealed that knowing more computer languages predicted more total number of words in the English programs. It was also found that subjects who knew both PASCAL and BASIC used more loops in their English...
Abstract Few would argue about the growing importance of information and communication technology (ICT) skills as a relatively new and distinct skill thataffects educational attainment, workforce readiness, and lifelong learning. There is less agreement, however, as to what these skills and knowledge are and how best to measure them. This paper out...