About
65
Publications
38,143
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,886
Citations
Introduction
Current institution
Publications
Publications (65)
Results from a sample of 1,013 Georgia principals who rated 12,617 teachers are used to compare holistic and analytic principal judgments with indicators of student growth central to the state’s teacher evaluation system. Holistic principal judgments were compared to mean student growth percentiles (MGPs) and analytic judgments from a formal observ...
Artificial Neural Networks (ANNs) have been proposed as a promising approach for the classification of students into different levels of a psychological attribute hierarchy. Unfortunately, because such classifications typically rely upon internally produced item response patterns that have not been externally validated, the instability of ANN estim...
This study examined how school and district leaders access, value, and use research. From a representative sample of school districts across the United States, we surveyed 733 school and district leaders as part of an effort to develop understanding of the prevalence of research use, the nature of leaders’ attitudes toward research, and individual...
The concept of growth is at the foundation of the policy and practice around systems of educational accountability. It is also at the foundation of what teachers concern themselves with on a daily basis as they help children learn. Yet there is a disconnect between the criterion-referenced intuitions that parents and teachers have for what it means...
This study focuses on an instance in which the mean grade-to-grade scale scores on a vertical scale showed evidence of common test items that do not get easier from one grade to the next. The issue was examined as part of a 2-day workshop in which participants were asked to predict the growth on all linking items used in the construction of vertica...
Previous research has demonstrated conclusively that value-added inferences are sensitive to the choice of outcome measure within the domains of math and reading. This implies that math and reading are multidimensional domains. However, conventional factor analyses offer little support of this interpretation. In this study some conceptual distincti...
We apply the Attribute Hierarchy Method (AHM; Gierl, Cui, & Hunka, 2008; Gierl, Leighton & Hunka, 2007) to model responses to a novel item format known as Ordered Multiple-Choice (OMC; Briggs, Alonzo, Schwab, & Wilson, 2006). A distinguishing feature of OMC items is that they have response options linked to multiple levels an underlying learning pr...
It is often assumed that a vertical scale is necessary when value-added models depend upon the gain scores of students across two or more points in time. This article examines the conditions under which the scale transformations associated with the vertical scaling process would be expected to have a significant impact on normative interpretations...
A vertical score scale is needed to measure growth across multiple tests in terms of absolute changes in magnitude. Since the warrant for subsequent growth interpretations depends upon the assumption that the scale has interval properties, the validation of a vertical scale would seem to require methods for distinguishing interval scales from ordin...
In a recent article published in EM:IP, Kingston and Nash report on the results of a meta-analysis on the efficacy of formative assessment. They conclude that the average effect of formative assessment on student achievement is about .20 SD units. This would seem to dispel the myth that effects between .40 and .70 can be attributed to formative ass...
A vertical scale, in principle, provides a common metric across tests with differing difficulties (e.g., spanning multiple grades) so that statements of absolute growth can be made. This paper compares 16 states' 2007-2008 effect size growth trends on vertically scaled reading and math assessments across grades 3 to 8. Two patterns common in past r...
Although previous meta-analyses have indicated a connection between inquiry-based teaching and improved student learning, the type of instruction characterized as inquiry based has varied greatly, and few have focused on the extent to which activities are led by the teacher or student. This meta-analysis introduces a framework for inquiry-based tea...
This paper focuses on the psychometric modeling of a specific item format known as ordered multiple choice (OMC). The OMC item format was developed to facilitate diagnostic assessment on the basis of levels from an underlying learning progression that is linked to constrained item response options. Though OMC items were developed by following the b...
It can be easy to stereotype psychometrics as a field that encourages research with an aim of discovering “angels dancing on the head of a pin.” This stereotype is not without merit: Many of the biggest advances in psychometric research are best understood as purely methodological improvements in the ways that the parameters of a given model can be...
Using longitudinal data for an entire state from 2004 to 2008, this article describes the results from an empirical investigation of the persistence of value-added school effects on student achievement in reading and math. It shows that when schools are the principal units of analysis rather than teachers, the persistence of estimated school effect...
Despite revealing some positive impacts, studies too often suffer from weak design and inadequate reporting.
A good aphorism can, in a few words, capture an essential truth. Of the many good aphorisms Paul Holland has coined over the
years, I have found myself invoking the one above frequently enough to worry that I should be paying out royalty fees, so
it is only fitting that I use it as the starting point for some ideas I wish to explore in this paper.
Most growth models implicitly assume that test scores have been vertically scaled. What may not be widely appreciated are the different choices that must be made when creating a vertical score scale. In this paper empirical patterns of growth in student achievement are compared as a function of different approaches to creating a vertical scale. Lon...
This paper presents results of an NSF project in which the goal is to provide a synthesis of research on instructional innovations that have been implemented in undergraduate courses in physics. The research questions guiding the project are: What constitutes the range of principal course innovations that are being implemented in undergraduate
phy...
The purpose of this study was to evaluate the sensitivity of growth and value-added modeling to the way an underlying vertical score scale has been created. Longitudinal item-level data were analyzed with both student- and school-level identifiers for the entire state of Colorado between 2003 and 2006. Eight different vertical scales were establish...
The development of reliable and valid measures of science teacher knowledge is essential for teacher education program evaluation purposes. A particular challenge to this effort lies in the fact that most programs serve pre-service teachers who have a range of disciplinary specialties and teaching areas (e.g., biology, chemistry, physics). Is it be...
Using observational data from the Education Longitudinal Survey of 2002, the effect of coaching on the SAT is estimated via linear regression and propensity score matching approaches. The key features of taking a propensity score matching approach to support causal inferences are highlighted relative to the more traditional linear regression approa...
This article illustrates the use of an explanatory item response modeling (EIRM) approach in the context of measuring group differences in science achievement. The distinction between item response models and EIRMs, recently elaborated by De Boeck and Wilson (2004)13.
De Boeck , P. and
Wilson , M. , eds. 2004. Explanatory item response models: A...
When causal inferences are to be synthesized across multiple studies, efforts to establish the magnitude of a causal effect should be balanced by an effort to evaluate the generalizability of the effect. The evaluation of generalizability depends on two factors that are given little attention in current syntheses: construct validity and external va...
The aim of this last chapter is threefold. First, we want to give the reader further insights into the estimation methods for the models presented in this volume. Second, we want to discuss the available software for the models presented in this volume. We will not sketch all possibilities of the software, but only those directly relevant to item r...
An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random effe...
In this article we describe the development, analysis, and interpretation of a novel item format we call Ordered Multiple-Choice (OMC). A unique feature of OMC items is that they are linked to a model of student cognitive development for the construct being measured. Each of the possible answer choices in an OMC item is linked to developmental leve...
This article raises some questions about the usefulness of meta-analysis as a means of reviewing quantitative research in the social sciences. When a meta-analytic model for SAT coaching is used to predict results from future studies, the amount of prediction error is quite large. Interpretations of meta-analytic regressions and quantifications of...
In the social sciences, evaluating the effectiveness of a program or intervention often leads researchers to draw causal inferences from observational research designs. Bias in estimated causal effects becomes an obvious problem in such settings. This article presents the Heckman Model as an approach sometimes applied to observational data for the...
In this chapter, we discuss two extensions to the item response models presented in the first two parts of this book: more than one random effect for persons (multidimensionality) and latent item predictors. We only consider models with random person weights (following a normal distribution), and with no inclusion of person predictors (except for t...
The act of constructing a measure requires a number of important assumptions. Principle among these assumptions is that the construct is unidimensional. In practice there are many instances when the assumption of unidimensionality does not hold, and where the application of a multidimensional measurement model is both technically appropriate and su...
The purpose of this paper is to describe how some students reconcile school with paid employment. Previous studies have found that students who work a moderate number of hours while in school actually maintain better academic performance than students who do not work at all. (They also perform better than students who work long hours.) Brief essays...
The question of how some high school students succeed in reconciling school and work was examined through a review of research on the topic. Special attention was paid to the comments made by students responding to the National Center for Research in Vocational Education's longitudinal study that followed approximately 1,500 students in high school...
Rather than relying on traditional measures of student performance, new admissions procedures for four-year institutions attempt to describe what students know and can do. Current procedures do not reflect integrated curriculum and applied learning, and they penalize multidisciplinary project and work-based learning program participants. Four state...
The validity of changes in school-level measures of student academic achievement is central to state and federal educational accountability systems. If such measures are "volatile" over time, their validity comes into question. It appears that correlational analyses are becoming a predominant means of diagnosing school-level volatility (Kane & Stai...
Thesis (Ph. D. in Education)--University of California, Berkeley, 2002. Includes bibliographical references (p. 168-175). Photocopy. s