
Craig Wells- University of Massachusetts Amherst
Craig Wells
- University of Massachusetts Amherst
About
32
Publications
9,573
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,002
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (32)
Efforts to address persistent intergenerational poverty in the Global South have focused, in part, on improving both access to and quality of schooling for all children, often including teacher training and provision of materials. Child Aid supports literacy development in hundreds of primary schools in indigenous communities in the rural highlands...
The Task Force on Foundational Competencies in Educational Measurement has produced a set of foundational competencies and invited comment on the document. The students and faculty at the University of Massachusetts Amherst provide their comments and critique of the proposed competencies. Both students and faculty agree that there needs to be more...
A randomized controlled trial of Student Success Skills (SSS) was conducted to determine the effect of the classroom program on Grade 5 students’ (N = 4,305) standardized test scores and proximal socioemotional variables associated with academic achievement. The SSS program was delivered by school counselors and reinforced through cuing and coachin...
In item response theory test scaling/equating with the three-parameter model, the scaling coefficients A and B have no impact on the c-parameter estimates of the test items since the c-parameter estimates are not adjusted in the scaling/equating procedure. The main research question in this study concerned how serious the consequences would be if c...
Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three IR...
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure wa...
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting mod...
Background:
Validity evidence based on the internal structure of an assessment is one of the five forms of validity evidence stipulated in the Standards for Educational and Psychological Testing of the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. In this paper, we...
This article describes the confirmatory factor analysis of the Student Engagement in School
Success Skills (SESSS) instrument. The results of this study confirm that the SESSS has potential
to be a useful self-report measure of elementary students’ use of strategies and skills associated
with enhanced academic learning and achievement.
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation stud...
Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some i...
In a statewide achievement test, the use of a calculator is often allowed as an accommodation for students with disabilities (SWD). The purpose of the accommodation is to eliminate the disadvantage faced by SWDs relative to peers without disabilities with the same level of overall mathematics proficiency. However, it is important to determine if th...
The main objective of the present study was to examine measurement invariance of the Reynolds Depression Adolescent Scale (RADS) (Reynolds, 198742.
Reynolds , W. M. 1987 . Reynolds Adolescent Depression Scale. Professional Manual , Odessa : Psychological Assessment Resources, Inc . View all references) across gender and age in a representative samp...
Investigating the fit of a parametric model is an important part of the measurement process when implementing item response theory (IRT), but research examining it is limited. A general nonparametric approach for detecting model misfit, introduced by J. Douglas and A. S. Cohen (2001), has exhibited promising results for the two-parameter logistic m...
Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study examine...
A primary concern with testing differential item functioning (DIF) using a traditional point-null hypothesis is that a statistically significant result does not imply that the magnitude of DIF is of practical interest. Similarly, for a given sample size, a non-significant result does not allow the researcher to conclude the item is free of DIF. To...
The National Assessment Governing Board used a new method to set achievement level standards on the 2005 Grade 12 NAEP Math test. In this article, we summarize our independent evaluation of the process used to set these standards. The evaluation data included observations of the standard-setting meeting, observations of advisory committee meetings...
The purpose of this study was to use differential item functioning (DIF) and latent mixture model analyses to explore factors that explain performance differences on a large-scale mathematics assessment between examinees allowed to use a calculator or who were afforded item presentation accommodations versus those who did not receive the same accom...
It is important to check the fundamental assumption of most popular Item Response Theory models, unidimensionality. However, it is hard for educational and psychological tests to be strictly unidimensional. The tests studied in this paper are from a standardized high-stake testing program. They feature potential multidimensionality by presenting va...
In the United States, when English language learners (ELLs) are tested, they are usually tested in English and their limited English proficiency is a potential cause of construct-irrelevant variance. When such irrelevancies affect test scores, inaccurate interpretations of ELLs' knowledge, skills, and abilities may occur. In this article, we review...
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not bee...
The validity of a hypothesis test is partly determined by whether the assumptions underlying the test are satisfied. In the past, a preliminary analysis of the data has been suggested prior to the use of the statistical test. In this article, the authors describe several limitations of preliminary tests (e.g., influence on significance levels). Ano...
Administering tests under time constraints may result in poorly estimated item parameters, particularly for items at the end of the test (Douglas, Kim, Habing, & Gao, 1998; Oshima, 1994). Bolt, Cohen, and Wollack (2002) developed an item response theory mixture model to identify a latent group of examinees for whom a test is overly speeded, and fou...
This study investigated the effect of item parameter drift on ability estimates under item response theory. Item response data for two testing occasions were simulated for the two-parameter logistic model under the following crossed conditions: test length, sample size, percentage of drifting items, and type of drift. Results indicated that item pa...
Research on mathematics education often includes some experimental manipulation of instruction in order to compare the effects of one type of instruction with that of one or more others. Tracking students' progress during treatments may be difficult in studies with special populations where numbers of participants are sometimes very small. In this...
One way that policies get enacted in higher education is through educational research. In 2000 the National Association of Graduate-Professional Students conducted the National Doctoral Pro-gram Survey (NDPS) in an effort to learn more about doctoral students' experiences and to influ-ence doctoral education policy at both the local and national le...
Thesis (Ph. D.)--University of Wisconsin--Madison, 2004. Includes bibliographical references (p. 100-102). Photocopy.