Ronald K Hambleton

Ronald K Hambleton
University of Massachusetts Amherst | UMass Amherst · Center for Educational Assessment

About

304
Publications
115,933
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
29,686
Citations
Additional affiliations
September 1990 - January 2014
University of Oviedo
Position
  • Researcher
Description
  • I have made numerous trips to Oviedo since about 1990 to work with Jose Muniz and other members of their outstanding department.
January 2011 - present
Griffith University
January 2009 - December 2011
University of Massachusetts Amherst

Publications

Publications (304)
Article
Full-text available
Along with technological developments, electronic devices/tools have been affecting our lives in many aspects. Inevitably, these developments have affected the learning and teaching processes. In the last decade, there has been an increase in the usage of electronic tools/devices in teaching and learning processes, as well as the assessment of thes...
Conference Paper
Full-text available
Thanks to their key advantages, multiple-choice item tests are the most widely used tools in large-scale exams and large classroom assessments. Although advantages such as application and scoring practicality make them very convenient to use, they have also some disadvantages like the possibility of cheating and giving the correct answer by chance....
Article
Full-text available
Validity is one of the psychometric properties of the achievement tests. To determine the validity, one of the examinations is item bias studies, which are based on Differential Item Functioning (DIF) analyses and field experts’ opinion. In this study, field experts were asked to estimate the DIF levels of the items to compare the estimations obtai...
Article
Full-text available
The second edition of the International Test Commission Guidelines for Translating and Adapting Tests was prepared between 2005 and 2015 to improve upon the first edition, and to respond to advances in testing technology and practices. The 18 guidelines are organized into six categories to facilitate their use: pre-condition (3), test development (...
Article
Full-text available
It is common for test publishers to make their most popular educational and psychological tests available in multiple languages and cultures. Occasionally, too, test items are found after publication of these new language versions of tests that may disadvantage members taking these translated tests due to biases. This means that when these tests ar...
Article
Full-text available
In item response theory (IRT) models, assessing model-data fit is an essential step in IRT calibration. While no general agreement has ever been reached on the best methods or approaches to use for detecting misfit, perhaps the more important comment based upon the research findings is that rarely does the research evaluate IRT misfit by focusing o...
Technical Report
Full-text available
These guidelines outline considerations relevant to the assessment of test takers in countries or regions that may be linguistically diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the fair and valid assessment...
Article
Repeatedly using items in high-stake testing programs provides a chance for test takers to have knowledge of particular items in advance of test administrations. A predictive checking method is proposed to detect whether a person uses preknowledge on repeatedly used items (i.e., possibly compromised items) by using information from secure items tha...
Article
Full-text available
Item response theory (IRT) has become a popular methodological framework for modeling response data from assessments in education and health; however, its use is not widespread among psychologists. This paper aims to provide a didactic application of IRT and to highlight some of these advantages for psychological test development. IRT was applied t...
Article
The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or hi...
Article
Application of MIRT modeling procedures is dependent on the quality of parameter estimates provided by the estimation software and techniques used. This study investigated model parameter recovery of two popular MIRT packages, BMIRT and flexMIRT, under some common measurement conditions. These packages were specifically selected to investigate the...
Article
This chapter focuses on international standards and guidelines that relate to tests and testing. It covers the work of the ITC (International Test Commission) and other international initiatives that have been influenced by that work, notably the work of EFPA (European Federation of Psychologists’ Associations) and of ISO (International Standards O...
Article
Full-text available
In item response theory test scaling/equating with the three-parameter model, the scaling coefficients A and B have no impact on the c-parameter estimates of the test items since the c-parameter estimates are not adjusted in the scaling/equating procedure. The main research question in this study concerned how serious the consequences would be if c...
Article
Full-text available
This special issue targets at efforts that offer insights into current and future trends and research directions in technology-based assessment and testing. It attracted 32 submissions which where double blindly reviewed by 42 international experts. Finally, 8 papers have been selected for publication covering a wide range of topics in this field....
Article
Research has shown that mental imagery facilitates various types of learning. This study investigated a procedure expressly designed to evoke and enhance mental imagery in spelling instruction. The purposes of the investigation were two-fold. The first compared the effects of three approaches to spelling instruction on learning and retention. The s...
Article
In objectives-based instructional programs where relatively short criterion-referenced tests are administered to estimate student mastery for the purpose of monitoring a student through the program, estimates which maximally utilize the information that can be obtained from the student during the alloted testing time are required. Bayesian estimate...
Article
Full-text available
Cross-national assessment of students’ competences in higher education is becoming increasingly important in many disciplines including economics but there are few available instruments that meet psychological standards for assessing students’ economic competence in higher education (HE). One of them is the internationally valid Test of Understandi...
Article
Objective: To use item response theory (IRT) methods to link scores from 2 recently developed contemporary functional outcome measures, the adult Spinal Cord Injury-Functional Index (SCI-FI) and the Pedi SCI (both the parent version and the child version). Design: Secondary data analysis of the physical functioning items of the adult SCI-FI and...
Article
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure wa...
Article
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting mod...
Article
The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a...
Article
Due to recent research in equating methodologies indicating that some methods may be more susceptible to the accumulation of equating error over multiple administrations, the sustainability of several item response theory methods of equating over time was investigated. In particular, the paper is focused on two equating methodologies: fixed common...
Article
Full-text available
The goal of the present study is to develop a questionnaire, with proper psychometric properties and current norms, to evaluate the burnout syndrome in Spain. The operative definition of burnout proposed by Maslach and Jackson is used to define three dimensions (Emotional exhaustion, Depersonalization and Personal accomplishment). A total of 2,403...
Article
Full-text available
Background: Adapting tests across cultures is a common practice that has increased in all evaluation areas in recent years. We live in an increasingly multicultural and multilingual world in which the tests are used to support decision-making in the educational, clinical, organizational and other areas, so the adaptation of tests becomes a necessi...
Article
Full-text available
DIMPACK Version 1.0 for assessing test dimensionality based on a nonparametric conditional covariance approach is reviewed. This software was originally distributed by Assessment Systems Corporation and now can be freely accessed online. The software consists of Windows-based interfaces of three components: DIMTEST, DETECT, and CCPROX/HAC, which co...
Chapter
Full-text available
The percentage of examinees who are classified consistently and accurately into the proficiency levels is an important measurement property of the tests that are used to classify the candidates. Given the suspected discrepancies between the classical test theory (CTT)- and item response theory (IRT)-based single-administration decision consistency...
Article
Since the 1950s, applications of item response theory (IRT) models were slow to be implemented because of their complexity and a shortage of suitable software, but now they are widely used by testing agencies and researchers. Clearly, IRT models are central today in test development, test evaluation, and test data analysis. The purposes of this cha...
Article
Test scores matter these days. Test‐takers want to understand how they performed, and test score reports, particularly those for individual examinees, are the vehicles by which most people get the bulk of this information. Historically, score reports have not always met the examinees’ information or usability needs, but this is clearly changing for...
Article
The equal ability distribution assumption associated with the equivalent groups equating design was investigated in the context of a selection test for admission to higher education. The purpose was to assess the consequences for the test-takers in terms of receiving improperly high or low scores compared to their peers, and to find strong empirica...
Article
Our purpose was to determine the efficiency of the estimates of ability provided by the one-parameter logistic model as compared to the estimates provided by the more general two- and three-parameter models. Several tests were simulated with item parameters meeting the assumptions of either the two- or three-parameter model. For each test, the info...
Article
Although Birnbaum's logistic models have been known since 1957, there have been few applications to empirical data reported in the literature. In this study, the one- and two-parameter logistic models were compared with respect to their capacity to predict the distribution of statistics for estimating ability. The results of this study suggest that...
Chapter
• technical advances and guidelines for improving; • Trends in International Mathematics and Science Studies (TIMSS) - assessments of 9 and 13 year olds; • Organization for Economic Cooperation and Development's (OECD's) Programme for International Student Assessment (PISA), assessments of 15 year olds in mathematics, science and reading; • IRT, in...
Article
Cross-cultural research is now an undeniable part of mainstream psychology and has had a major impact on conceptual models of human behavior. Although it is true that the basic principles of social psychological methodology and data analysis are applicable to cross-cultural research, there are a number of issues that are distinct to it, including m...
Chapter
Just as traditional computerized adaptive testing (CAT) involves adaptive selection of individual items for sequential administration to examinees as a test is in progress, multistage testing (MST) is an analogous approach that uses sets of items as the building blocks for a test. In MST terminology, these sets of items have come to be termed modul...
Article
To accurately assess the health knowledge, attitudes, and practices of students in grades four-seven, the staff of the School Health Education Evaluation (SHEE) project devoted extensive effort to identify a test appropriate for such assessment. An extensive literature review failed to produce an instrument sufficiently comprehensive or psychometri...
Article
Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study examine...
Article
Full-text available
How a testing agency approaches score reporting can have a significant impact on the perception of that assessment and the usefulness of the information among intended users and stakeholders. Too often, important decisions about reporting test data are left to the end of the test development cycle, but by considering the audience(s) and the kinds o...
Article
Full-text available
In this study, we mapped achievement levels from the National Assessment of Educational Progress (NAEP) onto the score scales for selected assessments from the Trends in International Mathematics and Science Study (TIMSS) and the Program for International Student Achievement (PISA). The mapping was conducted on NAEP, TIMSS, and PISA Mathematics ass...
Article
Full-text available
The objectives of this study were to develop a functional outcome instrument for hip and knee osteoarthritis research (OA-FUNCTION-CAT) using item response theory (IRT) and computer adaptive test (CAT) methods and to assess its psychometric performance compared to the current standard in the field. We conducted an extensive literature review, focus...
Data
A figure of the scree plots for the OA-FUNCTION-CAT domains.
Data
A table listing the items in the OA-FUNCTION-CAT item bank and the item calibrations.
Article
Full-text available
Psychological research that involves cross-cultural comparisons has increased considerably during the last decade and is expected to escalate further. Given its growing popularity within mainstream psychology, cross-cultural research no longer can be considered the sole domain of experts trained in this specialization. Concomitant with this expansi...
Article
Full-text available
Contemporary clinical assessments of activity are needed across the age span for children with cerebral palsy (CP). Computerized adaptive testing (CAT) has the potential to efficiently administer items for children across wide age spans and functional levels. The objective of this study was to examine the psychometric properties of a new item bank...
Article
The objective of this project was to develop computer-adaptive tests (CATs) using parent reports of physical function in children and adolescents with cerebral palsy (CP). The specific aims of this study were to (1) examine the psychometric properties of an item bank of lower-extremity and mobility skills for children with CP; (2) evaluate a CAT us...
Article
To develop and evaluate a prototype measure (OA-DISABILITY-CAT) for osteoarthritis research using item response theory (IRT) and computer-adaptive test (CAT) methodologies. We constructed an item bank consisting of 33 activities commonly affected by lower extremity (LE) osteoarthritis. A sample of 323 adults with LE osteoarthritis reported their de...
Article
Full-text available
The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). Parent respondents of 306 children with cerebral palsy were recruited from four p...
Article
The specific aims of this study were to (1) examine the psychometric properties (unidimensionality, differential item functioning, scale coverage) of an item bank of upper-extremity skills for children and adolescents with cerebral palsy (CP); (2) evaluate a simulated computer-adaptive test (CAT) using this item bank; (3) examine the concurrent val...
Article
Full-text available
This article discusses the ResidPlots-2, a computer software that provides a powerful tool for IRT graphical residual analyses. ResidPlots-2 consists of two components: a component for computing residual statistics and another component for communicating with users and for plotting the residual graphs. The features of the ResidPlots-2 software are...
Article
Full-text available
It is important to check the fundamental assumption of most popular Item Response Theory models, unidimensionality. However, it is hard for educational and psychological tests to be strictly unidimensional. The tests studied in this paper are from a standardized high-stake testing program. They feature potential multidimensionality by presenting va...
Article
Full-text available
DETECT is a nonparametric ``full'' dimensionality assessment procedure that clusters dichotomously scored items into dimensions and provides a DETECT index of magnitude of multidimensionality. Four factors (test length, sample size, item response theory [IRT] model, and DETECT index) were manipulated in a Monte Carlo study of bias, standard error,...
Article
The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive t...
Article
Item response theory (IRT) provides a framework for modeling and analyzing item response data. Assessing IRT model fit to item response data is one of the crucial steps before an IRT model can be applied with confidence to estimate proficiency or ability levels of examinees, to link tests across administrations, and to assess adequate yearly progre...
Article
Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing respo...
Article
Measurement specialists routinely assume examinee responses to test items are independent of one another. However, previous research has shown that many contemporary tests contain item dependencies and not accounting for these dependencies leads to misleading estimates of item, test, and ability parameters. The goals of the study were (a) to review...
Article
Full-text available
Many credentialing agencies today are either administering their examinations by computer or are likely to be doing so in the coming years. Unfortunately, although several promising computer-based test designs are available, little is known about how well they function in examination settings. The goal of this study was to compare fixed-length exam...
Article
Now that many credentialing exams are being routinely administered by computer, new computer-based test designs, along with item response theory models, are being aggressively researched to identify specific designs that can increase the decision consistency and accuracy of pass–fail decisions. The purpose of this study was to investigate the impac...
Book
No topic is more central to innovation and current practice in testing and assessment today than computers and the Internet. This timely publication highlights four main themes that define current issues, technical advances and applications of computer-based testing: Advances in computer-based testing -- new test designs, item selection algorithms,...
Article
How frequently have articles about CRT appeared? What advances in measurement have been influenced by the CRT movement? What has been the lasting contribution of the CRT movement?
Article
Can measurement specialists’current ideas about content validation be implemented with licensure examinations? Does pressure of litigation facilitate or inhibit conducting validity studies?
Chapter
This entry provides an introduction to criterion-referenced testing and approaches for resolving several technical problems. Several of the important differences between norm-referenced testing and criterion-referenced testing will be introduced first. Next, the challenge of setting performance standards on criterion-referenced tests and several of...
Chapter
This entry provides an introduction to the topic of item response theory. Shortcomings of classical test models are considered first. Second, current item response models for the analysis of dichotomously scored item response data are introduced. Estimation of model parameters, assessment of model fit, and available software, are described next. Fi...
Article
Full-text available
Standardized patient examinations are being used for high-stakes decisions (e.g., graduation, licensure, and certification) with growing frequency. Concurrently, research on methods to determine the passing score for these types of performance-based assessments has increased. A wide variety of approaches have been considered in the past several yea...
Article
Differential weighting of response alternatives and confidence testing have been proposed as ways to assess partial knowledge on multiple-choice tests. 211 students in an educational measurement course took their midterm examination under one of three procedures. Results from those students administered the test under conventional directions provid...
Article
Item response models are finding increasing use in achievement and aptitude test development. Item response theory (IRT) test development involves the selection of test items based on a consideration of their item information functions. But a problem arises because item information functions are determined by their item parameter estimates, which c...

Network

Cited By