
Kingston NealUniversity of Kansas | KU · Department of Educational Psychology
Kingston Neal
Ph.D.
About
68
Publications
10,844
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,243
Citations
Citations since 2017
Introduction
Kingston Neal currently works at the Department of Educational Psychology and is Director of the Achievement and Assessment Institute at the University of Kansas. Kingston does research in Educational Assessment, Educational Technology and Educational Psychology. Much of his current research focuses on learning maps and their use to support instruction and assessment.
Skills and Expertise
Publications
Publications (68)
Adaptive tests are more efficient than fixed‐length tests through the use of item response theory; adaptive tests also present students questions that are tailored to their proficiency level. Although the adaptive algorithm is straightforward, developing a multidimensional computer adaptive test (MCAT) measure is complex. Evidence‐centered design (...
This paper explores the rapidly changing world of higher education and the need for different ways to identify learner outcomes and evaluate student learning. In recent years, higher education has experienced significant demographic shifts in student populations. These shifts were the result of numerous variables including the increasing cost of hi...
A multivariate longitudinal DCM is developed that is the composite of two components, the log-linear cognitive diagnostic model (LCDM) as the measurement model component that evaluates the mastery status of attributes at each measurement occasion, and a generalized multivariate growth curve model that describes the growth of each attribute over tim...
Background
The importance of reading motivation has led to the development of a large number of self‐report reading motivation measures; however, there is still a need for a usable measure of adolescent reading motivation that captures a large number of theoretically and empirically distinct constructs.
Methods
The current paper details the develo...
Previous studies indicated that the assumption of logistic form of parametric item response functions (IRFs) is violated often enough to be worth checking. Using nonparametric item response theory (IRT) estimation methods with the posterior predictive model checking method can obtain significance probabilities of fit statistics in a Bayesian framew...
Under the high‐stakes accountability regime, narrower curricula and/or teaching to narrower tests can restrict the range of skills students acquire. We develop a theory of skill range restriction at the state level. The analysis focuses on math and reading skills in fourth and eighth grade between 2003 and 2009. At both grade levels, the average st...
Most research on multistage testing (MST) uses simulated data. This study adds to the literature by using both operational test data and simulated data to compare two different MST designs with regard to proficiency estimation accuracy and module exposure rates and by investigating whether simulation studies and operational test studies yield simil...
Classroom assessment and large‐scale assessment have, for the most part, existed in mutual isolation. Some experts have felt this is for the best and others have been concerned that the schism limits the potential contribution of both forms of assessment. Margaret Heritage has long been a champion of best practices in classroom assessment. Neal Kin...
Evidence-based approaches to assessment design, development, and administration provide a strong foundation for an assessment’s validity argument but can be time consuming, resource intensive, and complex to implement. This article describes an evidence-based approach used for one assessment that addresses these challenges. Evidence-centered design...
The hierarchical item response theory (H-IRT) model is very flexible and allows a general factor and subfactors within an overall structure of two or more levels. When an H-IRT model with a large number of dimensions is used for an adaptive test, the computational burden associated with interim scoring and selection of subsequent items is heavy. An...
The purpose of current study was to develop a multivariate longitudinal Diagnostic Classification Models to describe students' academic growth over time.
The continual supply of new items is crucial to maintaining quality for many tests. Automatic item generation (AIG) has the potential to rapidly increase the number of items that are available. However, the efficiency of AIG will be mitigated if the generated items must be submitted to traditional, time-consuming review processes. In two studies, g...
This study explored the impact of homogeneity of answer choices on item difficulty and discrimination. Twenty-two matched pairs of elementary and secondary mathematics items were administered to randomly equivalent samples of students. Each item pair comparison was treated as a separate study with the set of effect sizes analyzed using meta-analysi...
The purpose of this study was to develop a standard-setting method appropriate for use with a diagnostic assessment that produces profiles of student mastery rather than a single raw or scale score value. The condensed mastery profile method draws from established holistic standard-setting methods to use rounds of range finding and pinpointing to s...
Background:
Previous research suggested that folate levels play an important role in the etiology and course of depression. However, the literature has been inconsistent with regard to differences in folate level between individuals with and without depression. The present meta-analysis synthesized the results of previous studies to examine whethe...
Although there is widespread agreement among both special education experts and general classroom teachers that students with significant cognitive disabilities should participate in inclusive classrooms, most teachers report that they do not know how to do this effectively. One of the challenge teachers face is figuring out how to focus on grade-l...
Despite much theoretical support, meta-analysis of the efficacy of formative assessment does not provided empirical evidence commensurate with expectations. This theoretical study suggests that teachers need a better organizing structure to allow a formative assessment process to live up to its promise. We propose that the use of learning map syste...
The Dynamic Learning Maps™ Alternate Assessment is based on a different set of guiding principles than other assessments. In this article we describe its characteristics and look at the history of alternate assessment and the problems in implementing useful assessment programs for students with significant cognitive disabilities.
Recent studies have raised concerns about the vagueness of alternate assessment eligibility guidelines, specifically, that students with mild disabilities (SWMD) have been inappropriately assigned to alternate assessment-alternate achievement standards (AA-AAS). In this study, special education teachers (N = 317) were surveyed about SWMD in vignett...
We interviewed special educators (a) whose students with disabilities (SWDs) were proficient on the 2008 general education assessment but were assigned to the 2009 alternate assessment based on modified achievement standards (AA-MAS), and (b) whose students with mild disabilities took the 2008 alternate assessment based on alternate achievement sta...
Purpose
– Against a backdrop of high-stakes assessment policies in the USA, this paper explores the challenges, promises and the “state of the art” with regard to designing standardized achievement tests and educational assessment systems that are instructionally useful. Authors deliberate on the consequences of using inappropriately designed tests...
Background
Despite polling data that suggests that teachers are well respected by the general public, criticism of teacher preparation by various organizations and interest groups is common, often highlighting the perceived need for increasing their rigor and performance. A number of studies and reports have critiqued teacher preparation, and high-...
Scores on state standards-based assessments are readily available and may be an appropriate alternative to traditional placement tests for assigning or accepting students into particular courses. Many community colleges do not require test scores for admissions purposes but do require some kind of placement scores for first-year English and math co...
Promoting self‐determination has been suggested as a means for students with disabilities to access the general curriculum. We surveyed 407 elementary educators to examine a) the effects of classroom setting and teaching self‐regulation strategies on the perceived importance and frequency of teaching self‐determination; and b) the severity level of...
Research today demands the application of sophisticated and powerful research tools. Fulfilling this need, this two-volume text provides the tool box to deliver the valid and generalizable answers to today's complex research questions. The Oxford Handbook of Quantitative Methods in Psychology aims to be a source for learning and reviewing current b...
Coefficient alpha (α) has been described as a lower bound for test reliability. However, previous research indicates that when certain assumptions are violated, α can either overestimate or underestimate reliability. Raykov (1997a) has shown how structural equation modeling (SEM) can be used to estimate reliability. This study has introduced method...
Research suggests that self-determination skills are positively correlated with factors that have been shown to improve academic achievement, but the direct relationship among self-determination, self-concept, and academic achievement is not fully understood. This study offers an empirical explanation of how self-determination and self-concept affe...
The purpose of this case study was to determine teachers' rationales for assigning students with mild disabilities to alternate assessment based on alternate achievement standards (AA-AAS). In interviews, special educators stated that their primary considerations in making the assignments were low academic performance, student use of extended stand...
This study examined the validity of test accommodation in third–eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item difficu...
The discrete-option multiple-choice (DOMC) item type was developed to curtail cheating and reduce the impact of testwiseness, but to date there has been only one published study of its statisti-cal characteristics, and that was based on a relatively small sample. This study was implemented to investigate the psychometric properties of the DOMC item...
The authors surveyed 233 elementary special educators in 23 states to determine (a) how the teaching of self-regulation strategies and classroom setting affected their perceptions of the importance of teaching self-determination, (b) the frequency with which they did so, and (c) the barriers to promoting self-determination. Results indicated that t...
An effect size of about .70 (or .40–.70) is often claimed for the efficacy of formative assessment, but is not supported by the existing research base. More than 300 studies that appeared to address the efficacy of formative assessment in grades K‐12 were reviewed. Many of the studies had severely flawed research designs yielding uninterpretable re...
This study examined the learner characteristics and performance scores of students in the 2009 alternate assessment-modified achievement standard for one Midwestern state. Comparing performance differences by disability category for each content area from the students' 2008 test type assignments and performance scores facilitated examining the appr...
This article represents one outcome from the Invitational Research Symposium on Technology-Enabled and Universally Designed Assessments, which examined technology-enabled assessments (TEA) and universal design (UD) as they relate to students with disabilities (SWD). It was developed to stimulate research into TEAs designed to better understand the...
The No Child Left Behind Act (2001) and the Individuals with Disabilities Education Improvement Act (2004) emphasize accountability to improve student academic achievement. Promoting self-determination has been proposed as a means to achieving this outcome. Elementary teachers in 30 states were surveyed to measure (a) their perceived importance of...
Saleebey, DennisSavingSavoringSchool PsychologySelf-CompassionSelf-DeterminationSelf-EfficacySelf-EsteemSelf-MonitoringSelf-RegulationSelf-Report InventorySeligman, MartinSerotoninSmilesSnyder, C. R.Social Cognitive TheorySocial SkillsSocial SupportSocial WorkSolution-Focused Brief TherapySpiritual Well-BeingSpiritualitySport PsychologyStanton, Ann...
There have been many studies of the comparability of computer-administered and paper-administered tests. Not surprisingly (given the variety of measurement and statistical sampling issues that can affect any one study) the results of such studies have not always been consistent. Moreover, the quality of computer-based test administration systems ha...
This chapter focuses on the most important factor contributing to the value of a testing program—that is, validity. Unfortunately, it is relatively easy to develop sophisticated models to help reduce the error of estimation by a few percent, so the vast majority of recent psychometric research has focused on more accurately modeling and assessing t...
One of the major assumptions of item response theory (IRT)models is that performance on a set of items is unidimensional, that is, the probability of successful performance by examinees on a set of items can be modeled by a mathematical model that has only one ability parameter. In practice, this strong assumption is likely to be violated. An impor...
This study examined the factor structure of the Graduate Record Examinations (GRE) General Test to appraise the extent to which an analytical factor could be identified that was distinguishable from verbal and quantitative factors. Full-information factor analysis was employed in several groups of undergraduate majors on items from one edition of t...
Confirmatory multidimensional item response theory (CMIRT) was used to assess the structure of the Graduate Record Examination General Test, about which much information about factorial structure exists, using a sample of 1,001 psychology majors taking the test in 1984 or 1985. Results supported previous findings that, for this population, there ex...
A study was conducted to investigate the feasibility of using IRT equating for the GRE Subject Test in Mathematics. Two forms of the test were equated using the three-parameter logistic (3PL) model, and the results were compared to the results of the Tucker equating procedure currently used operationally, as well as to equipercentile equating. In a...
When the three-parameter logistic model and item response theory are used to analyze Graduate Management Admission Test (GMAT) data, there are problems with the assumption of unidimensionality. Linear factor analytic models, exploratory factor analysis programs, and the comparison of item parameter estimates for heterogeneous and homogeneous subset...
ABSTRACT The original,purpose,of,this,study,was,to,address,the test-disclosure-related,need to introduce,more Graduate Record Examinations (GRE) General Test editions each year than formerly, in a
Psychometric issues confronted when implementing a system of item response theory (IRT) tools for test development at the Educational Testing Service (ETS) are discussed. These issues include selecting and assessing the appropriateness of IRT models, choosing methods of IRT scaling for item pools, considering test scoring strategies, and applying I...
A necessary prerequisite to the operational use of item response theory (IRT) in any testing program is the investigation of the feasibility of such an approach. This report presents the results of such research for the Graduate Management Admission Test (GMAT). Despite the fact that GMAT data appear to violate a basic assumption of the three-param...
The use of item-ability regressions (the comparison of the regression of the observed proportion of people answering an item correctly on estimated θ with the estimated item response function) to investigate the psychometric properties of particular item types in a given population was explored using data from four administrations of 10 item types...
The incremental validity of the analytical measure of the revised Graduate Record Examination (GRE) General Test, for predicting first-year graduate grade-point average (GPA), was assessed using data submitted to the GRE Validity Study Service between March 1983 and November 1984. All selected students had data for the three General Test measures (...
A necessary prerequisite to the operational use of item response theory (IRT) in any testing program is the investigation of the feasibility of such an approach. This report presents the results of such research for the Graduate Management Admission Test (GMAT). Despite the fact that GMAT data appear to violate a basic assumption of the three-param...
The stability of scores over time of the GRE® General Test verbal, quantitative, and analytical measures were studied using data from the self-selected group of repeaters. Overall, in these analyses the new-format GRE verbal measure demonstrated the greatest stability over time and the new-format analytical measure the least.
A context effect occurs when examinees' item re sponding behavior is affected by the location of an item within a test. Recent advances in testing practice, most notably adaptive testing and certain innovative equating schemes, require items to be more invariant across intended usages than earlier methods. In this paper, location effects are identi...
The feasibility of using item response theory (IRT) as a psychometric model for the Graduate Record Examination (GRE) Aptitude Test was addressed by assessing the reasonableness of the assumptions of item response theory for GRE item types and examinee populations. Items from four forms and four administrations of the GRE Aptitude Test were calibra...
ABSTRACT The research,described,in,this,paper,deals,solely,with,the,effect,of the,position,of,an,item,within,a test,on,examinee's,responding,behavior at the item level. For simplicity's sake, this effect will be referred to as,practice,effect,when,the,result,is,improved,examinee,performance,and as,fatigue,effect,when,the,result,is poorer,examinee,p...
Typescript; issued also on microfilm. Thesis (Ph. D.)--Columbia University. Bibliography: leaves 45-50.